July 18th, 2003, 12:04 PM

shannon's information theory: log e, or log2 ?
shannon's information theory that looks like this:
H =  E P(i) log P(i)
(E representing sigma, and P(i) the probability of message i)
the log: should that be natural e log, or log2 does anyone know?
July 18th, 2003, 03:04 PM

log2
Should be obvious, since the code's alphabet is {0,1}
Andrei
July 19th, 2003, 08:39 AM

thanks. hmm, might be obvious to you but not me, and i'm still not sure beacuse in a book i've got it says "log e" within that formula. also in the original paper it's just log, and isn't log on it's own log e? it is on my calculator and in the c language in any case. but then i've seen log2 in a version of that formula on a www page somewhere and what you say makes sense, so i'm not sure. not sure how to find out defintely either.
July 19th, 2003, 09:11 AM

You're right. It's confusing when they don't write the logarithm's base. I found it obvious because Shannon's talking about binary information: 0 or 1, which is base 2. You should read Shannon's book on data compression, it's a very complete guide:
http://www.datacompression.com/theory.html
Good Luck!
Andrei
July 19th, 2003, 11:43 AM

ok thanks. so the book i have that says log e, is incorrect. maybe i'll email the author to tell him he's a silly sod you'd have thought someone'd check and double check before they put it into their book.
i have the "the mathematical theory of communication" book which has shannon's original paper and a follow up by warren weaver. the warren weaver part is great  it's in reasonably descriptive english, but unfortunetely i get lost within the first few pages of shannon's part.
thanks for the reply and link.
August 7th, 2003, 10:13 AM

stating the base is 2 for shannon's equasion turns out not to be so accurate at all. the base was left unspecified on purpose because it can be used with any base.
from shannon's paper:
"The choice of a logarithmic base corresponds to the choice of a unit for
measuring information. If the base 2 is used the resulting units may be
called binary digits, or more briefly bits....If the base 10 is used the
units may be called decimal digits."
so basically it depends on what base your input is in.
yes base 2 is most often used, but any other base can be used including base e.
August 7th, 2003, 02:53 PM

Actually, it does not really matter because the difference between both logs is only a constant. And, as we are more interested in maximums and minimums of information constants are no big deal.