Wednesday, 28 October 2009

Practical Example of Entropy

This article is the fruit of a discussion held between Maura Paterson, Paulo Cardoso and myself about Shannon's entropy in information theory.

Entropy, H(x), can be thought of as a way of measuring how difficult it is to get an answer correct by guessing. In other words, entropy represents the uncertainty value. If the next outcome of the challenge at hand is certain (we all know what it will be), then the entropy is 0 as there is no uncertainty.

The entropy H(x), where the value of x is selected randomly from the set {x1, x2, ..., xn} (of size n) can be calculates using the following formula:

where b is the number of symbols in the target alphabet and p(xi) is the probability mass function of outcome xi. In this article the target alphabet is the binary alphabet which only contains two symbols: {0, 1}

Why are there two ways to calculate entropy? The first formula is the general formula and suites all scenarios, while the second formula (logb(n)) is only used when the elements in the set x have equal probability to be selected (the selection is not biased)

Example 1: If I pick a weekday (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday) uniformly at random, then ask you to guess which day I picked, the best you can hope to do is guess randomly, and the entropy is log2(7) = 2.81. In this example we've used the simplest part of the formula (logb(n)) as all elements had the same probability.

Example 2: However, if I pick a weekday using the distribution that I pick Sunday with probability 1/2, and each other day with probability 1/12 then we can use the formula to compute the fact that the entropy of this distribution is 2.29, i.e. the uncertainty is less than in the previous case. This value is calculated as follows:

-( 1/2×log2(1/2)
   + 1/12×log2(1/12) 
   + 1/12×log2(1/12)
   + 1/12×log2(1/12)
   + 1/12×log2(1/12)
   + 1/12×log2(1/12)
   + 1/12×log2(1/12) )
 -( 1/2×log2(1/2)     
  + 6/12×log2(1/12) )
= 2.29

This makes sense, because in the last scenario your best strategy is to guess that I've picked Sunday and you will be correct much more often than in the previous case.

Conditional Entropy

If you think in terms of bits, you need (almost) three bits to represent all the weekdays. Example:

000 = Sunday
001 = Monday
110 = Saturday
111 is not used.

Example 3: We established that the entropy H(x) of the original distribution was log2(7) = 2.81. Learning that the first bit is 0 gives you extra information, since we now know it can only be Sunday, Monday, Tuesday or Wednesday. In this case the entropy H(x|the first bit of x is 0) is log2(4) = 2 (since there are four equiprobable possibilities remaining). The extra information has reduced the uncertainty.

The entropy H(x|the first bit of x is 1) is log2(3) = 1.58. Thus if x is the variable representing the day of the week and y is the variable representing the first bit of the encoding we have that the conditional entropy H(x|y) = 4/7× log2(4) + 3/7×log2(3) = 4/7× 2 + 3/7×1.58 = 1.822. The first half of the equation: 4/7×2 represents the option when the first bit is 0 while the second half, 3/7×1.58, when the first bit is 1. So we see that in this case H(x) > H(x|y), so we can conclude that learning the first bit of the encoding does give you some information about x (it reduces the uncertainty over the value of x).

In order to be correct you'd need to see three bits, unless the first two bits you saw were 11, in which case you'd know the answer was 110, since we are not using the binary string 111. In fact this depends on the encoding you are using for each of the possibilities (for example, we could take the above encoding and add 10 zeros in front of each triple to get a new, very wasteful, encoding. In this case you'd have to see 13 bits most of the time.) However, the value of the entropy gives a lower bound on the expected number of bits that will be required no matter what encoding is used.

No comments:

Post a comment