Information Theory
Updated 2017-10-31
Information Measure
How much information is required to to represent some outcome. If the outcome is either true or false, then only 1 bit is required to store the information with no loss.
Information measure should take into account the probabilities of outcomes to minimize the “footprint” of the information stored.
Suppose there are
If the symbols are bits in a computer, then
Requirements
- If
then- This implications states that if the probability of an event is greater, then less there is less information regarding that event. (Raining in November vs. snowing in July)
- If
then- If some event is going to occur always (i.e. there is someone on Earth that will alive tomorrow), then it holds no information
- Similar argument for
since it’s just the compliment of that (i.e. no one will be alive on Earth tomorrow)
-
- Information cannot have negative size (honestly WTF would negative information even mean?)
- If
are independent then
Example:
You got admitted to Stanford university (4.8% admission rate), then the chance is about 1 in 21. It follows that the information measured is
bits. However, it was all a dream; you did not get into Stanford. Then you’re part of the 95.2% that was rejected, or 20 in 21, or 0.952. Thus, the information measured is
bits.
Shannon Entropy
The total information in a message with
The average information is the entropy, which is
Notice that it can be also expressed as
Source Coding Theorem
The entropy tells us the minimum number of bits required to represent each symbol. Any less then we have a lossy encoding / compression.
According to law of large numbers, on average, any symbol
Typical Sequences
In a stream of information that has
It follows that we need
Assuming that each typical sequence is equally likely, then the probability of a typical sequence is
Huffman Coding
It is a method of encoding symbols with codewords. The idea centers around prefix-free code: meaning that the codeword is never a prefix to other symbols’ codewords.
Huffman coding allows the symbol that appears the most use codewords that have the least length. Therefore number of bits that is required to be transmitted is minimized.
Algorithm
- Create a binary tree where the leaf nodes is sorted
- The leaf nodes represent the symbol and its probability
- Join two leaf nodes that has the least probabilities into a parent node
- The parent node holds the sum of the probability of its children nodes
- Repeat this until the parent node is a root node.
- Starting from the root node, find the path it takes to get to a leaf node (the symbol)
- If the path to a symbol is {left, left, right}, one could encode this symbol 001 or 110.