Recently, I have been going through some interesting videos on information theory. I am posting the link here: Information Theory, Khan Academy, for the benefit of those who want to learn something new, fundamental and interesting. I will try to summarize what I understood from the lectures below.

**1. Ancient Information Theory**

It is interesting to note how humans started communicating with each other. Ancient humans used pictographs and ideograms engraved on rocks and in caves to share information. Later on, symbols and alphabets were devised for ease of communication.

With the passage of time, advanced communication technologies were developed. For instance, Greeks and Romans used torches, especially in battles, for quick long-distance communication. In the 17th century, shutter telegraphs were the norm. It could cover all the letters in English alphabet. With the help of a telescope, it was possible to send information across an incredible amount of distance. But still, these techniques were not sufficient for effective communication due to their low expressive powers and low speeds.

With the passage of time, advanced communication technologies were developed. For instance, Greeks and Romans used torches, especially in battles, for quick long-distance communication. In the 17th century, shutter telegraphs were the norm. It could cover all the letters in English alphabet. With the help of a telescope, it was possible to send information across an incredible amount of distance. But still, these techniques were not sufficient for effective communication due to their low expressive powers and low speeds.

With the discovery of electricity, the information age had begun. The visual telegraphs were soon replaced by electrostatic telegraphs. It was possible to send large bits of information to long distances in short amounts of time.

**2. Modern Information Theory**

How fast can two parties communicate with each other? The limiting rate at which it is possible to send messages depends on

**symbol rate**(baud) and

**difference**. Symbol rate stands for the number of symbols which can be transferred per second. There is a fundamental limit on the distance between two pulses. Due to noise, a pulse may not be perfect. If two pulses are very close to each other, there is a very high chance for inter-pulse interference to occur between them. This makes it difficult for the receiver to decode the signal. The difference stands for the number of signaling events per symbol. Message space denotes the number of messages possible.

How is it possible to

**quantify**information? A possible way solve this problem is by considering a scenario where the receiver asks a number of yes/no questions to the sender to receive information. Based on the minimum number of questions required to receive the complete message, it is possible to quantify information. This unit is called a binary digit or bit, in short. Mathematically, this is nothing but logarithm (to the base 2) of the size of the message space. For example, to pass the name of a book from the Bible, 6-7 bits are required. To pass the names of 2 books from the Bible, 12-14 bits are required.

Human communication is a mix of randomness and statistical dependencies. Claude Shannon uses a Markov model as the basis of how we can think about communication. Given a message, a machine can be designed which generates a similar-looking text. As we progress from the zeroth-order approximation to first-order and second-order approximations, the similarity of the text generated by the machine with the message increases.

**Entropy**- Shannon gives a mathematical method to quantify information. He calls this quantity entropy H= -Σp. log2(p), where p is the probability of an outcome of the event. If all the outcomes are equally likely, then entropy is maximum. If there is some predictability in the outcomes, then entropy comes down. For example, a text with random words and letters will have higher

**Shannon's information**than a text with "normal" letters. This seems counter-intuitive at first because information theory doesn't deal with the

**semantic information**in the message but with the number of symbols required to communicate a message. The only way to regenerate the random text is to copy the text as is. However, it is possible to compress the "normal" test using rules due to its predictable nature.

**Coding theory**- If the entropy is not maximum, then it is possible to compress a message. But, what are the ways in which we can compress a message? David Huffman came up with the Huffman coding strategy to compress a message with the help of a binary tree by encoding symbols into bits. However, the limit of (lossless) compression is the entropy of the message source. The information in the message would be lost when the message needs to be compressed beyond the specified limit.

**Error detection & correction**- During communication,

**noise**in the channel corrupts the message resulting in difficulty for the receiver to understand the message. How is it possible to deal with noise? Richard Hamming came up with the idea of parity bits built upon the concept of repetition. Thus, error correction is done by using more symbols to encode the same message resulting in an increase in the size of the message.

**Reference**

Shannon, Claude Elwood. "A mathematical theory of communication."

*ACM SIGMOBILE Mobile Computing and Communications Review*5.1 (2001): 3-55.