If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Computer science

### Course: Computer science>Unit 3

Lesson 2: Modern information theory

# Error correction

How can we communicate in the presence of noise? Created by Brit Cruise.

## Want to join the conversation?

• What happens is a parity bit gets corrupted?
• If one of the data bits gets flipped then two or three of the parity bits are wrong. if only one parity bit is wrong then it must be the parity bit that is flipped. note he stated repeatedly that the system only corrects single errors.
• What's the reason for sequence of information and parity bits at ? Is there some logical reason for the sequence or is it just what was used initially and so has stuck ever since?
• Indeed, there is a method behind the madness for which bit is used for data and which bit is used for parity in Hamming codes.

In general, a Hamming code uses a block of 2^m-1 bits with m bits used for parity
(where m is some integer >=2)
The powers of 2 (1,2,4,8,etc.) are the parity bits
The remaining bits are data bits
The parity bits cover the bit numbers that have have its power of 2 set in their binary representation
This gives us the property that if we sum up the incorrect parity bits we can identify the bit in error

e.g.
For the Hamming code presented in the video m=3
Thus we have a block of 2^3-1=7 bits with 3 parity bits
7 bits total-3 parity bits=4 data bits
This is called a (7,4) Hamming code

So bit 1,2,4 are parity bits (these correspond to p1,p2,p3 in the video)
Bits 3,5,6,7 are data bits (these correspond to d1,d2,d3,d4 in the video)

Notice at in the video that when p1,p2 were incorrect this caused d1 to be corrected
Notice how we get the same result by noting the incorrect parity bits and summing them, and
then correcting the bit corresponding to the sum.
i.e. 1+2=3 bit 3 is in error, bit 3 corresponds to d1

e.g.
For m=4 we would hav a 2^4-1=15 bit block and 4 parity bits
The parity bits would be bits 1,2,4,8
The data bits would be 3,5,6,7,9,10,11,12,13,14,15
If we found parity bits 1,4 and 8 didn't match we would calculate 1+4+8=13, and correct bit 13
If instead we found that parity bit 4 didn't match we would calculate a sum of 4, and know that the parity bit 4 was incorrect

Whether one uses odd or even parity doesn't matter

Hope this makes sense
• Couldn't the parity bits be mis-punched?
• Yes, the parity bits can be in error as well. An error correcting code should be able to detect which bit is in error, even if it is a parity bit.

The Hamming Code is arranged so that:
If only 1 parity bit doesn't match up, then it is the parity bit that is in error.
If more than 1 parity bit doesn't match up, it is a data bit that is in error (the combination of parity bits tells us which data bit is in error).

Hope this makes sense.
• Is this the same method used to allow QR codes to be damaged, but still readable?
• Yes, in the general sense that QR codes incorporate extra information, so that we can detect errors.

No, in the specific sense, in that QR codes do not use Hamming Codes.
QR uses RS codes for error correction.
Hamming codes are linear error correcting codes which use parity while, as I understand it, RS views data as coefficients of polynomials in a finite field in order to correct errors. (finite fields are a concept from abstract algebra)

Hope this makes sense
• What happens if two data bits get flipped in one set of four data bits?
• That usually only happens in very unstable connections. To be able to overcome this problem, you probably have to include more redundancy (telling more times the same thing).
• Is this the same system that is used in today's data storage devices? I wondered because the redundant bits take up 42% of the sequence and that seems a bit too much.
• Typically, for data storage, CRC (cyclical redundancy codes) is used for error detection, and Reed-Solomon codes are used for error correction.

These codes are used in data storage as they are well suited to handle random noise and burst errors (a contiguous chunk of errors), which tends to reflect the types of errors that occur in practice.
(1 vote)
• First at all, congratulations this an amazing video for learn Hamming Code and I hope to see more of this kind. My question: I'm confused In minute the sequence is P1 D1 P2 D2 P3 D4, later on minute the sequence is P1 P2 D1 P3 D2 D3 D4. But follow the discussion I see that the later is the correct one. Is this an error on the video or am I missing something?.
• I think it is a small error on the video, but in the case shown in the video, P1 D1 P2 D2 P3 D3 D4 are actually the same as P1 P2 D1 P3 D2 D3 D4.
Hope it helps.
• I don't see why we would want error correcting codes in the sense that it almost doubles the size of the message. We've probably come out with Ultra-efficient error correcting codes, but I'd rather just have a smaller error detection code (in larger multi-BYTE sequences, a small hash), so the other side can request a retransmit, making it something that is probable in size, and such.
(1 vote)
• Error detection with re-transmission works fine if the time to retransmit the message is small, errors are infrequent, and we have access to the original message to retransmit it. This is true in many cases, and this scheme has been used in many network protocols, but there are some notable exceptions.

If we were communicating with a rover on mars where messages could take several minutes to reach the rover and interference is routine, an error detection with retransmission scheme wouldn't work. An error correction scheme would just fix the small errors in the messages, but the error detection scheme could result in the same message being re-transmitted over and over again causing hours of delays before it is perfectly transmitted.

CDs and hard drives also use error correcting codes. These are case where if the data gets corrupted we don't have access to the original message. We use the error correcting codes to reconstruct the original message/data. This is why a scratched CD can still be played.

Hope this makes sense
• Can you just amplify the signal so the noise can't corrupt it?
• yes but that only would work in alice and bobs case, the hole punching example wouldn't work that way because you can't punch a hole harder.
(1 vote)
• hello, Is there a study that shows which languages are more efficient.? For example I have seen that some people from india can say a lot with less amount of noises (language) and some how is the same with some Asian languages. I have seen that also in a comparisons between english and spanish.
• First we need to quantify what it means to be "efficient." Here, you mean "to say less, but have the same message come across." In that case, we have to go back a few videos: to the concept of message space.

We appropriately defined a message space to be s^n. In English, s is basically the 26 letters of the alphabet, the space character, and punctuation. n is the number of times you can use any of these, in combination. So to have the same message conveyed, you need s^n to remain constant, but we can raise or lower s as long as we do the opposite to n.

Therefore, in order to convey the same message in the most efficient manner possible, we need a language that minimizes n, and thus has the biggest s. This can be thought of as simply: the language with the most distinct letters is the most efficient -- you will be saying less. Of course, in practice many languages are further limited by their choice of grammatical structure, etc. but at that point this question becomes more of an applied sciences question.