Computers and the Internet
User Datagram Protocol (UDP)
The User Datagram Protocol (UDP) is a lightweight data transport protocol that works on top of IP.
UDP provides a mechanism to detect corrupt data in packets, but it does not attempt to solve other problems that arise with packets, such as lost or out of order packets. That's why UDP is sometimes known as the Unreliable Data Protocol.
UDP is simple but fast, at least in comparison to other protocols that work over IP. It's often used for time-sensitive applications (such as real-time video streaming) where speed is more important than accuracy.
When sending packets using UDP over IP, the data portion of each IP packet is formatted as a UDP segment.
Diagram of a UDP segment within an IP packet. The IP packet contains header and data sections. The IP data section is the UDP segment, which itself contains header and data sections.
Each UDP segment contains an 8-byte header and variable length data.
The first four bytes of the UDP header store the port numbers for the source and destination.
A networked device can receive messages on different virtual ports, similar to how an ocean harbor can receive boats on different ports. The different ports help distinguish different types of network traffic.
Here's a listing of some ports in use by UDP on my laptop:
A command line terminal with the command "sudo lsof -i -n -P | grep UDP". The command outputs the following table:
Each row starts with the name of the process that's using the port and ends with the protocol and port number.
🔍 What sort of network traffic do those processes handle? If you search the web for the process name plus the port number, you can probably figure it out. You could even try it on the computer you're using now.
The next two bytes of the UDP header store the length (in bytes) of the segment (including the header).
Two bytes is bits, so the length can be as high as this binary number:
In decimal, that's or . Thus, the maximum length of a UDP segment is bytes.
The final two bytes of the UDP header is the checksum, a field that's used by the sender and receiver to check for data corruption.
Before sending off the segment, the sender:
- Computes the checksum based on the data in the segment.
- Stores the computed checksum in the field.
Upon receiving the segment, the recipient:
- Computes the checksum based on the received segment.
- Compares the checksums to each other. If the checksums aren't equal, it knows the data was corrupted.
To understand how a checksum can detect corrupted data, let's follow the process to compute a checksum for a very short string of data: "Hola".
First, the sender would encode "Hola" into binary somehow. The following encoding uses the the ASCII/UTF-8 encoding:
That encoding gives these bytes:
Next, the sender segments the bytes into -byte (-bit) binary numbers:
To compute the checksum, the sender adds up the -bit binary numbers:
The computer can now send a UDP segment with the encoded "Hola" as the data and as the checksum.
The entire UDP segment could look like this:
|Source port number|
|Destination port number|
What if the data got corrupted from "Hola" to "Mola" on the way?
First let's see what the corrupted data would look like in binary.
"Mola" encoded into binary...
...and then segmented into 16-bit numbers:
Now let's see what checksum the recipient would compute:
The recipient can now programmatically compare the checksum they received in the UDP segment with the checksum they just computed:
Do you see the difference?
When the recipient discovers that the two checksums are different, it knows that the data was corrupted somehow along the way. Unfortunately, the recipient can not use the computed checksum to reconstruct the original data, so it will likely just discard the packet entirely.
The actual UDP checksum computation process includes a few more steps than shown here, but this is the general process of how we can use checksums to detect corrupted data.
🙋🏽🙋🏻♀️🙋🏿♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!
Want to join the conversation?
- UDP doesn't do anything about packets arriving out of order, right? Is that why sometimes, in live streaming, the audio and video are not synchronized? Or why the live streams sometimes lag?(35 votes)
- Yes, UDP does without handshakes. That means the information received is somewhat unreliable when it comes to ordering, duplicates and packets arriving at all.
And you correctly identified the problems that arise with that :)(33 votes)
- What might cause data to become corrupted? Also, when you get the notification that a file is corrupted, does it have the same meaning? The data has gotten messed up somehow?(8 votes)
- It might be helpful to consider what "data" is. In this context, we are talking about data that is transmitted over a network. If you are using a personal computer, then the process of transmitting data over a network involves transitioning data from your HDD or SSD to your computer's RAM via internal buses, and then to the NIC (Network Interface Card) to communicate it across a network. The data (which is represented using binary) must then traverse the connection between your device and the device you are sending data to. This traversal could entail moving the data over a WiFi network, over Ethernet cables, etc.
This is an extensive process, and we certainly take for granted its complexity when we interact with the Internet each and every day. If during this process any bits of the data (again, data is being represented as binary; 1s and 0s) were to flip (go from 0 to 1, or 1 to 0) or were to be lost then the data received by the receiving machine wouldn't be the same as what was originally sent. In this case, the machine may not be able to interpret the data anymore - as it has lost its meaning - and so you end up with data corruption.
To your latter question, yes. When information saved to the machine's non-volatile storage has been saved incorrectly and has thus lost its meaning, then the computer will notify you that the data has been corrupted.(17 votes)
- What exactly happens when videos start to glitch? For example, in Zoom meetings, sometimes people tend to look all "blocky" and the details in their video are not defined. Is that something to do with data corruption?(6 votes)
- Yes, that generally indicates issues with packages, a lot of packets actually (a single dropped packets could be easily dealt with because of error-correcting encodings), generally it will just be a little blurriness or wobbly sound or something similarly barely noticeable.(7 votes)
I'm having trouble with adding binary numbers. I watched a few of Khan Academy's videos on YouTube, but I don't understand the mathematical reasoning behind it. Can someone please explain?
- In the decimal number system, each place has a value in powers of 10. in binary , each place has value in powers of 2.
What I do while adding binary numbers is , I convert them to decimal and then perform operations on them. If the number is too large, use a computer program to do the converting.(0 votes)
- https://cdn.kastatic.org/ka-perseus-images/9d185d3d44c7ef1e2cd61655e47befb4d383e907.svg in this image what does the 4 byte at the top indicates(2 votes)
- Good question! The 4 bytes is the width of the header. Together, the source port number and destination port number in the first row take up 4 bytes. Since they're shown equal sized, each of them take up 2 bytes (16 bits). Similarly the segment length and checksum together take up 4 bytes, and each take up 2 bytes.(7 votes)
- Is it possible for the data in the checksum (the two bytes) to be corrupt?(2 votes)
- Yes, any part of the packet could end up being corrupted during transmission.(4 votes)
- Are the terms package, packet and segment referring to the same thing? if not, could you please define each one? thank you(1 vote)
- "Package" is an informal term that people seem to use in place of "packet".
Segments, packets, and frames are created at different layers in the OSI model and each adds its own header to the data with more information. Segments are created at the transport layer and include port numbers. Packets are created at the network layer from segments and have IP addresses. Frames are created at the data link layer from segments and have MAC addresses.
(I know you didn't ask about frames, but for the sake of completeness I felt it should be added.)(6 votes)
- I'm sure this has come up before, but doesn't it seem possible to reconstruct a packet that is deemed corrupt by reverse-resolving for the missing or corrupted segment of the packet?(0 votes)
- It is possible! The relevant term is error correction. There's generally a tradeoff between size of message (increase the space resource) and "reverse-resolving" (increase the time resource). So instead of reverse-resolving, it's better to resend or vice-versa. TCP usually does error detection (not correction), but computer memory often uses ECC (error-correcting codes) for this exact purpose.
" he d g barks" could be error corrected to "the dog barks". Our brains perform this process frequently.
Hope this helps!(7 votes)
- If the message can be corrupted can't the checksum number also be? Then the message would be discarded even though the message might be completely fine.(3 votes)
- How long does it take for data to transport? I know that this will obviously depend on distance, but is there a general time frame? It's kind of hard to believe that we can watch live streams in which the video and audio are only a second or two behind the actual happenings of the game, etc.(2 votes)
- As incredible as it sounds it's milliseconds. I don't know if you heard about ping (playing online computer games without worrying about ping used to be almost unimaginable, but things might have changed with faster internet), but playing a computer game with a ping of 300 (0.3 seconds for communication between server and computer) is almost impossible when playing something that requires you to respond quickly to your environment.
You can also open a terminal on your computer (on windows right click on start and click on windows powershell, on Linux press CTRL+ALT+T) and type something like
ping google.com //or any other location really, but don't overdo that
to see how long it takes packets to travel to a target location.(2 votes)