AP®︎/College Computer Science Principles
Course: AP®︎/College Computer Science Principles > Unit 4Lesson 4: Parallel and distributed computing
When solving problems, we don't need to limit our solutions to running on a single computer. Instead we can use distributed computing to distribute the problem across multiple networked computing devices.
Distribution of parallel processes
Distributed computing is often used in tandem with parallel computing. Parallel computing on a single computer uses multiple processors to process tasks in parallel, whereas distributed parallel computing uses multiple computing devices to process those tasks.
Consider our example program that detects cats in images. In a distributed computing approach, a managing computer would send the image information to each of the worker computers and each worker would report back their results.
An illustration of distributed parallel computing: a computer with arrows fanning out to four other computers and each arrow is annotated with an image filename.
Evaluating the performance
Distributed computing can improve the performance of many solutions, by taking advantage of hundreds or thousands of computers running in parallel. We can measure the gains by calculating the speedup: the time taken by the sequential solution divided by the time taken by the distributed parallel solution. If a sequential solution takes minutes and a distributed solution takes minutes, the speedup is .
The performance of distributed solutions can also suffer from their distributed nature, however. The computers must communicate over the network, sending messages with input and output values. Every message sent back and forth takes some amount of time, and that time adds to the overall time of the solution. For a distributed computing solution to be worth the trouble, the time saved by distributing the operations must be greater than the time added by the communication overhead.
In the simplest distributed computing architecture, the managing computer needs to communicate with each worker:
Animation of communication in a distributed computing system. The managing computer sends a message to each of four workers, and each of the workers sends a message back.
In more complex architectures, worker nodes must communicate with other worker nodes. This is necessary when using distributed computing to train a deep learning network, for example.
Animation of communication in a distributed computing system. The managing computer sends a message to each of four workers, the workers send messages to each other, and then finally send messages back to the main computer.
One way to reduce the communication time is to use cluster computing: co-located computers on a local network that all work on similar tasks. In a computer cluster, a message does not have to travel very far and more importantly, does not have to travel over the public Internet.
Photo of a computer cluster, four large towers with stacks of processors.
Cluster computing has its own limitations; setting up a cluster requires physical space, hardware operations expertise, and of course, money to buy all the devices and networking infrastructure.
Fortunately, many companies now offer cloud computing services which give programmers everywhere access to managed clusters. The companies manage the hardware operations, provide tools to upload programs, and charge based on usage.
Distribution of functionality
Another form of distributed computing is to use different computing devices to execute different pieces of functionality.
For example, imagine a zoo with an array of security cameras. Each security camera records video footage in a digital format. The cameras send their video data to a computer cluster located in the zoo headquarters, and that cluster runs video analysis algorithms to detect escaped animals. The cluster also sends the video data to a cloud computing server which analyzes terabytes of video data to discover historical trends.
An illustration of a distributed computing network. There are four security cameras on the left with arrows headed to three computers. Those three computers show an output of a image of a fox with a red rectangle around it. More arrows go from the three computers to a cloud of 12 computers on the right. The cloud of computers output a graph.
Each computing device in this distributed network is working on a different piece of the problem, based on their strengths and weaknesses. The security cameras themselves don't have enough processing power to detect escaped animals or enough storage space for the other cameras' footage (which could help an algorithm track movement). The local cluster does have a decent amount of processing power and extra storage, so it can perform the urgent task of escaped animal detection. However, the cluster defers the task which requires the most processing and storage (but isn't as time sensitive) to the cloud computing server.
This form of distributed computing recognizes that the world is filled with a range of computing devices with varying capabilities, and ultimately, some problems are best solved by utilizing a network of those devices.
An illustration of the web as a distributed computing system. A web browser loads khanacademy.org on the left, arrows go from the browser to three computers labeled "khanacademy.org", and arrows go from there to a cloud of 12 computers on the right.
Every application that uses the Internet is an example of distributed computing, but each application makes different decisions about how it distributes the computing. For another example, smart home assistants do a small amount of language processing locally to determine that you've asked them for help but then send your audio to high-powered servers to parse your full question.
The Internet enables distributed computing at a worldwide scale, both to distribute parallel computation and to distribute functionality. Computer scientists, programmers, and entrepreneurs are constantly discovering new ways to use distributed computing to take advantage of such a massive network of computers to solve problems.
🙋🏽🙋🏻♀️🙋🏿♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!
Want to join the conversation?
- What exactly is a cloud computing service, or more generally, "the cloud"?
Furthermore, are public Internet connections more dangerous than private ones (i.e. can people see what you are doing and should you avoid doing things like checking your bank account on public Internet)?
Thirdly, why don't clusters have to run on public Internet connections?
Lastly, if computers are far apart, can't they run on private Internet instead of public Internet, helping to rule out some of the security issues that come with long-distance distributed computing?
- 1) The cloud (to simplify greatly) represents a collection of computing resources accessed over the Internet. Instead of playing video games on a console, imagine users pressing keys, the keys are sent over the Internet to the cloud, the cloud processes them, and sends back the results of the keys by streaming the new images to the TV. This is the new idea behind Google's Stadia and Microsoft xCloud. The "cloud" in this case is hardware (a gaming console) accessed over the Internet.
Similarly, watching movies over Netflix or Hulu is a cloud computing service, in which entertainment is consumed over the Internet instead of buying a movie from the store and playing it on a DVD player.
2) Public internet connections can be more dangerous (see this link for more: https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:online-data-security), but generally using HTTPS over the public internet is just as (if not more) secure as a private Internet connection.
3) Clusters are usually used internally at a company, hence there is little need for them to have a public Internet connection as they only communicate locally. Computer networks are independent of the Internet ( a collection of networks), which is why clusters can run without being tied to the Internet.
4) Absolutely. One way is using a VPN (also discussed in certain places in the above link). However, through the use of encryption and authentication, using the public internet has become much safer than people might think.
Hope this helps!(8 votes)
- how will this impact the economy?(3 votes)
- I have a question: can distributed computing be run on programs that don’t support parallel programming? Or can it only be used when certain steps of an algorithm must be performed simultaneously?(2 votes)
- Nope! Distributed computing involves multiple computers. It's fine if they only have one thread available, they can still process data and communicate results just fine! However, distributed computing with computers that have multiple threads and/or hyper-threading is much more efficient.(2 votes)
- Why do people call it "the cloud"?(2 votes)
- It was from early tech slang, calling the internet "the cloud"(1 vote)
- how does it work(0 votes)
- ? Read the article. I don't really know what you mean. If you have a question about the article, could you please be more specific about what you don't understand?(4 votes)
- How does multi-core processor technology affect distributed computing?(1 vote)
- It makes it more efficient :) Adding another core to a computer is (usually) cheaper than buying an entirely new computer. Using a mix of distributed computing and parallel threads allows for much faster computing.(2 votes)