What exactly is a cloud computing service, or more generally, "the cloud"? Furthermore, are public Internet connections more dangerous than private ones (i.e. can people see what you are doing and should you avoid doing things like checking your bank account on public Internet)? Thirdly, why don't clusters have to run on public Internet connections? Lastly, if computers are far apart, can't they run on private Internet instead of public Internet, helping to rule out some of the security issues that come with long-distance distributed computing? Thanks!

1) The cloud (to simplify greatly) represents a collection of computing resources accessed over the Internet. Instead of playing video games on a console, imagine users pressing keys, the keys are sent over the Internet to the cloud, the cloud processes them, and sends back the results of the keys by streaming the new images to the TV. This is the new idea behind Google's Stadia and Microsoft xCloud. The "cloud" in this case is hardware (a gaming console) accessed over the Internet. Similarly, watching movies over Netflix or Hulu is a cloud computing service, in which entertainment is consumed over the Internet instead of buying a movie from the store and playing it on a DVD player. 2) Public internet connections can be more dangerous (see this link for more: https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:online-data-security), but generally using HTTPS over the public internet is just as (if not more) secure as a private Internet connection. 3) Clusters are usually used internally at a company, hence there is little need for them to have a public Internet connection as they only communicate locally. Computer networks are independent of the Internet ( a collection of networks), which is why clusters can run without being tied to the Internet. 4) Absolutely. One way is using a VPN (also discussed in certain places in the above link). However, through the use of encryption and authentication, using the public internet has become much safer than people might think. Hope this helps!

I have a question: can distributed computing be run on programs that don’t support parallel programming? Or can it only be used when certain steps of an algorithm must be performed simultaneously?

Nope! Distributed computing involves multiple *computers*. It's fine if they only have one thread available, they can still process data and communicate results just fine! However, distributed computing with computers that have multiple threads and/or hyper-threading is much more efficient.

How does multi-core processor technology affect distributed computing?

It makes it more efficient :) Adding another core to a computer is (usually) cheaper than buying an entirely new computer. Using a mix of distributed computing and parallel threads allows for much faster computing.

A parallel computing can only run on a computer with multiple processors, but after using distributed computing, can we distribute tasks to computers with only one processor?

Yes, you can. There is no requirement that you must use parallel computing alongside distributed computing. If I have a collection of computers with only a single processor, I can distribute a workload to them. However, none of these computers will be able to perform parallel computing, and therefore won't be able to run more than one task at a time.

Main content

Course: AP®︎/College Computer Science Principles > Unit 4

Lesson 4: Parallel and distributed computing

Distributed computing

Google Classroom

When solving problems, we don't need to limit our solutions to running on a single computer. Instead we can use distributed computing to distribute the problem across multiple networked computing devices.

Distribution of parallel processes

Distributed computing is often used in tandem with parallel computing. Parallel computing on a single computer uses multiple processors to process tasks in parallel, whereas distributed parallel computing uses multiple computing devices to process those tasks.

Consider our example program that detects cats in images. In a distributed computing approach, a managing computer would send the image information to each of the worker computers and each worker would report back their results.

Evaluating the performance

Distributed computing can improve the performance of many solutions, by taking advantage of hundreds or thousands of computers running in parallel. We can measure the gains by calculating the speedup: the time taken by the sequential solution divided by the time taken by the distributed parallel solution. If a sequential solution takes

60

minutes and a distributed solution takes

6

minutes, the speedup is

10

The performance of distributed solutions can also suffer from their distributed nature, however. The computers must communicate over the network, sending messages with input and output values. Every message sent back and forth takes some amount of time, and that time adds to the overall time of the solution. For a distributed computing solution to be worth the trouble, the time saved by distributing the operations must be greater than the time added by the communication overhead.

In the simplest distributed computing architecture, the managing computer needs to communicate with each worker:

In more complex architectures, worker nodes must communicate with other worker nodes. This is necessary when using distributed computing to train a deep learning network, for example.

^{1}

One way to reduce the communication time is to use cluster computing: co-located computers on a local network that all work on similar tasks. In a computer cluster, a message does not have to travel very far and more importantly, does not have to travel over the public Internet.

Cluster computing has its own limitations; setting up a cluster requires physical space, hardware operations expertise, and of course, money to buy all the devices and networking infrastructure.

Fortunately, many companies now offer cloud computing services which give programmers everywhere access to managed clusters. The companies manage the hardware operations, provide tools to upload programs, and charge based on usage.

Distribution of functionality

Another form of distributed computing is to use different computing devices to execute different pieces of functionality.

For example, imagine a zoo with an array of security cameras. Each security camera records video footage in a digital format. The cameras send their video data to a computer cluster located in the zoo headquarters, and that cluster runs video analysis algorithms to detect escaped animals. The cluster also sends the video data to a cloud computing server which analyzes terabytes of video data to discover historical trends.

Each computing device in this distributed network is working on a different piece of the problem, based on their strengths and weaknesses. The security cameras themselves don't have enough processing power to detect escaped animals or enough storage space for the other cameras' footage (which could help an algorithm track movement). The local cluster does have a decent amount of processing power and extra storage, so it can perform the urgent task of escaped animal detection. However, the cluster defers the task which requires the most processing and storage (but isn't as time sensitive) to the cloud computing server.

This form of distributed computing recognizes that the world is filled with a range of computing devices with varying capabilities, and ultimately, some problems are best solved by utilizing a network of those devices.

In fact, you're currently participating in a giant example of distributed computing: the web. Your computer is doing a lot of processing to read this website: sending HTTP requests to get the website data, interpreting the JavaScript that the website loads, and constantly updating the screen as you scroll the page. But our servers are also doing a lot of work while responding to your HTTP requests, plus we send data out to high-powered analytics servers for further processing.

Every application that uses the Internet is an example of distributed computing, but each application makes different decisions about how it distributes the computing. For another example, smart home assistants do a small amount of language processing locally to determine that you've asked them for help but then send your audio to high-powered servers to parse your full question.

The Internet enables distributed computing at a worldwide scale, both to distribute parallel computation and to distribute functionality. Computer scientists, programmers, and entrepreneurs are constantly discovering new ways to use distributed computing to take advantage of such a massive network of computers to solve problems.

🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

Want to join the conversation?

Sort by:

layaz7717
Posted 4 years ago. Direct link to layaz7717's post “What exactly is a cloud c...”
What exactly is a cloud computing service, or more generally, "the cloud"?

Furthermore, are public Internet connections more dangerous than private ones (i.e. can people see what you are doing and should you avoid doing things like checking your bank account on public Internet)?

Thirdly, why don't clusters have to run on public Internet connections?

Lastly, if computers are far apart, can't they run on private Internet instead of public Internet, helping to rule out some of the security issues that come with long-distance distributed computing?

Thanks!
Button navigates to signup pageButton navigates to signup page
(6 votes)
Answer
- Abhishek Shah
  Posted 4 years ago. Direct link to Abhishek Shah's post “1) The cloud (to simplify...”
  1) The cloud (to simplify greatly) represents a collection of computing resources accessed over the Internet. Instead of playing video games on a console, imagine users pressing keys, the keys are sent over the Internet to the cloud, the cloud processes them, and sends back the results of the keys by streaming the new images to the TV. This is the new idea behind Google's Stadia and Microsoft xCloud. The "cloud" in this case is hardware (a gaming console) accessed over the Internet.
  Similarly, watching movies over Netflix or Hulu is a cloud computing service, in which entertainment is consumed over the Internet instead of buying a movie from the store and playing it on a DVD player.
  
  2) Public internet connections can be more dangerous (see this link for more: https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:online-data-security), but generally using HTTPS over the public internet is just as (if not more) secure as a private Internet connection.
  
  3) Clusters are usually used internally at a company, hence there is little need for them to have a public Internet connection as they only communicate locally. Computer networks are independent of the Internet ( a collection of networks), which is why clusters can run without being tied to the Internet.
  
  4) Absolutely. One way is using a VPN (also discussed in certain places in the above link). However, through the use of encryption and authentication, using the public internet has become much safer than people might think.
  
  Hope this helps!
  Button navigates to signup page
  (15 votes)
thehappykiwi8
Posted 3 years ago. Direct link to thehappykiwi8's post “I have a question: can di...”
I have a question: can distributed computing be run on programs that don’t support parallel programming? Or can it only be used when certain steps of an algorithm must be performed simultaneously?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Jme
  Posted a year ago. Direct link to Jme's post “Nope! Distributed computi...”
  Nope! Distributed computing involves multiple computers. It's fine if they only have one thread available, they can still process data and communicate results just fine! However, distributed computing with computers that have multiple threads and/or hyper-threading is much more efficient.
  Button navigates to signup page
  (5 votes)
Iggy Belton
Posted 2 years ago. Direct link to Iggy Belton's post “Why do people call it "th...”
Why do people call it "the cloud"?
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
- Jme
  Posted a year ago. Direct link to Jme's post “It was from early tech sl...”
  It was from early tech slang, calling the internet "the cloud"
  Button navigates to signup page
  (1 vote)
31595
Posted a year ago. Direct link to 31595's post “how will this impact the ...”
how will this impact the economy?
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
javeed.gbn
Posted 2 years ago. Direct link to javeed.gbn's post “How does multi-core proce...”
How does multi-core processor technology affect distributed computing?
Button navigates to signup pageComment on javeed.gbn's post “How does multi-core proce...”
(2 votes)
Answer
- Jme
  Posted a year ago. Direct link to Jme's post “It makes it more efficien...”
  It makes it more efficient :) Adding another core to a computer is (usually) cheaper than buying an entirely new computer. Using a mix of distributed computing and parallel threads allows for much faster computing.
  Button navigates to signup page
  (4 votes)
Ella Wei
Posted 3 months ago. Direct link to Ella Wei's post “A parallel computing can ...”
A parallel computing can only run on a computer with multiple processors, but after using distributed computing, can we distribute tasks to computers with only one processor?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
- ldahodwala
  Posted 3 months ago. Direct link to ldahodwala's post “Yes, you can. There is no...”
  Yes, you can. There is no requirement that you must use parallel computing alongside distributed computing. If I have a collection of computers with only a single processor, I can distribute a workload to them. However, none of these computers will be able to perform parallel computing, and therefore won't be able to run more than one task at a time.
  Button navigates to signup page
  (2 votes)