Main content
Course: AI for education > Unit 1
Lesson 2: Video series: What is AI? | Code.orgAI: Training data and bias
The most important aspect of Machine Learning is what data is used to train it. Find out how training data affects a machine's predictions and why biased data can lead to biased decisions.
Start learning at http://code.org/
Stay in touch with us!
• on Twitter https://twitter.com/codeorg
• on Facebook https://www.facebook.com/Code.org
• on Instagram https://instagram.com/codeorg
• on Tumblr https://blog.code.org
• on LinkedIn https://www.linkedin.com/company/code-org
• on Google+ https://google.com/+codeorg
. Created by Code.org.
Start learning at http://code.org/
Stay in touch with us!
• on Twitter https://twitter.com/codeorg
• on Facebook https://www.facebook.com/Code.org
• on Instagram https://instagram.com/codeorg
• on Tumblr https://blog.code.org
• on LinkedIn https://www.linkedin.com/company/code-org
• on Google+ https://google.com/+codeorg
. Created by Code.org.
Want to join the conversation?
- Who else finds this interesting to learn about?? 😆(3 votes)
- are you telling me i answer a captcha that proves im not a bot just to train a bot to beat the captchas ?(2 votes)
- yes, since the beginning of captchas existence(1 vote)
- i think people is starting to misundertand what really is AI remember is a basically something that can think itself but what people is doing is create a but with millions and millions more of codes to make look like it think itself bit i thik is just programed to have a different answer each time you ask for something, and then they call it AI... anyways i'm not talking about the people of this video but is a lot out there who do this...
do you agree with this?(1 vote)- i forgot... and yeah that is why is imposible for the AI to take control ober something like us, or like people said, can AI domain the world? or destruct us? or something like that?... the answer is NO because what i said in the comment before this!! ^(1 vote)
Video transcript
Machine learning is only as good as the
training data you put into it. So, it's super important to use high quality data, and lots of it. But if data is important, it's worth asking where does training data come from? Often, computers are collecting training data from people like you and me, without any effort on our part. A video streaming service might keep track of what you watch, then it can recognize patterns in that data to recommend what you might want to watch next. Other times, you're directly asked to help, like when a website asks you to spot street signs and photos, You're providing training data to help a
machine learn to see, and maybe even one day drive. Medical researchers can use
medical images as training data to teach computers how to recognize and diagnose diseases. Machine Learning needs hundreds and thousands of images, and training direction from a doctor who knows what to look for, before it can correctly identify disease. Even with thousands of examples, there can be problems with the computer's predictions. If X-ray data is only collected from men, then the computer's predictions may only work for men. It may not recognize diseases when
asked to diagnose the X-ray of a woman. This blind spot in the training data
creates something called bias. Biased data favors some things, and de-prioritizes or excludes others. Depending on how training data is collected, who is doing the collecting, and how the data is fed, there is a chance that
human bias is included in the data. By learning from bias data, the computer may make biased predictions, whether the people training the computer
are aware of it or not. When you are looking at training data, ask yourself two questions: Is this enough data to accurately train a computer? And, does this data represent all possible scenarios and users without bias? This is where you, as the human training, play a crucial role. It's up to you to give your machine unbiased data. That means collecting tons of examples, from lots of sources. Remember, when you pick and choose data for machine learning, you're actually programming the algorithm, using training data instead of code. The data IS the code. The better the data you provide, the better the computer will learn.