Main content
Computer programming - JavaScript and the web
Course: Computer programming - JavaScript and the web > Unit 3
Lesson 3: Relational queries in SQL- Splitting data into related tables
- JOINing related tables
- Challenge: Bobby's Hobbies
- Joining related tables with left outer joins
- Challenge: Customer's orders
- Joining tables to themselves with self-joins
- Challenge: Sequels in SQL
- Combining multiple joins
- Challenge: FriendBook
- Project: Famous people
- More efficient SQL with query planning and optimization
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Splitting data into related tables
So far, we've just been working with one table at a time, and seeing what interesting data we can select out of that table. But actually, most of the time, we have our data distributed across multiple tables, and those tables are "related" to each other in some way.
For example, let's say we have a table for logging how well students do on their tests, and we include emails in case we need to email parents about slipping grades:
student_name | student_email | test | grade |
---|---|---|---|
Peter Rabbit | peter@rabbit.com | Nutrition | 95 |
Alice Wonderland | alice@wonderland.com | Nutrition | 92 |
Peter Rabbit | peter@rabbit.com | Chemistry | 85 |
Alice Wonderland | alice@wonderland.com | Chemistry | 95 |
We might also have a table for logging what books each student reads:
student_name | book_title | book_author |
---|---|---|
Peter Rabbit | The Tale of Mrs. Tiggy-Winkle | Beatrix Potter |
Peter Rabbit | Jabberwocky | Lewis Carroll |
Alice Wonderland | The Hunting of the Snark | Lewis Carroll |
Alice Wonderland | Jabberwocky | Lewis Carroll |
We might also have a table just for detailed student information:
id | student_first | student_last | student_email | phone | birthday |
---|---|---|---|---|---|
1 | Peter | Rabbit | peter@rabbit.com | 555-6666 | 2001-05-10 |
2 | Alice | Wonderland | alice@wonderland.com | 555-4444 | 2001-04-02 |
What do you think of these tables? Would you change them in any way?
There is one big thing to realize about these tables: they are describing relational data - as in, they are describing data that relates to each other. Each of these tables describe data related to a particular student, and many of the tables replicate the same data. When the same data is replicated across multiple tables, there can be interesting consequences.
For example, what if a student's email changed? Which tables would we need to change?
We'd need to change the student information table, but since we also included that data in the grades table, we'd also have to find every row about that student, and change the email there too.
It's often preferable to make sure that a particular column of data is only stored in a single location, so there are fewer places to update and less risk of having different data in different places. If we do that, we need to make sure we have a way to relate the data across the tables, which we'll get to later.
Let's say we decide to remove email from the grades table, because we realize it's redundant with the email in the student details table. This is what we'd have:
student_name | test | grade |
---|---|---|
Peter Rabbit | Nutrition | 95 |
Alice Wonderland | Nutrition | 92 |
Peter Rabbit | Chemistry | 85 |
Alice Wonderland | Chemistry | 95 |
How could we figure out the email for each student? We could find the row in the student info table, matching by name. What if 2 students had the same name? (Did you know that in Bali, every person has only 1 of 4 possible first names?) We can't rely on name to look up a student, and really, we should never rely on something like name to identify anything uniquely in a table.
So the best thing to do is to remove the
student_name
and replace that with student_id
, since that is a guaranteed unique identifier:student_id | test | grade |
---|---|---|
1 | Nutrition | 95 |
2 | Nutrition | 92 |
1 | Chemistry | 85 |
2 | Chemistry | 95 |
We would make the same change to our books table, using
student_id
instead of student_name
:student_id | book_title | book_author |
---|---|---|
1 | The Tale of Mrs. Tiggy-Winkle | Beatrix Potter |
1 | Jabberwocky | Lewis Carroll |
2 | The Hunting of the Snark | Lewis Carroll |
2 | Jabberwocky | Lewis Carroll |
Notice how we have the title and author repeated twice for Jabberwocky? That's another warning sign that we could break our table up into multiple related tables, so that we don't have to update multiple places if something changes about a book.
We could have a table just about books:
id | book_title | book_author |
---|---|---|
1 | The Tale of Mrs. Tiggy-Winkle | Beatrix Potter |
2 | Jabberwocky | Lewis Carroll |
3 | The Hunting of the Snark | Lewis Carroll |
And then our
student_books
table becomes:student_id | book_id |
---|---|
1 | 1 |
1 | 2 |
2 | 3 |
2 | 2 |
I know, this table doesn't look nearly as readable as the old table that had all of information stuffed into every row. But, tables are often not designed to be readable to humans-- they're designed to be the easiest to maintain and least prone to bugs. In many cases, it may be best to split information into multiple related tables, so that there is less redundant data and fewer places to update.
It's important to understand how to use SQL to deal with data that has been split up into multiple related tables, and bring the data back together across the tables when you need it. We do that using a concept called "join"s and that's what I'll show you next.
Want to join the conversation?
- what is the purpose of having so many different tables, when we can have a single table with all information.?(143 votes)
- Here's a good discussion:
http://stackoverflow.com/questions/9774715/mysql-multiple-tables-or-one-table-with-many-columns
Basically a single table is good when data is one-to-one. When you have thousands of rows and columns of data, where the data is one-to-many, multiple tables are better to reduce duplicate data.(331 votes)
- I have many questions about this topic.
1. Why can't you make something like a variable in SQL? I mean, you can do it in JavaScript, but why not SQL?
2. Why do you have to go row by row changing each thing if you can just change a variable name?
3. And why not put all the information from all the tables into 1 table?
4. What do you need an id for?
5. What's the importance of SQL tables?
Please answer one of my questions.(37 votes)- These are all good questions. It shows you're really thinking about this. I like that.
1. SQL itself is not a programming language. It's a Query Language. You would use another programming language to write a program that sends queries to your database.
Here I use JavaScript with a hypotheticalperformSqlQuery
function.
This leads straight to...var query = "SELECT * FROM students WHERE id = " + studentId;
var result = performSqlQuery(query);
2. You can use variables of the programming language where you create your SQL queries.
3. We want to avoid putting duplicate data into the database.
You could put things into one table like this:
(You may have to copy this somewhere to view correctly.)┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ INVOICE │
├───────────┬────────────┬──────────────┬────────────────┬─────────────────┬──────────┬────────────┬───────────────┬──────────────────────────┬────────────────────┬───────┬───────┬─────┤
│ invoicenr │ customernr │ date │ name │ address │ zipcode │ city │ country │ iban │ product │ count | price │ vat │
├───────────┼────────────┼──────────────┼────────────────┼─────────────────┼──────────┼────────────┼───────────────┼──────────────────────────┼────────────────────┼───────┼───────┼─────┤
│ "0003417" │ "93276" │ "2017-09-01" │ "Joseph Smith" │ "120 Penstreet" │ "3829AD" │ "Somecity" │ "Somecountry" │ "GB29NWBK60161331926819" │ "Brause 361 nib" │ 1 │ 2.99 │ 19│
│ "0003417" │ "93276" │ "2017-09-01" │ "Joseph Smith" │ "120 Penstreet" │ "3029AD" │ "Somecity" │ "Somecountry" │ "GB29NWBK60161331926819" │ "Art paper A4" │ 1 │ 13.29 │ 19│
│ "0003417" │ "93276" │ "2017-09-01" │ "Joseph Smith" │ "120 Penstreet" │ "3829AD" │ "Somecity" │ "Somecountry" │ "GB29NWBK60161331926819" │ "Pen holder, blue" │ 1 │ 5.19 │ 19│
└───────────┴────────────┴──────────────┴────────────────┴─────────────────┴──────────┴────────────┴───────────────┴──────────────────────────┴────────────────────┴───────┴───────┴─────┘
The problem here is that a lot of data is the same. Only the product and the price are different.
Another problem is keeping the data consistent. In this table someone made a slight mistake. Can you spot it? This invoice may never arrive at the correct address.
It's better to split the data. This way you have less data to store, but more importantly, the data is only in one place. If you want to change someone's ZIP code for example, you don't have to go looking for it in many different places.
(You may have to copy this somewhere to view correctly.)┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ CUSTOMER │
├────┬────────────┬───────────┬────────┬──────────┬────────────┬──────────────────────────┤
│ id │ customernr │ firstname │ prefix │ lastname │ address_id │ iban │
├────┼────────────┼───────────┼────────┼──────────┼────────────┼──────────────────────────┤
│ 1 │ "93276" │ "Joseph" │ "" │ "Smith" │ 1 │ "GB29NWBK60161331926819" │
└────┴────────────┴───────────┴────────┴──────────┴────────────┴──────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ADDRESS │
├────┬─────────────┬──────────┬──────────┬────────────┬───────────────┤
│ id │ streetname │ housenr │ zipcode │ city │ country │
├────┼─────────────┼──────────┼──────────┼────────────┼───────────────┤
│ 1 │ "Penstreet" │ "120" │ "3829AD" │ "Somecity" │ "Somecountry" │
└────┴─────────────┴──────────┴──────────┴────────────┴───────────────┘
┌───────────────────────────────────────┐
│ PRODUCT │
├────┬────────────────────┬───────┬─────┤
│ id │ name │ price │ vat │
├────┼────────────────────┼───────┼─────┤
│ 1 │ "Brause 361 nib" │ 2.99 │ 19 │
│ 2 │ "Art paper A4" │ 13.29 │ 19 │
│ 3 │ "Pen holder, blue" │ 5.19 │ 19 │
└────┴────────────────────┴───────┴─────┘
┌─────────────────────────────────────────────┐
│ INVOICE │
├────┬───────────┬─────────────┬──────────────┤
│ id │ invoicenr │ customer_id │ date │
├────┼───────────┼─────────────┼──────────────┤
│ 1 │ "0003417" │ 1 │ "2017-09-01" │
└────┴───────────┴─────────────┴──────────────┘
┌──────────────────────────────────────┐
│ INVOICELINE │
├────┬────────────┬────────────┬───────┤
│ id │ invoice_id │ product_id │ count │
├────┼────────────┼────────────┼───────┤
│ 1 │ 1 │ 1 │ 1 │
│ 2 │ 1 │ 2 │ 1 │
│ 3 │ 1 │ 3 │ 1 │
└────┴────────────┴────────────┴───────┘
4. In the INVOICE table you can see acustomer_id
of 1. That means that that particular invoice belongs to the customer with id 1. In this case that's Joseph Smith.
Can you tell how the INVOICELINE table works?
5. Perhaps, if you understand the previous answers, you have enough information to answer this yourself. :)(170 votes)
- What's the different between sub query and "join"? When do we need to use join or sub query? Which is better?(27 votes)
- Lots of good answers here:
http://stackoverflow.com/questions/2577174/join-vs-sub-query
This seems to be a hot issue around the SQL campfire.(39 votes)
- What's a scenario where a book detail would change? Seems static to me.(1 vote)
- It's possible for books to have the same title and same author. Or you may need to track revisions of a book to identify specific editions or versions.(13 votes)
- What are some resources on how good relational databases are structured? I will only be querying databases and not building them. It is good to know some of the basics when it comes to structure, I imagine.(9 votes)
- How do the different tables help to organize?(3 votes)
- Typically, in a database system, each table will represent a different entity - students, classrooms, courses - or it will be a a bridge table (which is used to linked tables with many-to-many relationships).
Keeping the data related to an entity in a single table logically organized and it can reduce the number of rows to a minimum. It also keeps data easier to maintain.
Imagine you had a table that listed a row with a teacher's name, the course the teacher is teaching, and the classroom number. Now, imagine that you have one teacher who teaches eight different courses. This means you have rows with the teacher's name repeated eight times.
A few days into the term, you discover that the teacher's name has been misspelled. If you had a single table with teacher names, you could just update the name in one row in theteachers
table. However, in this case, you need to change the name eight different times.
That may not sound so bad. Now, imagine that you are working for a large company that does business with other large companies. One of the companies that supplies you with hundreds of parts decides to rebrand and change its name. Now, imagine having to change the name across hundreds of parts and thousands of documents. Imagine trying to remember every place that the name is used. Changing the name in one table is much faster.(15 votes)
- When I have multiple Tables can I maintain a directory of sorts to help me relate which table stands for what?
What happens in an Organization where there are hundreds of rows and columns and so many tables all relating to one another. And the person who created that database is no longer working there. How will I know what table is for what? And how does one maintain a directory, is there such a thing for SQL called "directory"?(5 votes)- There is a "SYSTEM" database that is maintained by database system itself, that is a collection of tables containing metadata about the "USER" databases. For example you can get a list of tables in your database by entering SELECT TABLE_NAME FROM SYSTEM.TABLES where DATABASE_NAME = "MY_MUSIC"; The actual syntax and SYSTEM tables is dependent on the specific database vendor. The column names, datatypes, primary keys, default values, foreign keys and other constraints plus much more are also in system tables(8 votes)
- By breaking down information into these multiple tables would that slow down the software or application because it is having to sift through so many tables? I guess what I'm asking is if it can be beneficial to require more detailed queries rather than writing in depth tables.(2 votes)
- Searching for it might be a bit slower, but writing the data may be quicker because you can omit duplicate data.
Updating data will be even quicker and easier, because you don't have to search everywhere. This is the main reason to separate data into tables: if you do it right, you only have to enter a piece of data once.(12 votes)
- while it did not specify which SQL in this article that was used. I am learning SQL server and when I copied the code into SQL server, it does not work. Have you considered if the code would work on different platforms?(6 votes)
- KhanAcademy says in the introduction to this unit that they use SQLite, which is one particular version. The similarities between versions are greater than the differences, so the concepts carry over, but for your particular use you might have slightly different syntax so you should carefully read the documentation for the specific system you're trying to copy into.(33 votes)
- when do games come in khan academy?(4 votes)
- Camacho,
There is a course dedicated to games: https://www.khanacademy.org/computing/computer-programming/programming-games-visualizations(5 votes)