Question 1

Under "Why Do SQL Queries Need A Plan? You said that "we only need 23 lookups to find a value in 10 million rows" if we do a binary search on a sorted table. Why is it 23 lookups? I do not see a correlation.

Accepted Answer

log base 2 of 10 million is 23. When using a binary search you want to know, 2 to the power of what will equal the number of data points? 2 to the 23rd power is about 10 million.

Question 2

What SQL Query Plan does Khan Academy (SQLlite) use ?

Accepted Answer

The query plan is different for each SQL query and it is really easy to see any of the query plans used by Khan Academy's SQLite instance.

You can see them by going back to any of the queries we have been working on and simply paste `EXPLAIN QUERY PLAN` on a line that is directly on top of a functioning `SELECT` statement. You should immediately see the query plan for that `SELECT` statement in the results window.

Question 3

If there are SQL profilers to help you identify queries that run longer, are there other similar products that suggest modifications to SQL queries to make them run faster? Are there optimization metric tools that measure the differences between the two models (scanning the whole table vs. creating the index)? - Trying to understand why the manual methods are needed since there are some query planning tools listed, I feel like this could be entirely automated.

Accepted Answer

In my experience as a software analyst for a large software company, sometimes investigatory queries are needed to drill on data. usually one time runs that generate a highly tailored report for a client. Manually tailoring your query is necessary as the query will not be used over and over again in the software but still has a deliverable. Sometimes we are working with a few thousand entries, other times we are north of 50 million. we don't like to leave it to chance as the software has users present 24 hours a day. Taking the system down with a poorly designed query is a big 'no-no', and usually is identified after the incident. The company I worked for had a proprietary query optimizer, but sometimes we needed to break from it's logic to complete a complex report with minimal user impact. Does that help as to why we would manually tailor a query vs letting an automation decide ?

Question 4

Please help me im not clear on how to do JOIN and ON.

Accepted Answer

You use a `JOIN` clause when the data you are trying to select is stored in more than one table.  In your `FROM` clause, you would have code like```_tablename1_ JOIN _tablename2_```
You use the keyword `ON` to indicate the field that relates the two tables, like```_tablename1_
    JOIN _tablename2_
    ON tablename1.id = tablename2.table1_id```

Question 5

I don't really understand the difference between the "full table scan" and Create an "index"; what does a binary search entail?

Accepted Answer

The article describes _Do a "full table scan"_ as "look at every single row in the table, return the matching rows."  Remember that your data was likely entered in a random order, not in order by author or title.  (It may be somewhat chronological if books entered were published after the database was created.)

A full table scan would mean that the system would start with the first record and check the criteria.  If the record matches, it is included in the query result.  If the record does not match, it is not included in the query record.  The system would then check the second record.  If this record matches the criteria, it is included in the query result.  If the record does not match, it is not included in the query record.  The system would then check the third record.  If this record matches the criteria, it is included in the query result.  If the record does not match, it is not included in the query record.  The system would then check the fourth record and so on.

When we _create an "index"_. we make a copy of the original data table.  We can then sort that table sorted by author, by title, or by date (or by any other field we choose).  Then we could do a _binary search_. as Jim E. has described.

If we had sorted the data in our copied table by `author`, a binary search will start with the middle row of the table and check the `author` field.  Because we know that our data is sorted by `author`, if the `author in the middle row is "Tolkien,", we know that "Rowling" comes, alphabetically, before "Tolkien" and the records for "Rowling" must be in the first half of the table.

The system will then check the middle row of the first half of the data.  If the author is "Gappah," we know that, alphabetically, "Rowling" comes after "Gappah" and the "Rowling" records must be in the second half of the first half of the table.  A binary search will continue dividing the data in half and until the records are found or there is no more data to check.

Question 6

Let's say there are five students, each of them took five tests during the entire semester. Now I want to not only see their average scores of the five tests, but also want to label their average scores using letters. So how can I combine the "avg" command with the "case...when..."? I tried many times myself, but it didn't work. Some help? Thanks!

Accepted Answer

Not sure if you still need help, but if you do can you please post what you have got so far so we can provide better help

Question 7

Doesn't a binary search just return a true or false value based on whether or not an element exists within an array and would not return the specific row where the author is 'jk rowling.'

Accepted Answer

that depends on the code! The most basic version of a binary search algorithm will do as you say, and return a True or False. But many search algorithms use the logic of a basic binary search to return the location or value of the element.

Here if we wanted every book where author is 'jk rowling' it would be pretty simple to write a search algorithm that takes in the sorted index and the value "jk rowling", does a binary search to find the location of any row where the author is "jk rowling", and then keeps looking up and down until it finds a row where author is not "jk rowling", and return the range of rows where author is "jk rowling". This is more than a basic binary search, but it uses the ordered table and binary search to find the queried data quickly.

Question 8

CREATE TABLE persons (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    fullname TEXT,
    age INTEGER);
    
INSERT INTO persons (fullname, age) VALUES ("Bobby McBobbyFace", "12");
INSERT INTO persons (fullname, age) VALUES ("Lucy BoBucie", "25");
INSERT INTO persons (fullname, age) VALUES ("Banana FoFanna", "14");
INSERT INTO persons (fullname, age) VALUES ("Shish Kabob", "20");
INSERT INTO persons (fullname, age) VALUES ("Fluffy Sparkles", "8");

CREATE table hobbies (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    person_id INTEGER,
    name TEXT);
    
INSERT INTO hobbies (person_id, name) VALUES (1, "drawing");
INSERT INTO hobbies (person_id, name) VALUES (1, "coding");
INSERT INTO hobbies (person_id, name) VALUES (2, "dancing");
INSERT INTO hobbies (person_id, name) VALUES (2, "coding");
INSERT INTO hobbies (person_id, name) VALUES (3, "skating");
INSERT INTO hobbies (person_id, name) VALUES (3, "rowing");
INSERT INTO hobbies (person_id, name) VALUES (3, "drawing");
INSERT INTO hobbies (person_id, name) VALUES (4, "coding");
INSERT INTO hobbies (person_id, name) VALUES (4, "dilly-dallying");
INSERT INTO hobbies (person_id, name) VALUES (4, "meowing");

CREATE table friends (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    person1_id INTEGER,
    person2_id INTEGER);

INSERT INTO friends (person1_id, person2_id)
    VALUES (1, 4);
INSERT INTO friends (person1_id, person2_id)
    VALUES (2, 3);

SELECT persons.fullname, hobbies.name as hobbies FROM persons
    JOIN hobbies
    ON hobbies.person_id = persons.id; 
 
SELECT a.fullname, b.fullname, friends.person1_id, friends.person2_id FROM persons, friends
    JOIN persons a
    ON friends.person1_id = a.id
    JOIN persons b
    ON friends.person2_id = b.id;
I need some help with fixing this code.

Accepted Answer

In Step 2 of the Friendbook Challenge, we are asked to```    _use another_ SELECT _with a_ JOIN _to show the names of each
    pair of friends, based on the data in the_
    friends _table._```You only need the two `fullname` fields in the `SELECT` clause.  Also, you don't need the `persons` table in the `FROM` clause - only in the `JOIN` clauses.

Question 9

This may be a dumb question. When you create an index, is it permanent? As I understand it, the table is indexed by default using the primary key. So if you are using autoincrement, which I assume is the only way to manage a large tables primary key, what happens if you index that table to another column. The primary key would be out of order, so in the above case, wouldn't an insert result in potentially 10 million more lookups for autoincrement to identify the next integer available? I apologize in advance if this is answered elsewhere.
One further question would be if a row had been removed, is that integer reused as the next available primary key? At my job, our CRM is a database with a large number of tables. Ive always been frustrated that we cannot purge the database of customer accounts that have been inactive. But learning about how the relational information would need to match I can see a missed table in the purge could definitely destabilize the relationships between tables.

Accepted Answer

Hello Steve,

A primary key will always have an index. In fact, it _is_ an index with a unique constraint.
If you add another index, that means your table will have two indices. They don't interfere much with each other. So your lookup by the PK will not suffer.
It will suffer when you have indices on many columns, and you're querying all those columns. Overusing indices can work against you in some cases.

A table may have an auto_increment or not. If it does have auto_increment, you can leave the PK out of your insert statements. It will calculate the next value on its own.
It will memorize a value, add 1, and give that value to the next record.
When you delete a record, it will not decrease that memorized value. It will continue counting as though nothing happened.

It is very common for companies to keep customer data even if they're inactive. There are a number of reasons.
One is billing information and taxes. By law, you have to keep records of customers for a number of years. Or at least their invoices.
Keeping the records also allows you to get statistics.
It is possible to keep a separate archive database, but that opens a whole other can of worms.

Question 10

On this discussion you talked about typing "EXPLAIN QUERY PLAN" to see which kind of plan an engine uses, my question is where do we write this statement?

Accepted Answer

Here is an example using the code at the end of the "Combine Multiple Joins" video:```    EXPLAIN QUERY PLAN SELECT a.title, b.title FROM project_pairs
        JOIN student_projects a
        ON project_pairs.project1_id = a.id
        JOIN student_projects b
        ON project_pairs.project2_id = b.id;```You will have to type it yourself to see the results.

Course: Computer programming - JavaScript and the web > Unit 3

More efficient SQL with query planning and optimization

Why do SQL queries need a plan?

The lifecycle of a SQL query

Where do humans come in?

Want to join the conversation?