Question 1

In the previous video introducing chi squared, you indicated that the curve for k = 3 corresponds to the distribution of the sum of three squared samples from the standard normal. In this video, you indicate that the curve corresponding to the sum of 6 squared samples is k = 5, because now we must consider "degrees of freedom." If this is the case, then a chi squared test based on two squared differences of samples (perhaps corresponding to customers coming in only on Friday and Saturday) would be based on the k = 1 curve, which seems wrong to me. Can you explain?

Accepted Answer

To calculate the degrees of freedom (df) for a Chi-Squared Test can be done as follows;

*For a two-way table* 
df = (m - 1)(n - 1) // where m = # of columns & n = # of rows

*For a one way table*
df = k - 1 // where k equals the number of groups

So in short, yes; in a one way table that deals with 2 groups will correspond to 1 degree(s) of freedom.

Hope this helps,
- Convenient Colleague

Question 2

I understand that if the chi-square value exceeds the appropriate minimum value in the chi-square distribution table, taken into account the degrees of freedom, you can reject the null hypothesis. (And that the same is true of the reverse, if the chi-square value does not exceed the appropriate minimum value in the chi-square distribution you will accept the null hypothesis). Can some explain to me why this is? I do not understand the theory about the minimum chi-square value to understand *why* we reject a chi-square value that exceeds the value in the distribution table.

Accepted Answer

The question you answer with the test can be rephrased like this: "if the shop owner's theory is right (i.e. what percentage of customers come each day), what is then the probability to see the given observations (30 on monday, 14 on Tuesday, etc) or something more unlikely?" 
This is the question you answer with the test, and you can calculate that probability exactly (or you can use tables).  In this case it is just below 5%.
So, if the hypothesis is right and you make observations for for a weak, then there is almost 5% chance that you see what you see or something even less likely. 
The 5% significance criteria is a subjective choice. Some use 1%, some 5%, some 0.5%. If I generally trusted the shop owner, and new that he had kept track of customers for a long period, and was a clever guy, then I would still believe his hypothesis. I mean, after all, 5% corresponds to about 1/20 - it is not a veeery rare observation. On the other hand, if I knew the shop owner was sloppy with numbers, and had a tendency to lie, etc., then I would be more likely to reject the hypothesis on basis of my observations. However, before I confronted him, I think I would observe another week to get more certain knowledge.

A lot of talking, sorry! My point here: you get the probability from the test. That is your result! What significance level to chose depends on the situation. Sometimes it might be life changing - if it was a test for some disease, I would never be satisfied with a 5% risk. Say, the docter tells me: there is only a 5% chance that you have that life threatening disease, given the test result, so you can go home. Then I would ask for another test! But if I was in the line for a super discount offer on black friday, and a clever person had calculated that there were only 5% chance that I would get the item before it was sold out, then I would step out of the line immediately. It depends on consequences, risk, what I already know and many other things!

Long answer, hope it made sense! :-)

Question 3

Had we counted legs of visitors instead of visitors, and assuming each has two, our chi-squared would be twice bigger for the same effective statistical question. It is thus incorrect to count legs. They indeed do not get odd values. It also seems that values which are not discrete, such as ammoung of food eaten each day, will result differently depending on the physical units of mass. It is thus also incorrect to use continbuous random variables for chi-statistics?

Accepted Answer

This is an intersting question. I don't really know.
But I want to say, that if, say John has two legs, if his two legs has independent random choices( say one leg accatully comes to the resturant at Monday, the other at Friday ), not always making the same decision, then it will make sense to count legs. If not, I don't think you can count legs, Jo.

Question 4

Can someone introduce me a Statistics book that is written in plain English, in a way that a novice like me can understand and apply in real, practical situtation. Also if it can give me some insights and intuative feelings why statistics tests are the ways they are. It's even better. Thanks a lot!

Accepted Answer

I've never read through it myself, but I've heard a lot of people say great things about "The Cartoon Guide to Statistics", by Gonick and Smith.  I don't think it has any actual examples, but it is so strong on the plain English and intuition that you might be able to make better sense of your statistic textbook afterwards.

Question 5

Would I be correct in assuming that there is a software program which is able to do all this math for us? Here we just have a few data points. How could a human mind do all the calculations for data points which number in the 1,000s or even billions?

Accepted Answer

(This is Spencer's mother)  Julius, there are several software programs that can do statistical analysis and MS Excel is probably the one that most students would have access to quickly.  "R" is a free program as is PSPP (a variant of SPSS) and there are a few others (STATA/SAS) but these require writing code.  (SPSS will allow drag and drop) but I think the most efficient way to learn this would be in excel.  You would need to use the 'add-in' for data analysis, but Excel can do lots of statistical analysis (and by the way, provides the critical value at which the null is rejected) in the the descriptive statistics.   You should try it!  There are lots of free data sets available in CSV format that you can download and play with!

Question 6

Are the charts mentioned at the 9:00 marker something that is given or would I have to fill it in myself while solving the problem. If so, how would i go about doing that?

Accepted Answer

You do not memorize the values. If you are doing homework/classwork, the tables are provided at the end of the book. On a test, the table should be attached to the test or the teacher should allow you to look up the critical value for a specified level of significance and degrees of freedom using the text. I hope this helps.

Question 7

Just food for thought.

Using a calculator and using chi2_Cdf[11,44, infinity, 5] (the chance of it being on 11,44 or higher on the function) results in a 4.33% chance. And if you think about it, if 11.07 gives 0.05, a higher number (namely 11.44) results in a lower p-value. 
So at what point should we start approximating? Because in this example that fact results in a different answer.

Accepted Answer

I'm not sure what the problem is, if in fact you're trying to point out a problem. It is natural that a higher chi-square value - in your case, 11.44 - has a lower p-value than , say, 11.07. This is because the p-value is the probability that you calculate a statistic as extreme as the one you did purely by chance instead of sample differences. In other words, you are taking the area to the right of the value, and since the distribution decreases as you move to the right, the area should also decrease as you move the left bound closer to the right.

Course: Statistics and probability > Unit 14

Pearson's chi square test (goodness of fit)

Want to join the conversation?

Video transcript