Inference for Distributions of Categorical Data
The Practice of Statistics for AP · 110 exercises
1.1
Mars, Inc., reports that their M&M’S Peanut Chocolate Candies are produced according to the following color distribution: each of blue and orange, each of green and yellow, and each of red and brown. Joey bought a bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: blue, orange, green, yellow, red, and brown.
State appropriate hypotheses for testing the company’s claim about the color distribution of peanut .
2 step solution
Q.1.1
Mars, Inc., reports that their M&M’S Peanut Chocolate Candies are produced according to the following color distribution: 23% each of blue and orange, 15% each of green and yellow, and 12% each of red and brown. Joey bought a bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2 brown
State appropriate hypotheses for testing the company’s claim about the color distribution of peanut M&M’S
2 step solution
Q.1.2
Mars, Inc., reports that their M&M’S Peanut Chocolate Candies are produced according to the following color distribution: 23% each of blue and orange, 15% each of green and yellow, and 12% each of red and brown. Joey bought a bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2 brown.
Calculate the expected count for each color, assuming that the company’s claim is true. Show your work.
2 step solution
Q.1.3
Mars, Inc., reports that their M&M’S Peanut Chocolate Candies are produced according to the following color distribution: 23% each of blue and orange, 15% each of green and yellow, and 12% each of red and brown. Joey bought a bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2 brown.
Calculate the chi-square statistic for Joey’s sample. Show your work.
2 step solution
Q.1.2
Calculate the expected count for each color, assuming that the company’s claim is true. Show your work.
2 step solution
Q.1.3
Calculate the chi-square statistic for Joey’s sample. Show your work
2 step solution
Q. 2.1
Let’s continue our analysis of Joey’s sample of M&M’S Peanut Chocolate Candies from the previous Check Your Understanding (page 681). Here is the brief intro of the question:
Candies are produced according to the following color distribution: 23% each of blue and orange, 15% each of green and yellow, and 12% each of red and brown. Joey bought a bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2 brown.
1. Confirm that the expected counts are large enough to use a chi-square distribution. Which distribution (specify the degrees of freedom) should we use?
2 step solution
Q. 2.2
Let’s continue our analysis of Joey’s sample of M&M’S Peanut Chocolate Candies from the previous Check Your Understanding (page 681).
23% each of blue and orange, 15% each of green and yellow, and 12% each of red and brown. Joey bought a bag of Peanut Chocolate Candies and counted the colors of the candies in his sample: 12 blue, 7 orange, 13 green, 4 yellow, 8 red, and 2 brown.
2. Sketch a graph like Figure 11.4 that shows the P-value.
3 step solution
Q. 2.3
Let’s continue our analysis of Joey’s sample of M&M’S Peanut Chocolate Candies from the previous Check Your Understanding (page 681).
3. Use Table C to find the P-value. Then use your calculator’s cdf command.
2 step solution
Q. 2.4
Let’s continue our analysis of Joey’s sample of M&M’S Peanut Chocolate Candies from the previous Check Your Understanding (page 681).
4. What conclusion would you draw about the company’s claimed color distribution for M&M’S Peanut Chocolate Candies? Justify your answer.
2 step solution
Q.3.1
Biologists wish to mate pairs of fruit flies having genetic makeup RrCc, indicating that each has one dominant gene (R) and one recessive gene (r) for eye color, along with one dominant (C) and one recessive (c) gene for wing type. Each offspring will receive one gene for each of the two traits from each parent. The following Punnett square shows the possible combinations of genes received by the offspring:
Any offspring receiving an R gene will have red eyes, and any offspring receiving a C gene will have straight wings. So based on this Punnett square, the biologists predict a ratio of 9 red-eyed, straight-winged (x):3 red-eyed, curly-winged (y):3 white-eyed, straight-winged (z):1 white-eyed, curly-winged (w) offspring. To test their hypothesis about the distribution of offspring, the biologists mate a random sample of pairs of fruit flies. Of 200 offspring, 99 had red eyes and straight wings, 42 had red eyes and curly wings, 49 had white eyes and straight wings, and 10 had white eyes and curly wings. Do these data differ significantly from what the biologists have predicted? Carry out a test at the A 0.01 significance level
3 step solution
Q. 1
A company claims that each batch of its deluxe mixed nuts contains cashews, almonds, macadamia nuts, and brazil nuts. To test this claim, a quality control inspector takes a random sample of nuts from the latest batch. The one-way table below displays the sample data.
(a) State appropriate hypotheses for performing a test of the company’s claim.
(b) Calculate the expected counts for each type of nut. Show your work
4 step solution
1
Aw, nuts! A company claims that each batch of its deluxe mixed nuts contains cashews, almonds, macadamia nuts, and 8% brazil nuts. To test this claim, a quality control inspector takes a random sample of nuts from the latest batch. The one-way table below displays the sample data.
(a) State appropriate hypotheses for performing a test of the company’s claim.
b) Calculate the expected counts for each type of nut. Show your work.
3 step solution
Q. 2
Roulette Casinos are required to verify that their games operate as advertised. American roulette wheels have slots red, black, and green In one casino, managers record data from a random sample of spins of one of their American roulette wheels. The one-way table below displays the results.
(a) State appropriate hypotheses for testing whether these data give convincing evidence that the distribution of outcomes on this wheel is not what it should be.
(b) Calculate the expected counts for each color. Show your work.
4 step solution
Q.3
Aw, nuts! Calculate the chi-square statistic for the data in Exercise . Show your work.
3 step solution
Q.4
Roulette Calculate the chi-square statistic for the data in Exercise . Show your work.
3 step solution
Q. 5
Refer to Exercises 1 and 3.
(a) Confirm that the expected counts are large enough to use a chi-square distribution. Which distribution (specify the degrees of freedom) should you use?
(b) Sketch a graph like Figure 11.4 (page 683) that shows the P-value.
(c) Use Table C to find the P-value. Then use your calculator’s C2cdf command
(d) What conclusion would you draw about the company’s claimed distribution for its deluxe mixed nuts? Justify your answer.
8 step solution
Q.6
Refer to Exercises 2 and 4.
(a) Confirm that the expected counts are large enough to use a chi-square distribution. Which distribution (specify the degrees of freedom) should you use?
(b) Sketch a graph like Figure 11.4 (page 683) that shows the P-value.
(c) Use Table C to find the P-value. Then use your calculator’s C2cdf command.
(d) What conclusion would you draw about whether the roulette wheel is operating correctly? Justify your answer
7 step solution
Q.7
Birds in the trees Researchers studied the behavior of birds that were searching for seeds and insects in an Oregon forest. In this forest, % of the trees were Douglas firs, % were ponderosa pines, and % were other types of trees. At a randomly selected time during the day, the researchers observed red-breasted nuthatches: were seen in Douglas firs, in ponderosa pines, and in other types of trees. Do these data suggest that nuthatches prefer particular types of trees when they’re searching for seeds and insects? Carry out a chi-square goodness-of-fit test to help answer this question.
4 step solution
Q.8
Seagulls by the seashore Do seagulls show a preference for where they land? To answer this question, biologists conducted a study in an enclosed outdoor space with a piece of shore whose area was made up of % sand, % mud, and % rocks. The biologists chose seagulls at random. Each seagull was released into the outdoor space on its own and observed until it landed somewhere on the piece of shore. In all, seagulls landed on the sand, landed in the mud, and landed on the rocks. Carry out a chi-square goodness-of-fit test. What do you conclude?
2 step solution
Q.9
No chi-square A school’s principal wants to know if students spend about the same amount of time on homework each night of the week. She asks a random sample of students to keep track of their homework time for a week. The following table displays the average amount of time (in minutes) students reported per night:
Explain carefully why it would not be appropriate to perform a chi-square goodness-of-fit test using these data.
2 step solution
Q.10
No chi-square The principal in Exercise also asked the random sample of students to record whether they did all of the homework that was assigned on each of the five school days that week. Here are the data:
Explain carefully why it would not be appropriate to perform a chi-square goodness-of-fit test using these data.
2 step solution
Q.11
Benford’s law Faked numbers in tax returns, invoices, or expense account claims often display patterns that aren’t present in legitimate records. Some patterns are obvious and easily avoided by a clever crook. Others are more subtle. It is a striking fact that the first digits of numbers in legitimate records often follow a model known as Benford’s law. Call the first digit of a randomly chosen record X for short. Benford’s law gives this probability model for X (note that a first digit can’t be 0):
A forensic accountant who is familiar with Benford’s law inspects a random sample of invoices from a company that is accused of committing fraud. The table below displays the sample data.
(a) Are these data inconsistent with Benford’s law? Carry out an appropriate test at the level to support your answer. If you find a significant result, perform follow-up analysis.
(b) Describe a Type I error and a Type II error in this setting, and give a possible consequence of each. Which do you think is more serious?
5 step solution
Q.12
Housing According to the Census Bureau, the distribution by ethnic background of the New York City population in a recent year was
The manager of a large housing complex in the city wonders whether the distribution by the race of the complex’s residents is consistent with the population distribution. To find out, she records data from a random sample of residents. The table below displays the sample data
Are these data significantly different from the city’s distribution by race? Carry out an appropriate test at the level to support your answer. If you find a significant result, perform follow-up analysis.
2 step solution
Q 13
Skittles Statistics teacher Jason Molesky contacted Mars, Inc., to ask about the color distribution for Skittles candies. Here is an excerpt from the response he received: “The original flavor blend for the SKITTLES BITE SIZE CANDIES is lemon, lime, orange, strawberry, and grape. They were chosen as a result of consumer preference tests we conducted. The flavor blend is 20 percent of each flavor.”
(a) State appropriate hypotheses for a significance test of the company’s claim.
(b) Find the expected counts for a bag of Skittles with 60 candies.
(c) How large a C2 statistic would you need to get in order to have significant evidence against the company’s claim at the A 0.05 level? At the A 0.01 level?
(d) Create a set of observed counts for a bag with 60 candies that gives a P-value between 0.01 and 0.05. Show the calculation of your chi-square statistic.
8 step solution
Q.14
Use your calculator’s RandInt function to generate 200 digits from to and store them in a list.
(a) State appropriate hypotheses for a chi-square goodness-of-fit test to determine whether your calculator’s random number generator gives each digit an equal chance to be generated.
(b) Carry out the test. Report your observed counts, expected counts, chi-square statistic, P-value, and your conclusion
5 step solution
Q.15
The University of Chicago’s General Social Survey (GSS) is the nation’s most important social science sample survey. For reasons known only to social scientists, the GSS regularly asks a random sample of people their astrological sign. Here are the counts of responses from a recent GSS:
If births are spread uniformly across the year, we expect all 12 signs to be equally likely. Are these data inconsistent with that belief? Carry out an appropriate test to support your answer. If you find a significant result, perform a follow-up analysis
4 step solution
Q.16
Orange, lemon, cherry, raspberry, blueberry, and lime are among the six fruit flavours available in Kellogg's Froot Loops cereal. Charise counted the number of cereal pieces in each flavour as she poured out her morning bowl of cereal. Here are her statistics.
Test the null hypothesis that each flavour of Kellogg's Froot Loops is distributed evenly throughout the population. Perform a follow-up analysis if you discover a noteworthy result.
3 step solution
Q.17
Gregor Mendel (1822–1884), an Austrian monk, is considered the father of genetics. Mendel studied the inheritance of various traits in pea plants. One such trait is whether the pea is smooth or wrinkled. Mendel predicted a ratio of smooth peas for every 1 wrinkled pea. In one experiment, he observed smooth and wrinkled peas. The data were produced in such a way that the Random and Independent conditions are met. Carry out a chi-square goodness-of-fit test based on Mendel’s prediction. What do you conclude?
3 step solution
Q.18
The paper “Linkage Studies of the Tomato” (Transactions of the Canadian Institute, 1931) reported the following data on phenotypes resulting from crossing tall cut-leaf tomatoes with dwarf potato-leaf tomatoes. We wish to investigate whether the following frequencies are consistent with genetic laws, which state that the phenotypes should occur in the ratio .
The data were produced in such a way that the Random and Independent conditions are met. Carry out a chi-square goodness-of-fit test using these data. What do you conclude?
3 step solution
Q.19
An appropriate null hypothesis to test whether the trees in the forest are randomly distributed is
(a) , where the mean number of trees in each quadrant.
(b) , where the proportion of all trees in the forest that are in Quadrant
(c) , where is the number of trees from the sample in Quadrant .
(d) , where is the actual proportion of trees in the forest that are in Quadrant .
(e) , where is the proportion of trees in the sample that are in Quadrant
2 step solution
Q.20
The chi-square statistic is
(a)
(b)
(c)
(d)
(e)
3 step solution
Q.21
The -value for a chi-square goodness-of-fit test is . The correct conclusion is
(a) reject at ; there is strong evidence that the trees are randomly distributed.
(b) reject at ; there is not strong evidence that the trees are randomly distributed.
(c) reject at ; there is strong evidence that the trees are not randomly distributed.
(d) fail to reject at ; there is not strong evidence that the trees are randomly distributed.
(e) fail to reject at ; there is strong evidence that the trees are randomly distributed.
3 step solution
Q.22
Your teacher prepares a large container full of colored beads. She claims that of the beads are red, are blue, and the remainder are yellow. Your class will take a simple random sample of beads from the container to test the teacher’s claim. The smallest number of beads you can take so that the conditions for performing inference are met is
(a) 15. (c) 30. (e) 80.
(b) 16. (d) 40
2 step solution
Q.23
Do students who read more books for pleasure tend to earn higher grades in English? The boxplots below show data from a simple random sample of students at a large high school. Students were classified as light readers if they read fewer than books for pleasure per year. Otherwise, they were classified as heavy readers. Each student's average English grade for the previous two marking periods was converted to a GPA scale where and so on.
Reading and grades (1.3) Write a few sentences comparing the distributions of English grades for light and heavy readers.
2 step solution
Q.24
Reading and grades (10.2) Summary statistics for the two groups from Minitab are provided below.
(a) Explain why it is acceptable to use two-sample t procedures in this setting.
(b) Construct and interpret a confidence interval for the difference in the mean English grade for light and heavy readers.
(c) Does the interval in part (b) provide convincing evidence that reading more causes an increase in students’ English grades? Justify your answer.
6 step solution
Q.25
Reading and grades (3.2) The Fathom scatterplot below show the number of books read and the English grade for all students in the study. A least-squares regression line has been added to the graph.
(a) Interpret the meaning of the y-intercept in context.
(b) The student who reported reading books for pleasure had an English GPA of . Find this student’s residual. Show your work.
(c) How strong is the relationship between English grades and the number of books read? Give appropriate evidence to support your answer.
6 step solution
Q.26
Yahtzee In the game of Yahtzee, six-sided dice are rolled simultaneously. To get a Yahtzee, the player must get the same number on all dice.
(a) Luis says that the probability of getting a Yahtzee in one roll of the dice is . Explain why Luis is wrong.
(b) Nassir decides to keep rolling all 5 dice until he gets a Yahtzee. He is surprised when he still hasn’t gotten a Yahtzee after rolls. Should he be? Calculate an appropriate probability to support your answer .
4 step solution
Q.1.1
The Pennsylvania State University has its main campus in the town of State College and more than smaller “commonwealth campuses” around the state. The Penn State Division of Student Affairs polled separate random samples of undergraduates from the main campus and commonwealth campuses about their use of online social networking. Facebook was the most popular site, with more than of students having an account. Here is a comparison of Facebook use by undergraduates at the main campus and commonwealth campuses who have a Facebook account:
Calculate the conditional distribution (in proportions) of Facebook use for each campus setting?
2 step solution
Q.1.2
Why is it important to compare proportions rather than counts in Question ?
2 step solution
Q.1.3
Make a bar graph that compares the two conditional distributions. What are the most important differences in Facebook use between the two campus settings?
2 step solution
Q.2.1
In the previous Check Your Understanding (page 698), we presented data on the use of Facebook by two randomly selected groups of Penn State students. Here are the data once again.
Do these data provide convincing evidence of a difference in the distributions of Facebook use among students in the two campus settings?
State appropriate null and alternative hypotheses for a significance test to help answer this question.
2 step solution
Q.2.2
Calculate the expected counts. Show your work
2 step solution
Q.2.3
Calculate the chi-square statistic. Show your work.
2 step solution
Q. 3.1
Use Table C to find the -value. Then use your calculator’s command.
2 step solution
Q. 3.2
Interpret the -value from the calculator in context.
2 step solution
Q. 3.3
What conclusion would you draw? Justify your answer.
2 step solution
Q. 4.1
Canada has universal health care. The United States does not but often offers more elaborate treatment to patients with access. How do the two systems compare in treating heart attacks? Researchers compared random samples of U.S. and Canadian heart attack patients. One key outcome was the patients’ own assessment of their quality of life relative to what it had been before the heart attack. Here are the data for the patients who survived a year:
Construct an appropriate graph to compare the distributions of opinions about the quality of life among heart attack patients in Canada and the United States.
3 step solution
Q. 4.2
Canada has universal health care. The United States does not but often offers more elaborate treatment to patients with access. How do the two systems compare in treating heart attacks? Researchers compared random samples of U.S. and Canadian heart attack patients. One key outcome was the patients’ own assessment of their quality of life relative to what it had been before the heart attack. Here are the data for the patients who survived a year:
Is there a significant difference between the two distributions of quality-of-life ratings? Carry out an appropriate test at the level.
3 step solution
Q. 5.1
Sample surveys on sensitive issues can give different results depending on how the question is asked. A University of Wisconsin study randomly divided 2400 respondents into three groups. All participants were asked if they had ever used cocaine. One group of 800 was interviewed by phone; 21% said they had used cocaine. Another 800 people were asked the question in a one-on-one personal interview; 25% said “Yes.” The remaining 800 were allowed to make an anonymous written response; 28% said “Yes.”
1. Was this an experiment or an observational study? Justify your answer.
2 step solution