## Join us for conversations that inspire, recognize, and encourage innovation and best practices in the education profession.

**Available on Apple Podcasts, Spotify, Google Podcasts, and more.**

**In This P****art: Sample Size 20
**All of our estimates thus far have been based on a sample size of 10 randomly selected sub-regions out of 100. In this part, we will examine the effects of changing the sample size to 20 sub-regions.

Here is a sequence of 20 random numbers selected by sampling without replacement:

**81 48 66 94 87 60 51 30 92 97 00 41 27 12 38 64 93 79 50 59**

Here is the corresponding sample of 20 sub-regions:

As before, we estimate the total number of penguins in the region by finding the mean of our samples, and then multiplying by 100 (the number of regions):

100 x [(5 + 6 + 5 + 6 + 3 + 7 + 4 + 5 + 5 + 7 + 5 + 5 + 4 + 4 + 5 + 6 + 7 + 4 + 5 + 4)/20] = 510

This estimate is very accurate (it is within 10 of the actual number of penguins). Let’s now investigate the effect that increasing the sample size has on the accuracy of our estimation procedure.

**In This Part:**** Comparing Sample Sizes 10 and 20
**In order to investigate whether samples of 20 sub-regions are more likely to produce better estimates than samples of 10 sub-regions, you will need to consider repeated sampling results for samples of size 20.

Here is the stem and leaf plot for 100 estimates of sample size 10:

Here is the stem and leaf plot for 100 estimates of sample size 20:

**Problem D1
**Compare the two distributions above. In particular, look at how many estimates for each fall in the interval 450 to < 550 (i.e., the 4H and 5L stems). What does this suggest about the effect of sample size on the accuracy of estimation?

Which distribution has more estimates “closer” to the actual answer of 500?

**Problem D2
**Now let’s revisit our table of intervals.

In summary, as the sample size increases, the distribution of the estimates becomes more concentrated. Consequently, a larger sample size generally improves the accuracy of the estimation procedure.

**b. **Compare the proportions within the six intervals for the two different sample sizes. What does this suggest about the effect of sample size on the accuracy of the estimation procedure?

**In This Part****: Box Plot Comparisons
**In the previous discussion, you investigated how increasing the sample size does two things:

We can also use another familiar method to explore this phenomenon: the Five-Number Summary and box plot.

**Problem D3
**Here is the stem and leaf plot for the 100 estimates from samples of size 10:

Use the stem and leaf plot to determine the Five-Number Summary for these estimates. These questions may help you along:

**a. **What is the position of the median, and which two values are used to calculate it?**
b. **If there are 50 values in each half, how are the quartiles calculated?

**Problem D4**

Generate the Five-Number Summary for this stem and leaf plot of the 100 estimates based on samples of size 20:

Since the number of estimates is the same as Problem D3’s, the quartiles and median will be in the same positions. Count the values in increasing order to find them.

**Problem D5
**Create two box plots for the Five-Number Summaries you generated in Problems D3 and D4, placing them side by side on the same scale to make them easier to compare.

**Problem D6
**What do the box plots suggest about the effect of sample size on the accuracy of the estimates? In particular, how do the box plots illustrate the following:

b.

**Video Segment**

In this video segment, the participants discuss what percentages of their data fell in particular interval ranges for samples of size 10 and 20. Professor Kader then introduces the Central Limit Theorem to further discuss the connection between probability and statistics. What is the give-and-take between selecting an interval range and sample size when designing a statistical investigation? How would you use this information to plan a statistical investigation? How can you be more precise when taking a sample size? How can you be more accurate?

**Problem D1**

There are more estimates from the distribution for sample size 20 that fall in the 4H and 5L stems (i.e., in the range 450-549). This suggests that the estimates from 20 sub-regions are more accurate.

**Problem D2
a. **Here is the completed table:

**b. **Each interval of the samples of 20 sub-regions contains a higher proportion of estimates. For instance, the interval 450-550 contains 83/100 samples of size 20, compared to 69/100 samples of size 10. A higher proportion of the estimates falls within 50 penguins of the actual population size (500) when samples of size 20 were used. This suggests that the increased sample size has a significant effect on the accuracy of the estimates.

**Problem D3
a. **The median is in position (100 + 1)/2 = 50.5, so it is the average of the 50th and 51st values in the ordered list. Each of these values is 500.

b.

c.

Problem D4

Here is the completed table:

**Problem D5
**Here are the completed box plots:

**
Problem D6
a. **The sample-to-sample variation goes down as the sample size increases. This is exhibited by the shrinking box portion of the graphs.

b.