Against All Odds: Inside Statistics
If C and D are mutually exclusive events, then P(C or D) = P(C) + P(D).
Adequacy of a Linear Model
A line is adequate to describe the pattern in a set of data points provided the data have linear form. A residual plot is a good way of checking adequacy.
Alternative Hypothesis or Ha
The claim in a significance test that we are trying to gather evidence for – the researcher’s point of view. The alternative hypothesis is contradictory to H0 and is judged the more plausible claim when H0 is rejected.
Analysis of variance (ANOVA) is a technique used to analyze variation in data in order to test whether three or more population means are equal.
Assumptions of the Linear Regression Model
- The observed response y for any value of x varies according to a normal distribution. Repeated responses, y-values, are independent of each other.
- The mean response, μy, has a straight-line relationship with x: μy = α + βx.
- The standard deviation of y, σ, is the same for all values of x.
Graph of a frequency distribution for categorical data. Each category is represented by a bar whose area is proportional to the frequency, relative frequency, or percent of that category. If the categorical variable is ordinal, the logical order of the categories should be preserved in the bar chart.
A measure of the spread of the group means about the grand mean, the mean of all the observations. It is measured by the mean square for groups, MSG.
A sample in which some individuals or groups from the population are less likely to be selected than others due to some attribute.
In a binomial setting with n trials and probability of success p, the distribution of x = the number of successes. Shorthand notation for this distribution is b(n, p). The probabilities p(x) for the binomial distribution with parameters n and p can be calculated using the following formula:
Binomial Random Variable
The number of successes, x, in a binomial setting with n trials with probability of success p. The mean and standard deviation of a binomial random variable x can be calculated as follows:
A setting in which there are a fixed number of n independent trials. Each trial can result in only one of two outcomes, success or failure, and the probability of success, p, is the same for each trial.
Measurements or observations are recorded on two attributes for each individual or subject under study.
Boxplot (or Box-and-Whisker Plot)
Graphical representation of the five-number summary. The basic boxplot consists of a box that extends from the first quartile to the third quartile with whiskers that extend from each box end to the minimum and maximum data values. The basic boxplot can be modified to include identification of mild and extreme outliers. Unit 5
Variable whose values are classifications or categories. Gender, occupation, and eye color are examples of categorical variables.
An attempt to gather information about every individual in a population.
The center line on a control chart is generally the target value or mean of the quality characteristic being sampled.
Central Limit Theorem
If the sample size n is large (say n > 30), then the sampling distribution of the sample mean x̄ of n independent observations from the same population has an approximate normal distribution. If the population mean and variance are μ and σ, respectively, then x̄has an approximate normal distribution with mean μ and standard deviation σ/√n.
Chi-Square Test Statistic for Independence
The chi-square test for independence is used for categorical variables. For testing the null hypothesis H0: no association between the variables or H0: variables are independent, the chi-square-test statistic is computed as follows:
If the null hypothesis is true, χ2 will have a chi-square distribution with degrees of freedom (r – 1)(c – 1), where r and c are the number of rows and columns in the two-way table, respectively.
Common Cause Variation
Variation due to day-to-day factors that influence the process.
Complement of an Event A
An event that consists of all the outcomes in the sample space that are not in A. If B is the complement of A, then B = not A.
For any event C, P(not C) = 1 – P(C).
Two events are complementary if they are mutually exclusive and combining their outcomes into a single set gives the entire sample space.
There are two sets of conditional distributions for a two-way table:
- distributions of the row variable for each fixed level of the column variable
- distributions of the column variable for each fixed level of the row variable
Conditional distributions provide one way to explore the relationship between the row and column variables.
An interval estimate computed from sample data that gives a range of plausible values for a population parameter. The interval is constructed so that the value of the parameter will be captured between the endpoints of the interval with a chosen level of confidence.
Confidence Interval for μ (t-interval)
When σ is unknown, the sample size n is small, and the population distribution is approximately normal, a t-confidence interval for μ is given by the following formula:
where t* is a t-critical value associated with the confidence level and determined from a t-distribution with df = n – 1 degrees of freedom.
Confidence Interval for μ (z-interval)
When σ is known and either the sample size n is large or the population distribution is normal, a confidence interval for μ is given by the following formula:
where z* is a z-critical value (from a standard normal distribution) associated with the confidence level.
Confidence Interval for p
In situations where the sample size n is large, a confidence interval for the population proportion p is given by the following formula:
where p̂ is the sample proportion and z* is the a z-critical value (from a standard normal distribution) associated with the confidence level.
Confidence Interval for Population Slope β
A confidence interval for the population slope β is given by the following formula:
where t* is a t-critical value associated with the confidence level and determined from a t-distribution with df = n – 2; b is the least-squares estimate of the population slope calculated from the data, and sb is the standard error of b.
A number that provides information on how much confidence we have in the method used to construct a confidence interval estimate of a population parameter. It is the long-run success rate (success means capturing the parameter in the interval) of the method used to construct the confidence interval.
Two (or more) factors (explanatory variables) are confounded when their effects on a response variable are intertwined and cannot be distinguished from each other.
Continuous Random Variable
A random variable that can take on values that include an interval. The number of possible distinct outcomes is uncountable; there are too many possible values to put them all in a list.
Charts used to monitor the output of a process. The charts are designed to signal when the process has been disturbed so that it is now out of control or is about to go out of control.
A group in an experiment that does not receive the treatment under study. The control group could receive a placebo to hide the fact that no treatment is being given. In an active control group, the subjects receive what might be considered the existing standard treatment.
The upper control limit (UCL) and lower control limit (LCL) on a control chart are generally set ±3 σ/√n from the center line.
A sampling design in which the pollster selects a sample that is easy to obtain, such as friends, family, co-workers, and so forth.
Denoted by r, correlation measures the direction and strength of a linear relationship between two quantitative variables. The formula for computing Pearson’s correlation coefficient is:
A set of rules that identify from a control chart when a process is becoming unstable or going out of control.
Degrees of Freedom for Test for Independence
(r – 1)(c – 1), where the numbers r and c are the number of rows and columns in the two-way table, respectively.
Two events are dependent if the fact that one of the events occurs does affect the probability that the other occurs. Events that are not dependent are independent.
A variable whose outcome we would like to predict based on another variable (independent variable). The dependent variable is always plotted on the vertical axis of a scatterplot. Also called a response variable.
Deviations from the Mean
The deviations of each data value from the sample mean: x1 – x̄, x2 – x̄, … xn – x̄.
Discrete Random Variable
A random variable that can take on only a countable number of distinct values – in other words, it is possible to list all possible values. Any random variable that can take on only a finite number of values is a discrete random variable.
Description of the possible values a variable assumes and how often these values occur.
Graphical display of quantitative data in which each observation (or a group of a specified number of observations) is represented by a dot above a horizontal axis.
An experiment in which neither the subjects nor the individuals measuring the response know which subjects are assigned to which treatment.
Empirical Rule (68-95-99.7% Rule)
Rule that gives the approximate percentage of data that fall within one standard deviation (68%), two standard deviations (95%), and three standard deviations (99.7%) of the mean. This rule should be applied only when the data are approximately normal.
Estimated Regression Line
The estimated regression line for the linear regression model is the least-squares line, ŷ= a + bx.
The number of observations that would be expected to fall into each cell (or class) of a two-way table if the null hypothesis is true. The expected counts for the chi-square test for independence are computed as follows:
A study in which researchers deliberatively apply some treatment to the subjects in order to observe their responses. The purpose is to study whether the treatment causes a change in the response.
Variable that is used to predict the response variable. The explanatory variable is always plotted on the horizontal axis of a scatterplot. Also called Independent Variable.
The test statistic of the ratio of the MSG and MSE, , which is used for testing H0: μ1 = μ2 = … = μk. When H0 is true, F has an F distribution with numerator df = k – 1 and denominator df = N – k, where k is the number of groups and N is the total number of observations.
The explanatory variables in an observational study or an experiment. Also called the independent variables.
First Quartile or Q1
The one-quarter point in an ordered set of quantitative data. To compute Q1, calculate the median of the lower half of the ordered data.
A five number summary of a quantitative data set consists of the following: minimum, first quartile (Q1), median, third quartile (Q3), maximum.
A table that displays frequencies of data falling into categories or class intervals.
Graphical representation of a frequency distribution. Bars are drawn over each class interval on a number line. The areas of the bars are proportional to the frequencies with which data fall into the class intervals.
The state of a process that is running smoothly, with its variables staying within an acceptable range.
Two events are independent if the fact that one of the events occurs does not affect the probability that the other occurs.
Variable that is used to predict the dependent variable. The independent variable is always plotted on the horizontal axis of a scatterplot. Also called Explanatory Variable.
Interquartile range or IQR
A measure of the spread of the middle half of the data: IQR = Q3 – Q1. The IQR is a resistant measure of the variability of a data set.
Joint Distribution of Two Categorical Variables
A two-way table of counts gives the joint distribution of two categorical variables. The joint distribution can be converted to percentages by dividing each cell count by the grand total and then multiplying by 100%.
A method for finding the best-fitting curve to a given set of data points by minimizing the sum of the squares of the residual errors (SSE).
Least-Squares Regression Line
The least-squares line is the line that makes the sum of the squares of the residual errors (SSE) as small as possible. The equation of the least-squares line has the form y= a + bx, where a and slope b can be calculated from n data pairs (x, y) using the following formulas:
One of the possible values or settings that a factor can assume.
A scatterplot has linear form when dots in a scatter plot appear to be randomly scattered on either side of a straight line.
Linear Regression Model
The simple linear regression model assumes that for each value of x the observed values of the response variable y are normally distributed about a mean μy that has the following linear relationship with x:
Margin of Error
For confidence intervals of the form point estimate ± margin of error, the margin of error gives the range of values above and below the point estimate. The margin of error is the half-width of the confidence interval.
A distribution computed from a two-way table of counts by dividing the row or column totals by the overall total. Often the marginal distributions are expressed as percentages.
The sum of the row entries or the sum of the column entries in a two-way table of counts.
Matched-Pairs t-Test Statistic
In testing H0: μD = μD0 where μD is the population mean difference, given by
where x̄D and sD are the mean and standard deviation of the sample differences. If the differences are approximately normally distributed and the null hypothesis is true, then thas a t-distribution with df = n – 1 degrees of freedom.
The arithmetic average or balance point of sample data. To calculate the mean, sum the data values and divide the sum by the number of data values.
If the sample consists of observations x1,x2,…,xn, then the sample mean is
Mean of a Discrete Random Variable x
Given a probability distribution, p(x), the mean is calculated as follows:
A resistant measure of center of a data set. The median separates the upper half of the data from the lower half. To calculate the median, order the data from smallest to largest and count up (n + 1)/2 places in the ordered list.
The data value in a quantitative data set that occurs most frequently.
If C and D are independent, then P(C and D) = P(C)P(D).
A sampling design that begins by dividing the population into clusters. In stage one, the pollster choses a (random) sample of clusters. In subsequent stages, samples are chosen from each of the selected clusters.
Data that consists of measurements or observations recorded on two or more attributes for each individual or subject under study.
Mutually Exclusive Events
Events that have no outcomes in common. Events that are disjoint.
Two variables have negative association if above-average values of one accompany below-average values of the other, and vice versa. In a scatterplot, a negative association would appear as a pattern of dots in the upper left to the lower right.
Often scatterplots do not have linear form. Instead the data might form a curved pattern. In that case, we say the scatterplot has nonlinear form.
Bell-shaped curve. The center line of the normal curve is at the mean μ. The change-of-curvature in the bell-shaped curve occurs at μ – σ and μ + σ where σ is the standard deviation.
Normal Density Curve
A normal curve scaled so that the area under the curve is 1.
Distribution that is described by a normal density curve. Any particular normal distribution is completely specified by two numbers, its mean μ and standard deviation σ.
Normal Quantile Plot
Also known as normal probability plot. A graphical method for assessing whether data come from a normal distribution. The plot compares the ordered data with what would be expected of perfectly normal data. A normal quantile plot that shows a roughly linear pattern suggests that it is reasonable to assume the data come from a normal distribution.
Null Hypothesis or H0
The claim tested by a significance test. Usually the null hypothesis is a statement about “no effect” or “no change.” The null hypothesis has the following form: H0: population parameter = hypothesized value.
A study in which researchers observe subjects and measure variables of interest. However, the researchers do not try to influence the responses. The purpose is to describe groups of subjects under different situations.
The number of observations that fall into each cell (or class) of a two-way table.
One-Sided Alternative Hypothesis
The alternative hypothesis in a significance test is one-sided if it states that either a parameter is greater than or a parameter is less than the null hypothesis value.
An analysis of variance in which one factor is thought to be related to the response variable.
Out of Control
The state of a process that is no longer in control. The process has become unstable or its variables are no longer within an acceptable range.
Data value that lies outside the overall pattern of the other data values.
Paired t-Confidence Interval for μD
When data are matched pairs, and the standard deviation of the population differences σD is unknown, a t-confidence interval estimate of the population mean differences, μD, is given by the formula:
where t* is a t-critical value associated with the confidence level and determined from a t-distribution with df = n – 1 and x̄D and sD are the mean and standard deviation of the sample differences.
A value such that a certain percentage of observations from the distribution falls at or below that value. The pth percentile of a data set is a value such that p% of the observations fall at or below that value.
Graph of a frequency distribution for categorical data. Each category is represented by a slice of pie in which the area of the slice is proportional to the frequency or relative frequency of that category.
Something that is identical in appearance to the treatment received by the treatment group. Placebos are meant to be ineffectual and are given as control treatments.
A single number based on sample data (a statistic) that represents a plausible value for a population parameter.
The entire group of objects or individuals about which information is wanted.
For a population that is divided into two categories, success and failure, based on some characteristic, the population proportion, p, is:
Population Regression Line
The population regression line, μy = α + βx describes how the mean response y varies as x changes.
Two variables have positive association if above-average values of one tend to accompany above-average values of the other and below-average values of one tend to accompany below-average values of the other. In a scatterplot, a positive association would appear as a pattern of dots in the lower left to the upper right.
A measure of how likely it is that something will happen or something is true. Probabilities are always between 0 and 1. Events with probabilities closer to 0 are less likely to happen and events with probabilities closer to 1 are more likely to happen.
A list of the possible values of a discrete random variable together with the probabilities associated with those values.
Chain of steps that turns inputs into outputs.
A study that starts with a group and watches for outcomes (for example, the development of cancer or remaining cancer-free) during the study period and relates this to suspected risk or protection factors that might be linked to the outcomes.
The probability, computed under the assumption that the null hypothesis is true, of observing a value from the test statistic’s distribution that is at least as extreme as the value of the test statistic that was actually observed.
Variable whose values are numbers obtained from measurements or counts. Height, weight, and points scored at a basketball game are examples of quantitative variables.
A situation in which the possible outcomes are known but we do not know which one will occur. If the situation is repeated over and over, a regular pattern to the outcomes emerges over the long run.
A variable whose possible values are numbers associated with outcomes of a random phenomenon.
Measure of the variability of a quantitative data set from its extremes: range = maximum – minimum.
A straight line that describes how a response variable y is related to an explanatory variable x.
A sample that accurately reflects the members of the entire population.
A residual error is the vertical deviation of a data point from the regression model: residual error = actual y – predicted y.
A statistic that measures some aspect of a distribution (such as its center) that is relatively unaffected by a small subset of extreme data values. For example, the median is a resistant measure of the center of a distribution while the mean is not a resistant measure of center.
The variable used to measure the outcome of a study, which we attempt to explain or predict using one or more independent variables (factors). The response variable is always plotted on the vertical axis of a scatterplot. Also called the dependent variable.
A study that starts with an outcome (for example, two groups of people, a cancer group and a non-cancer group) and then looks back to examine exposures to suspected risk or protection factors that might be linked to that outcome. Unit 14
A plot of data values versus the order in which these values were collected.
The part of the population that is actually examined in a study.
One measure of center of a data set. The mean is the arithmetic average or balance point of a set of data. To calculate the mean, sum the data and divide by the number of data items:
The sample proportion, p̂, from a sample of size n is:
Sample Standard Deviation
One measure of variability of a data set. The standard deviation has the same units as the data values. To calculate the standard deviation, take the square root of the sample variance:
One measure of variability of a data set. To calculate the variance, sum the squared deviations from the mean and divide by the number of data minus one:
Occurs when a sample is collected in such a way that some individuals in the population are less likely to be included in the sample than others. Because of this, information gathered from the sample will be slanted toward those who are more likely to be part of the sample.
Plan of how to select the sample from the population.
The distribution of the values of a sample statistic (such as x̄, the median, or s) over many, many random samples chosen from the same population.
Sampling Distribution of the Sample Mean
The distribution of x̄ over a very large number of samples. If x̄ is the mean of a simple random sample (SRS) of size n from a population having mean µ and standard deviation σ, then the mean and standard deviation of x̄ are:
Furthermore, if the population distribution is normal, then the distribution of x̄ is normal.
Sampling Distribution of the Sample Proportion
When the sample size n is large, the sampling distribution of the sample proportion p̂ is approximately normally distributed with the following mean and standard deviation:
A graphical display of bivariate quantitative data in which each observation (x, y) is plotted in the plane.
A sampling design in which the sample consists of people who respond to a request for participation in the survey. (Also called voluntary sampling.)
In a significance test, the highest p-value for which we will reject the null hypothesis.
A method that uses sample data to decide between two competing claims, called hypotheses, about a population parameter.
Simple Random Sample of Size n
A sample of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be in the sample actually selected.
Simple Random Sampling
A sampling design that chooses a sample of size n using a method in which all possible samples of size n are equally likely to be selected.
An experiment in which the subjects do not know which treatment they are receiving but the individuals measuring the response do know which subjects were assigned to which treatments.
Skewed Right or Left
A unimodal distribution is skewed to the right if the right tail of the distribution is longer than the left and is skewed to the left if the left tail of the distribution is longer than the right. Unit 3
Special Cause Variation
Variation due to sudden, unexpected events that affect the process.
Standard Deviation of a Discrete Random Variable x
Given a probability distribution, p(x), the standard deviation, σ, is calculated as follows:
Standard Error of the Estimate
A point estimate of σ, which is a measure of how much the observations vary about the regression line. The standard error of the estimate, se, is computed as follows:
Standard Error of the Slope b
The estimated standard deviation of b, the least-squares estimate for the population slope β, is:
Standard Normal Distribution
Normal distribution with μ = 0 and σ = 1.
Standard Normal Quantiles
The z-values that divide the horizontal axis of a standard normal density curve into intervals such that the areas under the density curve over each of the intervals are equal.
Stemplot (or Stem-and-Leaf Plot)
Graphical tool for organizing quantitative data in order from smallest to largest. The plot consists of two columns, one for the stems (leading digit(s) of the observations) and the other for the leaves (trailing digit(s) for each observation listed beside corresponding stem). Stemplots are a useful tool for conveying the shape of relatively small data sets and identifying outliers.
The non-overlapping groups used in a stratified sampling plan.
Stratified Random Sample
A stratified sampling plan in which the sample is obtained by taking random samples from each of the strata.
A sampling plan that is used to ensure that specific non-overlapping groups of the population are represented in the sample. The non-overlapping groups are called strata. Samples are taken from each stratum.
Shape of a distribution of a quantitative variable in which the lower half of the distribution is roughly a mirror image of the upper half.
t-Confidence Interval for μ
When σ is unknown, the sample size n is small, and the population distribution is approximately normal, a t-confidence interval for μ is given by the following formula:
t-Test Statistic for the Slope
Test of Hypotheses
Third Quartile or Q3
Two-Sample t-Confidence Interval for μ1 – μ2
Two-Sample t-Test Statistic
Two-Sided Alternative Hypothesis
Two-Way Table of Counts (Frequencies)
Describes some characteristic or attribute of interest that can vary in value.
Variance of a Discrete Random Variable x
Given a probability distribution, p(x), the variance is calculated as follows:
A sampling design in which the sample consists of people who respond to a request for participation in the survey. Also called self-selecting sampling.
A measure of the spread of individual data values within each group about the group mean. It is measured by the mean square error, MSE.
A plot of means of successive samples versus the order in which the samples were taken.
Transformation of a data value x into its deviation from the mean measured in standard deviations. To calculate a z-score for a data value x, subtract the mean and divide by the standard deviation:
In testing H0: μ = μ0, where μ is the population mean, the formula for the z-test statistic is:
The z-test statistic is used in situations where the population standard deviation σ is known and either the population has a normal distribution or the sample size n is large.
z-Test Statistic for Proportions
In testing H0: p = p0, where p is the population proportion, the formula for the z-test statistic is:
The z-test is used in situations where the sample size n is large.
Unit 1 What Is Statistics?
Statistics is the art and science of gathering, organizing, analyzing and drawing conclusions from data. And without rudimentary knowledge of how it works, people can't make informed judgments and evaluations of a wide variety of things encountered in daily life.
Unit 2 Stemplots
As a first step in visualizing data, we use stemplots to understand measurements taken by the U.S. Army when they size up soldiers in order to design well-fitting gear and supplies for modern warfighters.
Unit 3 Histograms
Meteorologists use histograms to map when lightning strikes and this visualization technique helps them understand the data in new ways.
Unit 4 Measures of Center
It's helpful to know the center of a distribution — which is what the clerical workers in Colorado Springs found out in the 1980s when they campaigned for comparable wages for comparable work. Mean and median are two different ways to describe the center.
Unit 5 Boxplots
Using the example of hot dog calorie counts, we use boxplots to visualize the five-number summary and make comparisons between different types of frankfurters.
Unit 6 Standard Deviation
How can we compare sales at two franchises in the Wahoo's restaurant chain? Standard deviation helps us quantify the variability in sales.
Unit 7 Normal Curves
A nature preserve that's tracked bird migrations through New England for decades records tons of bird-related data; everything from wingspan measurements to arrival dates provides examples of normal distributions.
Unit 8 Normal Calculations
Visit the Boston Beanstalks club for tall people. Height is normally distributed and we can use membership cutoffs and population data to calculate z-scores.
Unit 9 Checking Assumption of Normality
Production at Pete and Gerry's Organic Eggs provides a number of distributions that look normal — but are they?
Unit 10 Scatterplots
Plotting annual numbers of Florida powerboat registrations and manatee killings suggests an uncomfortable relationship for the marine mammals.
Unit 11 Fitting Lines to Data
Winter snowpack in the Colorado Rockies can predict spring water supply. Plotting annual measurements in a scatterplot lets resource managers draw a regression line that helps them forecast water availability.
Unit 12 Correlation
Twin studies track how similar identical and fraternal twins are on various characteristics, even if they don't grow up together. Correlation lets researchers put a number on it.
Unit 13 Two-Way Tables
One city surveyed the happiness of its residents. Two-way tables help organize the data and tease out relationships between happiness levels and opinions about aspects of the city itself.
Unit 14 The Question of Causation
This historical story describes how researchers untangled the relationship between smoking and lung cancer.
Unit 15 Designing Experiments
We move beyond observational studies — like one of marine life in the remote Line Islands — to designing experiments that manipulate various subject groups — as in the case of a medical study about osteoarthritis treatments.
Unit 16 Census and Sampling
The U.S. counts every resident every ten years — or at least tries to. Statisticians use sampling from a population as an alternative to a complete count, as utilized at a potato chip factory.
Unit 17 Sample and Surveys
A visit to the University of New Hampshire Survey Center illustrates how pollsters create accurate surveys. They can then use details from their sample to make inferences about a whole population.
Unit 18 Introduction to Probability
Probability is the mathematics of chance behavior — and can help predict events such as the daily weather, or whether an asteroid will collide with Earth.
Unit 19 Probability Models
Casinos are as well versed in probability as statisticians and probability models help them maintain the house advantage over gamblers.
Unit 20 Random Variables
The Challenger space shuttle disaster was blamed on faulty O-rings. How can probability calculations on random variables help predict the chances of this kind of failure?
Unit 21 Binomial Distributions
Sickle cell disease is an example of binomial distribution in families with two parents who are carriers for this genetic trait.
Unit 22 Sampling Distributions
Heights of third graders in one class. Quality scores for circuit boards at a factory. Taking multiple samples allows us to visualize the sampling distribution of the sample mean.
Unit 23 Control Charts
This quality control method helped Quest Diagnostics streamline and improve their system for processing and testing lab samples so they could meet their nightly deadlines.
Unit 24 Confidence Intervals
A battery manufacturer tests just a sample of its product to verify its claims about battery life. A margin of error and a confidence level help quantify its accuracy.
Unit 25 Tests of Significance
Is a newly-discovered poem really written by William Shakespeare? Using statistical analysis of his known word use, researchers set up null and alternative hypotheses to investigate.
Unit 26 Small Sample Inference for One Mean
A brewer uses this technique to monitor quality differences in multiple batches of the same beer.
Unit 27 Comparing Two Means
Comparing the activity and calorie expenditure levels of Western office workers and African hunter gatherers adds some surprising new data to the science of obesity.
Unit 28 Inference for Proportions
Managers have no clue what conditions actually motivate their workers best, as shown by research conducted by Teresa Amabile, host of the original Against All Odds.
Unit 29 Inference for Two-Way Tables
Host Dr. Pardis Sabeti's own research examines possible genetic resistance to deadly Lassa fever in West Africa. Using Inference for Two-Way Tables helps untangle potential relationships.
Unit 30 Inference for Regression
Historical story of how statisticians built the case against DDT as the culprit behind plummeting peregrine falcon population numbers.
Unit 31 One-Way ANOVA
Does holding a heavier clipboard make you estimate that a jar of coins has more money in it than if you're holding a lighter clipboard? Psychologists use One-Way ANOVA to analyze the data from this experiment.
Unit 32 Summary
This review of the course through the preceding 31 video modules provides an overview of the practice of statistics and helps students appreciate how statistical methods can help them better understand their world.
Interactive 34 Interactive: Stemplots
The Stemplots tool organizes data — either your own input or randomly generated — into a stemplot.
interactive 35 Interactive: Wafer Thickness
The Wafer Thickness tool generates data (represented as a histogram)