Teacher resources and professional development across the curriculum

Teacher professional development and classroom resources across the curriculum

Monthly Update sign up
Mailing List signup
Search
Follow The Annenberg Learner on LinkedIn Follow The Annenberg Learner on Facebook Follow Annenberg Learner on Twitter
MENU

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

A

To Top

allocation

close window

An allocation is an arrangement for the values in a data set. For example, the data sets {1, 2, 3, 4, 5} and {3, 3, 3, 3, 3} each have a mean and a median equal to 3, but they are very different allocations. Allocation can also be used to describe the proximity of values to the mean; values may be closely distributed to or widely distributed from the mean, for example.

association

close window

An association between two variables exists when a change in the values for one variable produces a systematic change in the other. If an increase in one variable tends to result in an increase in the other, the association is positive. If an increase in one variable tends to result in a decrease in the other, the association is negative.

B

To Top

bias

close window

Bias, or systematic error, favors particular results. A measurement process is biased if it systematically overstates or understates the true value of a variable.

binomial experiment

close window

A binomial experiment consists of n trials, where each trial is like a coin toss -- it has exactly two possible outcomes. In each trial, the probability for each outcome remains constant.

binomial probability model

close window

The binomial probability model specifies the probabilities for each of the two possible outcomes in a binomial experiment.

bivariate analysis

close window

Bivariate analysis is a kind of data analysis that explores the association between two variables.

box plot

close window

A box plot, also known as a box-and-whiskers plot, is a graphical representation of the Five-Number Summary of a data set. A box is drawn from the lower quartile (Q1) to the upper quartile (Q3); a horizontal line across the box indicates the median. Two whiskers are drawn, one from the lower quartile to the minimum and one from the upper quartile to the maximum. Box plots can be used to make graphical comparisons between data sets and to measure the variation within parts of a data set.

C

To Top

census

close window

A census is an attempt to include every individual in a given population in a sample.

comparative experimental study

close window

A comparative experimental study seeks to determine "cause and effect." In an experimental study, two groups are selected, and each group is given a different treatment. At the end of the experiment, the results for each group are compared to determine whether or not the treatment had an influence on the results. For example, an experimental study might indicate that people who were told to drink more milk daily had a decreased incidence of osteoporosis.

comparative observational study

close window

A comparative observational study seeks to determine differences in measured groups, where each group is selected based on a differentiating criterion. For example, an observational study might compare smokers to non-smokers, or men to women. The difference between an observational study and an experimental study is that in an experimental study, participants are actively given different behaviors, while in an observational study, the different behaviors are predetermined and are used to place participants into groups.

comparative study

close window

A comparative study focuses on the relationship(s) between two or more sets of data. For example, a comparative study might demonstrate that, on average, the winners of a Best Actress award are younger than the winners of a Best Actor award. Comparative studies often use box plots and other statistical comparisons to prove that the distributions are different in a significant way.

contingency table

close window

A contingency table lists the number of values in each quadrant of a scatter plot.

continuous variable

close window

A continuous variable is a quantitative variable whose values can take on any value on a number line; it may contain a decimal or fractional value. For example, time is a continuous variable since its values can be any number zero or greater. Time can be measured on a number line, and any point on the number line is a possible point in time. This is in contrast to a discrete variable, which can only accept whole numbers as values (such as the number of raisins in a box).

co-variation

close window

Co-variation describes the way two variables simultaneously change together.

cumulative frequency

close window

Cumulative frequency specifies how many data values are of a particular number or smaller. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the cumulative frequency for the value 4 is nine, since there are nine values in the set that are 4 or less. The cumulative frequency for the value 2 is four; the cumulative frequency for the value 26 is 11; and so on. The statement "You scored higher than 10 other students in this class" is a statement of cumulative frequency.

cumulative frequency table

close window

A cumulative frequency table is a representation of data that shows the cumulative frequency of each value in the data set.

D

To Top

data

close window

Data are a set of values for a measured variable.

design of a comparative study

close window

The design of a comparative study is the step-by-step description of how the study is conducted, including the selection process of participants and the process of data collection. Designs must be created in ways that reduce potential sources of bias.

deviation from the mean

close window

Deviation from the mean for a data value is the difference between the value and the mean. The deviation from the mean can be positive, negative, or zero. For example, in the data set {1, 2, 3, 4, 5}, the mean is 3, and the deviations from the mean for each data value are {-2, -1, 0, 1, 2}. Adding all the deviations from the mean, positive and negative, must result in zero, since the mean represents a balance point for these deviations -- the point at which the excesses and deficits are perfectly balanced.

discrete data

close window

Discrete data are data whose measurements are obtained by counting and whose values must be whole numbers. The number of people living in a town, the number of times a person has been struck by lightning, the number of licks it takes to get to the center of a lollipop -- these are all discrete data.

distribution

close window

The distribution of data describes the shape of a data set when displayed on a histogram. There are dozens of specific statistical distributions found in data, but two of the most common are uniform distribution (intervals with equal frequency) and normal distribution (a bell-shaped histogram).

E

To Top

equal-shares allocation

close window

see fair allocation.

experimental probability

close window

Experimental probability is the proportion of times a particular outcome actually occurs when a random experiment is repeated a large number of times.

F

To Top

fair allocation

close window

Fair allocation, or the equal-shares allocation, is an allocation in which each data value is equal to the mean. For example, if five people are to share 35 cookies, the fair allocation is for each person to have the mean of 7 cookies.

Five-Number Summary

close window

The Five-Number Summary of a data set is a five-item list comprising the minimum value, first quartile, median, third quartile, and maximum value of the set. It divides a data set into four sets, each of which contains 25% of the set.

frequency

close window

The frequency of a value in a data set is the number of times that that value appears in the set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the frequency of the value 3 is two, the frequency of the value 26 is one, and the frequency of the value 6 is zero.

frequency bar graph

close window

A frequency bar graph is a graphical representation of data in which the values of the data are placed on the horizontal axis, and bars extend vertically above each value to indicate the frequency of that value. A bar graph indicating the population of a dozen cities is an example of a frequency bar graph.

frequency table

close window

A frequency table is a representation of data that shows the frequency of each value in the data set.

G

To Top

grouped frequency table

close window

A grouped frequency table is a representation of data in which the number (frequency) of data values that occurs within each interval (group) of a data set is listed.

H

To Top

histogram

close window

A frequency histogram is a graphical representation of grouped continuous data. The groups of data values are placed on the horizontal axis, and bars are placed vertically above each value to indicate the frequency of the data for that interval.

I

To Top

interquartile range

close window

The interquartile range is the length of the interval between the lower quartile (Q1) and the upper quartile (Q3). This interval indicates the central, or middle, 50% of a data set.

interval

close window

An interval is a range of values for data. Some common intervals include the interval from the lowest data value to the highest data value and the interval that contains the middle 50% of data.

J

To Top

close window

 

K

To Top

close window

 

L

To Top

least squares line

close window

Also called the line of best fit, the least squares line, is the line that most closely approximates a data set.

line of best fit

close window

See least squares line.

line plot

close window

A line plot is a graphical representation of data in which the values of the data are placed on the horizontal axis, and dots are placed vertically above each value to indicate the number of times that that value appears in the data. A line plot is sometimes called a dot plot.

M

To Top

mathematical probability

close window

Mathematical probability, or theoretical probability, is the proportion of times a particular outcome is expected to occur when a random experiment is repeated a large number of times.

mean

close window

The mean of a data set is the arithmetic average of the data set, which is obtained by adding all the values, then dividing by the number of values in the set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the mean is 5; you find it by dividing the sum of the values in the set (55) by the number of values (11). The mean may or may not be an actual value in the set.

mean absolute deviation (MAD)

close window

The mean absolute deviation (MAD) of a data set is the average of the absolute values of all deviations from the mean in that set. For example, in the data set {1, 2, 3, 4, 5}, the mean is 3, the deviations from the mean are {-2, -1, 0, 1, 2}, the absolute deviations from the mean are {|-2|, |-1|, |0|, |1|, |2|}, and the MAD is (2 + 1 + 0 + 1 + 2) / 5 = 1.2. The MAD is a measure of, on average, how far the values in a data set are from the mean.

measure of central tendency

close window

A measure of central tendency is a value that represents the data set. The mean, median, and mode are examples of measures of central tendency. Although all measures of central tendency represent the data set, they are not necessarily the same value.

median

close window

The median of a data set is the value in the center of an ordered list of the data. It is also the value for which there are as many values above it as there are below it. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the sixth value has five above it and five below it. This value, 3, is the median. If a data set contains an even number of values, the median is found by taking the mean of the two values in the center of the ordered list.

midrange

close window

The midrange of a data set is the average of the minimum and maximum values.

mode

close window

The mode is the most frequently occurring value in a data set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the mode is 4, which has a frequency of three. It is possible for a data set to have more than one mode if two or more values each have the highest frequency. It is also possible for a data set to have no mode if all of its values have the same frequency.

N

To Top

 

close window

 

O

To Top

outcome

close window

An outcome is a possible result of a random experiment. Each outcome has a probability associated with it (between zero and one).

P

To Top

Pascal's Triangle

close window

Pascal's Triangle is a special triangular tabulation of numbers. Each row in the triangle corresponds to the frequencies in a binomial probability table for n trials.

population

close window

The population is the entire group that a study wants information about.

probability table

close window

A probability table shows each of the possible values for an outcome of an experiment, paired with its corresponding probability.

Q

To Top

quadrants

close window

The four quadrants of a scatter plot are created when the graph is divided at the mean of each of the two variables. For example, the first quadrant consists of points that are above the mean for both variables.

qualitative data

close window

Qualitative data are the values of a measured qualitative variable.

qualitative variables

close window

Qualitative variables represent categories rather than numbers -- for example, the colleges attended by the last 10 American presidents, or the five cars most likely to be stolen in the United States.

quantitative data

close window

Quantitative data are the values of a measured quantitative variable.

quantitative variables

close window

Quantitative variables represent numbers or quantities -- for example, the number of lions in a box of animal crackers, or the height of each student in a classroom.

quartiles

close window

Quartiles are numbers that divide an ordered data set into four portions, each containing approximately one-fourth of the data. Twenty-five percent of the data values come before the first quartile (Q1). The median is the second quartile (Q2); 50% of the data values come before the median. Seventy-five percent of the data values come before the third quartile (Q3).

R

To Top

random assignment

close window

In a comparative experimental study, random assignment is frequently used to select the group in which participants are placed; this is done to reduce bias. For example, if an experiment attempted to study the effect of fear on people's ability to think clearly, such an experiment would be unreasonably biased if it were to ask for volunteers to make up its groups. Random assignment makes it equally likely that any participant will be placed in any group.

random error

close window

Random error is a nonsystematic measurement error that is beyond our control; its effects average out over a set of measurements.

random experiment

close window

A random experiment is an experiment whose outcomes are due to chance.

random sample

close window

A random sample is a sample that is selected completely by chance from the population.

relative frequency

close window

Relative frequency is frequency as a proportion of the whole set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the relative frequency of the value 4 is 3/11, since the value 4 appears three times out of 11 total values. Relative frequencies can be expressed as fractions (3/11), decimals (.273), or percentages (27.3%). The total of all relative frequencies in a data set should be 1 (or 100%) but may instead be very close to 1, due to round-off error.

relative frequency bar graph

close window

A relative frequency bar graph is a graphical representation of data in which the values of the data are placed on the horizontal axis, and bars extend vertically above each value to indicate its relative frequency. A bar graph indicating the percentage of people who voted for each presidential candidate is an example of a relative frequency bar graph.

relative frequency histogram

close window

A relative frequency histogram is a histogram in which the relative frequency of each group appears on the vertical axis, rather than the actual frequency. Typically, the relative frequency is expressed as a percentage.

representative sample

close window

A representative sample is one in which the relevant characteristics of the sample members are generally the same as the characteristics of the population.

S

To Top

sample

close window

A sample is a part of the population examined in a study to gain information about the whole population.

sample mean

close window

The sample mean is the mean of a sample. It can be used as an estimate of the mean of the population under study.

sample size

close window

The sample size is the number of observations taken from a population to form a sample. For example, when 500 people are polled regarding an upcoming election, the size of this sample is 500. Increasing the sample size generally leads to more accurate estimates.

sampling with replacement

close window

Sampling with replacement is a type of sampling in which it is possible for the same observation to be included more than once within a sample.

sampling without replacement

close window

Sampling without replacement is a type of sampling in which the same observation cannot be included more than once within a sample. If the same unit is randomly selected a second time, it is ignored.

scatter plot

close window

A scatter plot is a graph that allows you to visualize the simultaneous changes taking place in two variables. Each of the paired values of the two variables is plotted as a point on a graph in two dimensions.

standard deviation

close window

The standard deviation of a data set is the square root of the variance of that set. For example, in a data set whose variance is 2, the standard deviation is the square root of 2, which is approximately 1.414. Like the MAD, the standard deviation is a measure of the typical amount that the values in a data set vary from the mean.

stem and leaf plot

close window

A stem and leaf plot is a representation of data in which each data value is separated into two parts -- a stem and a leaf. For example, if the data are two-digit numbers, then the stems are commonly the tens digits, and the leaves would be the units digits. The stems are listed vertically (from smallest to largest), and the corresponding leaves for the data values are listed horizontally beside the appropriate stem. On the final version of the stem and leaf plot, the leaves are usually ordered within each stem. Note that the stems on a stem and leaf plot provide a mechanism for grouping numeric data.

sum of squared errors

close window

The sum of squared errors, or SSE, is the sum of the squares of the vertical distances from the values in a data set to the corresponding points on a trend line. The line of best fit, or the least squares line, is the line with the smallest SSE.

summary measures

close window

Summary measures are numbers that describe some significant characteristics of your data. Summary measures include the mean, the median, the mode, the maximum, the minimum, and the quartiles of a data set.

T

To Top

Three-Number Summary

close window

The Three-Number Summary of a data set is a three-item list comprising the minimum, median, and maximum values of the set. It divides a data set into two sets, each of which contains 50% of the set.

treatment

close window

The treatment in a comparative study is the defining difference between the groups. In an experimental study, the treatment might be a new drug being clinically tested. An observational study does not impose a treatment on individual objects; it observes the objects as they are.

tree diagram

close window

A tree diagram is a schematic diagram that can be used to describe the possible outcomes of a random experiment.

Two-Number Summary

close window

The Two-Number Summary of a data set is a two-item list comprising the minimum and maximum values of the set.

U

To Top

close window

V

To Top

variable

close window

A variable is a characteristic that may change (i.e., vary) from one observation to another.

variance

close window

The variance of a data set is the average of the squares of all the deviations from the mean in that set. For example, in the data set {1, 2, 3, 4, 5}, the deviations from the mean are {-2, -1, 0, 1, 2}, and the variance is ([-2]2 + [-1]2 + 02 + 12 + 22) / 5 = 2.

variation

close window

Variation is any difference in measured data. Variation can occur for many reasons, including random error and bias.

W

To Top

 

close window

 

X

To Top

close window

Y

To Top

close window

Z

To Top

close window