A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

 allocation close window An allocation is an arrangement for the values in a data set. For example, the data sets {1, 2, 3, 4, 5} and {3, 3, 3, 3, 3} each have a mean and a median equal to 3, but they are very different allocations. Allocation can also be used to describe the proximity of values to the mean; values may be closely distributed to or widely distributed from the mean, for example. association close window An association between two variables exists when a change in the values for one variable produces a systematic change in the other. If an increase in one variable tends to result in an increase in the other, the association is positive. If an increase in one variable tends to result in a decrease in the other, the association is negative.

B

 bias close window Bias, or systematic error, favors particular results. A measurement process is biased if it systematically overstates or understates the true value of a variable. binomial experiment close window A binomial experiment consists of n trials, where each trial is like a coin toss -- it has exactly two possible outcomes. In each trial, the probability for each outcome remains constant. binomial probability model close window The binomial probability model specifies the probabilities for each of the two possible outcomes in a binomial experiment. bivariate analysis close window Bivariate analysis is a kind of data analysis that explores the association between two variables. box plot close window A box plot, also known as a box-and-whiskers plot, is a graphical representation of the Five-Number Summary of a data set. A box is drawn from the lower quartile (Q1) to the upper quartile (Q3); a horizontal line across the box indicates the median. Two whiskers are drawn, one from the lower quartile to the minimum and one from the upper quartile to the maximum. Box plots can be used to make graphical comparisons between data sets and to measure the variation within parts of a data set.

C

 census close window A census is an attempt to include every individual in a given population in a sample. comparative experimental study close window A comparative experimental study seeks to determine "cause and effect." In an experimental study, two groups are selected, and each group is given a different treatment. At the end of the experiment, the results for each group are compared to determine whether or not the treatment had an influence on the results. For example, an experimental study might indicate that people who were told to drink more milk daily had a decreased incidence of osteoporosis. comparative observational study close window A comparative observational study seeks to determine differences in measured groups, where each group is selected based on a differentiating criterion. For example, an observational study might compare smokers to non-smokers, or men to women. The difference between an observational study and an experimental study is that in an experimental study, participants are actively given different behaviors, while in an observational study, the different behaviors are predetermined and are used to place participants into groups. comparative study close window A comparative study focuses on the relationship(s) between two or more sets of data. For example, a comparative study might demonstrate that, on average, the winners of a Best Actress award are younger than the winners of a Best Actor award. Comparative studies often use box plots and other statistical comparisons to prove that the distributions are different in a significant way. contingency table close window A contingency table lists the number of values in each quadrant of a scatter plot. continuous variable close window A continuous variable is a quantitative variable whose values can take on any value on a number line; it may contain a decimal or fractional value. For example, time is a continuous variable since its values can be any number zero or greater. Time can be measured on a number line, and any point on the number line is a possible point in time. This is in contrast to a discrete variable, which can only accept whole numbers as values (such as the number of raisins in a box). co-variation close window Co-variation describes the way two variables simultaneously change together. cumulative frequency close window Cumulative frequency specifies how many data values are of a particular number or smaller. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the cumulative frequency for the value 4 is nine, since there are nine values in the set that are 4 or less. The cumulative frequency for the value 2 is four; the cumulative frequency for the value 26 is 11; and so on. The statement "You scored higher than 10 other students in this class" is a statement of cumulative frequency. cumulative frequency table close window A cumulative frequency table is a representation of data that shows the cumulative frequency of each value in the data set.

D

 data close window Data are a set of values for a measured variable. design of a comparative study close window The design of a comparative study is the step-by-step description of how the study is conducted, including the selection process of participants and the process of data collection. Designs must be created in ways that reduce potential sources of bias. deviation from the mean close window Deviation from the mean for a data value is the difference between the value and the mean. The deviation from the mean can be positive, negative, or zero. For example, in the data set {1, 2, 3, 4, 5}, the mean is 3, and the deviations from the mean for each data value are {-2, -1, 0, 1, 2}. Adding all the deviations from the mean, positive and negative, must result in zero, since the mean represents a balance point for these deviations -- the point at which the excesses and deficits are perfectly balanced. discrete data close window Discrete data are data whose measurements are obtained by counting and whose values must be whole numbers. The number of people living in a town, the number of times a person has been struck by lightning, the number of licks it takes to get to the center of a lollipop -- these are all discrete data. distribution close window The distribution of data describes the shape of a data set when displayed on a histogram. There are dozens of specific statistical distributions found in data, but two of the most common are uniform distribution (intervals with equal frequency) and normal distribution (a bell-shaped histogram).

E

 equal-shares allocation close window see fair allocation. experimental probability close window Experimental probability is the proportion of times a particular outcome actually occurs when a random experiment is repeated a large number of times.

F

 fair allocation close window Fair allocation, or the equal-shares allocation, is an allocation in which each data value is equal to the mean. For example, if five people are to share 35 cookies, the fair allocation is for each person to have the mean of 7 cookies. Five-Number Summary close window The Five-Number Summary of a data set is a five-item list comprising the minimum value, first quartile, median, third quartile, and maximum value of the set. It divides a data set into four sets, each of which contains 25% of the set. frequency close window The frequency of a value in a data set is the number of times that that value appears in the set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the frequency of the value 3 is two, the frequency of the value 26 is one, and the frequency of the value 6 is zero. frequency bar graph close window A frequency bar graph is a graphical representation of data in which the values of the data are placed on the horizontal axis, and bars extend vertically above each value to indicate the frequency of that value. A bar graph indicating the population of a dozen cities is an example of a frequency bar graph. frequency table close window A frequency table is a representation of data that shows the frequency of each value in the data set.

G

 grouped frequency table close window A grouped frequency table is a representation of data in which the number (frequency) of data values that occurs within each interval (group) of a data set is listed.

H

 histogram close window A frequency histogram is a graphical representation of grouped continuous data. The groups of data values are placed on the horizontal axis, and bars are placed vertically above each value to indicate the frequency of the data for that interval.

I

 interquartile range close window The interquartile range is the length of the interval between the lower quartile (Q1) and the upper quartile (Q3). This interval indicates the central, or middle, 50% of a data set. interval close window An interval is a range of values for data. Some common intervals include the interval from the lowest data value to the highest data value and the interval that contains the middle 50% of data.

J

K

L

 least squares line close window Also called the line of best fit, the least squares line, is the line that most closely approximates a data set. line of best fit close window See least squares line. line plot close window A line plot is a graphical representation of data in which the values of the data are placed on the horizontal axis, and dots are placed vertically above each value to indicate the number of times that that value appears in the data. A line plot is sometimes called a dot plot.

M

 mathematical probability close window Mathematical probability, or theoretical probability, is the proportion of times a particular outcome is expected to occur when a random experiment is repeated a large number of times. mean close window The mean of a data set is the arithmetic average of the data set, which is obtained by adding all the values, then dividing by the number of values in the set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the mean is 5; you find it by dividing the sum of the values in the set (55) by the number of values (11). The mean may or may not be an actual value in the set. mean absolute deviation (MAD) close window The mean absolute deviation (MAD) of a data set is the average of the absolute values of all deviations from the mean in that set. For example, in the data set {1, 2, 3, 4, 5}, the mean is 3, the deviations from the mean are {-2, -1, 0, 1, 2}, the absolute deviations from the mean are {|-2|, |-1|, |0|, |1|, |2|}, and the MAD is (2 + 1 + 0 + 1 + 2) / 5 = 1.2. The MAD is a measure of, on average, how far the values in a data set are from the mean. measure of central tendency close window A measure of central tendency is a value that represents the data set. The mean, median, and mode are examples of measures of central tendency. Although all measures of central tendency represent the data set, they are not necessarily the same value. median close window The median of a data set is the value in the center of an ordered list of the data. It is also the value for which there are as many values above it as there are below it. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the sixth value has five above it and five below it. This value, 3, is the median. If a data set contains an even number of values, the median is found by taking the mean of the two values in the center of the ordered list. midrange close window The midrange of a data set is the average of the minimum and maximum values. mode close window The mode is the most frequently occurring value in a data set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the mode is 4, which has a frequency of three. It is possible for a data set to have more than one mode if two or more values each have the highest frequency. It is also possible for a data set to have no mode if all of its values have the same frequency.

N

O

 outcome close window An outcome is a possible result of a random experiment. Each outcome has a probability associated with it (between zero and one).

P

 Pascal's Triangle close window Pascal's Triangle is a special triangular tabulation of numbers. Each row in the triangle corresponds to the frequencies in a binomial probability table for n trials. population close window The population is the entire group that a study wants information about. probability table close window A probability table shows each of the possible values for an outcome of an experiment, paired with its corresponding probability.

Q

 quadrants close window The four quadrants of a scatter plot are created when the graph is divided at the mean of each of the two variables. For example, the first quadrant consists of points that are above the mean for both variables. qualitative data close window Qualitative data are the values of a measured qualitative variable. qualitative variables close window Qualitative variables represent categories rather than numbers -- for example, the colleges attended by the last 10 American presidents, or the five cars most likely to be stolen in the United States. quantitative data close window Quantitative data are the values of a measured quantitative variable. quantitative variables close window Quantitative variables represent numbers or quantities -- for example, the number of lions in a box of animal crackers, or the height of each student in a classroom. quartiles close window Quartiles are numbers that divide an ordered data set into four portions, each containing approximately one-fourth of the data. Twenty-five percent of the data values come before the first quartile (Q1). The median is the second quartile (Q2); 50% of the data values come before the median. Seventy-five percent of the data values come before the third quartile (Q3).

R

 random assignment close window In a comparative experimental study, random assignment is frequently used to select the group in which participants are placed; this is done to reduce bias. For example, if an experiment attempted to study the effect of fear on people's ability to think clearly, such an experiment would be unreasonably biased if it were to ask for volunteers to make up its groups. Random assignment makes it equally likely that any participant will be placed in any group. random error close window Random error is a nonsystematic measurement error that is beyond our control; its effects average out over a set of measurements. random experiment close window A random experiment is an experiment whose outcomes are due to chance. random sample close window A random sample is a sample that is selected completely by chance from the population. relative frequency close window Relative frequency is frequency as a proportion of the whole set. For example, in the data set {1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 26}, the relative frequency of the value 4 is 3/11, since the value 4 appears three times out of 11 total values. Relative frequencies can be expressed as fractions (3/11), decimals (.273), or percentages (27.3%). The total of all relative frequencies in a data set should be 1 (or 100%) but may instead be very close to 1, due to round-off error. relative frequency bar graph close window A relative frequency bar graph is a graphical representation of data in which the values of the data are placed on the horizontal axis, and bars extend vertically above each value to indicate its relative frequency. A bar graph indicating the percentage of people who voted for each presidential candidate is an example of a relative frequency bar graph. relative frequency histogram close window A relative frequency histogram is a histogram in which the relative frequency of each group appears on the vertical axis, rather than the actual frequency. Typically, the relative frequency is expressed as a percentage. representative sample close window A representative sample is one in which the relevant characteristics of the sample members are generally the same as the characteristics of the population.

S

 sample close window A sample is a part of the population examined in a study to gain information about the whole population. sample mean close window The sample mean is the mean of a sample. It can be used as an estimate of the mean of the population under study. sample size close window The sample size is the number of observations taken from a population to form a sample. For example, when 500 people are polled regarding an upcoming election, the size of this sample is 500. Increasing the sample size generally leads to more accurate estimates. sampling with replacement close window Sampling with replacement is a type of sampling in which it is possible for the same observation to be included more than once within a sample. sampling without replacement close window Sampling without replacement is a type of sampling in which the same observation cannot be included more than once within a sample. If the same unit is randomly selected a second time, it is ignored. scatter plot close window A scatter plot is a graph that allows you to visualize the simultaneous changes taking place in two variables. Each of the paired values of the two variables is plotted as a point on a graph in two dimensions. standard deviation close window The standard deviation of a data set is the square root of the variance of that set. For example, in a data set whose variance is 2, the standard deviation is the square root of 2, which is approximately 1.414. Like the MAD, the standard deviation is a measure of the typical amount that the values in a data set vary from the mean. stem and leaf plot close window A stem and leaf plot is a representation of data in which each data value is separated into two parts -- a stem and a leaf. For example, if the data are two-digit numbers, then the stems are commonly the tens digits, and the leaves would be the units digits. The stems are listed vertically (from smallest to largest), and the corresponding leaves for the data values are listed horizontally beside the appropriate stem. On the final version of the stem and leaf plot, the leaves are usually ordered within each stem. Note that the stems on a stem and leaf plot provide a mechanism for grouping numeric data. sum of squared errors close window The sum of squared errors, or SSE, is the sum of the squares of the vertical distances from the values in a data set to the corresponding points on a trend line. The line of best fit, or the least squares line, is the line with the smallest SSE. summary measures close window Summary measures are numbers that describe some significant characteristics of your data. Summary measures include the mean, the median, the mode, the maximum, the minimum, and the quartiles of a data set.

T

 Three-Number Summary close window The Three-Number Summary of a data set is a three-item list comprising the minimum, median, and maximum values of the set. It divides a data set into two sets, each of which contains 50% of the set. treatment close window The treatment in a comparative study is the defining difference between the groups. In an experimental study, the treatment might be a new drug being clinically tested. An observational study does not impose a treatment on individual objects; it observes the objects as they are. tree diagram close window A tree diagram is a schematic diagram that can be used to describe the possible outcomes of a random experiment. Two-Number Summary close window The Two-Number Summary of a data set is a two-item list comprising the minimum and maximum values of the set.

U

V

 variable close window A variable is a characteristic that may change (i.e., vary) from one observation to another. variance close window The variance of a data set is the average of the squares of all the deviations from the mean in that set. For example, in the data set {1, 2, 3, 4, 5}, the deviations from the mean are {-2, -1, 0, 1, 2}, and the variance is ([-2]2 + [-1]2 + 02 + 12 + 22) / 5 = 2. variation close window Variation is any difference in measured data. Variation can occur for many reasons, including random error and bias.

W

X

Y

Z