Quantitative analysis of research data is divided into two types, descriptive and inferential statistics. Descriptive statistics are used to describe or summarize the data obtained in the study and to provide an overview of findings. Inferential statistics are used to make inferences, or draw conclusions that can be extended beyond the immediate data themselves. We will review here the forms of descriptive and inferential statistics most commonly used in service-learning research. Note that this is not intended to be a "how to" discussion, but rather an introduction to the most frequently seen statistics in service-learning research. For more specific information on statistics and their use the reader should reference a statistics text or consult with experienced research colleagues.
Descriptive Statistics
- Frequency distribution: A summary of the individual scores or values on a measure or groupings of values, and how frequently that score or value occurred. This can take the form of a table (below), or a figure, such as a histogram, line or bar graph, or pie chart.
Example:
SAT Writing Score Percent
200-299
300-399
400-499
500-599
600-699
700-800
Percent
9%
15%
26%
27%
13%
10%
- Measures of central tendency: There are three statistics that are used to show the "center" of a distribution. In a normal or bell-shaped distribution these three scores are all equal to each other.
- Mean: the mathematical average of all scores. The mean is typically used with interval data.
- Median: the score found at the exact middle of the set of scores. For example, if you have 300 scores and put them in numerical order, the 150th ranked score is the median. The median is particularly useful if there are a few extreme scores that "pull" the mean up or down. The median is appropriate for ordinal data.
- Mode: the most frequent value in the set of scores; the highest point in the histogram or line graph. Sometimes there is more than one modal value, such as in a bimodal distribution. The mode is used for nominal or categorical data.
- Measures of dispersion: Dispersion refers to how spread out the scores are in a distribution. There are two common statistics used to show dispersion:
- Range: A simple way to show the "width" of a distribution, the range is highest value minus the lowest value.
- Standard deviation: A descriptive statistic that shows the relationship that the set of scores has to the mean (average) of the distribution. The higher the standard deviation, the bigger the width of the distribution and the more varied the scores are around the mean.
- Crosstabs: A table summarizing combinations of two (or more) characteristics, categories, or scores, and how frequently they occur. In the table below, the two variables being summarized are class status and sex of respondents.
Example:
Class
Freshmen
Sophomores
Juniors
Seniors
Percent of Males
49%
25%
16%
10%
Percent of Females
46%
30%
15%
9%
Inferential Statistics
Inferential statistics are used to test hypotheses, make inferences, and draw conclusions that can be extended beyond the immediate data themselves. The most commonly-used inferential statistics in service-learning research are described below. This section also includes a discussion of analysis of pre-test, post-test data, because this is a common measurement strategy in service-learning research.
- Correlations: A correlation demonstrates the nature and degree of association between two naturally occurring variables. The correlation coefficient is a statistical summary of the nature of the association between two constructs that have been operationalized as variables. The correlation coefficient contains two pieces of information, (a) a number, which summarizes the degree to which the two variables are linearly associated; and (b) a sign, which summarizes the nature of the relationship. The numeric value of a correlation coefficient can range from +1.0 to -1.0. Larger absolute values indicate greater linear association; numbers close to zero indicate no linear relationship. A positive sign indicates that higher values on one variable are associated with higher values on the other variable; a negative sign indicates an inverse relationship between the variables such that higher values on one variable are associated with lower values on the other variable. A correlation coefficient is both a descriptive statistic (i.e., describing the nature of the relationship in a sample) and an inferential statistic (i.e., a sample of the nature of the relationship in a broader population).
- t-test: The t-test and one-way analysis of variance (ANOVA) are used to determine if two sets of scores (t-test) or two or more sets of scores (ANOVA) are different. One common use of them is to compare the average performance of one group of subjects on a measure before and after a program; either the dependent t-test or repeated measure ANOVA can be used to determine if the two sets of scores differ significantly. In this case, the two sets of scores come from the same group of subjects. Another common use is to compare the average scores of one group versus another group, such as the post-test scores of a service-learning class versus the post-test scores of a non-service-learning class. ANOVA is used when there are two or more groups being compared or when there are more than two independent variables being analyzed.
- Analysis of covariance (ANCOVA): ANCOVA tests whether certain factors (independent variables) have an effect on the dependent variable while statistically removing the effects of other variables (covariates). For example, the researcher might give a pre-test and a post-test to both a service-learning section of a course and to a traditional section that does not include a service component. Because of the possibility of self-selection into the service-learning course, the researcher may wish to control for prior volunteering. The ANCOVA analysis allows the researcher to control for differences on prior volunteering (i.e., hold statistically constant prior volunteering experience), while examining differences between treatment and non-treatment groups on the dependent variable, thus isolating the effect of the main independent variable on the dependent variable. Another common approach is to use the pre-test as the covariate— i.e., hold the pre-test scores for the two groups constant, and then evaluate whether members of the service-learning group changed more than members of the traditional course section. Because most service-learning research involves non-random assignment of subjects to groups (quasi-experimental), researchers need to use a reliability-corrected ANCOVA model when pre-test scores are available (Trochim, 2006).
- Multiple regression: Multiple regression allows the evaluation of the association between a set of independent variables and a dependent variable. Multiple regression can also evaluate the relative importance of the each independent variable to the change in the dependent variable scores. Multiple regression is an improvement over bivariate correlation because multiple regression can examine the association of many predictors (e.g., family background variables, prior volunteering, attitudes, values, moral development) with an outcome variable (e.g., post-graduation civic involvement).
- Strategies for pre-test/post-test analysis: One of the most common measurement strategies in service-learning research is to give a measure (e.g., attitude, knowledge) to students at the beginning and the end of the semester to detect change or growth. (See the discussions of Experimental and Quasi-Experimental Designs in Chapter 2.) There are two basic strategies for analysis of pre-test, post-test data. The first strategy is to use a ttest to conduct a comparison of differences between post-test scores only of two groups (e.g., service-learning section versus non-service-learning section). Researchers often use this strategy if they do not have pre-test data available, or if they found no differences on pre-test scores and subsequently choose to ignore the pre-test data in analysis. Unfortunately this strategy suffers from the limitation that it is not possible to conclude that the difference on the post-test is due to the difference in instruction, rather than differences between the groups, general student maturation, or other events external to the course. When pre-test data are available, they should always be included in analyses, even when there are no significant differences between groups on the pre-test scores.
A second strategy is to analyze the raw difference scores (post-test minus pre-test scores) for each individual in the groups. This practice is not without controversy (Cronbach & Furby, 1970; Maruish, 1999; Pedhazur & Schmelkin, 1991; Rogosa & Willett, 1983) but is preferable to using post-test scores only, because the researcher is analyzing the change that is occurring for each participant and can make some conclusions, depending on the design of the study, that the changes in scores are due to the educational intervention rather than pre-existing differences in groups. For more precision the researcher may choose to use blocking, matching, or add a moderator variable such as gender, service site, or some other pre-existing measure (e.g., personality, prior service experience) to the design (Cook & Campbell,1979; Maruish, 1999, Pedhazur & Schmelkin, 1991) and conduct an ANOVA (one dependent variable) or MANOVA (more than one dependent variable) on the difference scores. In this type of analysis, the difference in groups (e.g., intervention) would be one factor (between subjects), "time" would be a factor (within subjects), and the moderator variable would be a factor in the ANOVA or MANOVA analysis. Another option is to conduct a multiple regression or ANCOVA (Edwards, 1994; Pedhazur & Schmelkin, 1991) to statistically remove the effects of moderator or other variables that produce nonequivalence of groups, and to control for pre-test scores. There are other types of scores that have been recommended for pre-, post-test analyses (standardized difference scores, residual change scores), but these are less straightforward, have problems of their own, and are not as appropriate as raw difference scores for research on service-learning.