A one-way analysis of variance (ANOVA) is typically performed when an analyst would like to test for mean differences between three or more treatments or conditions. For example, you may want to see if first-year students scored differently than second or third-year students on an exam.
A one-way ANOVA is appropriate when each experimental unit, (study subject) is only assigned one of the available treatment conditions. Thus, the treatment groups do not have overlapping membership and are considered independent. A one-way ANOVA is considered a “between-subjects” analysis.
The null hypothesis is that there is no difference between treatment group means, while the alternative hypothesis is that the mean of at least one treatment group differs.
H0: μ1 = μ2 = … = μk
Ha: μ1 ≠ μ2 ≠ … ≠ μk
One-way ANOVA Assumptions
In order to run a one-way ANOVA the following assumptions must be met:
- The response of interest is continuous and normally distributed for each treatment group.
- Treatment groups are independent of one another. Experimental units only receive one treatment, and they do not overlap.
- There are no major outliers.
- A check for unequal variances will help determine which version of a one-way ANOVA is most appropriate:
- If variances are equal, then the assumptions of a standard one-way ANOVA are met.
- If variances are unequal, then a Welch’s one-way ANOVA is appropriate.
One-way ANOVA Example
In this example, an experiment is performed to compare the dry weight of plants with one of three potential treatments. Each plant is treated with one out of three available treatments to enhance the weight of each plant. We want to answer the question, “Which treatment is optimal for enhancing the plant weight”?
Dependent response variable:
Plant_weight = The dried weight of each plant
Independent categorical variable:
Treatment = One of three available treatments to enhance plant weight:
- Ctrl = Control
- Trt1 = Treatment 1
- Trt2 = Treatment 2
The data for this example is available here:
One-way ANOVA SAS Code
SAS code for PROC MEANS is used to perform basic descriptive statistics. PROC UNIVARIATE performs normality tests and QQ plots for each treatment group. PROC GLM performs Levene’s Test for Homogeneity of Variances, the one-way ANOVA calculation, and corresponding post-hoc tests to help determine exactly where treatment differences occur.
Here is the annotated code for the example. All assumption checks are provided along with the one-way ANOVA and post-hoc tests:
*Import the data; proc import datafile='C:\Dropbox\Website\Analysis\One-Way ANOVA\data\PlantGrowth.csv' out=work.plants dbms=csv replace; run; *Produce descriptive statistics; proc means data=plants nmiss mean std stderr lclm uclm median min max qrange maxdec=2; class treatment; var plant_weight; run; *Sort the data by treatment; proc sort data=plants; by treatment; run; *Test for normality; proc univariate data=plants normal; by treatment; var plant_weight; qqplot /normal (mu=est sigma=est); run; *Test for equality of variances and perform anova; proc glm data=plants; class treatment; model plant_weight = treatment; means treatment / hovtest=levene(type=abs) welch; lsmeans treatment /pdiff adjust=tukey plot=meanplot(connect cl) lines; run; quit;
One-way ANOVA Annotated SAS Output
Many times, analysts forget to take a good look at their data prior to performing statistical tests. Descriptive statistics are not only used to describe the data but also help determine if any inconsistencies are present. Detailed investigation of descriptive statistics can help answer the following questions (in addition to many others):
- How much missing data do I have?
- Do I have potential outliers?
- Are my standard deviation and standard error values large relative to the mean?
- In what range most of my data fall for each treatment?
- treatment – Each treatment level of our independent variable.
- N obs – The number of observations for each treatment.
- N Miss – The number of missing observations for each treatment.
- Mean – The mean value for each treatment.
- Std Dev – The standard deviation of each treatment.
- Std Error – The standard error of each treatment. That is the standard deviation / sqrt (n).
- Lower and Upper 95% CL for Mean – The upper and lower confidence intervals of the mean. That is to say, you can be 95% certain that the true mean falls between the lower and upper values specified for each treatment group, assuming the data is normally distributed.
- Median – The median value for each treatment.
- Minimum, Maximum – The minimum and maximum value for each treatment.
- Quartile Range – The inner quartile range of each treatment. That is the 75th percentile – 25th percentile.
Prior to performing a one-way ANOVA, it is important to validate our assumptions to ensure that we are performing an appropriate and reliable comparison. Testing normality should be performed using a Shapiro-Wilk normality test (or equivalent), and/or a QQ plot for large sample sizes. Many times, histograms can also be helpful especially for large sample sizes.
In this example, we will use PROC UNIVARIATE to produce our Shapiro-Wilk normality test and QQ plots for each treatment group.
The Shapiro-Wilk normality test for the control group plant weights:
The Shapiro-Wilk normality test for the treatment 1 plant weights:
The Shapiro-Wilk normality test for the treatment 2 plant weights:
- Test – Four different normality tests are presented.
- Statistic – The test statistics for each test is provided here.
- p Value – The p-value for each test is provided. A p-value < 0.05 would indicate that we should reject the assumption of normality. Since the Shapiro-Wilk Test p-values are > 0.05 for each group(p=.75, p=.48, p=.56, respectively), we conclude the data is normally distributed.
The vast majority of points should follow the theoretical normal reference line for the data to be considered normally distributed.
QQ plot for control group plant weights:
QQ plot for treatment 1 plant weights:
QQ plot for treatment 2 plant weights:
Since the Shapiro-Wilk Test p-values are > 0.05, and the QQ Plot for each treatment group follows the QQ plot theoretical normal diagonal line, we conclude the data is normally distributed.
Levene’s Test for Homogeneity of Variances
Levene’s Test for Homogeneity of Variances is produced by PROC GLM to test whether variances are considered equal across all treatment groups. The null hypothesis for this test is that variances are equal across groups. The alternative hypothesis is that variances are unequal for at least one of our treatment groups. If we reject the null hypothesis of equal variances and conclude variances are not equal, then a Welch’s ANOVA is appropriate. However, if we do not reject the null hypothesis, the standard one-way ANOVA results would be considered appropriate.
More clearly, if the Levene’s test p-value P ≤ 0.05, use a Welch’s ANOVA results. If P > 0.05, use the standard one-way ANOVA results.
In our example, a p-value = 0.3093 indicates that we fail to reject the null hypothesis and conclude that variances are equal. Therefore we will be using the standard ANOVA results after assumptions are checked.
- Source – This column designates which variable Levene’s Test for Homogeneity is performed.
- DF – The degrees of freedom associated with each variable and overall error.
- Sum of Squares – The sums of squares calculation for Levene’s Test
- Mean Square – The mean square calculation for Levene’s Test
- F Value – The F statistic for which the p-value is computed.
- Pr > F – Levene’s Test for Equality of Variances shows a p-value of 0.3093. A significant p-value (P ≤ 0.05) would indicate that Welch’s ANOVA should be used in place of a standard one-way ANOVA. However since variances are considered equal (P > 0.05), the standard one-way ANOVA results can be reported.
Boxplots to Visually Check for Outliers
The GLM procedure provides side-by-side boxplots. Boxplots can help visually identify major outliers and help visually show if variances might be unequal. The boxplot below seems to indicate one minor outlier but subjectively, not enough evidence to suggest we move to a different analysis method.
One-way ANOVA Results
So far, we have determined that the data for each treatment group is normally distributed, variances are equal, and we do not have major influential outliers. Our next step is to officially perform the ANOVA to determine whether there is a statistically significant difference in plant weight between treatments.
- Source – This column identifies the partitioned sources of variability for the ANOVA calculation. The ‘Model’ measures represent the variation that can be explained by the model. The source ‘Error’ is the leftover variation not explained by measurable sources. The corrected total is the sum of the two sources.
- DF – This column displays information corresponds to the degrees of freedom associated with each source of variability.
- Model – This represents our number of treatment levels – 1. Thus, 3 – 1 = 2.
- Error – This represents our Total Sample Size – 1 – Model dfs = 30 – 1 – 2 = 27
- Total – This will be our Total Sample Size – 1 = 30 – 1 = 29.
- Sum of Squares – This column represents the sums of squares calculation for each source of variation. I will attempt to provide the primarily non-mathematical version of the interpretation of each source. However, if you prefer the mathematical interpretation, I encourage you to read the post here.
- Model – This value is a calculation of the variation in our treatment group means around the overall plant weight (grand) mean.
- Error – This value is a calculation of the variation in individual plant weight values around the mean of the corresponding treatment. This is an overall measure of the discrepancy between the data and the model.
- Total – This value represents the sum of the squared distance between actual values and the grand mean. That is SStotal = SSmodel + SSerror.
- Mean Square – In general, the mean square represents the variance explained by our treatments (model) and outside sources (error). Each source is computed by dividing each sum of squares by the corresponding degrees of freedom.
- Model – This value represents our measurable variability in plant weight attributed to our treatment effect. This can be thought of as the ‘signal’ we are measuring.
- Error – This value represents our outside sources of variability attributed to sources other than our treatment effect. This value results in the variance accounted for by sources other than our treatment and can be thought of as the ‘noise’.
- F value – This value is computed by dividing the mean square of our model by the mean square error and represents our ANOVA test statistic. This can be thought of as the signal/noise ratio. The higher the value, the more significant the effect of our treatment is at explaining variability in our outcome (plant weight).
- Pr > F – The p-value associated with our F test statistic. The p-value for our test is 0.0154. Given our alpha =0.05, we would reject our null hypothesis and conclude that there is a statistically significant difference in plant weight between treatments.
General Information About Post-hoc Tests
In our example, an ANOVA p-value=0.0154 indicates that there is an overall difference in mean plant weight between at least two of our treatments groups. However, it does not provide us with an indication of which groups are different without also performing post-hoc tests. Post-hoc tests are typically adjusted for the number of tests performed in order to control for Type I errors. As a result, we will apply Tukey’s post-hoc test p-value adjustment. This will allow us to determine which groups significantly differ while controlling for Type I errors.
In SAS, the lsmeans statement is typically used to perform post-hoc tests. Furthermore, this statement will compute the estimated marginal mean values for each treatment group and the corresponding differences between treatment group combinations. This approach is preferred especially for analyses where there are missing values, or you have an unequal number of subjects (or experimental units) within each treatment. You can find more information on estimated marginal means using the following link.
Post-hoc Test Results
- treatment – This column designates each categorical level of our treatment effect.
- plant_weight LSMEAN – Estimated marginal mean (also known as least squares mean) values for each treatment level.
- LSMEAN Number – The integer indicator value SAS assigned to each treatment level. The following values are assigned by SAS as follows:
- 1: control
- 2: treatment 1
- 3: treatment 2
- Least Squares Means for effect treatment – This matrix represents p-values for each treatment comparison. The first row and column of the matrix correspond to the treatment levels indicated by the LSMEAN number assigned in column c. For example, If we look at row 2, and column 3, we see a value of p=0.0116. This is Tukey’s adjusted p-value for a comparison of treatment 1 and treatment 2. As a result, we would reject the null hypothesis that treatment 1 and 2 mean plant weights are equal and conclude that there is a statistically significant difference in plant weight between these two treatments.
- 95% Confidence Limits – The upper and lower confidence intervals of the estimated marginal (least squares) mean. That is to say, you can be 95% certain that the true mean estimated marginal falls between the lower and upper values specified for each treatment group.
- i – The integer indicator value assigned in column c above. This value points to an individual treatment level for comparison.
- j – The integer indicator value assigned in column c above. This value points to an individual treatment level for comparison. For example i=2, j=3 corresponds to a comparison of treatments 1 and 2 as identified in column c.
- Difference Between Means – This value represents the marginal mean difference between treatment combinations identified in the adjoining columns i and j. For example i=2, j=3 corresponds to a marginal mean difference between treatments 1 and 2 of 4.659-5.526 = -0.867. So on average, plants treated with treatment 1 weight about 0.867 less (lbs?) than those treated with treatment 1 (p=0.0116).
- Simultaneous 95% Confidence Limits – The upper and lower confidence intervals of the estimated marginal mean difference between treatments. That is to say, you can be 95% certain that the estimated mean difference between treatments falls between the lower and upper values specified.
The “takeaway” here is that treatments 1 and 2 significantly differ (p = 0.0116). However, no differences exist between the control group and treatment 1 (p = 0.3854) or the control group and treatment 2 (p=.1966). Furthermore, the expected mean difference between treatments 1 and 2 is approximately -0.87. We can reverse this difference and say that on average, plants in treatment group 2 weight about 0.87 more (lbs?) than those in treatment group 1.
While this information can be helpful, in larger studies it can be hard to mentally process this information without additional details. SAS PROC GLM includes additional helpful plots which can help visually describe the results.
LSMEANS Plot With 95% Confidence Intervals
The estimated marginal means plot provides a visual aid to help interpret the numerical information provided by our post-hoc tests. Treatments are identified on the X-axis and mean plant weights are provided on the Y-axis. The circular point identifies the marginal mean of each treatment and while the high-low bars represent 95% confidence intervals. Confidence intervals from different treatments that do not overlap are generally of interest and warrants further investigation in the post-hoc tests.
In the plot above, a substantial increase between treatment 2 and 1 can be observed, thus reflecting a significant p-value = 0.0116. The difference in plant yield between treatments 2 and 1 is approximately 0.87.
LS Means Lines Plot
The LS Means line plot attempts to convey the post-hoc test results in graphical form. Treatments sharing a solid vertical line are not statistically different. The blue line connects treatment 1 and the control groups. The red line connects treatment groups 1 and 2. The fact that treatments 1 and 2 are not connected indicates that a difference is present between these treatments.
One-way ANOVA Interpretation and Conclusions
The overall ANOVA p-value = 0.0154. This indicates a statistically significant difference exists between plant weights of least two treatment groups. Post-hoc tests reveal that significant differences exist between treatments 1 and 2 (p = 0.0116). However, significant differences do not exist between our control group and treatment 1, or our control group and treatment 2.
We can quantify this difference by reviewing our estimated marginal (LS) mean differences from the post-hoc tests. The estimated marginal mean difference between treatments 1 and 2 is -0.87 with a 95% confidence interval on the difference of [-1.56, -0.17]. For simpler interpretation, it is possible to turn this comparison around and say that the difference between treatment 2 and treatment 1 is about 0.87 with a 95% CI [0.17, 1.56]. This can be confirmed visually by evaluating the LS Mean Plot with 95% Confidence Intervals.
As a result, we can conclude that treatment 2 is superior to treatment 1 at enhancing plant weight. However, neither treatment is statistically different than our control group.
What to do When Assumptions are Broken or Things Go Wrong
The lack of normality or severe impact of outliers can violate ANOVA assumptions and ultimately the results. If this happens, there are several available options:
Perform a nonparametric Kruskal-Wallis test is the most popular alternative. This test is considered robust to violations of normality and outliers (among others) and tests for differences in mean ranks. The Kruskal-Wallis test can have slightly decreased power when compared to ANOVA. So while the Kruskal-Wallis test is an acceptable option, transforming the dependent (using a log, square root, etc.) should result in a slightly higher power. A permutational ANOVA would be a more modern, but lesser known, nonparametric alternative.
If your data is normally distributed, but you have unequal variances, then a Welch’s ANOVA is a viable option.
While a one-way ANOVA is appropriate if you have a between-subjects design (each experimental only receives only one treatment), a one-way ANOVA is not appropriate for a within-subjects design. A within-subjects design can be analyzed with a repeated measures ANOVA. This is appropriate when each experimental unit (subject) receives more than one treatment. For example, if you wanted to see if students exam scores differed between 3 tests, then a single factor repeated measures ANOVA would be an appropriate analysis.
Many variations exist for both within and between measures designs. If an analyst needs to compare two between-subject factors, a two-way ANOVA would be appropriate. If you have one between-subject factor, and one within-subject factor then a repeated measures split-plot ANOVA would be the way to go. If you have two within-subject factors then a doubly repeated measures ANOVA would be appropriate. This goes on…
Additionally, if you have a continuous outside source of measurable variability, then an analysis of covariance (ANCOVA) can be performed to capture both the effects of a categorical treatment while also accounting for the continuous covariate.
As I update this site, I will provide the alternatives listed and much more.
Additional Resources and References
SAS Version 9.4, SAS Institute Inc., Cary, NC.
Hicks, C.R. & Turner, K.V. (1999). Fundamental Concepts in the Design of Experiments. New York, NY: Oxford University Press.
Mitra, A. (1998). Fundamentals of Quality Control and Improvement. Upper Saddle River, NJ: Prentice Hall.
Laplin, L.L. (1997). Modern Engineering Statistics. Belmont, CA: Wadsworth Publishing Company.
Dobson, A. J. (1983) An Introduction to Statistical Modelling. London: Chapman and Hall