A Kruskal-Wallis test is typically performed when an analyst would like to test for differences between three or more treatments or conditions. However, unlike a one-way ANOVA, the response variable of interest is not normally distributed. For example, you may want to know if first-years students scored differently on an exam when compared to second-year or third-year students, but the exam scores for at least one group are do not follow a normal distribution. The Kruskal-Wallis test is often considered a nonparametric alternative to a one-way ANOVA.
A Kruskal-Wallis test is typically performed when each experimental unit, (study subject) is only assigned one of the available treatment conditions. Thus, the treatment groups do not have overlapping membership and are considered independent. A Kruskal-Wallis test is considered a “between-subjects” analysis.
Formally, the null hypothesis is that the population distribution functions are equal for all treatments. The alternative hypothesis is that at least one of the distributions function is not equal.
Informally, we are testing to see if mean ranks differ between treatments. Since mean ranks approximate the median under similar distributions, many time analysts will indicate that we are testing for median differences even though this may not be considered formally correct. For this reason, many times descriptive statistics regarding median values are provided when the Kruskal-Wallis test is performed.
H0: distribution1 = distribution2 = … = distributionk
Ha: distribution1 ≠ distribution2 ≠ … ≠ distributionk
Kruskal-Wallis Test Assumptions
The following assumptions must be met in order to run a Kruskal-Wallis test:
- Treatment groups are independent of one another. Experimental units only receive one treatment and they do not overlap.
- The response variable of interest is ordinal or continuous.
- Both samples are random.
Kruskal-Wallis Test Example in SAS
In this example, we will test to see if there is a statistically significant difference in the number of insects that survived when treated with one of three different insecticide treatments.
Dependent response variable:
bugs = number of bugs
Categorical independent variable with 3 levels:
spray = three different insecticide treatments labeled C, D, F
The data for this example is available here and represents a subset of a larger experiment:
Kruskal-Wallis Test SAS Code
PROC UNIVARIATE provides the ability to test for normality while PROC NPAR1WAY provides the ability to perform classic nonparametric tests. The wilcoxon option will enable the Kruskal-Wallis test. The dscf option will produce the Dwass, Steel, Critchlow-Fligner multiple comparison (post-hoc) tests. PROC SGPLOT is used to provide boxplots for the number of bugs by treatment.
Here is the annotated code for the example. All assumption checks are provided along with the Kruskal-Wallis test:
*Import the data; proc import datafile='C:\Dropbox\Website\Analysis\Kruskal-Wallis\Data\InsectSpraysKW.csv' out=work.insect dbms=csv replace; run; *Sort the data by spray; proc sort data=insect; by spray; run; *Produce descriptive statistics; proc means data=insect nmiss mean std stderr lclm uclm median min max qrange maxdec=2; class spray; var bugs; run; *Test for normality; proc univariate data=insect normal cipctldf; by spray; var bugs; histogram bugs /normal; qqplot /normal (mu=est sigma=est); run; *Produce boxplots; proc sgplot data=insect; title 'Boxplots for number of bugs by treatment'; vbox bugs /category=spray; run; *Perform the Kruskal-Wallis Test; proc npar1way data=insect wilcoxon dscf; class spray; var bugs; run;
Kruskal-Wallis Test Annotated SAS Output
Many times, analysts forget to take a good look at their data prior to performing statistical tests. Descriptive statistics are not only used to describe the data but also help determine if any inconsistencies are present. Detailed investigation of descriptive statistics can help answer the following questions (in addition to many others):
- How much missing data do I have?
- Do I have potential outliers?
- Are my standard deviation and standard error values large relative to the mean?
- In what range most of my data fall for each treatment?
- spray – The treatment levels corresponding to our independent variable ‘spray’.
- N obs – The number of observations for each treatment.
- N Miss – The number of missing observations for each treatment.
- Mean – The mean value for each treatment.
- Std Dev – The standard deviation of each treatment.
- Std Error – The standard error of each treatment. That is the standard deviation / sqrt (n).
- Lower and Upper 95% CL for Mean – The upper and lower confidence intervals of the mean. That is to say, you can be 95% certain that the true mean falls between the lower and upper values specified for each treatment group assuming a normal distribution.
- Median – The median value for each treatment.
- Minimum, Maximum – The minimum and maximum value for each treatment.
- Quartile Range – The inner quartile range of each treatment. That is the 75th percentile – 25th percentile.
PROC UNIVARIATE can create distribution free 95% confidence intervals on many different percentiles. This can be helpful when describing data that does not follow a normal distribution. A subset of tables for each spray is presented below to provide confidence intervals on the median:
Median 95% confidence intervals for spray C:
Median 95% confidence intervals for spray D:
Median 95% confidence intervals for spray F:
- Level – Indicates the percentile for which the confidence interval is computed.
- Quantile – Designates the quantile corresponding to each percentile. Since 50% of the data falls above and below this point, the quantile value in this table also corresponds to the median for each group.
- 95% Confidence Limits Distribution Free – The 95% confidence interval for the median.
Side-by-side boxplots are provided by the SGPLOT procedure. The boxplots below seem to indicate one outlier for treatment groups C and D. Furthermore, both the mean (diamond) and median (middle line) values are at the 75th percentile. This indicates that the data is highly skewed by the effects of the outlier(s).
Prior to performing the Kruskal-Wallis test, it is important to evaluate model assumptions to ensure that we are performing an appropriate and reliable comparison. If normality is present, a one-way ANOVA would be a more powerful alternative.
Testing normality should be performed using a Shapiro-Wilk normality test (or equivalent), and/or a QQ plots for large sample sizes. many times, histograms can also be helpful. However, this data set is so small that histograms did not add value. PROC UNIVARIATE is used to produce the Shapiro-Wilk normality test and corresponding QQ plots.
The Shapiro-Wilk normality test for insecticide treatment C:
The Shapiro-Wilk normality test for insecticide treatment D:
The Shapiro-Wilk normality test for insecticide treatment F:
- Test – Four different normality tests are presented.
- Statistic – The test statistics for each test is provided here.
- p Value – The p-value for each test is provided. A p-value < 0.05 would indicate that we should reject the assumption of normality. The Shapiro-Wilk Test p-values for treatments C and D are < 0.05, while the p-value for treatment F is 0.1.
The vast majority of points should follow the theoretical normal reference line. However, for spray D, a deviation from normality can be observed which supports our Shapiro-Wilk normality test conclusion.
QQ plot for spray C:
QQ plot for spray D:
QQ plot for spray F:
The Shapiro-Wilk test p-values are < 0.05 for treatment groups C and D. In addition, the QQ plot for spray D is showing a deviation from the theoretical normal diagonal line. Since the bug amounts in 2 of our 3 treatment groups are not normally distributed, we conclude the Kruskal-Wallis test is more appropriate than the one-way ANOVA alternative.
So far, we have determined that the data is not normally distributed, and we have major influential outliers. As a result, a Kruskal-Wallis test would be more appropriate than a one-way ANOVA to test for significant differences between treatment groups. Our next step is to officially perform a Kruskal-Wallis test to determine which bug spray is more effective. The NPAR1WAY procedure performs this test in SAS.
- Spray – The treatment levels corresponding to the independent variable ‘spray’.
- N – The number of observations for each treatment.
- Sum of Scores – The sum of the assigned ranks for each treatment.
- Expected Under H0 – The expected sum of the ranks for each treatment assuming an identical distribution for each treatment level.
- Std Dev Under H0 – The standard deviation of the ranked data assuming an identical distribution for each treatment level.
- Mean Score – The mean rank for each treatment level.
Kruskal-Wallis Test Results
- Chi-Square – This value corresponds to the Kruskal-Wallis chi-square test statistic. The chi-square statistic is compared to the appropriate chi-square critical value as denoted by the DF row.
- DF – The degrees of freedom associated with the test. This will equal the number of treatment groups – 1.
- Pr > Chi-Square – The p-value corresponding to the two-sided test based on the chi-square distribution. The p-value for our test is <.0001. Given our alpha =0.05, we would reject our null hypothesis and conclude that there is a statistically significant difference in the number of bugs that survived each treatment.
General Information About Post-hoc Analyses for the Kruskal-Wallis Test
From our example, a Kruskal-Wallis test p-value < 0.0001 indicates that there is a significant difference in the mean ranks of bugs that survived between at least two of our treatments groups. However, it does not provide an indication of which groups are different without also performing post-hoc tests. SAS can perform Dwass, Steel, Critchlow-Fligner multiple comparisons post-hoc procedure to help determine which pairs of treatments differ.
- spray – This column identifies the paired comparison that is being performed.
- Wilcoxon Z – The z-score corresponding to the standardized two-sample Wilcoxon statistic.
- DSCF Value – This value represents the Dwass, Steel, Critchlow-Fligner statistics and equals |sqrt(2) * Wilcoxon Z|.
- Pr > DSCF -The Dwass, Steel, Critchlow-Fligner two sided p-value for each paired comparison.
Kruskal-Wallis Test Interpretation and Conclusions
We have concluded that the number of bugs in treatment groups C and D are not normally distributed, and treatment group F is marginally non-normal. In addition, outliers exist for groups C and D. As a result, a Kruskal-Wallis test is more appropriate than a traditional one-way ANOVA to compare the effectiveness of three separate insecticide treatments.
The Kruskal-Wallis test results in a two-sided test p-value < 0.0001. This indicates that we should reject the null hypothesis that mean ranks are equal across treatments and conclude that there is a significant difference in insecticide effectiveness. Descriptive statistics indicate that the median value with 95% confidence intervals for spray C is 1.5 CI[1,3], spray D is 5.0 CI[3,5], and spray F is 15.0 CI[11,24]. That is to say, the difference between the median values of each treatments D and C is about 3.5 bugs (p=.0068), treatments F and D is about 10 bugs (p=.0002), and treatments F and C is about 13.5 bugs (p<.0001). Thus, if the objective is to find the most effective insecticide, we would choose spray C since this treatment was most effective at controlling the bug population by minimizing the number of bugs that survived.
What to do When Assumptions are Broken or Things go Wrong
The Kruskal-Wallis test is typically used as a last resort. This is because it is a slightly lower powered test when compared to a one-way ANOVA.
More modern alternatives to the Kruskal-Wallis test include permutation/randomization tests, bootstrap confidence intervals, and transforming the data to achieve normality. However, each option will have its own stipulations.
A Kruskal-Wallis test is not appropriate if you have repeated measurements taken on the same experimental unit (subject). For example, if you have a pre-test, post-test, and follow-up study then each subject would be measured at three different time points. If this is the case, then a single factor repeated measures ANOVA or nonparametric Friedman test may be a more appropriate course of action.
Additional Resources and References
SAS Version 9.4, SAS Institute Inc., Cary, NC.
Higgins, J.J. (2004). Introduction to Modern Nonparametric Statistics, Pacific Grove, CA: Brooks/Cole, Thomson Learning, Inc.
Conover W.J. (1999). Practical Nonparametric Statistics. New York, NY: John Wiley & Sons, Inc.
Beall, G. (1942). The Transformation of Data from Entomological Field Experiments. Biometrika, 29, 243–262.