Kruskal-Wallis Test in R


A Kruskal-Wallis test is typically performed when an analyst would like to test for differences between three or more treatments or conditions. However, unlike a one-way ANOVA, the response variable of interest is not normally distributed. For example, you may want to know if first-years students scored differently on an exam when compared to second-year or third-year students, but the exam scores for at least one group are do not follow a normal distribution. The Kruskal-Wallis test is often considered a nonparametric alternative to a one-way ANOVA.

A Kruskal-Wallis test is typically performed when each experimental unit, (study subject) is only assigned one of the available treatment conditions. Thus, the treatment groups do not have overlapping membership and are considered independent. A Kruskal-Wallis test is considered a “between-subjects” analysis.

Formally, the null hypothesis is that the population distribution functions are equal for all treatments. The alternative hypothesis is that at least one of the distributions function is not equal. 

Informally, we are testing to see if mean ranks differ between treatments.  Since mean ranks approximate the median under similar distributions, many time analysts will indicate that we are testing for median differences even though this may not be considered formally correct. For this reason, many times descriptive statistics regarding median values are provided when the Kruskal-Wallis test is performed. 

H0: distribution1 = distribution2 = … = distributionk
Ha: distribution1 ≠ distribution2 ≠ … ≠ distributionk

Kruskal-Wallis Test Assumptions

The following assumptions must be met in order to run a Mann-Whitney U test:

  1. Treatment groups are independent of one another. Experimental units only receive one treatment and they do not overlap.
  2. The response variable of interest is ordinal or continuous.
  3. Both samples are random.

Kruskal-Wallis Test Example in R

In this example, we will test to see if there is a statistically significant difference in the number of insects that survived when treated with one of three different insecticide treatments.

Dependent response variable:
bugs = number of bugs

Categorical independent variable with 3 levels:
spray = three different insecticide treatments labeled C, D, F

The data for this example is available here and represents a subset of a larger experiment:

Kruskal-Wallis Test R Code

Each package used in the example can be installed with the install.packages commands as follows:

install.packages("ggplot2", dependencies = TRUE)
install.packages("qqplotr", dependencies = TRUE)
install.packages("dplyr", dependencies = TRUE)
install.packages("DescTools", dependencies = TRUE)
install.packages("FSA", dependencies = TRUE)
install.packages("PMCMRplus", dependencies = TRUE)

The R code below includes Shapiro-Wilk Normality Tests and QQ plots for each treatment group.  Data manipulation and summary statistics are performed using the dplyr package. Boxplots are created using the ggplot2 package. QQ plots are created with the qqplotr package. The shapiro.test and kruskal.test functions are included in the base stats package. Median confidence intervals are computed by the DescTools package. Post-hoc tests are provided by the FSA and PMCMRplus packages.

Here is the annotated code for the example.  All assumption checks are provided along with the Kruskal-Wallis test:


#Import the data

#Designate spray as a categorical factor

#Produce descriptive statistics by group
dat %>% select(spray, bugs) %>% group_by(spray) %>% 
            mean=mean(bugs, na.rm = TRUE), 
            sd=sd(bugs, na.rm = TRUE),
            LCL = mean - qt(1 - (0.05 / 2), n - 1) * stderr,
            UCL = mean + qt(1 - (0.05 / 2), n - 1) * stderr,
            median=median(bugs, na.rm = TRUE),
            min=min(bugs, na.rm = TRUE), 
            max=max(bugs, na.rm = TRUE),
            IQR=IQR(bugs, na.rm = TRUE),
            LCLmed = MedianCI(bugs, na.rm=TRUE)[2],
            UCLmed = MedianCI(bugs, na.rm=TRUE)[3])

#Produce Boxplots and visually check for outliers
ggplot(dat, aes(x = spray, y = bugs, fill = spray)) +
  stat_boxplot(geom ="errorbar", width = 0.5) +
  geom_boxplot(fill = "light blue") + 
  stat_summary(fun.y=mean, geom="point", shape=10, size=3.5, color="black") + 
  ggtitle("Boxplot of Treatments C and D") + 
  theme_bw() + theme(legend.position="none")

#Test each group for normality
dat %>%
  group_by(spray) %>%
  summarise(W = shapiro.test(bugs)$statistic,
            p.value = shapiro.test(bugs)$p.value)

#Perform QQ plots by group
ggplot(data = dat, mapping = aes(sample = bugs, color = spray, fill = spray)) +
  stat_qq_band(alpha=0.5, conf=0.95, bandType = "pointwise") +
  stat_qq_line() +
  stat_qq_point(col="black") +
  facet_wrap(~ spray, scales = "free") +
  labs(x = "Theoretical Quantiles", y = "Sample Quantiles") + theme_bw() 

#Perform the Kruskal-Wallis test
m1<-kruskal.test(bugs ~ spray, data=dat)

#Dunn's Kruskal-Wallis post-hoc test
posthocs1<-dunnTest(bugs ~ spray, data=dat, method="holm")

#Dwass, Steel, Critchlow, Fligner post-hoc test
posthocs2<-dscfAllPairsTest(bugs ~ spray, data=dat)

Kruskal-Wallis Test Annotated R Output

Descriptive Statistics

Many times, analysts forget to take a good look at their data prior to performing statistical tests. Descriptive statistics are not only used to describe the data but also help determine if any inconsistencies are present. Detailed investigation of descriptive statistics can help answer the following questions (in addition to many others):

  • How much missing data do I have?
  • Do I have potential outliers?
  • Are my standard deviation and standard error values large relative to the mean?
  • In what range most of my data fall for each treatment?
spraya    nb  meanc  sdd    stderre   LCLf   UCLf    mediang  minh maxh   IQRi    LCLmedj UCLmedj
C        12  2.08   1.98   0.570     0.828  3.34    1.5     0     7     2       1      3
D        12  4.92   2.50   0.723     3.33   6.51    5       2    12     1.25    3      5

F        12 16.7   6.21  1.79    12.7 20.6 15      9    26 10  11     24
  1. spray – The treatment levels corresponding to our independent variable ‘spray’.
  2. n – The number of observations for each treatment.
  3. mean – The mean value for each treatment.
  4. sd – The standard deviation of each treatment.
  5. stderr – The standard error of each treatment.  That is the standard deviation / sqrt (n).
  6. LCL, UCL – The upper and lower confidence intervals of the mean.  That is to say, you can be 95% certain that the true mean falls between the lower and upper values specified for each treatment group assuming a normal distribution. 
  7. median – The median value for each treatment.
  8. min, max – The minimum and maximum value for each treatment.
  9. IQR – The inner quartile range of each treatment. That is the 75th percentile –  25th percentile.
  10. LCLmed, UCLmed The 95% confidence interval for the median.


Side-by-side boxplots are provided by ggplot2.  The boxplots below seem to indicate one outlier for treatment group C and D. Furthermore, both the mean (circle with +) and median (middle line) values are at the 75th percentile.  This indicates that the data is highly skewed by the effects of the outlier(s).

Side-by-side boxplots for the number of bugs surviving each treatment

Normality Tests

Prior to performing the Kruskal-Wallis test, it is important to evaluate model assumptions to ensure that we are performing an appropriate and reliable comparison. If normality is present, a one-way ANOVA would be a more appropriate test.

Testing normality should be performed using a Shapiro-Wilk normality test (or equivalent), and/or a QQ plots for large sample sizes. Many times, histograms can also be helpful. However, this data set is so small that histograms did not add value.

In this example, we will use the shapiro.test function from the stats package to produce our Shapiro-Wilk normality test for each cylinder group, and the qqPlot function from the qqplotr package to produce QQ plots. These functions are wrapped with “tidyverse” dplyr syntax to easily produce separate analyses for each treatment group.

spraya  W Statb  p.valuec
C     0.859 0.0476
D     0.751 0.0027
F 0.885 0.1010
  1. spray – This column identifies the levels of the treatment variable along with the mean differences between the levels.
  2. W Stat – The Shapiro-Wilk (W) test statistics for each test is provided for each group.
  3. p-value – The p-value for each test is provided.  A p-value < 0.05 would indicate that we should reject the assumption of normality. The Shapiro-Wilk Test p-values for treatments C and D are < 0.05, while the p-value for treatment F is 0.1.

QQ Plots

The vast majority of points should follow the theoretical normal reference line and fall within the 95% confidence interval bands. However, for spray D, a deviation from normality can be observed which supports our Shapiro-Wilk normality test conclusion.

Normal Q-Q plots by group

The Shapiro-Wilk test p-values are < 0.05 for treatment groups C and D. In addition, the QQ plot for spray D is showing a deviation from the theoretical normal diagonal line. Since the bug amounts in 2 of our 3 treatment groups are not normally distributed, we conclude the Kruskal-Wallis test is more appropriate than the one-way ANOVA alternative.

Kruskal-Wallis Test

So far, we have determined that the data for each treatment group is not normally distributed, and we have major influential outliers. As a result, a Kruskal-Wallis test would be more appropriate than a one-way ANOVA to test for significant differences between treatment groups. Our next step is to officially perform a Kruskal-Wallis test to determine which bug spray is more effective. The kruskal.test function performs this test in R.

	Kruskal-Wallis rank sum test

data:  bugs by spray
Kruskal-Wallis chi-squareda = 26.866, dfb = 2, p-valuec = 1.466e-06
  1. chi-squared – This value corresponds to the Kruskal-Wallis chi-square test statistic. The chi-square statistic is compared to the appropriate chi-square critical value as denoted by the degrees of freedom.
  2. df – The degrees of freedom associated with the test.  This will equal the number of treatment groups – 1.
  3. p-value – The p-value corresponding to the two-sided test based on the chi-square distribution. The p-value for our test is 1.5e-6.  Given our alpha =0.05, we would reject our null hypothesis and conclude that there is a statistically significant difference in the number of bugs that survived each treatment.

General Information About Post-hoc Analyses for the Kruskal-Wallis Test

From our example, a Kruskal-Wallis test p-value = 1.5e-6 indicates that there is a significant difference in the mean ranks of bugs that survived between at least two of our treatments groups.  However, it does not provide an indication of which groups are different without also performing post-hoc tests.

In R, the FSA package can perform Dunn’s multiple comparison procedure to identify which pairs of groups significantly differ. Alternatively, the PMCMRplus package can perform the Dwass, Steel, Critchlow-Fligner multiple comparisons procedure if you would like to duplicate the results of the SAS software package. Although either post-hoc procedure is viable, Dunn’s is likely somewhat more popular.

Dunn’s multiple comparison procedure:

Dunn (1964) Kruskal-Wallis multiple comparison
  p-values adjusted with the Holm method.

  Comparisona      Zb      P.unadjc       P.adjd
1      C - D -2.080055 3.752050e-02 3.752050e-02
2      C - F -5.151538 2.583592e-07 7.750776e-07
3      D - F -3.071483 2.129984e-03 4.259968e-03
  1. Comparison – This column identifies the paired comparison that is being performed.
  2. Z – The z-score corresponding to the paired comparison.
  3. P.unadj – The unadjusted p-value corresponding to the paired comparison.
  4. P.adj – The adjusted p-value using Holm’s method corresponding to the paired comparison.

Dwass, Steel, Critchlow-Fligner multiple comparisons procedure:

Pairwise comparisons using Dwass-Steele-Critchlow-Fligner all-pairs test

data: bugs by spray

C D D 0.00680a - F 9e-05b 0.00018c P value adjustment method: single-step

Kruskal-Wallis Test Interpretation and Conclusions

We have concluded that the number of bugs in treatment groups C and D are not normally distributed, and treatment group F is marginally non-normal. In addition, outliers exist for groups C and D. As a result, a Kruskal-Wallis test is more appropriate than a traditional one-way ANOVA to compare the effectiveness of three separate insecticide treatments.

The Kruskal-Wallis test results in a two-sided test p-value = 1.5e-6. This indicates that we should reject the null hypothesis that mean ranks are equal across treatments and conclude that there is a significant difference in insecticide effectiveness. Descriptive statistics indicate that the median value with 95% confidence intervals for spray C is 1.5 CI[1,3], spray D is 5.0 CI[3,5], and spray F is 15.0 CI[11,24]. That is to say, the difference between the median values of each treatments D and C is about 3.5 bugs (p=3.8e-2), treatments F and D is about 10 bugs (p=7.8e-7), and treatments F and C is about 13.5 bugs (p=4.3e-3). Thus, if the objective is to find the most effective insecticide, we would choose spray C since this treatment was most effective at controlling the bug population by minimizing the number of bugs that survived.  

What to do When Assumptions are Broken or Things Go Wrong

The Kruskal-Wallis test is typically used as a last resort.  This is because it is a slightly lower powered test when compared to a one-way ANOVA.

More modern alternatives to the Kruskal-Wallis test include permutation/randomization tests, bootstrap confidence intervals, and transforming the data to achieve normality.  However, each option will have its own stipulations.

A Kruskal-Wallis test is not appropriate if you have repeated measurements taken on the same experimental unit (subject).  For example, if you have a pre-test, post-test, and follow-up study then each subject would be measured at three different time points.  If this is the case, then a single factor repeated measures ANOVA or nonparametric Friedman test may be a more appropriate course of action.

Additional Resources and References

Muenchen, R.A. (2011). R for SAS and SPSS Users, Second Edition. New York, NY: Springer, LLC.

Higgins, J.J. (2004). Introduction to Modern Nonparametric Statistics, Pacific Grove, CA: Brooks/Cole, Thomson Learning, Inc.

Conover W.J. (1999). Practical Nonparametric Statistics. New York, NY: John Wiley & Sons, Inc.

Beall, G. (1942). The Transformation of Data from Entomological Field Experiments. Biometrika, 29, 243–262.

Scroll to Top