Mann-Whitney U Test in SAS

Introduction

A Mann-Whitney U test is typically performed when an analyst would like to test for differences between two independent treatments or conditions.  However, the continuous response variable of interest is not normally distributed.  For example, you may want to know if first-years students scored differently on an exam when compared to second-year students, but the exam scores for at least one group do not follow a normal distribution. The Mann-Whitney U test is often considered a nonparametric alternative to an independent sample t-test. The Mann-Whitney U test is also known as the Mann-Whitney-Wilcoxon, Wilcoxon-Mann-Whitney, and the Wilcoxon Rank Sum.

A Mann-Whitney U test is typically performed when each experimental unit, (study subject) is only assigned one of the two available treatment conditions. Thus, the treatment groups do not have overlapping membership and are considered independent. A Mann-Whitney U test is considered a “between-subjects” analysis.

Formally, the null hypothesis is that the distribution functions of both populations are equal. The alternative hypothesis is that the distribution functions are not equal. 

Informally, we are testing to see if mean ranks differ between groups.  Since mean ranks approximate the median,  many time analysts will indicate that we are testing for median differences even though this may not be considered formally correct. However for this reason, many times descriptive statistics regarding median values are provided when the Mann-Whitney U test is performed. 

H0: distribution1 = distribution2

Ha: distribution1 ≠ distribution2

Mann-Whitney U Test Assumptions

The following assumptions must be met in order to run a Mann-Whitney U test:

  1. Treatment groups are independent of one another. Experimental units only receive one treatment and they do not overlap.
  2. The response variable of interest is ordinal or continuous.
  3. Both samples are random.

Mann-Whitney U Test Example in SAS

In this example, we will test to see if there is a statistically significant difference in the number of insects that survived when treated with one of two available insecticide treatments.

Dependent response variable:
bugs = number of bugs

Categorical independent variable:
spray = two different insecticide treatments (C or D)

The data for this example is available here and represents a subset of a larger experiment:

Mann-Whitney U Test SAS Code

PROC UNIVARIATE provides the ability to test for normality while PROC NPAR1WAY provides the ability to perform classic nonparametric tests.  The wilcoxon option will enable the Mann-Whitney U test. PROC SGPLOT is used to provide boxplots for the number of bugs by treatment.

Here is the annotated code for the example.  All assumption checks are provided along with the Mann-Whitney U test:

*Import the data;
proc import datafile='C:\Dropbox\Website\Analysis\Mann-Whitney U\Data\InsectSpraysMWU.csv'
     out=work.insect
     dbms=csv
	 replace;
run;

*Produce descriptive statistics;
proc means data=insect n nmiss mean std stderr median min max qrange maxdec=4;
class spray;
var bugs;
run;

*Test for normality and produce confidence intervals on the median;
proc univariate data=insect normal cipctldf;
class spray;
var bugs;
histogram bugs /normal;
qqplot /normal (mu=est sigma=est);
run;

*Produce boxplots;
proc sgplot data=insect;
title 'Boxplot number of bugs by treatment';
vbox bugs /category=spray;
run;

*Perform the Mann-Whitney U Test;
proc npar1way data=insect wilcoxon;
class spray;
var bugs;
run;

Mann-Whitney U Test Annotated SAS Output

Descriptive Statistics

Many times, analysts forget to take a good look at their data prior to performing statistical tests. Descriptive statistics are not only used to describe the data but also help determine if any inconsistencies are present. Detailed investigation of descriptive statistics can help answer the following questions (in addition to many others):

  • How much missing data do I have?
  • Do I have potential outliers?
  • Are my standard deviation and standard error values large relative to the mean?
  • In what range most of my data fall for each treatment?
Descriptive statistics on the number of bugs that survived each treatment
  1. spray – The treatment levels corresponding to our independent variable ‘spray’.
  2. N obs – The number of observations for each treatment.
  3. N Miss – The number of missing observations for each treatment.
  4. Mean – The mean value for each treatment.
  5. Std Dev – The standard deviation of each treatment.
  6. Std Error – The standard error of each treatment.  That is the standard deviation / sqrt (n).
  7. Lower and Upper 95% CL for Mean – The upper and lower confidence intervals of the mean.  That is to say, you can be 95% certain that the true mean falls between the lower and upper values specified for each treatment group assuming a normal distribution. 
  8. Median – The median value for each treatment.
  9. Minimum, Maximum – The minimum and maximum value for each treatment.
  10. Quartile Range – The inner quartile range of each treatment. That is the 75th percentile –  25th percentile.

PROC UNIVARIATE can create distribution free 95% confidence intervals on many different percentiles.  This can be helpful when describing data that does not follow a normal distribution.  A subset of tables for each spray is presented below to provide confidence intervals on the median:

Median 95% confidence intervals for spray C:

95% Confidence interval on the median for spray C

Median 95% confidence intervals for spray D:

95% Confidence interval on the median for spray D
  1. Level – Indicates the percentile for which the confidence interval is computed.
  2. Quantile – Designates the quantile corresponding to each percentile. Since 50% of the data falls above and below this point, the quantile value in this table also corresponds to the median for each group.
  3. 95% Confidence Limits Distribution Free – The 95% confidence interval for the median.

Boxplots

Side-by-side boxplots are provided by the SGPLOT procedure.  The boxplots below seem to indicate one outlier in each treatment group. Furthermore, both the mean (diamond) and median (middle line) values are at the 75th percentile.  This indicates that the data is highly skewed by the effects of the outlier(s).

Side-by-side boxplots of the number of bugs that survived under each treatment

Normality Tests

Prior to performing the Mann-Whitney U, it is important to evaluate our assumptions to ensure that we are performing an appropriate and reliable comparison. If normality is present, an independent samples t-test would be a more appropriate test.

Testing normality should be performed using a Shapiro-Wilk normality test (or equivalent), and/or a QQ plots for large sample sizes. many times, histograms can also be helpful. However, this data set is so small that histograms did not add value. PROC UNIVARIATE is used to produce the Shapiro-Wilk normality test and corresponding QQ plots.

The Shapiro-Wilk normality test for insecticide treatment C:

Normality tests on the number of bugs that survived under the effects of treatment C

The Shapiro-Wilk normality test for insecticide treatment D:

Normality tests on the number of bugs that survived under the effects of treatment D
  1. Test – Four different normality tests are presented.
  2. Statistic – The test statistics for each test is provided here.
  3. p Value The p-value for each test is provided.  A p-value < 0.05 would indicate that we should reject the assumption of normality. The Shapiro-Wilk Test p-values for treatments C and D are < 0.05 and are, therefore, not normally distributed.

QQ Plots

The vast majority of points should follow the theoretical normal reference line. However, for spray D, a deviation from normality can be observed which supports our Shapiro-Wilk normality test conclusion.

QQ plot for spray C:

QQ plot for the number of bugs that survived under treatment C

QQ plot for spray D:

QQ plot for the number of bugs that survived under treatment D

Since the Shapiro-Wilk test p-values are < 0.05, for both treatment groups and the QQ plot for spray D is showing a deviation from the theoretical normal diagonal line, we conclude the data is not normally distributed.

Mann-Whitney U Test

So far, we have determined that the data for each treatment group is not normally distributed, and we have major influential outliers. As a result, a Mann-Whitney U test would be more appropriate than an independent samples t-test to test for significant differences between treatment groups. Our next step is to officially perform a Mann-Whitney U test to determine which bug spray is more effective. The NPAR1WAY procedure performs this test in SAS.

Mann-whitney U test computation in SAS
  1. Spray – The treatment levels corresponding to the independent variable ‘spray’.
  2. N – The number of observations for each treatment.
  3. Sum of Scores – The sum of the assigned ranks for each treatment.
  4. Expected Under H0 – The expected sum of the ranks for each treatment assuming an identical distribution for each treatment level.
  5. Std Dev Under H0 – The standard deviation of the ranked data assuming an identical distribution for each treatment level.
  6. Mean Score – The mean rank for each treatment level.

Mann-Whitney U Test Results

PROC NPAR1WAY p-values and results
  1. Statistic – This value corresponds to the U statistic. The U statistic is tested to see if it differs from the expected rank sum.
  2. Z – The z score corresponding the difference between U statistic and the expected rank sum, divided by the expected standard deviation.
  3. Normal Approximation One-Sided Pr < Z – The p-value corresponding to the one-sided test based on the standard normal distribution.
  4. Normal Approximation Two-Sided Pr > |Z| – The p-value corresponding to the two-sided test based on the standard normal distribution.  This is the classic Mann-Whitney U test p-value result and will be reported most often.
  5. t Approximation One-Sided Pr < Z – The p-value corresponding to the one-sided test based on Student’s t distribution.
  6. t Approximation Two-Sided Pr < |Z|– The p-value corresponding to the two-sided test based on Student’s t distribution. This value may be more appropriate for small sample sizes.

Hodges-Lehmann Estimate of the differences between sprays C and D:

Hodges-Lehmann estimation
  1. Type – The type of interval computed.
  2. 95% Confidence Limits – The 95% confidence interval on the difference between the number of bugs that survived under the effects of spray C vs spray D.
  3. Interval Midpoint – The confidence interval midpoint for the difference between sprays C and D.
  4. Asymptotic Standard Error – The asymptotic standard error for the confidence interval estimate.

Mann-Whitney U Test Interpretation and Conclusions

We have concluded that the number of bugs in each treatment group is not normally distributed. In addition, outliers exist in each group. As a result, a Mann-Whitney U test is more appropriate than a traditional independent samples t-test to compare the effectiveness of two separate insecticide treatments.

The Mann-Whitney U test results in a two-sided test p-value = 0.0027. This indicates that we should reject the null hypothesis that distributions are equal and conclude that there is a significant difference in insecticide effectiveness. Descriptive statistics indicate that the median value for spray C is 1.5 and spray D is 5.0. That is to say, the difference between the median values of each treatment is about 3.5 bugs between sprays. The Hodges-Lehmann estimate more precisely indicates that we can expect a median of about 3 more bugs will survive when spray D is used instead of spray C. We are 95% certain that the median difference between spray D and C across the population will be between 1 and 4 bugs. Thus, spray C is more effective than spray D at controlling the bug population.  

What to do When Assumptions are Broken or Things go Wrong

The Mann-Whitney U test is typically used as a last resort.  This is because it is a lower power test when compared to the independent samples t-test.

More modern alternatives include permutation/randomization tests, bootstrap confidence intervals, and transforming the data but each option will have its own stipulations.

If you need to compare more than two independent groups, a one-way Analysis of Variances (ANOVA) or Kruskal-Wallis test may be appropriate.

A Mann-Whitney U test is not appropriate if you have repeated measurements taken on the same experimental unit (subject).  For example, if you have a pre-test post-test study, then each subject would be measured at two different time intervals.  If this is the case, then a paired t-test or corresponding nonparametric Wilcoxon signed-rank test may be a more appropriate course of action.

Additional Resources and References

SAS Version 9.4, SAS Institute Inc., Cary, NC.

Higgins, J.J. (2004). Introduction to Modern Nonparametric Statistics, Pacific Grove, CA: Brooks/Cole, Thomson Learning, Inc.

Conover W.J. (1999). Practical Nonparametric Statistics. New York, NY: John Wiley & Sons, Inc.

Beall, G. (1942). The Transformation of Data from Entomological Field Experiments. Biometrika, 29, 243–262.