Descriptive Statistics By Group in R

Want to know the easiest way to perform descriptive statistics by group in R? 

The ‘rstatix’ package has you covered.

The code below uses the employee dataset from the ‘stima’ package is used to compute salary descriptive statistics by gender and minority groups using the the get_summary_stats function from the ‘rstatix’ package.

Basic summary statistics for salary by minority and gender are displayed below.

By Group Descriptive Statistics Using the ‘rstatix’ Package

library("stima")
library("rstatix")
library("dplyr")
library("DescTools")

data("employee")

out <- employee %>% group_by(gender, minority) %>% get_summary_stats(salary, type="common")

df <- data.frame(out)
df

By Group Descriptive Statistics ‘rstatix’ Output

  minority gender variable   n   min    max median     iqr     mean        sd       se       ci
1      min      f   salary  40 16350  35100  23775  5025.0 23062.50  3972.369  628.087 1270.425
2   no_min      f   salary 176 15750  58125  24450  7500.0 26706.79  8011.894  603.919 1191.902
3      min      m   salary  64 19650 100000  29025  5062.5 32246.09 13059.881 1632.485 3262.261
4   no_min      m   salary 193 21300 135000  36000 26650.0 44524.77 20371.882 1466.400 2892.322

Note: The confidence interval (ci) reported above can be misleading. It represents the distance above, or below the mean value, sometimes referred to as the half-width. So the the 95% confidence interval would be approximately:

  • LCL = 23062.50 – 1270.425 = 21792.075
  • UCL = 23062.50 + 1270.425 = 24332.925

Descriptive Statistics By Group Using the ‘dplyr’ Package

Another way to compute descriptive statistics is by using the R ‘dplyr’ package.  While this method is more flexible, it is also a little bit more complicated. Using ‘dplyr’ allows you to compute descriptive statistics beyond what is typically provided by other packages.

For example, 95% bootstrapped confidence intervals of the median are computed below. LCLmed represents the lower confidence interval while UCLmed represents the upper confidence interval.

By Group Descriptive Statistics ‘dplyr’ Code

out <- employee %>% select(salary, gender, minority) %>% group_by(gender, minority) %>% 
  summarise(n = n(), 
            mean = mean(salary, na.rm = TRUE), 
            sd = sd(salary, na.rm = TRUE),
            stderr = sd/sqrt(n),
            LCL = mean - qt(1 - (0.05 / 2), n - 1) * stderr,
            UCL = mean + qt(1 - (0.05 / 2), n - 1) * stderr,
            median = median(salary, na.rm = TRUE),
            min = min(salary, na.rm = TRUE), 
            max = max(salary, na.rm = TRUE),
            IQR = IQR(salary, na.rm = TRUE),
            LCLmed = MedianCI(salary, na.rm=TRUE)[2],
            UCLmed = MedianCI(salary, na.rm=TRUE)[3])

df <- data.frame(out)
df

By Group Descriptive Statistics ‘dplyr’ Output

  gender minority   n     mean        sd    stderr      LCL      UCL median   min    max     IQR LCLmed UCLmed
1      f      min  40 23062.50  3972.369  628.0866 21792.07 24332.93  23775 16350  35100  5025.0  21150  24750
2      f   no_min 176 26706.79  8011.894  603.9192 25514.89 27898.69  24450 15750  58125  7500.0  23550  25500
3      m      min  64 32246.09 13059.881 1632.4852 28983.83 35508.36  29025 19650 100000  5062.5  27750  30750
4      m   no_min 193 44524.77 20371.882 1466.4001 41632.44 47417.09  36000 21300 135000 26650.0  33750  40200

Note: LCL and UCL are the 95% confidence intervals of the mean. LCLmed and UCLmed are the 95% confidence interval of the median as calculated by the ‘DescTools’ package.


Three-way Frequency Tables for Categorical Data

The R ‘stats’ package contains the ‘ftable()’ command which allows for a flexible way to create multi-way contingency, or frequency, tables of counts. Below we create a frequency table using the gender, minority, and job category variables from the employees dataset.

Three-way Frequencies ‘dplyr’ Code

employee %>% select(minority, gender, jobcat) %>% ftable()

Three-way Frequencies ‘dplyr’ Output

                jobcat Clerical Custodial manager
minority gender                                  
min      f                   40         0       0
         m                   47        13       4
no_min   f                  166         0      10
         m                  109        14      70
Scroll to Top