# Descriptive Statistics By Group in R

Want to know the easiest way to perform descriptive statistics by group in R?

The ‘rstatix’ package has you covered.

The code below uses the employee dataset from the ‘stima’ package is used to compute salary descriptive statistics by gender and minority groups using the the get_summary_stats function from the ‘rstatix’ package.

Basic summary statistics for salary by minority and gender are displayed below.

## By Group Descriptive Statistics Using the ‘rstatix’ Package

```library("stima")
library("rstatix")
library("dplyr")
library("DescTools")

data("employee")

out <- employee %>% group_by(gender, minority) %>% get_summary_stats(salary, type="common")

df <- data.frame(out)
df
```

## By Group Descriptive Statistics ‘rstatix’ Output

```  minority gender variable   n   min    max median     iqr     mean        sd       se       ci
1      min      f   salary  40 16350  35100  23775  5025.0 23062.50  3972.369  628.087 1270.425
2   no_min      f   salary 176 15750  58125  24450  7500.0 26706.79  8011.894  603.919 1191.902
3      min      m   salary  64 19650 100000  29025  5062.5 32246.09 13059.881 1632.485 3262.261
4   no_min      m   salary 193 21300 135000  36000 26650.0 44524.77 20371.882 1466.400 2892.322```

Note: The confidence interval (ci) reported above can be misleading. It represents the distance above, or below the mean value, sometimes referred to as the half-width. So the the 95% confidence interval would be approximately:

• LCL = 23062.50 – 1270.425 = 21792.075
• UCL = 23062.50 + 1270.425 = 24332.925

## Descriptive Statistics By Group Using the ‘dplyr’ Package

Another way to compute descriptive statistics is by using the R ‘dplyr’ package.  While this method is more flexible, it is also a little bit more complicated. Using ‘dplyr’ allows you to compute descriptive statistics beyond what is typically provided by other packages.

For example, 95% bootstrapped confidence intervals of the median are computed below. LCLmed represents the lower confidence interval while UCLmed represents the upper confidence interval.

## By Group Descriptive Statistics ‘dplyr’ Code

```out <- employee %>% select(salary, gender, minority) %>% group_by(gender, minority) %>%
summarise(n = n(),
mean = mean(salary, na.rm = TRUE),
sd = sd(salary, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean - qt(1 - (0.05 / 2), n - 1) * stderr,
UCL = mean + qt(1 - (0.05 / 2), n - 1) * stderr,
median = median(salary, na.rm = TRUE),
min = min(salary, na.rm = TRUE),
max = max(salary, na.rm = TRUE),
IQR = IQR(salary, na.rm = TRUE),
LCLmed = MedianCI(salary, na.rm=TRUE),
UCLmed = MedianCI(salary, na.rm=TRUE))

df <- data.frame(out)
df
```

## By Group Descriptive Statistics ‘dplyr’ Output

```  gender minority   n     mean        sd    stderr      LCL      UCL median   min    max     IQR LCLmed UCLmed
1      f      min  40 23062.50  3972.369  628.0866 21792.07 24332.93  23775 16350  35100  5025.0  21150  24750
2      f   no_min 176 26706.79  8011.894  603.9192 25514.89 27898.69  24450 15750  58125  7500.0  23550  25500
3      m      min  64 32246.09 13059.881 1632.4852 28983.83 35508.36  29025 19650 100000  5062.5  27750  30750
4      m   no_min 193 44524.77 20371.882 1466.4001 41632.44 47417.09  36000 21300 135000 26650.0  33750  40200```

Note: LCL and UCL are the 95% confidence intervals of the mean. LCLmed and UCLmed are the 95% confidence interval of the median as calculated by the ‘DescTools’ package.

## Three-way Frequency Tables for Categorical Data

The R ‘stats’ package contains the ‘ftable()’ command which allows for a flexible way to create multi-way contingency, or frequency, tables of counts. Below we create a frequency table using the gender, minority, and job category variables from the employees dataset.

## Three-way Frequencies ‘dplyr’ Code

```employee %>% select(minority, gender, jobcat) %>% ftable()
```

## Three-way Frequencies ‘dplyr’ Output

```                jobcat Clerical Custodial manager
minority gender
min      f                   40         0       0
m                   47        13       4
no_min   f                  166         0      10
m                  109        14      70```
Scroll to Top