The function summaryTable()
produces a table with
descriptive statistics for continuous, categorical and dichotomous
variables. It is based on the function
gtsummary::tbl_summary()
, with several enhancements and
simplifications, such as
To demonstrate the various functionalities of the function we will
use the dataset survival::colon
.
library(survival)
data(cancer, package="survival")
colon1 <- colon %>%
group_by(id) %>%
slice(1) %>% # Select the first row within each id group
ungroup()
The dataset colon
contains data of 1858 patients from
one of the first successful trials of adjuvant chemotherapy for colon
cancer.
For simplicity, we focus here on recurrence only, two treatment groups, and four variable:
rx
),Male
),age
) andextent
).We also add a few missing values for the variable
extent
.
By default, the function produces a table with all variables present in the dataset.
Characteristic | N | N = 6191 |
---|---|---|
rx | 619 | |
Control | 315 (51%) | |
Lev+5FU | 304 (49%) | |
Male | 619 | |
0 | 312 (50%) | |
1 | 307 (50%) | |
age | 619 | 61.0 (18.0, 85.0) |
extent | 557 | |
1 | 17 (3%) | |
2 | 68 (11%) | |
3 | 446 (72%) | |
4 | 26 (4%) | |
Missing | 62 (10%) | |
1n (%); Median (Min, Max) |
If only specific variables are to be included, they need to be
entered in the argument vars
. The argument
group
allows the summary statistics to be stratified by
this variable.
Characteristic | N1 | Control | N1 | Lev+5FU |
---|---|---|---|---|
Male | 315 | 304 | ||
0 | 149 (47%) | 163 (54%) | ||
1 | 166 (53%) | 141 (46%) | ||
age | 315 | 60.0 (18.0, 85.0) | 304 | 62.0 (26.0, 81.0) |
extent | 285 | 272 | ||
1 | 8 (3%) | 9 (3%) | ||
2 | 38 (12%) | 30 (10%) | ||
3 | 222 (70%) | 224 (74%) | ||
4 | 17 (5%) | 9 (3%) | ||
Missing | 30 (10%) | 32 (11%) | ||
1N without missing values | ||||
2n (%); Median (Min, Max) |
The displayed name of each variable is
the label if it exists in the dataset, or
the variable name if no label is present in the dataset (which is the case in our example).
In order to customize the displayed name, the argument
labels
can be used. Please note that the labels need to be
entered as a list, as shown below:
Characteristic | N1 | Control | N1 | Lev+5FU |
---|---|---|---|---|
Male | 315 | 304 | ||
0 | 149 (47%) | 163 (54%) | ||
1 | 166 (53%) | 141 (46%) | ||
Age | 315 | 60.0 (18.0, 85.0) | 304 | 62.0 (26.0, 81.0) |
Extent | 285 | 272 | ||
1 | 8 (3%) | 9 (3%) | ||
2 | 38 (12%) | 30 (10%) | ||
3 | 222 (70%) | 224 (74%) | ||
4 | 17 (5%) | 9 (3%) | ||
Missing | 30 (10%) | 32 (11%) | ||
1N without missing values | ||||
2n (%); Median (Min, Max) |
The number of observations which are not missing
values are by default added in a new column. This can be
disabled by setting the argument add_n
to
FALSE
.
summaryTable(data = colon2,
group = "rx",
labels = list(rx = "Arm", age = "Age", extent = "Extent"),
add_n = FALSE)
Characteristic | Control | Lev+5FU |
---|---|---|
Male | ||
0 | 149 (47%) | 163 (54%) |
1 | 166 (53%) | 141 (46%) |
Age | 60.0 (18.0, 85.0) | 62.0 (26.0, 81.0) |
Extent | ||
1 | 8 (3%) | 9 (3%) |
2 | 38 (12%) | 30 (10%) |
3 | 222 (70%) | 224 (74%) |
4 | 17 (5%) | 9 (3%) |
Missing | 30 (10%) | 32 (11%) |
1n (%); Median (Min, Max) |
An “overall” column can be added by setting the argument
overall
to TRUE
.
summaryTable(data = colon2,
group = "rx",
overall = TRUE,
labels = list(age = "Age", extent = "Extent"))
Characteristic | N1 | Control | N1 | Lev+5FU | Overall |
---|---|---|---|---|---|
Male | 315 | 304 | |||
0 | 149 (47%) | 163 (54%) | 312 (50%) | ||
1 | 166 (53%) | 141 (46%) | 307 (50%) | ||
Age | 315 | 60.0 (18.0, 85.0) | 304 | 62.0 (26.0, 81.0) | 61.0 (18.0, 85.0) |
Extent | 285 | 272 | |||
1 | 8 (3%) | 9 (3%) | 17 (3%) | ||
2 | 38 (12%) | 30 (10%) | 68 (11%) | ||
3 | 222 (70%) | 224 (74%) | 446 (72%) | ||
4 | 17 (5%) | 9 (3%) | 26 (4%) | ||
Missing | 30 (10%) | 32 (11%) | 62 (10%) | ||
1N without missing values | |||||
2n (%); Median (Min, Max) |
The function gtsummary::tbl_summary
considers numeric
variables with fewer than 10 unique values as categorical by default.
This is not the case in the function summaryTable
.
Per default, all numeric variables are considered as continuous,
unless they only have two unique values: 0 and 1. In that case, they are
considered as dichotomous. This can be changed by setting the argument
continuous_as
to categorical
.
For dichotomous variables, all levels are displayed by default. To
show only one row, use the argument
dichotomous_as = dichotomous
. The reference level is
specified using the argument
value = list(variable ~ "level to show")
.
summaryTable(data = colon2,
group = "rx",
vars = "Male",
labels = list(age = "Age"),
dichotomous_as = "dichotomous",
value = list(Male ~ "1"),
missing = FALSE)
Characteristic | N1 | Control | N1 | Lev+5FU |
---|---|---|---|---|
Male | 315 | 166 (53%) | 304 | 141 (46%) |
1N without missing values | ||||
2n (%) |
By default, the function plots the median and range for continuous
variables. A number of other options are available, using the argument
stat_cont
.
The statistics to be displayed can be chosen using the argument
stat_cont
(options: median_IQR
,
median_range
(default), "mean_sd"
,
"mean_se"
and "geomMean_sd"
) and
stat_cat
(options: "n_percent"
(default)
"n"
and "n_N"
).
summaryTable(data = colon2, group = "rx",
stat_cont = "median_IQR",
stat_cat = "n_N",
labels = list(age = "Age", sex = "Sex", extent = "Extent"))
Characteristic | N1 | Control | N1 | Lev+5FU |
---|---|---|---|---|
Male | 315 | 304 | ||
0 | 149/315 | 163/304 | ||
1 | 166/315 | 141/304 | ||
Age | 315 | 60.0 (53.0, 68.0) | 304 | 62.0 (52.0, 70.0) |
Extent | 285 | 272 | ||
1 | 8/315 | 9/304 | ||
2 | 38/315 | 30/304 | ||
3 | 222/315 | 224/304 | ||
4 | 17/315 | 9/304 | ||
Missing | 30/315 | 32/304 | ||
1N without missing values | ||||
2n/N; Median (Q1, Q3) |
By default, no p-value and confidence (CI) are displayed. p-values
can be added by setting test
to TRUE
and CI by
setting ci
to TRUE
.
The default test type for continuous variable is
wilcox.test
, and fisher.test
for categorical
variables. This can be changed in test_cont
and
test_cat
, respectively.
The default CI type for continuous variables is
wilcox.test
and wilson
for categorical
variables. This can be changed in ci_cont
and
ci_cat
, respectively.
summaryTable(data = colon2,
group = "rx",
vars = c("age", "extent"),
stat_cont = "mean_sd",
test = TRUE,
ci = TRUE,
labels = list(age = "Age", extent = "Extent")
)
Characteristic | N1 | Control | 95% CI | N1 | Lev+5FU | 95% CI | p-value3 |
---|---|---|---|---|---|---|---|
Age | 315 | 59.5 (12.0) | [59, 62] | 304 | 59.7 (12.3) | [59, 62] | 0.60 |
Extent | 285 | 272 | 0.37 | ||||
1 | 8 (3%) | [1.2%, 5.1%] | 9 (3%) | [1.5%, 5.7%] | |||
2 | 38 (12%) | [8.8%, 16%] | 30 (10%) | [6.9%, 14%] | |||
3 | 222 (70%) | [65%, 75%] | 224 (74%) | [68%, 78%] | |||
4 | 17 (5%) | [3.3%, 8.7%] | 9 (3%) | [1.5%, 5.7%] | |||
Missing | 30 (10%) | [6.6%, 13%] | 32 (11%) | [7.4%, 15%] | |||
1N without missing values | |||||||
2Mean (SD); n (%) | |||||||
3Wilcoxon rank sum test; Fisher's exact test | |||||||
Abbreviation: CI = Confidence Interval |
Per default, missing values are shown as a separate category. This
can be disabled by setting missing
to
FALSE
.
For missing = TRUE
, the percentage are automatically
added next to the missing number. This can be disabled by setting the
argument missing_percentage
to FALSE
.
summaryTable(data = colon2,
group = "rx",
vars = "extent",
test = TRUE,
ci = TRUE,
missing_percent = FALSE,
labels = list(extent = "Extent")
)
Characteristic | N1 | Control | 95% CI | N1 | Lev+5FU | 95% CI | p-value3 |
---|---|---|---|---|---|---|---|
Extent | 285 | 272 | 0.37 | ||||
1 | 8 (3%) | [1.3%, 5.7%] | 9 (3%) | [1.6%, 6.4%] | |||
2 | 38 (13%) | [9.7%, 18%] | 30 (11%) | [7.7%, 16%] | |||
3 | 222 (78%) | [73%, 82%] | 224 (82%) | [77%, 87%] | |||
4 | 17 (6%) | [3.6%, 9.6%] | 9 (3%) | [1.6%, 6.4%] | |||
Missing | 30 | 32 | |||||
1N without missing values | |||||||
2n (%) | |||||||
3Fisher's exact test | |||||||
Abbreviation: CI = Confidence Interval |
summaryTable(data = colon2,
group = "rx",
vars = "extent",
test = TRUE,
ci = TRUE,
missing_percent = TRUE,
labels = list(extent = "Extent")
)
Characteristic | N1 | Control | 95% CI | N1 | Lev+5FU | 95% CI | p-value3 |
---|---|---|---|---|---|---|---|
Extent | 285 | 272 | 0.37 | ||||
1 | 8 (3%) | [1.2%, 5.1%] | 9 (3%) | [1.5%, 5.7%] | |||
2 | 38 (12%) | [8.8%, 16%] | 30 (10%) | [6.9%, 14%] | |||
3 | 222 (70%) | [65%, 75%] | 224 (74%) | [68%, 78%] | |||
4 | 17 (5%) | [3.3%, 8.7%] | 9 (3%) | [1.5%, 5.7%] | |||
Missing | 30 (10%) | [6.6%, 13%] | 32 (11%) | [7.4%, 15%] | |||
1N without missing values | |||||||
2n (%) | |||||||
3Fisher's exact test | |||||||
Abbreviation: CI = Confidence Interval |
The tables with and without missing values can also be put next to
each other by setting missing
to "both"
.
summaryTable(data = colon2,
group = "rx",
vars = "extent",
missing_percent = "both",
test = TRUE,
labels = list(extent = "Extent")
)
| With missing | Without missing | |||
---|---|---|---|---|---|
Characteristic | Control | Lev+5FU | Control | Lev+5FU | p-value2 |
Extent | 0.37 | ||||
1 | 8 (3%) | 9 (3%) | 8 (3%) | 9 (3%) | |
2 | 38 (12%) | 30 (10%) | 38 (13%) | 30 (11%) | |
3 | 222 (70%) | 224 (74%) | 222 (78%) | 224 (82%) | |
4 | 17 (5%) | 9 (3%) | 17 (6%) | 9 (3%) | |
Missing | 30 (10%) | 32 (11%) | |||
1n (%) | |||||
2Fisher's exact test |
Digits can be customized with the arguments digits_cont
and digits_cat
. The argument as_flex_table
(default to TRUE
) converts the gtsummary object to a
flextable object, which is better for Word output.
The argument type
will be introduced in a future release
to enable more fine-grained customization of the variables types.