Classical longitudinal methods (e.g., multilevel models) estimate one set of parameters shared — or partially shared — across all units of an ensemble. If the focus of interest is the trajectory of individuals, this is only sensible under hard-to-meet assumptions, such as exchangeability and/or ergodicity. If these assumptions are not met, ensemble averages may systematically obscure individual differences: an average positive effect may coexist with a significant subset of individuals for whom the effect is negative, nonsignificant, or zero.
Idionomic science inverts the order of operations:
The recommended pipeline is:
i_screener() → pmstandardize() → i_detrender() → iarimax()/looping_machine() → i_pval() / sden_test()
This vignette walks through each step with a synthetic panel of 12
subjects, each contributing 50 time points. The data includes four
variables (a, b, c,
y) with known relationships, plus three manually created
subjects that demonstrate specific pipeline behaviors.
library(idionomics)
#>
#> One Size Fits None!
set.seed(42)
panel <- do.call(rbind, lapply(1:9, function(id) {
a <- rnorm(50)
b <- 0.4 * a + rnorm(50)
c <- 0.4 * b + rnorm(50)
data.frame(id = as.character(id), time = seq_len(50),
a = a, b = b, c = c,
y = 0.5 * a + rnorm(50),
stringsAsFactors = FALSE)
}))
# Subject 10: near-constant "a" — will be caught by i_screener(min_sd = 0.5)
s10 <- data.frame(id = "10", time = seq_len(50),
a = rep(3, 50), b = rnorm(50), c = rnorm(50),
y = rnorm(50), stringsAsFactors = FALSE)
# Subject 11: full positive loop (a -> b -> c -> a)
a11 <- rnorm(50)
b11 <- 0.6 * a11 + rnorm(50, sd = 0.5)
c11 <- 0.6 * b11 + rnorm(50, sd = 0.5)
s11 <- data.frame(id = "11", time = seq_len(50),
a = a11 + 0.4 * c11, b = b11, c = c11,
y = 0.5 * a11 + rnorm(50), stringsAsFactors = FALSE)
# Subject 12: negative a -> y effect
a12 <- rnorm(50)
s12 <- data.frame(id = "12", time = seq_len(50),
a = a12, b = 0.4 * a12 + rnorm(50),
c = rnorm(50), y = -0.5 * a12 + rnorm(50),
stringsAsFactors = FALSE)
panel <- rbind(panel, s10, s11, s12)The data-generating process:
a is independent noise,
b = 0.4*a + noise, c = 0.4*b + noise,
y = 0.5*a + noise. The chain a -> b -> c
provides signal for looping_machine(), and
a -> y provides signal for iarimax(). The
c -> a leg has no true signal — the loop should not
close for most subjects.a is constant (rep(3, 50)), so
it has zero within-person variance. i_screener() will catch
this.c -> a. This subject should show a positive directed
loop.a -> y effect is negative
(-0.5), creating heterogeneity in the meta-analysis.i_screener()Before standardizing, screen subjects on their raw data. After
pmstandardize(), all non-constant series have within-person
variance = 1 by construction, making variance-based filters ineffective.
i_screener() catches low-quality subjects at the right
stage.
panel_clean <- i_screener(panel, cols = c("a", "b", "c", "y"), id_var = "id",
min_n_subject = 20, min_sd = 0.5, verbose = TRUE)
#> i_screener applies per-subject data quality filters.
#> min_n_subject: subject-column combinations need >= 20 non-NA observations per variable.
#> min_sd : subject-column combinations need within-person SD >= 0.5 per variable.
#> filter_type : 'joint' - Subjects excluded if any variable fails any criterion.
#> Subjects before screening : 12
#> Subjects failing at least one criterion across variables : 1
#> Subjects passing all criteria on all variables : 11Subject 10 is removed because a has SD = 0, which falls
below min_sd = 0.5.
Three criteria are available (all optional except
min_n_subject):
| Criterion | What it catches |
|---|---|
min_n_subject |
Too few observations |
min_sd |
Near-constant series (floor/ceiling responders) |
max_mode_pct |
“Stuck” responders (e.g. >= 80% of responses identical) |
Use mode = "report" to inspect quality metrics without
committing to exclusion:
report <- i_screener(panel, cols = c("a", "b", "c", "y"), id_var = "id",
min_n_subject = 20, min_sd = 0.5, max_mode_pct = 0.80,
mode = "report", verbose = TRUE)
#> i_screener applies per-subject data quality filters.
#> min_n_subject: subject-column combinations need >= 20 non-NA observations per variable.
#> min_sd : subject-column combinations need within-person SD >= 0.5 per variable.
#> max_mode_pct : subject-column combinations need <= 80% of responses on the modal value.
#> filter_type : 'joint' - Subjects excluded if any variable fails any criterion.
#> Subjects before screening : 12
#> Subjects failing at least one criterion across variables : 1
#> Subjects passing all criteria on all variables : 11
print(report)
#> # A tibble: 12 × 18
#> id a_n_valid b_n_valid c_n_valid y_n_valid a_sd b_sd c_sd y_sd
#> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 50 50 50 50 1.15 0.930 1.02 1.11
#> 2 10 50 50 50 50 0 0.994 0.979 0.934
#> 3 11 50 50 50 50 1.09 0.709 0.752 1.08
#> 4 12 50 50 50 50 1.05 1.15 1.12 1.09
#> 5 2 50 50 50 50 0.988 1.18 0.878 1.02
#> 6 3 50 50 50 50 0.996 1.19 1.13 1.11
#> 7 4 50 50 50 50 1.05 1.00 1.02 1.09
#> 8 5 50 50 50 50 1.07 1.10 1.10 1.40
#> 9 6 50 50 50 50 1.06 1.11 0.933 1.20
#> 10 7 50 50 50 50 0.995 1.04 1.15 1.03
#> 11 8 50 50 50 50 0.953 1.04 1.11 1.09
#> 12 9 50 50 50 50 0.935 1.18 1.03 1.10
#> # ℹ 9 more variables: a_mode_pct <dbl>, b_mode_pct <dbl>, c_mode_pct <dbl>,
#> # y_mode_pct <dbl>, a_pass <lgl>, b_pass <lgl>, c_pass <lgl>, y_pass <lgl>,
#> # pass_overall <lgl>pmstandardize()Person-mean standardization computes
(x - person_mean) / person_sd for each subject and column.
This removes between-person differences in level and scale, making
coefficients comparable across subjects. Output columns are named
<col>_psd.
panel_std <- pmstandardize(panel_clean, cols = c("a", "b", "c", "y"), id_var = "id",
verbose = TRUE)
#> This function creates within-person z-scores (person-level standardization).
#> If all values for a feature within ID are NA, returns NA.
#> If fewer than 2 non-NA values exist, returns 0 (SD undefined).
#> If values are constant within person, returns 0 (zero variance).
#> Otherwise, returns (x - person_mean) / person_sd.
head(panel_std[, c("id", "time", "a_psd", "b_psd", "c_psd", "y_psd")])
#> # A tibble: 6 × 6
#> id time a_psd b_psd c_psd y_psd
#> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1.22 0.843 1.64 0.618
#> 2 1 2 -0.459 -1.18 0.746 -1.61
#> 3 1 3 0.346 1.76 -0.195 1.25
#> 4 1 4 0.581 0.871 2.29 0.0759
#> 5 1 5 0.382 0.178 -0.443 -0.202
#> 6 1 6 -0.0612 0.159 0.311 -1.12Edge cases are handled automatically: all-NA series remain NA;
single-observation series return 0; constant series (zero SD) return 0
and are filtered downstream by iarimax()’s
minvar guard.
i_detrender()i_detrender() removes the within-person linear time
trend via lm(col ~ time) and appends each column with the
residuals (<col>_dt). This reduces the integration
order auto.arima() selects, fostering comparability between
models.
Filtering is per-column and independent: a subject can produce valid
detrended residuals for one variable while receiving NA in
another if that column fails the variance or observation-count
thresholds.
panel_dt <- i_detrender(panel_std, cols = c("a_psd", "b_psd", "c_psd", "y_psd"),
id_var = "id", timevar = "time", verbose = TRUE)
#> i_detrender removes the within-person linear time trend per variable.
#> If all values for a column within ID are NA: returns NA.
#> If fewer than 20 non-NA observations: returns NA.
#> If pre-detrend or post-detrend variance < 0.01: returns NA.
#> Otherwise: returns residuals from lm(col ~ timevar) per person.
head(panel_dt[, c("id", "time", "a_psd_dt", "b_psd_dt", "c_psd_dt", "y_psd_dt")])
#> # A tibble: 6 × 6
#> id time a_psd_dt b_psd_dt c_psd_dt y_psd_dt
#> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.883 0.512 1.41 0.587
#> 2 1 2 -0.785 -1.50 0.523 -1.64
#> 3 1 3 0.0349 1.45 -0.409 1.22
#> 4 1 4 0.283 0.580 2.08 0.0493
#> 5 1 5 0.0983 -0.0999 -0.637 -0.227
#> 6 1 6 -0.331 -0.105 0.126 -1.15Pipeline order matters. Detrending after standardization is preferred. Reversing the order (
i_detrender()thenpmstandardize()) can amplify near-zero residuals to unit variance, bypassingiarimax()’sminvarfilter.
iarimax()iarimax() fits one forecast::auto.arima()
model per subject, extracts coefficients via broom::tidy(),
and pools the focal predictor’s coefficients with
metafor::rma() (REML).
result <- iarimax(panel_dt,
y_series = "y_psd_dt",
x_series = "a_psd_dt",
id_var = "id",
timevar = "time",
verbose = TRUE)
#> Filtering data based on minimum non-NA observations and variance...
#> Filtering done: 11 subjects will be used for the analyses.
#> Running I-ARIMAX algorithm...
#> Applying ARIMAX (auto d) to case: 1 ... 9.1% done
#> Applying ARIMAX (auto d) to case: 11 ... 18.2% done
#> Applying ARIMAX (auto d) to case: 12 ... 27.3% done
#> Applying ARIMAX (auto d) to case: 2 ... 36.4% done
#> Applying ARIMAX (auto d) to case: 3 ... 45.5% done
#> Applying ARIMAX (auto d) to case: 4 ... 54.5% done
#> Applying ARIMAX (auto d) to case: 5 ... 63.6% done
#> Applying ARIMAX (auto d) to case: 6 ... 72.7% done
#> Applying ARIMAX (auto d) to case: 7 ... 81.8% done
#> Applying ARIMAX (auto d) to case: 8 ... 90.9% done
#> Applying ARIMAX (auto d) to case: 9 ... 100% done
#> I-ARIMAX algorithm finished.By default, auto.arima() selects the differencing order
d independently per subject. When d varies
across subjects, some coefficients describe effects on levels
(d = 0) while others describe effects on changes
(d = 1), making them less comparable in the meta-analysis.
Use fixed_d to impose a common differencing order:
# Fix d = 0 for all subjects (coefficients describe effects on levels)
result_d0 <- iarimax(panel_dt, y_series = "y_psd_dt", x_series = "a_psd_dt",
id_var = "id", timevar = "time", fixed_d = 0)summary(result)
#>
#> IARIMAX Results Summary
#> ----------------------------
#> Features:
#> Outcome : y_psd_dt
#> Focal predictor : a_psd_dt
#> ID variable : id
#>
#> Number of subjects:
#> Original Dataframe (iarimax input): 11
#> Excl. first filter (var or n). : 0
#> Skipped (auto.arima failed) : 0
#> Analyzed cases : 11
#>
#> Per-subject effects (alpha =0.05):
#> Positive : 10
#> Significant positive: 10
#> Negative : 1
#> Significant negative: 1
#>
#> Random-Effects Meta-Analysis (REML)
#> beta = 0.3801 SE = 0.0961 z = 3.9561 p = 1e-04
#> 95% Confidence Interval: [0.1918, 0.5684]
#> 95% Prediction Interval: [-0.2236, 0.9838]
#>
#> Heterogeneity
#> tau2 = 0.0856 I2 = 84.54 % Q(df = 10 ) = 65.5311 p = 0Each bar is a per-subject confidence interval, colored green
(significantly positive), red (significantly negative), or black
(crosses zero). The blue band is the 95% CI of the REMA pooled effect.
Subject 12 (negative a -> y effect) should appear on the
negative side.
The full individual-level output is in $results_df:
head(result$results_df[, c("id", "nAR", "nI", "nMA",
"estimate_a_psd_dt", "std.error_a_psd_dt",
"n_valid", "n_params", "raw_cor")])
#> # A tibble: 6 × 9
#> id nAR nI nMA estimate_a_psd_dt std.error_a_psd_dt n_valid n_params
#> <chr> <int> <int> <int> <dbl> <dbl> <int> <int>
#> 1 1 0 0 0 0.634 0.113 50 1
#> 2 11 0 0 0 0.438 0.129 50 1
#> 3 12 0 0 0 -0.531 0.126 50 1
#> 4 2 0 0 0 0.345 0.129 50 1
#> 5 3 0 0 3 0.448 0.143 50 4
#> 6 4 0 0 0 0.570 0.119 50 1
#> # ℹ 1 more variable: raw_cor <dbl>looping_machine()looping_machine() tests whether three variables form a
directed positive loop (a -> b -> c -> a). It fits three
I-ARIMAX legs, applies i_pval() to each, and computes
Loop_positive_directed: a 0/1 indicator that is 1 only when
all three focal betas are positive and significant at
alpha.
loop_result <- looping_machine(panel_dt,
a_series = "a_psd_dt",
b_series = "b_psd_dt",
c_series = "c_psd_dt",
id_var = "id",
timevar = "time",
verbose = TRUE)
#> Calculating a to b: From a_psd_dt to b_psd_dt...
#> Filtering data based on minimum non-NA observations and variance...
#> Filtering done: 11 subjects will be used for the analyses.
#> Running I-ARIMAX algorithm...
#> Applying ARIMAX (auto d) to case: 1 ... 9.1% done
#> Applying ARIMAX (auto d) to case: 11 ... 18.2% done
#> Applying ARIMAX (auto d) to case: 12 ... 27.3% done
#> Applying ARIMAX (auto d) to case: 2 ... 36.4% done
#> Applying ARIMAX (auto d) to case: 3 ... 45.5% done
#> Applying ARIMAX (auto d) to case: 4 ... 54.5% done
#> Applying ARIMAX (auto d) to case: 5 ... 63.6% done
#> Applying ARIMAX (auto d) to case: 6 ... 72.7% done
#> Applying ARIMAX (auto d) to case: 7 ... 81.8% done
#> Applying ARIMAX (auto d) to case: 8 ... 90.9% done
#> Applying ARIMAX (auto d) to case: 9 ... 100% done
#> I-ARIMAX algorithm finished.
#> Calculating b to c: From b_psd_dt to c_psd_dt...
#> Filtering data based on minimum non-NA observations and variance...
#> Filtering done: 11 subjects will be used for the analyses.
#> Running I-ARIMAX algorithm...
#> Applying ARIMAX (auto d) to case: 1 ... 9.1% done
#> Applying ARIMAX (auto d) to case: 11 ... 18.2% done
#> Applying ARIMAX (auto d) to case: 12 ... 27.3% done
#> Applying ARIMAX (auto d) to case: 2 ... 36.4% done
#> Applying ARIMAX (auto d) to case: 3 ... 45.5% done
#> Applying ARIMAX (auto d) to case: 4 ... 54.5% done
#> Applying ARIMAX (auto d) to case: 5 ... 63.6% done
#> Applying ARIMAX (auto d) to case: 6 ... 72.7% done
#> Applying ARIMAX (auto d) to case: 7 ... 81.8% done
#> Applying ARIMAX (auto d) to case: 8 ... 90.9% done
#> Applying ARIMAX (auto d) to case: 9 ... 100% done
#> I-ARIMAX algorithm finished.
#> Calculating c to a: From c_psd_dt to a_psd_dt...
#> Filtering data based on minimum non-NA observations and variance...
#> Filtering done: 11 subjects will be used for the analyses.
#> Running I-ARIMAX algorithm...
#> Applying ARIMAX (auto d) to case: 1 ... 9.1% done
#> Applying ARIMAX (auto d) to case: 11 ... 18.2% done
#> Applying ARIMAX (auto d) to case: 12 ... 27.3% done
#> Applying ARIMAX (auto d) to case: 2 ... 36.4% done
#> Applying ARIMAX (auto d) to case: 3 ... 45.5% done
#> Applying ARIMAX (auto d) to case: 4 ... 54.5% done
#> Applying ARIMAX (auto d) to case: 5 ... 63.6% done
#> Applying ARIMAX (auto d) to case: 6 ... 72.7% done
#> Applying ARIMAX (auto d) to case: 7 ... 81.8% done
#> Applying ARIMAX (auto d) to case: 8 ... 90.9% done
#> Applying ARIMAX (auto d) to case: 9 ... 100% done
#> I-ARIMAX algorithm finished.
#> Looping Machine finished.
#> Number of cases with the positive loop present: 1# Proportion of subjects with a detected positive directed loop
mean(loop_result$loop_df$Loop_positive_directed, na.rm = TRUE)
#> [1] 0.09090909
# Per-leg I-ARIMAX results are also accessible
summary(loop_result$iarimax_a_to_b)
#>
#> IARIMAX Results Summary
#> ----------------------------
#> Features:
#> Outcome : b_psd_dt
#> Focal predictor : a_psd_dt
#> ID variable : id
#>
#> Number of subjects:
#> Original Dataframe (iarimax input): 11
#> Excl. first filter (var or n). : 0
#> Skipped (auto.arima failed) : 0
#> Analyzed cases : 11
#>
#> Per-subject effects (alpha =0.05):
#> Positive : 11
#> Significant positive: 10
#> Negative : 0
#> Significant negative: 0
#>
#> Random-Effects Meta-Analysis (REML)
#> beta = 0.4695 SE = 0.0542 z = 8.6541 p = 0
#> 95% Confidence Interval: [0.3631, 0.5758]
#> 95% Prediction Interval: [0.1854, 0.7535]
#>
#> Heterogeneity
#> tau2 = 0.0181 I2 = 57.56 % Q(df = 10 ) = 27.0741 p = 0.0025Because the positive directed loop is a conjunction of three one-sided tests, it is very conservative: under the global null and independence, the false positive rate is approximately \((\alpha/2)^3\).
Optional arguments include covariates (shared extra
predictors for all three legs), include_third_as_covariate
(adds the third loop variable as a covariate in each leg), and
fixed_d (fixes the differencing order across all legs for
comparability).
i_pval()i_pval() attaches a pval_<feature>
column to results_df using the two-tailed t-distribution
with ML-based degrees of freedom (n_valid - n_params).
result_pval <- i_pval(result)
result_pval$results_df[, c("id", "estimate_a_psd_dt", "pval_a_psd_dt")]
#> # A tibble: 11 × 3
#> id estimate_a_psd_dt pval_a_psd_dt
#> <chr> <dbl> <dbl>
#> 1 1 0.634 0.000000928
#> 2 11 0.438 0.00136
#> 3 12 -0.531 0.000106
#> 4 2 0.345 0.0103
#> 5 3 0.448 0.00295
#> 6 4 0.570 0.0000153
#> 7 5 0.443 0.00111
#> 8 6 0.448 0.000802
#> 9 7 0.618 0.00000115
#> 10 8 0.387 0.00488
#> 11 9 0.364 0.00762sden_test()sden_test() runs a binomial test on the count of
individually significant effects. Two variants are available and
selected automatically based on the REMA pooled effect:
alpha_arimax). Used when the pooled effect is
non-significant.alpha_arimax / 2). Used when the pooled effect is
significant.The auto-selection pivot is fixed at p = 0.05, independent of
alpha_arimax.
sden <- sden_test(result_pval)
#> Running Sign Divergence Test (SDT): number of negative cases (counter positive pooled-effect) values are being evaluated
summary(sden)
#>
#> SDEN Test Results
#> -----------------------
#> Features:
#> Outcome : y_psd_dt
#> Focal predictor : a_psd_dt
#> ID variable : id
#>
#> Test : SDT counter-positive (Sign Divergence Test)
#> Test selection: Automatic
#>
#> Explanation:
#> -------------
#> Automatic selection based on:
#> The pooled effect (0.3801) was positive and statistically significant (p = 1e-04).
#>
#> Testing whether the proportion of negative significant effects
#> is greater than 0.025
#>
#> Results:
#> ---------
#> Context - REMA pooled effect (p-val): 0.38 ( 1e-04 )
#>
#> Total number of significant effects (both sides): 11
#> Number of significant positive cases : 10
#> Number of significant negative cases : 1
#> Number of valid cases : 11
#>
#> Given the test, relevant significant effects are just negatives : 1
#> SDT p-value = 0.2430786You can also force a specific test:
sden_ent <- sden_test(result, test = "ENT")
#> Running Equisyncratic Null Test (ENT)
#> The p-value of the pooled effect is statistically significant at 0.05. p = 1e-04. SDT test is suggested.
summary(sden_ent)
#>
#> SDEN Test Results
#> -----------------------
#> Features:
#> Outcome : y_psd_dt
#> Focal predictor : a_psd_dt
#> ID variable : id
#>
#> Test : ENT (Equisyncratic Null Test)
#> Test selection: Manual
#>
#> Explanation:
#> -------------
#> You manually selected the test.
#>
#> Testing whether the proportion of all significant effects (both sides)
#> is greater than 0.05
#>
#> Results:
#> ---------
#> Context - REMA pooled effect (p-val): 0.38 ( 1e-04 )
#>
#> Total number of significant effects (both sides): 11
#> Number of significant positive cases : 10
#> Number of significant negative cases : 1
#> Number of valid cases : 11
#>
#> Given the test, relevant significant effects are all (both sides) : 11
#> ENT p-value = 0