The sprtt package is a toolbox for
sequential probability
ratio tests (SPRTs). This vignette
introduces the core functions and demonstrates a typical analysis
workflow. For more comprehensive guides on specific topics, see the
other package vignettes. If you are unfamiliar with SPRTs, please read
first the vignette vignette("sprt").
The foundational literature (Wald, 1945, 1947) established the mathematical framework for SPRTs. While these original papers provide theoretical depth, they require strong mathematical statistics background and focus primarily on abstract theory rather than practical application.
For practical implementation, we strongly recommend starting with the following simulation studies, which demonstrate robustness to assumption violations, explain common pitfalls, and provide actionable guidance:
Essential reading:
For comparisons of different sequential designs (including SPRTs), see:
Whether a statistical tool is appropriate depends strongly on the research context and intended use. SPRTs are recommended when:
SPRTs are not recommended when:
The plan_sample_size() function helps establish
realistic expectations for data requirements and resource planning.
Unlike traditional designs, SPRTs do not require classical a priori power analysis. Power is controlled through the stopping boundaries, allowing you to start data collection immediately and stop once the test reaches a decision.
However, resource planning remains essential. While the boundaries control \(\alpha\) and \(\beta\) error rates in the long run, they cannot guarantee you will collect enough data to reach a decision within your available resources.
Two sample sizes are relevant for planning:
The plan_sample_size() function provides tables and
plots to help you find appropriate design parameters that balance
efficiency with feasibility.
For detailed examples and guidance, see the vignette
vignette("plan_sample_size").
A thoughtful data collection plan is highly recommended to keep groups somewhat balanced and to track and balance (or randomize) potential confounders.
Outliers
The question of how one should deal with outliers in sequential testing is still an ongoing research topic. Note, that implementing a naive sequential outlier analysis can lead to a inflation of the \(\alpha\) error rate, see Steinhilber et al. (2025).
As SPRTs are best suited for confirmatory research, preregistering the data collection plan, hypothesis, and test specifications (e.g., effect size of interest, \(\alpha\) and \(\beta\) levels) is strongly recommended.
Preparing an analysis script in advance enables a smooth process for continuously analyzing incoming data. This ensures data collection stops immediately once the stopping criterion is reached, avoiding unnecessary additional data collection.
To test the analysis pipeline, use either:
sprtt packageThe following test are currently implemented:
vignette("t_test")vignette("one_way_anova")Guidelines for the reporting of SPRTs can be found in the paper of Schubert et al. (2025) that also explicitly covers sequential testing and specifically SPRTs.
A complete SPRT report should include: the specific variant of the SPRT used (e.g., sequential t-test), the \(\alpha\) and \(\beta\) levels, the effect size or other parameters specifying the alternative hypothesis, the starting point of the SPRT (the sample size at the first look), the final sample size when data collection was stopped, the final likelihood ratio, a plot showing the full likelihood progression across all looks, and an effect size estimate with confidence interval (Schubert et al., 2025). Note that effect size estimates in sequential designs are often biased and should be interpreted with caution.
Example report β decision reached (\(H_1\) accepted):
We preregistered a sequential t-test (Schnuerch & Erdfelder, 2020; Wald, 1945) with error probabilities \(\alpha = .05\) and \(\beta = .05\), and a minimum effect size of interest of \(d = 0.5\). The first look took place at \(n = 5\) per group. Data collection stopped at \(N = 48\) (24 per group) when the likelihood ratio crossed the upper decision boundary (\(LR_{48} = 21.3 > B = 19\)), providing sufficient evidence to accept \(H_1\). The estimated effect size was \(d = 0.61\) (95% CI [0.21, 1.00]). A plot of the full likelihood progression is provided in Figure X. All materials and the preregistration are available at [OSF link].
Example report β \(N_{max}\) reached (non-decision):
We preregistered a sequential t-test (Schnuerch & Erdfelder, 2020; Wald, 1945) with error probabilities \(\alpha = .05\) and \(\beta = .05\), a minimum effect size of interest of \(d = 0.5\), and a maximum sample size of \(N_{max} = 200\) (100 per group). The first look took place at \(n = 4\) per group. Data collection stopped upon reaching \(N_{max} = 200\) without the likelihood ratio crossing either decision boundary (\(LR_{200} = 3.1\); lower boundary \(A = 1/19\), upper boundary \(B = 19\)). This constitutes a non-decision: the accumulated evidence was insufficient to accept either \(H_0\) or \(H_1\) with the prespecified error control. The results are therefore inconclusive, though the final likelihood ratio indicates that the data are 3.1 times more likely under \(H_1\) than under \(H_0\). Note that a non-decision due to resource depletion does not constitute evidence for \(H_0\). The estimated effect size was \(d = 0.21\) (95% CI \([-0.07, 0.49]\)). A plot of the full likelihood progression is provided in Figure X.