epiworldRcalibrate is an R package that provides fast,
data-driven calibration of agent-based epidemic models built with
epiworldR (Meyer and Vega Yon
2023). Calibration — the process of finding model parameters that
reproduce observed epidemic dynamics — is traditionally performed using
computationally expensive simulation-based methods such as Approximate
Bayesian Computation (ABC), which can require minutes to hours per run.
epiworldRcalibrate addresses this bottleneck by
implementing DeepIMC (Deep Inverse Mapping Calibration), a pretrained
Bidirectional Long Short-Term Memory (BiLSTM) neural network (Najafzadehkhoei et al. 2025) that estimates SIR
model parameters from a 60-day incidence time series in seconds. The
package manages all required Python dependencies automatically through
reticulate (Ushey, Allaire, and Tang
2024), requiring no manual environment configuration from the
user. Pre-computed ABC results are also shipped with the package,
enabling researchers to benchmark DeepIMC estimates against full
Bayesian posterior distributions without re-running expensive
simulations.
Agent-based models (ABMs) are widely used in infectious disease epidemiology because they capture individual-level behaviors and contact heterogeneity that compartmental models cannot represent (Railsback and Grimm 2019). However, a persistent challenge in applied ABM work is calibration: identifying the parameter values — transmission probability, contact rate, and recovery rate — that make the model’s simulated incidence match observed surveillance data.
The dominant calibration approach, Approximate Bayesian Computation (ABC), draws candidate parameters from a prior, simulates the model, and accepts draws whose output is sufficiently close to the observed data under a chosen discrepancy metric (Toni et al. 2009; Marjoram et al. 2003). This provides posterior distributions over parameters but requires thousands of forward simulations and can be prohibitively expensive when calibration must be repeated across locations, intervention scenarios, or forecasting horizons. An alternative line of work uses machine learning to build surrogate models that approximate the forward mapping from parameters to simulated trajectories (Lamperti, Roventini, and Sani 2018; Angione, Silverman, and Yaneske 2022). While surrogates accelerate individual forward passes, they still require iterative sampling at calibration time.
epiworldRcalibrate takes a different approach,
implementing the inverse mapping directly: the DeepIMC network learns to
map observed incidence trajectories back to the underlying model
parameters. Once trained offline, the network produces calibrated
parameter estimates via a single forward pass at inference time — no
simulation is required. In a comprehensive simulation study of 5,000
epidemic scenarios, DeepIMC reduced calibration time from approximately
77 seconds (ABC–LFMCMC) to 2.35 seconds while achieving lower parameter
recovery error and tighter, well-calibrated prediction intervals (Najafzadehkhoei et al. 2025).
epiworldRcalibrate addresses the need for fast,
accessible calibration of ABMs built with epiworldR. It
targets researchers and public health practitioners who need rapid
parameter estimates during an active outbreak, researchers conducting
simulation studies requiring repeated calibration across many scenarios,
and educators teaching infectious disease modeling who want students to
focus on model interpretation rather than calibration machinery.
Existing R packages for ABC calibration — such as abc
(Csilléry, François, and Blum 2012) —
require the user to supply a forward simulator and run it thousands of
times per calibration, making them slow by design and dependent on the
user’s model implementation. epiworldRcalibrate is, to our
knowledge, the first R package to provide a pretrained deep learning
model for SIR calibration, enabling near-instant parameter estimation
from incidence data with no simulation required at inference time.
The DeepIMC calibration model is a BiLSTM neural network (Schuster and Paliwal 1997) implemented in
PyTorch and interfaced to R via reticulate (Ushey, Allaire, and Tang 2024). BiLSTMs process
sequential input in both the forward and backward temporal directions
simultaneously, allowing the network to exploit the full shape of the
epidemic curve — including the growth phase, peak timing, and tail decay
— when estimating parameters. This bidirectional context is particularly
valuable for calibration from complete trajectories, where the tail is
as informative about transmission dynamics as the early rise.
The network consists of three stacked BiLSTM layers with 160 hidden
units per direction and a dropout rate of 0.5 applied between layers.
The final forward and backward hidden states are concatenated with two
normalized scalar inputs — population size and recovery rate — to form a
322-dimensional feature vector. This is passed through two fully
connected layers (322 → 64 → 3) with ReLU activation in the first layer.
Three output heads produce epidemiologically constrained estimates: a
Sigmoid activation bounds transmission probability ptran to
\((0, 1)\), and Softplus activations
enforce positivity for contact rate crate and basic
reproduction number \(R_0\). All
hyperparameters — including learning rate (\(2.77 \times 10^{-4}\)), number of layers,
hidden units, and dropout — were selected via Optuna-based tuning.
The network is trained with a composite loss function combining mean squared error with an epidemiological consistency penalty:
\[\mathcal{L}(\phi) = \mathbb{E}\left[\|\hat{\theta}_\phi(Y) - \theta\|_2^2\right] + \lambda \left(\hat{R}_{0,\phi}(Y)\cdot\gamma - \hat{p}_{\text{tran},\phi}(Y)\cdot\hat{c}_{\text{rate},\phi}(Y)\right)^2\]
where \(\gamma\) is the recovery rate and \(\lambda = 1.77 \times 10^{-4}\) was tuned jointly with the other hyperparameters. This penalty encourages the predicted parameters to satisfy the theoretical identity \(R_0 \times \text{recov} = \text{ptran} \times \text{crate}\), acting as a light regulariser that enforces epidemiological coherence without dominating the training signal.
The network was trained entirely on synthetic SIR simulations
generated with epiworldR. For each simulation, parameters
were drawn from the following distributions:
Each simulation produced a 60-day daily incidence trajectory, which served as the sole sequence input to the network. Population size and recovery rate were included as additional scalar covariates, consistent with the assumption — standard in applied epidemic modeling — that these quantities are known or reliably estimated from external sources. Input features were normalised with MinMax scaling; the fitted scalers are bundled with the model weights and applied identically at inference time. The model was trained with batch size 64 for up to 100 epochs with early stopping based on validation loss.
The BiLSTM model requires PyTorch, NumPy, scikit-learn, and joblib —
Python libraries not natively available in R. A common barrier to
adopting Python-backed R packages is the complexity of environment
setup: users may need to install a specific Python version, create a
virtual environment, and resolve package conflicts with existing
installations. epiworldRcalibrate eliminates this barrier
through reticulate (Ushey, Allaire,
and Tang 2024), using the py_require() interface
introduced in reticulate >= 1.41, which is backed by
uv — a fast, self-contained Python package manager that
resolves and installs all required packages into a dedicated virtual
environment without requiring a system Python installation.
From the user’s perspective, setup requires a single function call, run once after installing the package:
epiworldRcalibrate::setup_python_deps(force = TRUE)
This function declares the required packages and a Python version
constraint (>=3.11,<3.12) to
reticulate::py_require(), initializes the managed
environment, and verifies that each package can be successfully
imported. In subsequent R sessions, setup_python_deps()
with force = FALSE (the default) performs only the
verification step. Python is never initialized automatically during
normal package use: calibration functions call an internal check that
raises an informative error if setup has not been completed, directing
the user to setup_python_deps(). The pretrained model
weights and fitted scalers are bundled under inst/models/,
so no network access is required after initial setup.
Pre-computed ABC results obtained via Likelihood-Free MCMC (LFMCMC)
(Marjoram et al. 2003) as implemented in
epiworldR are shipped as the
abc_calibration_params dataset. Each LFMCMC chain ran for
2,000 iterations, discarding the first 1,000 as burn-in, with parameters
proposed via multiplicative log-normal perturbations and discrepancy
measured by Euclidean distance between simulated and observed
trajectories with an adaptive tolerance \(\varepsilon = 0.05 \cdot
\|S_{\text{obs}}\|_2\). These pre-computed results allow users to
compare DeepIMC point estimates against full posterior distributions
without re-running the multi-hour procedure. The generating script is
available in data-raw/abc_calibration_results.R.
Built on the epiworldR simulation framework (Meyer and Vega Yon 2023) and the
reticulate Python interface (Ushey,
Allaire, and Tang 2024), epiworldRcalibrate provides
the following features:
calibrate_sir()
accepts a 60-day incidence vector, population size, and recovery rate,
and returns named estimates for ptran, crate,
and \(R_0\) in under a minute with no
simulation required at inference time.reticulate::py_require() with a single call to
setup_python_deps(). No manual conda or virtualenv
configuration is needed.inst/models/, so no external downloads are required after
installation.ModelSIRCONN() and
related epiworldR functions for downstream stochastic
simulation and uncertainty quantification via
run_multiple().| Feature | epiworldRcalibrate (DeepIMC) | ABC (abc package) |
|---|---|---|
| Calibration time | ~2 seconds | ~77 seconds (simple SIR) |
| Uncertainty quantification | Point estimate | Full posterior |
| Requires simulation at inference | No | Yes |
| Pretrained model included | Yes | No |
| Automatic Python management | Yes | No |
| epiworldR integration | Native | Manual |
This work was supported by cooperative agreement CDC-RFA-FT-23-0069 from the CDC’s Center for Forecasting and Outbreak Analytics. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention.