| Title: | Statistical Tools for Modelling Climate-Health Impacts |
| Version: | 1.0.0 |
| Date: | 2026-03-26 |
| Description: | Tools for producing climate-health indicators and supporting official statistics from health and climate data. Implements analytical workflows for temperature-related mortality, wildfire smoke exposure, air pollution, suicides related to extreme heat, malaria, and diarrhoeal disease outcomes, with utilities for descriptive statistics, model validation, attributable fraction and attributable number estimation, relative risk estimation, minimum mortality temperature estimation, and plotting for reporting. These six indicators are endorsed by the United Nations Statistical Commission for inclusion in the Global Set of Environment and Climate Change Statistics. Implemented methods include distributed lag non-linear models (DLNM), quasi-Poisson time-series regression, case-crossover analysis, Bayesian spatio-temporal models using the Integrated Nested Laplace Approximation ('INLA'), and multivariate meta-analysis for sub-national estimates. The package is based on methods developed in the Standards for Official Statistics on Climate-Health Interactions (SOSCHI) project https://climate-health.officialstatistics.org. For methodologies, see Watkins et al. (2025) <doi:10.5281/zenodo.14865904>, Brown et al. (2024) <doi:10.5281/zenodo.14052183>, Pearce et al. (2024) <doi:10.5281/zenodo.14050224>, Byukusenge et al. (2025) <doi:10.5281/zenodo.15585042>, Dzakpa et al. (2025) <doi:10.5281/zenodo.14881886>, and Dzakpa et al. (2025) <doi:10.5281/zenodo.14871506>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | car, data.table, dlnm, dplyr, Epi, forcats, exactextractr, ggplot2, ggtext, gnm, graphics, grDevices, gplots, lifecycle, lme4, lubridate, metafor, mgcv, mixmeta, ncdf4, patchwork, pkgbuild, purrr, raster, RColorBrewer, readr, readxl, reshape2, rlang, scales, sf, spdep, splines, stats, stringr, tibble, tidyr, tools, tseries, tsModel (≥ 0.6-2), utils, xfun, zoo |
| VignetteBuilder: | knitr |
| Suggests: | covr, knitr, rmarkdown, devtools, DT, htmltools, INLA, mockery, mvmeta, openxlsx, patrick, pkgload, stringdist, terra, testthat (≥ 3.2.1.1), withr |
| URL: | https://climate-health.officialstatistics.org |
| Additional_repositories: | https://inla.r-inla-download.org/R/stable/ |
| Depends: | R (≥ 4.4.0) |
| Config/rcmdcheck/ignore-inconsequential-notes: | true |
| NeedsCompilation: | no |
| Packaged: | 2026-03-26 10:24:16 UTC; omekek |
| Author: | Charlie Browning [aut], Kenechi Omeke [aut, cre], Etse Yawo Dzakpa [aut], Gladin Jose [aut], Matt Pearce [aut], Ellie Watkins [aut], Claire Hunt [aut], Beatrice Byukusenge [aut], Cassien Habyarimana [aut], Venuste Nyagahakwa [aut], Felix Scarbrough [aut], Treesa Shaji [aut], Bonnie Lewis [aut], Maquines Odhiambo Sewe [aut], Vijendra Ingole [aut], Sean Lovell [ctb], Antony Brown [ctb], Euan Soutter [ctb], Gillian Flower [ctb], David Furley [ctb], Joe Panes [ctb], Charlotte Romaniuk [ctb], Milly Powell [ctb], Wellcome [fnd], Office for National Statistics [cph] (SOSCHI Project) |
| Maintainer: | Kenechi Omeke <climate.health@ons.gov.uk> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-30 18:50:02 UTC |
climatehealth: Statistical Tools for Modelling Climate-Health Impacts
Description
Overview
This package provides a suite of analysis functions for measuring the relationship between various climate factors (indicators) and health outcomes.
Included Indicators
Mortality attributable to high and low outdoor temperatures
Mortality attributable to wildfire-related PM2.5
Suicides attributable to extreme heat
Mortality attributable to short-term exposure to outdoor PM2.5 exposure
Diarrhea cases attributable to extreme temperatures and rainfall
Malaria cases attributable to extreme temperatures and rainfall
License
MIT
The full range of topics include
Temperature-related health effects
Health effects of wildfires
Mental Health
Health effects of air pollution
Water-borne diseases
Vector-borne diseases
Author(s)
Maintainer: Kenechi Omeke climate.health@ons.gov.uk
Authors:
Charlie Browning
Etse Yawo Dzakpa
Gladin Jose
Matt Pearce
Ellie Watkins
Claire Hunt
Beatrice Byukusenge
Cassien Habyarimana
Venuste Nyagahakwa
Felix Scarbrough
Treesa Shaji
Bonnie Lewis
Maquines Odhiambo Sewe
Vijendra Ingole
Other contributors:
Sean Lovell [contributor]
Antony Brown [contributor]
Euan Soutter [contributor]
Gillian Flower [contributor]
David Furley [contributor]
Joe Panes [contributor]
Charlotte Romaniuk [contributor]
Milly Powell [contributor]
Wellcome [funder]
Office for National Statistics (SOSCHI Project) [copyright holder]
See Also
Useful links:
English day of week names
Description
Provides consistent English day names regardless of system locale
Usage
.english_dow_names(day_numbers = NULL, short = FALSE)
Arguments
day_numbers |
Optional vector of day numbers (1-7, where 1=Sunday) |
short |
Logical. Return abbreviated names? Default FALSE. |
Value
Character vector of day names
English month names
Description
Provides consistent English month names regardless of system locale
Usage
.english_month_names(month_numbers = NULL, short = FALSE)
Arguments
month_numbers |
Optional vector of month numbers (1-12) to return |
short |
Logical. Return abbreviated names? Default FALSE. |
Value
Character vector of month names
Temporarily set English locale for date operations
Description
Temporarily sets the locale to English for date parsing and formatting
Usage
.with_english_locale(expr)
Arguments
expr |
Expression to evaluate with English locale |
Value
Result of the expression
Raise a typed error with structured metadata
Description
Creates a classed condition that can be caught and inspected by the API layer.
This is the base helper - prefer using specific helpers like
abort_column_not_found() or abort_validation() when applicable.
Usage
abort_climate(message, type = "generic_error", ..., call = rlang::caller_env())
Arguments
message |
Human-readable error message |
type |
Error type for classification. One of:
|
... |
Additional metadata to include in the error (e.g., column = "tmean") |
call |
The call to include in the error (defaults to caller's call) |
Value
Never returns; always raises an error.
Examples
# Basic usage
err <- tryCatch(
abort_climate("Something went wrong", "generic_error"),
error = identity
)
inherits(err, "climate_error")
# With metadata
err <- tryCatch(
abort_climate(
"Invalid lag value",
"validation_error",
param = "nlag",
value = -1,
expected = "non-negative integer"
),
error = identity
)
err$type
Raise a column-not-found error with available columns
Description
Use this when a required column is missing from a dataset. Includes fuzzy matching to suggest the closest available column name.
Usage
abort_column_not_found(
column,
available,
dataset_name = "dataset",
call = rlang::caller_env()
)
Arguments
column |
The column name that was not found |
available |
Character vector of available column names |
dataset_name |
Optional name of the dataset for clearer messages |
call |
The call to include in the error |
Value
Never returns; always raises an error.
Examples
data <- data.frame(temp = 1)
if (!("tmean" %in% colnames(data))) {
err <- tryCatch(
abort_column_not_found("tmean", colnames(data)),
error = identity
)
err$suggestion
}
Raise a model error (statistical/computational failures)
Description
Use this when statistical models fail to converge, produce singular matrices, or encounter other computational issues that aren't due to obvious user error.
Usage
abort_model_error(
message,
model_type = "unknown",
...,
call = rlang::caller_env()
)
Arguments
message |
Human-readable error message |
model_type |
Type of model that failed (e.g., "dlnm", "glm", "meta-analysis") |
... |
Additional diagnostic metadata |
call |
The call to include in the error |
Value
Never returns; always raises an error.
Examples
tryCatch({
stop("convergence failed")
}, error = function(e) {
err <- tryCatch(
abort_model_error(
"Model failed to converge",
model_type = "dlnm",
original_error = conditionMessage(e)
),
error = identity
)
inherits(err, "model_error")
})
Raise a validation error (data/parameter issues)
Description
Use this for general validation failures where the user's input or data
doesn't meet requirements. For missing columns specifically, use
abort_column_not_found().
Usage
abort_validation(message, ..., call = rlang::caller_env())
Arguments
message |
Human-readable error message |
... |
Additional metadata (e.g., param = "nlag", value = -1) |
call |
The call to include in the error |
Value
Never returns; always raises an error.
Examples
# Parameter validation
nlag <- -1
if (nlag < 0) {
err <- tryCatch(
abort_validation(
"nlag must be >= 0",
param = "nlag",
value = nlag,
expected = "non-negative integer"
),
error = identity
)
inherits(err, "validation_error")
}
Aggregate air pollution results by month
Description
Aggregates daily analysis results to monthly summaries
Usage
aggregate_air_pollution_by_month(
analysis_results,
max_lag = 14L,
include_national = TRUE
)
Arguments
analysis_results |
Results from analyze_air_pollution_daily |
max_lag |
Integer. Maximum lag used in analysis. Defaults to 14. |
include_national |
Logical. Whether to include national results. Default TRUE. |
Value
Dataframe with monthly aggregates
Aggregate air pollution results by region
Description
Aggregates daily analysis results to regional summaries
Usage
aggregate_air_pollution_by_region(analysis_results, max_lag = 14L)
Arguments
analysis_results |
Results from analyze_air_pollution_daily |
max_lag |
Integer. Maximum lag used in analysis. Defaults to 14. |
Value
Dataframe with regional aggregates
Aggregate air pollution results by year
Description
Aggregates daily analysis results to annual summaries
Usage
aggregate_air_pollution_by_year(
analysis_results,
max_lag = 14L,
include_national = TRUE
)
Arguments
analysis_results |
Results from analyze_air_pollution_daily |
max_lag |
Integer. Maximum lag used in analysis. Defaults to 14. |
include_national |
Logical. Whether to include national results. Default TRUE. |
Value
Dataframe with annual aggregates
Split dataframe into multiple dataframes, based on a columns value.
Description
Split dataframe into multiple dataframes, based on a columns value.
Usage
aggregate_by_column(df, column_name)
Arguments
df |
The dataframe to aggregate. |
column_name |
The column to aggregate the data by. |
Value
A list of dataframes, split up based on the value of column_name.
Descriptive statistics
Description
Generates summary statistics for climate, environmental and health data
Usage
air_pollution_descriptive_stats(
data,
env_labels = c(pm25 = "PM2.5 (µg/m³)", tmax = "Max Temperature (°C)", precipitation
= "Precipitation (mm)", humidity = "Humidity (%)", wind_speed = "Wind Speed (m/s)"),
save_outputs = FALSE,
output_dir = NULL,
moving_average_window = 3L,
plot_corr_matrix = FALSE,
correlation_method = "pearson",
plot_dist = FALSE,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
detect_outliers = FALSE,
calculate_rate = FALSE
)
Arguments
data |
Dataframe containing a daily time series of climate, environmental and health data |
env_labels |
Named vector. Labels for environmental variables with units. |
save_outputs |
Logical. Whether to save outputs. Defaults to FALSE. |
output_dir |
Character. Directory to save descriptive statistics. Defaults to NULL. |
moving_average_window |
Numeric. Window size for moving average calculations. Defaults to 3 (3-day moving average). |
plot_corr_matrix |
Logical. Whether to plot correlation matrix. Defaults to FALSE. |
correlation_method |
Character. Correlation method. One of 'pearson', 'spearman', 'kendall'. |
plot_dist |
Logical. Whether to plot distribution histograms. Defaults to FALSE. |
plot_na_counts |
Logical. Whether to plot NA counts. Defaults to FALSE. |
plot_scatter |
Logical. Whether to plot scatter plots. Defaults to FALSE. |
plot_box |
Logical. Whether to plot box plots. Defaults to FALSE. |
plot_seasonal |
Logical. Whether to plot seasonal trends. Defaults to FALSE. |
plot_regional |
Logical. Whether to plot regional trends. Defaults to FALSE. |
plot_total |
Logical. Whether to plot total health outcomes per year. Defaults to FALSE. |
detect_outliers |
Logical. Whether to detect outliers. Defaults to FALSE. |
calculate_rate |
Logical. Whether to calculate rate per 100k people.. Defaults to FALSE. |
Value
Invisibly returns the national data with moving averages
Comprehensive Air Pollution Analysis Pipeline
Description
Master function that runs the complete air pollution analysis including data loading, preprocessing (including lags), modeling, plotting, attribution calculations vs reference standards, power analysis and descriptive statistics
Usage
air_pollution_do_analysis(
data_path,
date_col = "date",
region_col = "region",
pm25_col = "pm25",
deaths_col = "deaths",
population_col = "population",
humidity_col = "humidity",
precipitation_col = "precipitation",
tmax_col = "tmax",
wind_speed_col = "wind_speed",
categorical_others = NULL,
continuous_others = NULL,
Categorical_Others = NULL,
Continuous_Others = NULL,
max_lag = 14L,
df_seasonal = 6,
family = "quasipoisson",
reference_standards = list(list(value = 15, name = "WHO")),
output_dir = "air_pollution_results",
save_outputs = TRUE,
run_descriptive = TRUE,
run_power = TRUE,
moving_average_window = 3L,
include_national = TRUE,
years_filter = NULL,
regions_filter = NULL,
attr_thr = 95,
plot_corr_matrix = TRUE,
correlation_method = "pearson",
plot_dist = TRUE,
plot_na_counts = TRUE,
plot_scatter = TRUE,
plot_box = TRUE,
plot_seasonal = TRUE,
plot_regional = TRUE,
plot_total = TRUE,
detect_outliers = TRUE,
calculate_rate = FALSE
)
Arguments
data_path |
Character. Path to CSV data file |
date_col |
Character. Name of date column |
region_col |
Character. Name of region column |
pm25_col |
Character. Name of PM2.5 column |
deaths_col |
Character. Name of deaths column |
population_col |
Character. Name of the population column. |
humidity_col |
Character. Name of humidity column |
precipitation_col |
Character. Name of precipitation column |
tmax_col |
Character. Name of temperature column |
wind_speed_col |
Character. Name of wind speed column |
categorical_others |
Optional character vector. Names of additional categorical variables. |
continuous_others |
Optional character vector. Names of additional continuous variables (e.g., "tmean") |
Categorical_Others |
Deprecated alias for |
Continuous_Others |
Deprecated alias for |
max_lag |
Integer. Maximum lag days. Defaults to 14. |
df_seasonal |
Integer. Degrees of freedom for seasonal spline. Default 6. |
family |
Character. Character. Probability distribution for the outcome variable. Options include "quasipoisson" (default: "quasipoisson") |
reference_standards |
List of reference standards, each with "PM2.5 value" and "name of of standard (e.g. WHO)" |
output_dir |
Directory to save outputs |
save_outputs |
Logical. Whether to save outputs |
run_descriptive |
Logical. Whether to run descriptive statistics |
run_power |
Logical. Whether to run power analysis |
moving_average_window |
Integer. Window for moving average in descriptive stats |
include_national |
Logical. Whether to include national results in plots. Default TRUE. |
years_filter |
Optional numeric vector of years to include (e.g., c(2020, 2021, 2022)). It is recommended to filter for at least 3 consecutive years for a minimum considerable time series |
regions_filter |
Optional character vector of regions to include |
attr_thr |
Numeric (0-100). Percentile threshold used in power analysis to evaluate attribution detectability. Default 95. |
plot_corr_matrix |
Logical. Plot correlation matrix. Default TRUE. |
correlation_method |
Character. Correlation method for corr matrix (e.g.,"pearson", "spearman"). Default "pearson". |
plot_dist |
Logical. Plot distributions (hist/density) for key variables. Default TRUE. |
plot_na_counts |
Logical. Plot missingness/NA counts. Default TRUE. |
plot_scatter |
Logical. Plot scatter plots for key pairs. Default TRUE. |
plot_box |
Logical. Plot boxplots by region/season where applicable. Default TRUE. |
plot_seasonal |
Logical. Plot seasonal summaries. Default TRUE. |
plot_regional |
Logical. Plot regional summaries. Default TRUE. |
plot_total |
Logical. Plot overall totals where relevant. Default TRUE. |
detect_outliers |
Logical. Flag potential outliers in descriptive workflow. Default TRUE. |
calculate_rate |
Logical. Whether to calculate rate variables during descriptive stats (e.g., deaths per population). Default FALSE |
Value
List containing:
- data
Processed data with lag variables
- meta_analysis
Meta-analysis results with AF/AN calculations
- lag_analysis
Lag-specific analysis results
- distributed_lag_analysis
Distributed lag model results (if requested)
- plots
List of generated plots (forest, lags, distributed lags)
- power_list
A list containing power information by area
- exposure_response_plots
Exposure-response plots for each reference standard (if requested)
- reference_specific_af_an
AF/AN calculations specific to each reference standard (if requested)
- descriptive_stats
Summary statistics of key variables
Examples
example_data <- data.frame(
date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 180),
province = "Example Province",
pm25 = stats::runif(180, 8, 35),
deaths = stats::rpois(180, lambda = 5),
population = 500000,
humidity = stats::runif(180, 40, 90),
precipitation = stats::runif(180, 0, 20),
tmax = stats::runif(180, 18, 35),
wind_speed = stats::runif(180, 1, 8)
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)
results <- air_pollution_do_analysis(
data_path = example_path,
date_col = "date",
region_col = "province",
pm25_col = "pm25",
deaths_col = "deaths",
population_col = "population",
humidity_col = "humidity",
precipitation_col = "precipitation",
tmax_col = "tmax",
wind_speed_col = "wind_speed",
continuous_others = NULL,
max_lag = 7L,
df_seasonal = 4,
family = "quasipoisson",
reference_standards = list(list(value = 15, name = "WHO")),
years_filter = NULL,
regions_filter = NULL,
include_national = FALSE,
output_dir = tempdir(),
save_outputs = FALSE,
run_descriptive = FALSE,
run_power = FALSE,
moving_average_window = 3L,
attr_thr = 95,
plot_corr_matrix = FALSE,
correlation_method = "pearson",
plot_dist = FALSE,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
detect_outliers = FALSE,
calculate_rate = FALSE
)
Perform meta analysis with multiple lag structures
Description
Implements distributed lag model. Individual lag coefficients and cumulative effects are extracted and perform meta analysis
Usage
air_pollution_meta_analysis(
data_with_lags,
max_lag = 14L,
df_seasonal = 6L,
family = "quasipoisson"
)
Arguments
data_with_lags |
Lagged data |
max_lag |
Integer. Maximum lag days. Defaults to 14 |
df_seasonal |
Integer. Degrees of freedom for seasonal spline. Default 6. |
family |
Character string indicating the distribution family used in the GAM. |
Value
Dataframe with lag-specific results including for regional and national
Air Pollution Power Calculation using Meta Results
Description
Produce a power statistic by region for PM2.5 attributable mortality using meta-analysis results
Usage
air_pollution_power_list(
meta_results,
data_with_lags,
ref_pm25 = 15,
attr_thr = 95,
include_national = TRUE
)
Arguments
meta_results |
Meta-analysis results from air_pollution_meta_analysis |
data_with_lags |
Lagged data frame |
ref_pm25 |
Numeric. Reference PM2.5 value for attributable risk calculation |
attr_thr |
Integer. Percentile at which to define the PM2.5 threshold for calculating attributable risk. Defaults to 95. |
include_national |
Logical. Whether to include national level calculations. Defaults to TRUE. |
Value
A list containing power information by region
FUNCTION FOR COMPUTING ATTRIBUTABLE MEASURES FROM DLNM
Description
A function to calculate attributable numbers and fractions derived from (c) Antonio Gasparrini 2015-2017. Modifications to produce daily values with confidence intervals.
Usage
an_attrdl(
x,
basis,
cases,
coef = NULL,
vcov = NULL,
model.link = NULL,
dir = "back",
tot = TRUE,
cen,
range = NULL,
nsim = 5000
)
Arguments
x |
AN EXPOSURE VECTOR OR (ONLY FOR dir="back") A MATRIX OF LAGGED EXPOSURES |
basis |
THE CROSS-BASIS COMPUTED FROM x |
cases |
THE CASES VECTOR OR (ONLY FOR dir="forw") THE MATRIX OF FUTURE CASES |
coef |
COEF FOR basis IF model IS NOT PROVIDED |
vcov |
VCOV FOR basis IF model IS NOT PROVIDED |
model.link |
LINK FUNCTION IF model IS NOT PROVIDED |
dir |
EITHER "back" OR "forw" FOR BACKWARD OR FORWARD PERSPECTIVES |
tot |
IF TRUE, THE TOTAL ATTRIBUTABLE RISK IS COMPUTED |
cen |
THE REFERENCE VALUE USED AS COUNTERFACTUAL SCENARIO |
range |
THE RANGE OF EXPOSURE. IF NULL, THE WHOLE RANGE IS USED |
nsim |
NUMBER OF SIMULATION SAMPLES |
Value
Attributable Fraction
Attributable Fraction lower confidence intervals
Attributable Fraction upper confidence intervals
Attributable Numbers
Attributable Numbers lower confidence intervals
Attributable Numbers upper confidence intervals
Simulation matrix of attributable numbers
Calculate daily RR/AF/AN/AR for region-specific/national distributed lag effects for a chosen PM2.5 reference.
Description
Calculate daily RR/AF/AN/AR for region-specific/national distributed lag effects for a chosen PM2.5 reference.
Usage
analyze_air_pollution_daily(
data_with_lags,
meta_results,
ref_pm25 = 15,
ref_name = "WHO",
max_lag = 14L
)
Arguments
data_with_lags |
Dataset. Lagged data with lag variables. |
meta_results |
Dataset. Results from meta analysis. |
ref_pm25 |
Numeric. PM2.5 reference value. Defaults to 15. |
ref_name |
Character. Reference body name. Defaults to "WHO". |
max_lag |
Integer. Maximum lag days. Defaults to 14. |
Value
List with region-specific/national results for daily RR/AF/AN/AR
FUNCTION FOR COMPUTING ATTRIBUTABLE MEASURES FROM DLNM
Description
A function to calculate attributable numbers and fractions derived from (c) Antonio Gasparrini 2015-2017.
Usage
attrdl(
x,
basis,
cases,
model = NULL,
coef = NULL,
vcov = NULL,
model.link = NULL,
type = "af",
dir = "back",
tot = TRUE,
cen,
range = NULL,
sim = FALSE,
nsim = 5000
)
Arguments
x |
AN EXPOSURE VECTOR OR (ONLY FOR dir="back") A MATRIX OF LAGGED EXPOSURES |
basis |
THE CROSS-BASIS COMPUTED FROM x |
cases |
THE CASES VECTOR OR (ONLY FOR dir="forw") THE MATRIX OF FUTURE CASES |
model |
THE FITTED MODEL |
coef |
COEF FOR basis IF model IS NOT PROVIDED |
vcov |
VCOV FOR basis IF model IS NOT PROVIDED |
model.link |
LINK FUNCTION IF model IS NOT PROVIDED |
type |
EITHER "an" OR "af" FOR ATTRIBUTABLE NUMBER OR FRACTION |
dir |
EITHER "back" OR "forw" FOR BACKWARD OR FORWARD PERSPECTIVES |
tot |
IF TRUE, THE TOTAL ATTRIBUTABLE RISK IS COMPUTED |
cen |
THE REFERENCE VALUE USED AS COUNTERFACTUAL SCENARIO |
range |
THE RANGE OF EXPOSURE. IF NULL, THE WHOLE RANGE IS USED |
sim |
IF SIMULATION SAMPLES SHOULD BE RETURNED. ONLY FOR tot=TRUE |
nsim |
NUMBER OF SIMULATION SAMPLES |
Value
Attributable Numbers and Fractions
Calculate Attributable Metrics for Climate-Health Associations.
Description
Computes the attributable number, fraction, and rate of cases associated with specific exposure variables (e.g., temperature or rainfall) using fitted INLA models. The function estimates these metrics at the desired spatial aggregation level (country, region, or district) and optionally disaggregates by month or year.
Usage
attribution_calculation(
data,
param_term,
model,
level,
param_threshold = 1,
max_lag,
nk,
filter_year = NULL,
group_by_year = FALSE,
case_type,
output_dir = NULL,
save_csv = FALSE
)
Arguments
data |
A data frame or list returned by the |
param_term |
Character. The exposure variable term to evaluate (e.g., |
model |
The fitted INLA model object returned by the |
level |
Character. The spatial disaggregation level. Can take one of
the following values: |
param_threshold |
Numeric. Threshold above which relative risks (RR) are
considered attributable. Defaults to |
filter_year |
Integer. The year to filter to data to. Defaults to NULL. |
group_by_year |
Logical. Whether to aggregate results by year ( |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
output_dir |
Optional. Directory path to save the output metrics if
|
save_csv |
Logical. Whether to save the generated attribution metrics to file.
Default is |
Value
A tibble containing the following columns:
Grouping variables depending on the
levelandgroup_by_yearsettings.-
MRT: Minimum risk temperature (or equivalent reference exposure). -
AR_Number,AR_Number_LCI,AR_Number_UCI: Estimated, lower, and upper bounds of the attributable number of cases. -
AR_Fraction,AR_Fraction_LCI,AR_Fraction_UCI: Estimated, lower, and upper bounds of the attributable fraction (%). -
AR_per_100k,AR_per_100k_LCI,AR_per_100k_UCI: Estimated, lower, and upper bounds of the attributable rate per 100,000 population.
Generate a grid size for a certain number of plots.
Description
Generate a grid size for a certain number of plots.
Usage
calculate_air_pollution_grid_dims(n_plots)
Arguments
n_plots |
The number of plots required for the grid. |
Value
A list containing ncol and nrow values for the grid.
Calculate attributable numbers and fraction of a given health outcome.
Description
Takes a calculated RR and upper and lower CIs, and applies these to the input data to calculate attributable fraction and attributable number, along with upper and lower CIs, for each day in the input data. Uses Lag 1 RR and lower/upper CIs.
Usage
calculate_daily_AF_AN(data, rr_data)
Arguments
data |
Dataframe containing a daily time series of climate and health data that was used to obtain rr_data. |
rr_data |
Dataframe containing relative risk and confidence intervals, calculated from input data. |
Value
A dataframe containing a daily timseries of AF and AN, including upper and lower confidence intervals.
QAIC calculation
Description
Computes the Quasi-Akaike Information Criterion (QAIC) for models, enabling model comparison
Usage
calculate_qaic(
data,
save_csv = FALSE,
output_folder_path = NULL,
print_results = FALSE
)
Arguments
data |
Dataframe containing a daily time series of climate and health data from which to fit models. |
save_csv |
Bool. Whether or not to save the VIF results to a CSV. |
output_folder_path |
String. Where to save the CSV file to (if save_csv == TRUE). |
print_results |
Logical. Whether or not to print model summaries and pearson dispersion statistics. Defaults to FALSE. |
Value
Dataframe containing QAIC results for each lag.
Passes data to casecrossover_quasipoisson to calculate RR.
Description
Splits data by region if relative_risk_by_region==TRUE. If TRUE, data for each individual region is passed to casecrossover_quasipoisson to calculate RR by region. If FALSE, RR is calculated for the entire dataset.
Usage
calculate_wildfire_rr_by_region(
data,
scale_factor_wildfire_pm,
calc_relative_risk_by_region = FALSE,
save_fig = FALSE,
output_folder_path = NULL,
print_model_summaries = FALSE
)
Arguments
data |
Dataframe containing a daily time series of climate and health data from which to fit models. |
scale_factor_wildfire_pm |
Numeric. The value to divide the wildfire PM2.5 concentration variables by for alternative interpretation of outputs. Corresponds to the unit increase in wildfire PM2.5 to give the model estimates and relative risks (e.g. scale_factor = 10 corresponds to estimates and relative risks representing impacts of a 10 unit increase in wildfire PM2.5). Setting this parameter to 0 or 1 leaves the variable unscaled. |
calc_relative_risk_by_region |
Bool. Whether to calculate Relative Risk by region. Defaults to FALSE. |
save_fig |
Bool. Whether or not to save a figure showing residuals vs fitted values for each lag. Defaults to FALSE. |
output_folder_path |
String. Where to save the figure. Defaults to NULL. |
print_model_summaries |
Bool. Whether to print the model summaries to console. Defaults to FALSE. |
Value
Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5. Split by region if calc_relative_risk_by_region set to TRUE.
Fit quasipoisson regression models for different lags using a time-stratified case-crossover approach.
Description
Fits quasipoisson regression models using gnm
Usage
casecrossover_quasipoisson(
data,
scale_factor_wildfire_pm = 10,
wildfire_lag,
save_fig = TRUE,
output_folder_path = NULL,
print_model_summaries = TRUE
)
Arguments
data |
Dataframe containing a daily time series of climate and health data from which to fit models. |
scale_factor_wildfire_pm |
Numeric. The value to divide the wildfire PM2.5 concentration variables by for alternative interpretation of outputs. Corresponds to the unit increase in wildfire PM2.5 to give the model estimates and relative risks (e.g. scale_factor = 10 corresponds to estimates and relative risks representing impacts of a 10 unit increase in wildfire PM2.5). Setting this parameter to 0 or 1 leaves the variable unscaled. |
save_fig |
Bool. Whether or not to save a figure showing residuals vs fitted values for each lag. Defaults to FALSE. |
output_folder_path |
String. Where to save the figure. Defaults to NULL. |
print_model_summaries |
Bool. Whether to print the model summaries to console. Defaults to FALSE. |
Value
Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5
Check multicollinearity using VIF and write the results to file
Description
This function runs check_diseases_vif(), reshapes the result into a tabular
data frame, and optionally writes the table to VIF_results.csv.
Usage
check_and_write_vif(data, param_term, inla_param, case_type, output_dir = NULL)
Arguments
data |
A data frame containing the disease outcome column, |
param_term |
Character vector of exposure variable term(s) to include in the VIF assessment. |
inla_param |
Character vector of additional model covariates to include in the VIF assessment. |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
output_dir |
Character. The output directory to save the VIF results to.
Results are saved as |
Value
A data frame with columns variable, VIF, and interpretation.
Check multicollinearity using VIF on model variables
Description
This function checks multicollinearity across the disease outcome, the exposure term(s) of interest, and the additional INLA covariates using a correlation-matrix-based variance inflation factor calculation.
Usage
check_diseases_vif(data, param_term, inla_param, case_type)
Arguments
data |
A data frame containing the disease outcome column, |
param_term |
Character vector of exposure variable term(s) to include in the VIF assessment. |
inla_param |
Character vector of additional model covariates to include in the VIF assessment. |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
Value
A list with:
- variables
Character vector of variables used in the VIF calculation.
- vif
Numeric vector of VIF values aligned to
variables.- vif_interpretation
Character vector of qualitative VIF interpretations (
"Low","Moderate","High", or"Not computed").
Check if a dataframe is empty.
Description
Checks if a dataframe is empty, and raises an error if it is.
Usage
check_empty_dataframe(df)
Arguments
df |
Dataframe. The dataframe to check. |
Value
NULL. No return if the dataframe is not empty.
Check that a file exists at a passed path.
Description
Checks the files on disk to assert that the passed file is present.
Usage
check_file_exists(fpath, raise = TRUE)
Arguments
fpath |
The filepath to check exists. |
raise |
Whether or not to raise an error if the file does not exist, Default: TRUE |
Value
'exists'. Whether or not the file exists on disk.
Check that a file extension on a given path matches the expected.
Description
This function takes an expected file extension, and validates it against a user-inputted file path.
Usage
check_file_extension(fpath, expected_ext, param_nm = "fpath", raise = TRUE)
Arguments
fpath |
The filepath. |
expected_ext |
The expected file extension. |
param_nm |
The parameter name that the filepath was passed to (for error raising), Default: 'fpath' |
raise |
Whether or not to raise an error, Default: TRUE |
Value
Whether or not the passed file has a valid file extension.
Check for Rtools Installation on Windows
Description
Verifies whether Rtools is installed and properly configured on a Windows system.
Usage
check_has_rtools()
Details
The function uses pkgbuild::check_build_tools(debug = TRUE) to test for the presence
of Rtools and its integration with R. If Rtools is missing or misconfigured, the function
throws an error with installation instructions.
Value
Returns TRUE invisibly if Rtools is detected and functional. Otherwise, throws an error.
See Also
check_build_tools, https://cran.r-project.org/bin/windows/Rtools/
Check variance inflation factors of predictor variables using a linear model
Description
Checks variance inflation factors of predictor variables using a linear model of the predictor variables on the health outcome. Prints stats if print_vif==TRUE. Raises a warning if VIF for a variables is > 2.
Usage
check_wildfire_vif(
data,
predictors,
save_csv = FALSE,
output_folder_path = NULL,
print_vif = FALSE
)
Arguments
data |
Dataframe containing a daily time series of climate and health data. |
predictors |
Character vector with each of the predictors to include in the model. Must contain at least 2 variables. |
save_csv |
Bool. Whether or not to save the VIF results to a CSV. |
output_folder_path |
String. Where to save the CSV file to (if save_csv == TRUE). |
print_vif |
Bool, whether or not to print VIF for each predictor. Defaults to FALSE. |
Value
Variance inflation factor statistics for each predictor variable.
Read in and combine climate and health data
Description
Read and combine climate and health data prepared for the spatiotemporal and DLNM analysis.
Usage
combine_health_climate_data(
health_data_path,
climate_data_path,
map_path,
region_col,
district_col,
date_col,
year_col,
month_col,
case_col,
case_type,
tot_pop_col,
tmin_col,
tmean_col,
tmax_col,
rainfall_col,
r_humidity_col,
geometry_col,
runoff_col = NULL,
ndvi_col = NULL,
spi_col = NULL,
max_lag,
output_dir = NULL
)
Arguments
health_data_path |
The path to the health data. |
climate_data_path |
The path to the climate data. |
map_path |
The path to the relevant map data. |
region_col |
Character. Name of the column in the dataframe that contains the region names. |
district_col |
Character. Name of the column in the dataframe that contains the region names. |
date_col |
Character. Name of the column in the dataframe that contains the date. Defaults to NULL. |
year_col |
Character. Name of the column in the dataframe that contains the Year. |
month_col |
Character. Name of the column in the dataframe that contains the Month. |
case_col |
Character. Name of the column in the dataframe that contains the disease cases to be considered. |
case_type |
Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'. |
tot_pop_col |
Character. Name of the column in the dataframe that contains the total population. |
tmin_col |
Character. Name of the column in the dataframe that contains the minimum temperature data. |
tmean_col |
Character. Name of the column in the dataframe that contains the average temperature. |
tmax_col |
Character. Name of the column in the dataframe that contains the maximum temperature. |
rainfall_col |
Character. Name of the column in the dataframe that contains the cumulative monthly rainfall. |
r_humidity_col |
Character. Name of the column in the dataframe that contains the relative humidity. |
geometry_col |
is the Name of the geometry column in the shapefile (usually "geometry"). |
runoff_col |
Character. Name of the column in the dataframe that contains the monthly runoff water data. Defaults to NULL. |
ndvi_col |
Character. Name of column containing the Normalized Difference Vegetation Index (ndvi) data. Defaults to NULL. |
spi_col |
Character. Name of the column in the dataframe that contains the standardized precipitation index. Defaults to NULL. |
max_lag |
Character. Number corresponding to the maximum lag to be considered for the delay effect. It should be between 2 an 4. Defaults to 2. |
output_dir |
Path to folder where the processed map data should be saved. Defaults to NULL. |
Value
A list of dataframes containing the map, nb.map, data, grid_data, summary
Deprecated alias for run_descriptive_stats().
Description
Generic wrapper function to compute descriptive statistics and EDA outputs.
Usage
common_descriptive_stats(
df_list,
output_path,
aggregation_column = NULL,
population_col = NULL,
plot_corr_matrix = FALSE,
correlation_method = "pearson",
plot_dist = FALSE,
plot_ma = FALSE,
ma_days = 100,
ma_sides = 1,
timeseries_col = NULL,
dependent_col,
independent_cols,
units = NULL,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
detect_outliers = FALSE,
calculate_rate = FALSE
)
Arguments
df_list |
List of dataframes. A list of input dataframes. |
output_path |
Character. The path to write outputs to. |
aggregation_column |
Character. The column to use for aggregating the dataset into smaller subsets of regions. |
population_col |
Character. The column containing the population. |
plot_corr_matrix |
Logical. Whether or not to plot correlation matrix. |
correlation_method |
Character. The correlation method. One of 'pearson', 'spearman', 'kendall'. |
plot_dist |
Logical. Whether or not to plot distribution histograms. |
plot_ma |
Logical. Whether to plot moving averages over a timeseries. |
ma_days |
Integer. The number of days to use for a moving average. |
ma_sides |
Integer. The number of sides to use for a moving average (1 or 2). |
timeseries_col |
Character. The column used as the timeseries for moving averages. |
dependent_col |
Character. The column in the data containing the dependent variable. |
independent_cols |
Character vector. The columns in the data containing the independent variables. |
units |
Named character vector. A named character vector of units for each variable. |
plot_na_counts |
Logical. Whether to plot NA counts. |
plot_scatter |
Logical. Whether to plot scatter plots. |
plot_box |
Logical. Whether to plot box plots. |
plot_seasonal |
Logical. Whether to plot seasonal plots. |
plot_regional |
Logical. Whether to plot regional plots. |
plot_total |
Logical. Whether to plot total health outcomes per year. |
detect_outliers |
Logical. Whether to output a table containing outlier information. |
calculate_rate |
Logical. Whether to calculate the rate of health outcomes per 100k people. |
Value
Character vector. Backward-compatible output path format.
Deprecated. Use run_descriptive_stats() instead.
Deprecated alias for run_descriptive_stats_api().
Description
Deprecated alias for run_descriptive_stats_api().
Usage
common_descriptive_stats_api(
data,
aggregation_column = NULL,
population_col = NULL,
dependent_col,
independent_cols,
units = NULL,
plot_correlation = FALSE,
plot_dist_hists = FALSE,
plot_ma = FALSE,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
correlation_method = "pearson",
ma_days = 100,
ma_sides = 1,
timeseries_col = NULL,
detect_outliers = FALSE,
calculate_rate = FALSE,
output_path
)
Arguments
data |
The dataset used for descriptive stats (as a vector). |
aggregation_column |
Character. The column to use for aggregating the dataset into smaller subsets. |
population_col |
Character. The column containing the population. |
dependent_col |
Character. The dependent column. |
independent_cols |
Character vector. The independent columns. |
units |
Named character vector. A named character vector of units for each variable. |
plot_correlation |
Logical. Whether to plot a correlation matrix. |
plot_dist_hists |
Logical. Whether to plot histograms showing column distributions. |
plot_ma |
Logical. Whether to plot moving averages over a timeseries. |
plot_na_counts |
Logical. Whether to plot counts of NAs in each column. |
plot_scatter |
Logical. Whether to plot the dependent column against the independent columns. |
plot_box |
Logical. Whether to generate box plots for selected columns. |
plot_seasonal |
Logical. Whether to plot seasonal trends of the variables in columns. |
plot_regional |
Logical. Whether to plot regional trends of the variables in columns. |
plot_total |
Logical. Whether to plot the total of the dependent column per year. |
correlation_method |
Character. The correlation method. One of 'pearson', 'spearman', 'kendall'. |
ma_days |
Integer. The number of days to use in moving average calculations. |
ma_sides |
Integer. The number of sides to use in moving average calculations (1 or 2). |
timeseries_col |
Character. The column used as the timeseries for moving averages. |
detect_outliers |
Logical. Whether to have a table of outliers. |
calculate_rate |
Logical. Whether to plot a rate based metric of the dependent column per year. |
output_path |
Character. The path to save outputs to. |
Value
Character vector. Backward-compatible output path format.
Deprecated. Use run_descriptive_stats_api() instead.
Deprecated alias for descriptive_stats_core().
Description
Deprecated. Use descriptive_stats_core() instead.
Usage
common_descriptive_stats_core(
df,
output_path,
title,
aggregation_column = NULL,
population_col = NULL,
plot_corr_matrix = FALSE,
correlation_method = "pearson",
plot_dist = FALSE,
dependent_col,
independent_cols = c(),
units = NULL,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
timeseries_col = "date",
detect_outliers = FALSE,
calculate_rate = FALSE
)
Create and plot the exposure-lag-response relationship (contour plot) at country,
region or district level for each disease cases type (diarrhea and malaria).
Description
: Generates a contour plot showing the exposure-lag-response
relationship of the exposure tmax and rainfall and the diseases case type.
Usage
contour_plot(
data,
param_term,
model,
level,
max_lag,
nk,
case_type,
filter_year = NULL,
save_fig = FALSE,
output_dir = NULL
)
Arguments
data |
Data list from |
param_term |
A character vector or list containing parameter terms such
as |
model |
The fitted model from the |
level |
A character vector specifying the geographical disaggregation. Can take one of the following values: "country", "region", or "district". |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
filter_year |
Integer. The year to filter to data to. Defaults to NULL. |
save_fig |
Boolean. Whether to save the outputted plot. Defaults to FALSE. |
output_dir |
The path to save the visualisation to. Defaults to NULL |
Value
contour plot at country, Region and District level
Create lagged values for PM2.5 variable and average lag column.
Description
Creates new variables in a dataframe for lags and means over lag periods.
Usage
create_air_pollution_lags(data, max_lag = 14L)
Arguments
data |
Dataframe from load_air_pollution_data() containing a daily time series of health and environmental data. |
max_lag |
Integer. The maximum lag days for outdoor PM2.5. Defaults to 14. |
Value
Dataframe with added columns for lagged PM2.5 concentration.
Create statistical summaries of columns in a dataframe.
Description
Create statistical summaries of columns in a dataframe.
Usage
create_column_summaries(df, independent_cols = NULL)
Arguments
df |
Datarame. Input data. |
independent_cols |
Character vector. The columns in the data containing the independent variables. |
Value
Dataframe. Column summaries
Create a correlation matrix for columns in a dataframe.
Description
Create a correlation matrix for columns in a dataframe.
Usage
create_correlation_matrix(
df,
independent_cols = NULL,
correlation_method = "pearson"
)
Arguments
df |
Dataframe. The dataframe to use to create a correlation matrix. |
independent_cols |
Character vector. The columns in the data containing the independent variables. |
correlation_method |
string. The method to use for correlation calculations. |
Value
Matrix. Correlation matrix for selected columns in the input dataset.
Generate a grid size for a certain number of plots.
Description
This function calculates the minimum grid size required to plot X amount of plots on a a figure. For example, 6 plots would require a 3x2, where as 7 would require a 3x3, and so on.
Usage
create_grid(plot_count)
Arguments
plot_count |
The number of plots required for the grid. |
Value
A numeric vector: c(x, y), where x and y define the grid dimensions.
Create indices for INLA models
Description
: For the INLA model, there is a need to set-up regions index, district index, and year index. This function create these indices using the dataset, ndistrict and nregion.
Usage
create_inla_indices(data, case_type)
Arguments
data |
is the dataframe containing district_code, region_code, and year columns from the combine_health_climate_data() function. |
case_type |
Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'. |
Value
The modified data with the created indices.
Generate lagged values for predictor (temperature) variables
Description
Generates new variables in a dataframe for lags and means over lag periods.
Usage
create_lagged_variables(data, wildfire_lag, temperature_lag)
Arguments
data |
Dataframe containing a daily time series of climate and health data |
wildfire_lag |
Integer. The number of days for which to calculate the lags for wildfire PM2.5. Default is 3. |
temperature_lag |
Integer. The number of days for which to calculate the lags for temperature. Default is 1. |
Value
Dataframe with added columns for lagged temperature and wildfire-related PM2.5 concentration
Create a summary of all NA values in a dataset.
Description
Create a summary of all NA values in a dataset.
Usage
create_na_summary(df, independent_cols = NULL)
Arguments
df |
Dataframe. The input dataset. |
independent_cols |
Character vector. The columns in the data containing the independent variables. |
Value
Dataframe. A summary of NA values in the dataset.
Generate splines for temperature variable
Description
Generates temperature splines for each region
Usage
create_temperature_splines(data, nlag = 0, degrees_freedom = 6)
Arguments
data |
Dataframe containing a daily time series of climate and health data |
nlag |
Integer. The number of days of lag in the temperature variable from which to generate splines (unlagged temperature variable). Defaults to 0. |
degrees_freedom |
Integer. Degrees of freedom for the spline(s). Defaults to 6. |
Value
Dataframe with additional column for temperature spline.
Emit a consistent deprecation warning for descriptive stats wrappers.
Description
Emit a consistent deprecation warning for descriptive stats wrappers.
Usage
deprecate_descriptive_stats(old_fn, new_fn)
Arguments
old_fn |
Character. Deprecated function name. |
new_fn |
Character. Replacement function name. |
Value
None. Emits a warning.
Save descriptive statistics
Description
Generates summary statistics for climate and health data and saves them to the specified file path.
Usage
descriptive_stats(data, variables, bin_width = 5, output_dir = ".")
Arguments
data |
Dataframe containing a daily time series of climate and health data |
variables |
Character or character vector with variable to produce summary statistics for. Must include at least 1 variable. |
bin_width |
Integer. Width of each bin in a histogram of the outcome variable. Defaults to 5. |
output_dir |
Character. The directory to output descriptive stats to. Must exist and will not be automatically created. Defaults to ".". |
Value
Prints summary statistics and a histogram of the the outcome variable
Core Functionality for Producing Descriptive Statistics
Description
Core Functionality for Producing Descriptive Statistics
Usage
descriptive_stats_core(
df,
output_path,
title,
aggregation_column = NULL,
population_col = NULL,
plot_corr_matrix = FALSE,
correlation_method = "pearson",
plot_dist = FALSE,
dependent_col,
independent_cols = c(),
units = NULL,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
timeseries_col = "date",
write_outlier_table = FALSE,
calculate_rate = FALSE
)
Arguments
df |
Dataframe. The input DataFrame. |
output_path |
Character. The path to write outputs to. |
title |
Character. The specific title for the subset of data being used. |
aggregation_column |
Character. Column to aggregate data by. |
population_col |
Character. Column containing population data. |
plot_corr_matrix |
Logical. Whether or not to plot correlation matrix. |
correlation_method |
Character. The correlation method. One of 'pearson', 'spearman', 'kendall'. |
plot_dist |
Logical. Whether or not to plot distribution histograms. |
dependent_col |
Character. The dependent column. |
independent_cols |
Character vector. The independent columns. |
units |
Named character vector. Units to use for plots (maps to columns parameter). |
plot_na_counts |
Logical. Whether to plot NA counts. |
plot_scatter |
Logical. Whether to plot scatter plots. |
plot_box |
Logical. Whether to plot box plots. |
plot_seasonal |
Logical. Whether to plot seasonal plots. |
plot_regional |
Logical. Whether to plot regional plots. |
plot_total |
Logical. Whether to plot total health outcomes per year. |
timeseries_col |
Character. Column containing timeseries data (e.g., date). |
write_outlier_table |
Logical. Whether to output a table containing outlier information. |
calculate_rate |
Logical. Whether to calculate the rate of health outcomes per 100k people. |
Value
None. Outputs are written to files.
Detect Outliers Using the IQR Method
Description
Detect Outliers Using the IQR Method
Usage
detect_outliers(df, independent_cols = NULL)
Arguments
df |
A data frame containing the data to check for outliers. |
independent_cols |
Character vector. The columns in the data containing the independent variables. |
Value
Dataframe. Column summaries
Code for calculating Diarrhea disease cases attributable to extreme precipitation and extreme temperature Run Full diarrhea-Climate Analysis Pipeline
Description
The diarrhea_do_analysis function runs the complete analysis workflow
by combining multiple functions to analyze the association between diarrhea
cases and climate variables. It processes health, climate, and spatial data,
fits models, generates plots, and calculates attributable risk.
Usage
diarrhea_do_analysis(
health_data_path,
climate_data_path,
map_path,
region_col,
district_col,
date_col = NULL,
year_col,
month_col,
case_col,
tot_pop_col,
tmin_col,
tmean_col,
tmax_col,
rainfall_col,
r_humidity_col,
runoff_col,
geometry_col,
spi_col = NULL,
ndvi_col = NULL,
max_lag = 2,
nk = 2,
basis_matrices_choices,
inla_param,
param_term,
level,
param_threshold = 1,
filter_year = NULL,
family = "nbinomial",
group_by_year = FALSE,
config = TRUE,
save_csv = FALSE,
save_model = TRUE,
save_fig = FALSE,
cumulative = FALSE,
output_dir = NULL
)
Arguments
health_data_path |
Character. Path to the processed health data file. |
climate_data_path |
Character. Path to the processed climate data file. |
map_path |
Character. Path to the spatial data file (e.g., shapefile). |
region_col |
Character. Column name for the region variable. |
district_col |
Character. Column name for the district variable. |
date_col |
Character (optional). Column name for the date variable.
Defaults to |
year_col |
Character. Column name for the year variable. |
month_col |
Character. Column name for the month variable. |
case_col |
Character. Column name for diarrhea case counts. |
tot_pop_col |
Character. Column name for total population. |
tmin_col |
Character. Column name for minimum temperature. |
tmean_col |
Character. Column name for mean temperature. |
tmax_col |
Character. Column name for maximum temperature. |
rainfall_col |
Character. Column name for cumulative monthly rainfall. |
r_humidity_col |
Character. Column name for relative humidity. |
runoff_col |
Character. Column name for monthly runoff data. |
geometry_col |
Character. Column name of the geometry column in the
shapefile (usually |
spi_col |
Character (optional). Column name for the Standardized
Precipitation Index (SPI). Defaults to |
ndvi_col |
Character (optional). Column name for the Normalized Difference
Vegetation Index (NDVI). Defaults to |
max_lag |
Numeric. Maximum temporal lag to include in the distributed
lag model (e.g., |
nk |
Numeric. Number of internal knots for the natural spline of
each predictor, controlling its flexibility: |
basis_matrices_choices |
Character vector. Specifies which climate variables
to include in the basis matrix (e.g., |
inla_param |
Character vector. Specifies exposure variables included in
the INLA model (e.g., |
param_term |
Character or vector. Exposure variable(s) of primary interest
for relative risk and attribution (e.g., |
level |
Character. Spatial disaggregation level; must be one of
|
param_threshold |
Numeric. Threshold above which exposure is considered
"attributable." Defaults to |
filter_year |
Integer or vector (optional). Year(s) to filter the data by.
Defaults to |
family |
Character. Probability distribution for the outcome variable.
Options include |
group_by_year |
Logical. Whether to group attributable metrics by year.
Defaults to |
config |
Logical. Whether to enable additional INLA model configurations.
Defaults to |
save_csv |
Logical. If |
save_model |
Logical. If |
save_fig |
Logical. If |
cumulative |
Boolean. If TRUE, plot and save cumulative risk of all year for the specific exposure at region and district level. Defaults to FALSE. |
output_dir |
Character. Directory where output files (plots, datasets, maps)
are saved. Defaults to |
Value
A list containing:
Model output from INLA
Monthly random effects plot
Yearly random effects plot
Contour plot
Relative risk map
Relative risk plot
Attributable fraction and number summary
Meta-analysis and BLUPs
Description
Run meta-analysis using temperature average and range as meta predictors. Then create the best linear unbiased predictions (BLUPs).
Usage
dlnm_meta_analysis(
df_list,
coef_,
vcov_,
save_csv = FALSE,
output_folder_path = NULL
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
coef_ |
A matrix of coefficients for the reduced model. |
vcov_ |
A list. Covariance matrices for each region for the reduced model. |
save_csv |
Boolean. Whether to save the results as a CSV. Defaults to FALSE. |
output_folder_path |
Path to folder where results should be saved. Defaults to NULL. |
Value
-
mmA model object. A multivariate meta-analysis model. -
blupA list. BLUP (best linear unbiased predictions) from the meta-analysis model for each region. -
meta_test_resA dataframe of results from statistical tests on the meta model.
Define minimum mortality percentiles and temperatures
Description
Calculate the temperature at which there is minimum mortality risk using the product of the basis matrix and BLUPs.
Usage
dlnm_min_mortality_temp(
df_list,
var_fun = "bs",
var_per = c(10, 75, 90),
var_degree = 2,
blup = NULL,
coef_,
meta_analysis = FALSE,
outcome_type = c("temperature", "suicide")
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90). |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
blup |
A list. BLUP (best linear unbiased predictions) from the meta-analysis model for each region. Defaults to NULL. |
coef_ |
A matrix of coefficients for the reduced model. |
meta_analysis |
Boolean. Whether to perform a meta-analysis. Defaults to FALSE. |
outcome_type |
Character. The indicator that the function is being used for. One of 'suicide' or 'temperature'. Defaults to c("temperature", "suicide") |
Value
Percentiles and corresponding temperatures for each geography.
Create population totals
Description
Creates a list of population totals by year and region for use in the attributable rate calculations.
Usage
dlnm_pop_totals(df_list, country = "National", meta_analysis = FALSE)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
country |
Character. Name of country for national level estimates. Defaults to 'National' |
meta_analysis |
Boolean. Whether to perform a meta-analysis. Defaults to FALSE. |
Value
List of population totals by year and region
Power calculation
Description
Produce a power statistic by area for the attributable threshold and above as a reference.
Usage
dlnm_power_list(
df_list,
pred_list,
minperc,
attr_thr_high = 97.5,
attr_thr_low = 2.5,
compute_low = TRUE
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
pred_list |
A list containing predictions from the model by region. |
minperc |
Vector. Percentile of maximum outcome temperature for each region. |
attr_thr_high |
Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5. |
attr_thr_low |
Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5. |
compute_low |
Bool. Whether to computer power for the lower threshold. Defaults to FALSE |
Value
A list containing power information by area.
Run national predictions from meta analysis
Description
Use the meta analysis to create national level predictions
Usage
dlnm_predict_nat(
df_list,
var_fun = "bs",
var_per = c(25, 50, 75),
var_degree = 2,
minpercreg,
mmpredall,
pred_list,
country = "National"
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25, 50, 75). |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
minpercreg |
Vector. Percentile of maximum suicide temperature for each region. |
mmpredall |
List of national coefficients and covariance matrices for the crosspred. |
pred_list |
A list containing predictions from the model by region. |
country |
Character. Name of country for national level estimates. Defaults to National. |
Value
A list containing predictions by region.
Reduce to overall cumulative
Description
Reduce model to the overall cumulative association
Usage
dlnm_reduce_cumulative(
df_list,
var_per = c(25, 50, 75),
var_degree = 2,
cenper = NULL,
cb_list,
model_list
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25, 50, 75). |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
cenper |
Integer. Value for the percentile in calculating the centering value 0-100. Defaults to NULL. |
cb_list |
List of cross_basis matrices from create_crossbasis function. |
model_list |
List of models produced from DLNM analysis. |
Value
-
coef_A matrix of coefficients for the reduced model. -
vcov_A list. Covariance matrices for each region for the reduced model.
Produce variance inflation factor
Description
Produces variance inflation factor for the independent variables.
Usage
dlnm_vif(df_list, independent_cols = NULL)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
independent_cols |
Additional independent variables to test in model validation. Defaults to NULL. |
Value
A list. Variance inflation factors for each independent variables by region.
Enforce a file extension on a given path
Description
Ensures that the provided file path ends with the desired file extension.
Usage
enforce_file_extension(path, file_extension)
Arguments
path |
Character. A file path. |
file_extension |
Character. The file extension to enforce on 'path'. |
Value
Character. The path with the expected file extension.
Extract metadata from a climate_error
Description
Extracts the structured metadata from a typed climate error for use in API responses or logging.
Usage
extract_error_metadata(error)
Arguments
error |
A climate_error condition object |
Value
A list containing the error metadata (type, column, available, etc.)
Extract mean wildfire PM2.5 values for shapefile regions from NetCDF file
Description
Takes a NetCDF file of gridded wildfire data and shapefile for geographical regions and extracts mean values for each shapefile region.
Information on NetCDF files: https://climatedataguide.ucar.edu/climate-tools/NetCDF#:~:text=An%20nc4%20files%20is%20a,readily%20handle%20netCDF%2D4%20files.
We use a daily time series of gridded wildfire-related PM2.5 concentration from the Finnish Meteorological Institute's SILAM-CTM model. This is available open-source: https://doi.org/10.57707/fmi-b2share.d1cac971b3224d438d5304e945e9f16c.
Usage
extract_means_for_geography(
ncdf_path,
shp_path,
region_col = "region",
output_value_col = "mean_PM"
)
Arguments
ncdf_path |
Path to a NetCDF file |
shp_path |
Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5 |
region_col |
Character. The name of the column containing region data in the shapefile. Defaults to 'region' |
output_value_col |
Character. The name of the value column to include in the output. Defaults to mean_PM |
Value
Dataframe containing a daily time series with mean wildfire-related PM2.5 values for each region
Fit GAM model
Description
Fit a generalized additive model (mgcv::gam) including pm25 and its lagged variables (pm25_lag1, ..., pm25_lagN)
Usage
fit_air_pollution_gam(
data_with_lags,
max_lag = 14L,
df_seasonal = 6L,
family = "quasipoisson"
)
Arguments
data_with_lags |
data.frame or tibble containing the outcome, confounders and pm25 lag variables. |
max_lag |
integer. Maximum lag to include. Defaults to 14. |
df_seasonal |
integer. Degrees of freedom for seasonal spline. Default 6. |
family |
character or family object passed to mgcv::gam. Default "quasipoisson". |
Value
A list with components:
model: the fitted mgcv::gam object (or NULL if fit failed)
coef_table: data.frame with columns: lag (0 for pm25, 1..N for pm25_lag#, and "0-N" for cumulative), pm25_variable, coef, se, ci.lb, ci.ub
vcov_used_for_cumulative: logical; TRUE if vcov() was used to compute cumulative SE
Calculate p-values for Wald test
Description
Calculate p-values for an explanatory variable.
Usage
fwald(mm, var)
Arguments
mm |
A model object. A multivariate meta-analysis model. |
var |
A character. The name of the variable in the meta-model to calculate p-values for. |
Value
A number. The p-value of the explanatory variable.
Generate a run id for descriptive statistics output folders.
Description
Generate a run id for descriptive statistics output folders.
Usage
generate_descriptive_stats_run_id()
Value
Character. Run id in the format YYYYmmdd_HHMMSS_NNNN.
Generate Relative Risk Estimates by Region
Description
Computes relative risk estimates for wildfire-specific PM2.5 exposure across regions as PM values changes.
Usage
generate_rr_pm_by_region(
data,
relative_risk_overall,
scale_factor_wildfire_pm,
wildfire_lag = 0,
pm_vals = NULL
)
Arguments
data |
Data frame containing a daily time series of mean_PM values, either from the original input csv file or produced after merging wildfire data with the initial csv file. |
relative_risk_overall |
Data frame containing relative risk estimates and confidence intervals for wildfire-related PM2.5 exposure at different lags. Must include columns: 'lag', 'relative_risk', 'ci_lower', and 'ci_upper'. |
scale_factor_wildfire_pm |
Numeric. Scaling factor used to normalize PM2.5 values to the unit of exposure used in the original relative risk estimate. |
wildfire_lag |
Integer. Lag day to filter from the input data for extrapolation. Defaults to 0. |
pm_vals |
Numeric vector. PM2.5 concentrations over which to compute relative risk. Defaults to a sequence from 0 to the maximum observed wildfire-related PM2.5 in dataset, max(mean_PM). |
Value
A data frame with relative risk estimates for each region and PM value.
Relative risk estimates across PM2.5 concentrations for a specified lag.
Description
Computes relative risk and confidence intervals across a range of PM2.5 concentrations for a specified wildfire-related lag, using log-linear extrapolation from a reference estimate.
Usage
generate_rr_pm_overall(
data,
relative_risk_overall,
scale_factor_wildfire_pm,
wildfire_lag = 0,
pm_vals = NULL
)
Arguments
data |
Data frame containing a daily time series of mean_PM values, either from the original input csv file or produced after merging wildfire data with the initial csv file. |
relative_risk_overall |
Data frame containing relative risk estimates and confidence intervals for wildfire-related PM2.5 exposure at different lags. Must include columns: 'lag', 'relative_risk', 'ci_lower', and 'ci_upper'. |
scale_factor_wildfire_pm |
Numeric. Scaling factor used to normalize PM2.5 values to the unit of exposure used in the original relative risk estimate. |
wildfire_lag |
Integer. Lag day to filter from the input data for extrapolation. Defaults to 0. |
pm_vals |
Numeric vector. PM2.5 concentrations over which to compute relative risk. Defaults to a sequence from 0 to the maximum observed wildfire-related PM2.5 in dataset, max(mean_PM). |
Value
A data frame with columns: 'pm_levels', 'relative_risk', 'ci_lower', and 'ci_upper', representing estimated relative risk and 95% confidence intervals across the specified PM2.5 levels.
Generate and RGB colour value with alpha from a hex value.
Description
Generate and RGB colour value with alpha from a hex value.
Usage
get_alpha_colour(hex, alpha)
Arguments
hex |
The hex code of the colour to convert. |
alpha |
The alpha of the converted colour (ranging from 0-1). |
Value
The converted RGB colour.
Create lagged columns and provide the mean value.
Description
Creates new columns containing lagged values over n rows and determine the mean of the lagged column.
Usage
get_lags_and_means(data, lagcol, nlags)
Arguments
data |
Dataframe containing a daily time series of climate and health data |
lagcol |
Character. The column to lag. |
nlags |
Character. How many rows to obtain a lag from. |
Value
Dataframe with added columns for lagged values and mean(s) of those lags.
A function to predict relative risk at country, region, and district level
Description
Produces cumulative relative risk at country, region and district level from analysis.
Usage
get_predictions(data, param_term, max_lag, nk, model, level, case_type)
Arguments
data |
Data list from |
param_term |
A character vector or list containing parameter terms such
as |
model |
The fitted model from run_inla_models() function. |
level |
Character. The spatial disaggregation level.
Can take one of the following values: |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
Value
A dataframe containing cumulative relative risk at the chosen level.
Process data for national analysis
Description
Aggregate to national data and run crossbasis
Usage
hc_add_national_data(
df_list,
pop_list,
var_fun = "bs",
var_per = c(10, 75, 90),
var_degree = 2,
lagn = 21,
lagnk = 3,
country = "National",
cb_list,
mm,
minpercgeog_
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
pop_list |
List of population totals by year and geography. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90). |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
lagn |
Integer. Number of days in the lag period. Defaults to 21. (see dlnm::crossbasis). |
lagnk |
Integer. Number of knots in lag function. Defaults to 3. (see dlnm::logknots). |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
cb_list |
A list of cross-basis matrices by geography. |
mm |
A model object. A multivariate meta-analysis model. |
minpercgeog_ |
Vector. Percentile of minimum mortality temperature for each geography. |
Value
-
df_listList. A list of data frames for each geography and national level. -
cb_listList. A list of cross-basis matrices by geography and national level. -
minpercgeog_Vector. Percentile of minimum mortality temperature for each geography and national level. -
mmpredallList. A list of national coefficients and covariance matrices.
Run ADF test and produce PACF plots for each model combination
Description
Run augmented Dickey-Fuller test for stationarity of dependent variable and produce a partial autocorrelation function plot of residuals for each model combination.
Usage
hc_adf(df_list)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
Value
'adf_list'. List of ADF results for each geography.
Estimate attributable numbers
Description
Estimate attributable numbers and confidence intervals for each geography using Monte Carlo simulations.
Usage
hc_attr(
df_list,
cb_list,
pred_list,
minpercgeog_,
attr_thr_high = 97.5,
attr_thr_low = 2.5
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
cb_list |
A list of cross-basis matrices by geography. |
pred_list |
A list containing predictions from the model by geography. |
minpercgeog_ |
Vector. Percentile of minimum mortality temperature for each geography. |
attr_thr_high |
Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5. |
attr_thr_low |
Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5. |
Value
'attr_list'. A list containing attributable numbers per geography.
Create attributable estimates tables
Description
Aggregate tables of attributable numbers, rates and fractions for total, yearly and monthly by geography and national level.
Usage
hc_attr_tables(attr_list, country = "National", meta_analysis = FALSE)
Arguments
attr_list |
A list containing attributable numbers per geography. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
meta_analysis |
Boolean. Whether to perform a meta-analysis. Defaults to FALSE. |
Value
-
res_attr_totDataframe. Total attributable fractions, numbers and rates for each geography over the whole time series. -
attr_yr_listList. Dataframes containing yearly estimates of attributable fractions, numbers and rates by geography. -
attr_mth_listList. Dataframes containing total attributable fractions, numbers and rates by calendar month and geography.
Create cross-basis matrix
Description
Creates a cross-basis matrix of the lag-response and exposure-response functions, for each geography.
Usage
hc_create_crossbasis(
df_list,
var_fun = "bs",
var_degree = 2,
var_per = c(10, 75, 90),
lagn = 21,
lagnk = 3,
dfseas = 8
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10,75,90). |
lagn |
Integer. Number of days in the lag period. Defaults to 21. (see dlnm::crossbasis). |
lagnk |
Integer. Number of knots in lag function. Defaults to 3. (see dlnm::logknots). |
dfseas |
Integer. Degrees of freedom for seasonality. Defaults to 8. |
Value
'cb_list'. A list of cross-basis matrices by geography.
Produce check results of model combinations
Description
Runs every combination of model based on user selected additional independent variables and returns model diagnostic checks for each.
Usage
hc_model_combo_res(df_list, cb_list, independent_cols = NULL, dfseas = 8)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
cb_list |
List of cross-basis matrices from hc_create_crossbasis function. |
independent_cols |
Character/list. Additional independent variables to test in model validation as confounders. Defaults to NULL. |
dfseas |
Integer. Degrees of freedom for seasonality. Defaults to 8. |
Value
-
qaic_resultsA dataframe of QAIC and dispersion metrics for each model combination. -
residuals_listList. Residuals for each model combination.
Model Validation Assessment
Description
Produces results on QAIC for each model combination, variance inflation factor for each independent variable, ADF test for stationarity, and plots for residuals to assess the models.
Usage
hc_model_validation(
df_list,
cb_list,
independent_cols = NULL,
dfseas = 8,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = NULL,
seed = NULL
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
cb_list |
List of cross-basis matrices from hc_create_crossbasis function. |
independent_cols |
Character/list. Additional independent variables to test in model validation as confounders. Defaults to NULL. |
dfseas |
Integer. Degrees of freedom for seasonality. Defaults to 8. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
save_csv |
Boolean. Whether to save the results as a CSV. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
-
qaic_resultsA dataframe of QAIC and dispersion metrics for each model combination and geography. -
qaic_summaryA dataframe with the mean QAIC and dispersion metrics for each model combination. -
vif_resultsA dataframe of variance inflation factors for each independent variables by geography. -
vif_summaryA dataframe with the mean variance inflation factors for each independent variable. -
adf_resultsA dataframe of ADF test results for each geography.
Plot attributable fractions by calendar month - low temperatures
Description
Plot attributable fractions grouped over the whole time series by calendar month to explore seasonality.
Usage
hc_plot_af_cold_monthly(
attr_mth_list,
df_list,
country = "National",
attr_thr_low = 2.5,
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
attr_thr_low |
Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of attributable fractions by calendar month per geography.
Plot attributable fractions for cold by year
Description
Plot attributable fractions by year and geography with confidence intervals.
Usage
hc_plot_af_cold_yearly(
attr_yr_list,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of yearly attributable fractions per geography.
Plot attributable fractions by calendar month - high temperatures
Description
Plot attributable fractions grouped over the whole time series by calendar month to explore seasonality.
Usage
hc_plot_af_heat_monthly(
attr_mth_list,
df_list,
country = "National",
attr_thr_high = 97.5,
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
attr_thr_high |
Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of attributable fractions by calendar month per geography.
Plot attributable fractions for heat by year
Description
Plot attributable fractions by year and geography with confidence intervals.
Usage
hc_plot_af_heat_yearly(
attr_yr_list,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of yearly attributable fractions per geography.
Plot attributable rates by calendar month - low temperatures
Description
Plot attributable rates grouped over the whole time series by calendar month to explore seasonality.
Usage
hc_plot_ar_cold_monthly(
attr_mth_list,
df_list,
country = "National",
attr_thr_low = 2.5,
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
attr_thr_low |
Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of attributable rates by calendar month per geography.
Plot attributable rates by year - low temperatures
Description
Plot attributable rates by year and geography with confidence intervals.
Usage
hc_plot_ar_cold_yearly(
attr_yr_list,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of yearly attributable rates per geography.
Plot attributable rates by calendar month - high temperatures
Description
Plot attributable rates grouped over the whole time series by calendar month to explore seasonality.
Usage
hc_plot_ar_heat_monthly(
attr_mth_list,
df_list,
country = "National",
attr_thr_high = 97.5,
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
attr_thr_high |
Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of attributable rates by calendar month per geography.
Plot attributable rates by year - high temperatures
Description
Plot attributable rates by year and geography with confidence intervals.
Usage
hc_plot_ar_heat_yearly(
attr_yr_list,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of yearly attributable rates per geography.
Plot total attributable fractions and rates - low temperatures
Description
Plot total attributable fractions and rates over the whole time series by geography.
Usage
hc_plot_attr_cold_totals(
df_list,
res_attr_tot,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
res_attr_tot |
Matrix containing total attributable fractions, numbers and rates for each geography over the whole time series. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of total attributable fractions and rates by geography
Plot total attributable fractions and rates - high temperatures
Description
Plot total attributable fractions and rates over the whole time series by geography.
Usage
hc_plot_attr_heat_totals(
df_list,
res_attr_tot,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
res_attr_tot |
Matrix containing total attributable fractions, numbers and rates for each geography over the whole time series. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of total attributable fractions and rates by geography.
Plot statistical power for temperature mortality analysis
Description
Plots the power statistic for each reference temperature at and above the attributable risk threshold for each geography.
Usage
hc_plot_power(
power_list_high,
power_list_low,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
power_list_high |
List. A list containing power information for high temperatures by geography. |
power_list_low |
List. A list containing power information for low temperatures by geography. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Character. Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. The name of the country for national level estimates. Defaults to 'National'. |
Value
Plots of power by temperature for the attributable thresholds and above for each geography.
Plot results of relative risk analysis
Description
Plots cumulative lag exposure-response function with histogram of temperature distribution for each geography.
Usage
hc_plot_rr(
df_list,
pred_list,
attr_thr_high = 97.5,
attr_thr_low = 2.5,
minpercgeog_,
country = "National",
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
pred_list |
A list containing predictions from the model by geography. |
attr_thr_high |
Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5. |
attr_thr_low |
Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5. |
minpercgeog_ |
Vector. Percentile of minimum mortality temperature for each geography. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of cumulative lag exposure-response function with histogram of temperature distribution for each geography.
Run predictions from model
Description
Use model to run predictions. Predictions can be produced for a single input geography, or multiple disaggregated geographies.
Usage
hc_predict_subnat(
df_list,
var_fun = "bs",
var_per = c(10, 75, 90),
var_degree = 2,
mintempgeog_,
blup,
coef_,
vcov_,
meta_analysis = FALSE
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90). |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
mintempgeog_ |
Vector. Percentile of minimum mortality temperature for each geography. |
blup |
A list. BLUP (best linear unbiased predictions) from the meta-analysis model for each geography. |
coef_ |
A matrix of coefficients for the reduced model. |
vcov_ |
A list. Covariance matrices for each geography for the reduced model. |
meta_analysis |
Boolean. Whether to perform a meta-analysis. Defaults to FALSE. |
Value
'pred_list'. A list containing predictions by geography.
Define and run quasi-Poisson regression with DLNM
Description
Fits a quasi-Poisson case-crossover with a distributed lag non-linear model.
Usage
hc_quasipoisson_dlnm(df_list, control_cols = NULL, cb_list, dfseas = 8)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
control_cols |
List. Confounders to include in the final model adjustment. Defaults to NULL. |
cb_list |
List of cross-basis matrices from hc_create_crossbasis function. |
dfseas |
Integer. Degrees of freedom for seasonality. Defaults to 8. |
Value
'model_list'. List containing models by geography.
Read temperature-related mortality indicator data
Description
Reads in data and geography names for analysis from a CSV file.
Usage
hc_read_data(
input_csv_path,
dependent_col,
date_col,
region_col,
temperature_col,
population_col
)
Arguments
input_csv_path |
Path to a CSV containing a daily time series of health outcome and climate data per geography. |
dependent_col |
Character. Name of the column in the dataframe containing the dependent health outcome variable e.g. deaths. |
date_col |
Character. Name of the column in the dataframe containing the date. |
region_col |
Character. Name of the column in the dataframe that contains the geography name(s). |
temperature_col |
Character. Name of the column in the dataframe that contains the temperature column. |
population_col |
Character. Name of the column in the dataframe that contains the population estimate per geography. |
Value
'df_list'. A list of dataframes for each geography with formatted and renamed columns.
Produce cumulative relative risk results of analysis
Description
Produces cumulative relative risk and confidence intervals from analysis.
Usage
hc_rr_results(
pred_list,
df_list,
minpercgeog_,
attr_thr_high = 97.5,
attr_thr_low = 2.5
)
Arguments
pred_list |
A list containing predictions from the model by geography. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography. |
minpercgeog_ |
Vector. Percentile of minimum mortality temperature for each geography. |
attr_thr_high |
Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5. |
attr_thr_low |
Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5. |
Value
'rr_results'. Dataframe containing cumulative relative risk and confidence intervals from analysis.
Save results of analysis
Description
Saves a CSV file of cumulative relative risk and confidence intervals.
Usage
hc_save_results(
rr_results,
res_attr_tot,
attr_yr_list,
attr_mth_list,
power_list_high,
power_list_low,
output_folder_path = NULL
)
Arguments
rr_results |
Dataframe containing cumulative relative risk and confidence intervals from analysis. |
res_attr_tot |
Matrix containing total attributable fractions, numbers and rates for each geography over the whole time series. |
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography. |
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography. |
output_folder_path |
Path to folder where results should be saved. Defaults to NULL. |
Install the INLA Package from Its Official Repository
Description
This function installs the INLA package from its official repository
at https://inla.r-inla-download.org/R/stable/. On Windows, it checks
whether Rtools is available and installs the official binary package directly.
Usage
install_INLA(os = .Platform$OS.type)
Arguments
os |
The current operating system. Defaults to |
Details
On Windows systems, the function verifies that Rtools is installed using
pkgbuild::has_build_tools(). If Rtools is missing, it displays a warning
and aborts the installation. The function then installs the matching Windows
binary package from the official INLA repository.
On non-Windows systems, the package is installed normally from the repository.
Value
Invisibly returns NULL. The function is called for its side effect.
Examples
## Not run:
install_INLA()
## End(Not run)
Install the terra Package from the CRAN Archive
Description
This function installs the terra package at version 1.8-60 from the
CRAN archive.
Usage
install_terra(os = .Platform$OS.type)
Arguments
os |
The current operating system. Defaults to |
Details
On Windows systems, the function verifies that Rtools is installed using
pkgbuild::has_build_tools(). If Rtools is missing, it displays a warning
and aborts the installation. The function then forces installation from source.
Value
Invisibly returns NULL. The function is called for its side effect.
Examples
## Not run:
install_terra()
## End(Not run)
Check if an error is a climate_error
Description
Utility function to check if a caught condition is a typed climate error.
Usage
is_climate_error(error)
Arguments
error |
A condition object |
Value
TRUE if the error inherits from "climate_error", FALSE otherwise.
Examples
tryCatch({
stop("example error")
}, error = function(e) {
if (is_climate_error(e)) {
# Handle structured error
} else {
# Handle untyped error
}
})
Join monthly PM2.5 estimates with attributable risk data by region and time
Description
Aggregates PM2.5 data to monthly averages by region and joins it with attributable risk data using year, month, and region as keys.
Usage
join_ar_and_pm_monthly(pm_data, an_ar_data)
Arguments
pm_data |
A data frame with columns: year, month, region, mean_PM. Represents monthly PM2.5 estimates. |
an_ar_data |
A data frame with columns: year, month, region. Represents attributable risk or fraction data to be joined with PM2.5 estimates. |
Value
A data frame with monthly average PM2.5 values joined to attributable risk data.
Join health and climate data
Description
Joins a daily time series of wildfire PM2.5 data with a daily time series of health data.
Usage
join_health_and_climate_data(
climate_data,
health_data,
region_col = "region",
date_col = "date",
exposure_col = "mean_PM"
)
Arguments
climate_data |
Character. Dataframe containing a daily time series of climate data, which may be disaggregated by region. |
health_data |
Character. Path to a CSV file containing a daily time series of data for a particular health outcome, which may be disaggregated by region. |
region_col |
Character. Name of the region column in both datasets. Defaults to 'region' |
date_col |
Character. Name of the date column in both datasets. Defaults to 'date' |
exposure_col |
Character. Name of the column in the climate data containing the exposure column (e.g., PM2.5) in kilograms. Defaults to 'mean_PM'. |
Value
Dataframe containing a daily time series of the joined climate and health data.
Append the units to a column label.
Description
Append the units to a column label.
Usage
label_with_unit(col, units)
Arguments
col |
Character. The column name. |
units |
Named Character vector. A vector of units (str) that map to columns. |
Value
The new column label containing units (if col in units).
Read in climate, environmental and health data and rename columns
Description
Reads in a CSV file for a daily time series of climate, environmental and health data and renames them to standardised names. This function creates year, month, day, and day of week columns derived from the date.
Usage
load_air_pollution_data(
data_path,
date_col = "date",
region_col = "region",
pm25_col = "pm25",
deaths_col = "deaths",
population_col = "population",
humidity_col = "humidity",
precipitation_col = "precipitation",
tmax_col = "tmax",
wind_speed_col = "wind_speed",
categorical_others = NULL,
continuous_others = NULL
)
Arguments
data_path |
Path to a CSV file containing a daily time series of data. |
date_col |
Character. Name of date column in the dataframe with format YYYY-MM-DD. Defaults to "date". |
region_col |
Character. Name of region column in the dataframe. Defaults to "region". |
pm25_col |
Character. Name of PM2.5 column in the dataframe. Defaults to "pm25". |
deaths_col |
Character. Name of all-cause mortality column in the dataframe (Note that deaths_col variable has value 1 for each recorded death). 'Defaults to "deaths" |
population_col |
Character. Name of population column in the dataframe. This is REQUIRED for calculating Attributable Rate (AR). Defaults to "population". |
humidity_col |
Character. Name of humidity column in the dataframe. Defaults to "humidity". |
precipitation_col |
Character. Name of precipitation column in the dataframe. Defaults to "precipitation". |
tmax_col |
Character. Name of maximum temperature column in the dataframe. Defaults to "tmax". |
wind_speed_col |
Character. Name of wind speed column in the dataframe. Defaults to "wind_speed". |
categorical_others |
Optional. Character vector of additional categorical variables (e.g., "sex", "age_group"). Defaults to NULL. |
continuous_others |
Optional. Character vector of additional continuous variables (e.g., "tmean"). Defaults to NULL. |
Value
Dataframe with formatted and renamed with standardized column names.
Read in and format climate data
Description
Read in a monthly time series of climate data, rename columns and create lag variable for spatiotemporal and DLNM analysis. The climate data should start a year before a start year in the health data to allow the lag variables calculation.
Usage
load_and_process_climatedata(
climate_data_path,
district_col,
year_col,
month_col,
tmin_col,
tmean_col,
tmax_col,
rainfall_col,
r_humidity_col,
runoff_col = NULL,
ndvi_col = NULL,
spi_col = NULL,
max_lag
)
Arguments
climate_data_path |
Path to a csv file containing a monthly time series of data for climate variables, which may be disaggregated by district. |
district_col |
Character. Name of the column in the dataframe that contains the region names. |
year_col |
Character. Name of the column in the dataframe that contains the Year. |
month_col |
Character. Name of the column in the dataframe that contains the month. |
tmin_col |
Character. Name of the column in the dataframe that contains the minimum temperature data. |
tmean_col |
Character. Name of the column in the dataframe that contains the average temperature. |
tmax_col |
Character. Name of the column in the dataframe that contains the maximum temperature. |
rainfall_col |
Character. Name of the column in the dataframe that contains the cumulative monthly rainfall. |
r_humidity_col |
Character. Name of the column in the dataframe that contains the relative humidity. |
runoff_col |
Character. Name of the column in the dataframe that contains the monthly runoff water data. Defaults to NULL. |
ndvi_col |
Character. Name of column containing the Normalized Difference Vegetation Index (ndvi) data. Defaults to NULL. |
spi_col |
Character. Name of the column in the dataframe that contains the standardized precipitation index. Defaults to NULL. |
max_lag |
Character. Number corresponding to the maximum lag to be considered for the delay effect. It should be between 2 an 4. Defaults to 4. |
Value
climate dataframe with formatted and renamed columns, and the lag variables
Read in and format health data - diseases cases type
Description
Read in a csv file containing a monthly time series of health outcomes and population data. Renames columns and creates time variables for spatiotemporal analysis.
Usage
load_and_process_data(
health_data_path,
region_col,
district_col,
date_col = NULL,
year_col = NULL,
month_col = NULL,
case_col,
case_type,
tot_pop_col
)
Arguments
health_data_path |
Path to a csv file containing a monthly time series of data for health outcome case type, which may be disaggregated by sex (under five case or above five case), and by Region and District. |
region_col |
Character. Name of the column in the dataframe that contains the region names. |
district_col |
Character. Name of the column in the dataframe that contains the district names. |
date_col |
Character. Name of the column in the dataframe that contains the date. Defaults to NULL. |
year_col |
Character. Name of the column in the dataframe that contains the year. |
month_col |
Character. Name of the column in the dataframe that contains the month. |
case_col |
Character. Name of the column in the dataframe that contains the disease cases to be considered. |
case_type |
Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'. |
tot_pop_col |
Character. Name of the column in the dataframe that contains the total population. |
Value
A dataframe with formatted and renamed columns.
Read in and format country map data
Description
: Read in a shape file, rename columns and create the adjacency matrix for spatiotemporal analysis.
Usage
load_and_process_map(
map_path,
region_col,
district_col,
geometry_col,
output_dir = NULL
)
Arguments
map_path |
The path to the country's geographic data (shape file "sf" data). |
region_col |
Character. The region column in the dataframe. |
district_col |
Character. The district column in the dataframe. |
geometry_col |
Character. The geometry column in the dataframe. |
output_dir |
Character. The path to output the processed adjacency (neighboring) matrix, and the map graph. |
Value
'map' The processed map
'nb.map'
'graph_file'
Load wildfire and health data
Description
Loads a dataframe containing a daily time series of health and climate data, which may be disaggregated by region.
Usage
load_wildfire_data(
health_path,
ncdf_path,
shp_path,
join_wildfire_data = TRUE,
date_col,
region_col,
shape_region_col = NULL,
mean_temperature_col,
health_outcome_col,
population_col = NULL,
rh_col = NULL,
wind_speed_col = NULL,
pm_2_5_col = NULL
)
Arguments
health_path |
Path to a CSV file containing a daily time series of data for a particular health outcome, which may be disaggregated by region. If this does not include a column with wildfire-related PM2.5, use join_wildfire_data = TRUE to join these data. |
ncdf_path |
Path to a NetCDF file containing a daily time series of gridded wildfire-related PM2.5 concentration data. |
shp_path |
Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5 |
join_wildfire_data |
Boolean. If TRUE, a daily time series of wildfire-related PM2.5 concentration is joined to the health data. If FALSE, the data set is loaded without any additional joins. |
date_col |
Character. Name of the column in the dataframe that contains the date. |
region_col |
Character. Name of the column in the dataframe that contains the region names. |
shape_region_col |
Character. Name of the column in the shapefile dataframe that contains the region names. |
mean_temperature_col |
Character. Name of the column in the dataframe that contains the mean temperature column. |
health_outcome_col |
Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions) |
population_col |
Character. Name of the column in the dataframe that
contains the population data. Defaults to NULL. If omitted, a |
rh_col |
Character. Name of the column in the dataframe that contains daily relative humidity values.Defaults to NULL. |
wind_speed_col |
Character. Name of the column in the dataframe that contains the daily windspeed values.Defaults to NULL. |
pm_2_5_col |
Character. The name of the column containing PM2.5 values in micrograms. This is only required if health data isn't joined. Defaults to NULL. |
Value
Dataframe containing a daily time series of climate and health data.
Code for calculating Malaria disease cases attributable to extreme rainfall and extreme temperature Run Full Malaria-Climate Analysis Pipeline
Description
The Malaria_do_analysis() function executes the complete workflow for analyzing
the association between malaria cases and climate variables. It integrates
health, climate, and spatial data; fits spatio-temporal models using INLA;
and generates a suite of diagnostic and inferential outputs, including plots
and attributable risk estimates.
Usage
malaria_do_analysis(
health_data_path,
climate_data_path,
map_path,
region_col,
district_col,
date_col = NULL,
year_col,
month_col,
case_col,
tot_pop_col,
tmin_col,
tmean_col,
tmax_col,
rainfall_col,
r_humidity_col,
runoff_col,
geometry_col,
spi_col = NULL,
ndvi_col = NULL,
max_lag = 2,
nk = 2,
basis_matrices_choices,
inla_param,
param_term,
level,
param_threshold = 1,
filter_year = NULL,
family = "nbinomial",
group_by_year = FALSE,
cumulative = FALSE,
config = FALSE,
save_csv = FALSE,
save_model = FALSE,
save_fig = FALSE,
output_dir = NULL
)
Arguments
health_data_path |
Character. Path to the processed health data file. |
climate_data_path |
Character. Path to the processed climate data file. |
map_path |
Character. Path to the spatial data file (e.g., shapefile). |
region_col |
Character. Column name for the region variable. |
district_col |
Character. Column name for the district variable. |
date_col |
Character (optional). Column name for the date variable.
Defaults to |
year_col |
Character. Column name for the year variable. |
month_col |
Character. Column name for the month variable. |
case_col |
Character. Column name for malaria case counts. |
tot_pop_col |
Character. Column name for total population. |
tmin_col |
Character. Column name for minimum temperature. |
tmean_col |
Character. Column name for mean temperature. |
tmax_col |
Character. Column name for maximum temperature. |
rainfall_col |
Character. Column name for cumulative monthly rainfall. |
r_humidity_col |
Character. Column name for relative humidity. |
runoff_col |
Character. Column name for monthly runoff data. |
geometry_col |
Character. Column name of the geometry column in the
shapefile (usually |
spi_col |
Character (optional). Column name for the Standardized
Precipitation Index (SPI). Defaults to |
ndvi_col |
Character (optional). Column name for the Normalized Difference
Vegetation Index (NDVI). Defaults to |
max_lag |
Numeric. Maximum temporal lag to include in the distributed
lag model (e.g., |
nk |
Numeric. Number of internal knots for the natural spline of
each predictor, controlling its flexibility: |
basis_matrices_choices |
Character vector. Specifies which climate variables
to include in the basis matrix (e.g., |
inla_param |
Character vector. Specifies exposure variables included in
the INLA model (e.g., |
param_term |
Character or vector. Exposure variable(s) of primary interest
for relative risk and attribution (e.g., |
level |
Character. Spatial disaggregation level; must be one of
|
param_threshold |
Numeric. Threshold above which exposure is considered
"attributable." Defaults to |
filter_year |
Integer or vector (optional). Year(s) to filter the data by.
Defaults to |
family |
Character. Probability distribution for the outcome variable.
Options include |
group_by_year |
Logical. Whether to group attributable metrics by year.
Defaults to |
cumulative |
Boolean. If TRUE, plot and save cumulative risk of all year for the specific exposure at region and district level. Defaults to FALSE. |
config |
Logical. Whether to enable additional INLA model configurations.
Defaults to |
save_csv |
Logical. If |
save_model |
Logical. If |
save_fig |
Logical. If |
output_dir |
Character. Directory where output files (plots, datasets, maps)
are saved. Defaults to |
Value
A named list containing:
-
inla_result- Fitted INLA model object and summaries. -
plot_malaria,plot_tmax,plot_rainfall- Exploratory time-series plots. -
reff_plot_monthly- Monthly random effects plot. -
reff_plot_yearly- Yearly spatial random effects plot. -
contour_plot- Exposure-response contour plot. -
rr_map_plot- Spatial relative risk map. -
rr_plot,rr_df- Relative risk plot and associated data. -
attr_frac_num- Attributable risk summary table. -
plot_AR_num,plot_AR_frac,plot_AR_per_100k- Plots of attributable number, fraction, and rate.
Process data for national analysis
Description
Aggregate to national data and run crossbasis
Usage
mh_add_national_data(
df_list,
pop_list,
var_fun = "bs",
var_per = c(25, 50, 75),
var_degree = 2,
lag_fun = "strata",
lag_breaks = 1,
lag_days = 2,
country = "National",
cb_list,
mm,
minpercreg
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
pop_list |
List of population totals by year and region. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75). |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
lag_fun |
Character. Exposure function for arglag (see dlnm::crossbasis). Defaults to 'strata'. |
lag_breaks |
Integer. Internal cut-off point defining the strata for arglag (see dlnm::crossbasis). Defaults to 1. |
lag_days |
Integer. Maximum lag. Defaults to 2. (see dlnm::crossbasis). |
country |
Character. Name of country for national level estimates. |
cb_list |
A list of cross-basis matrices by region. |
mm |
A model object. A multivariate meta-analysis model. |
minpercreg |
Vector. Percentile of maximum suicide temperature for each region. |
Value
-
df_listList. A list of data frames for each region and nation. -
cb_listList. A list of cross-basis matrices by region and nation. -
minpercregVector. Percentile of minimum suicide temperature for each region and nation. -
mmpredallList. A list of national coefficients and covariance matrices.
Estimate attributable numbers
Description
Estimate attributable numbers for each region and confidence intervals using Monte Carlo simulations.
Usage
mh_attr(df_list, cb_list, pred_list, minpercreg, attr_thr = 97.5)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
cb_list |
A list of cross-basis matrices by region. |
pred_list |
A list containing predictions from the model by region. |
minpercreg |
Vector. Percentile of maximum suicide temperature for each region. |
attr_thr |
Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5. |
Value
A list containing attributable numbers per region
Create attributable estimates tables
Description
Aggregate tables of attributable numbers, rates and fractions for total, yearly and monthly by region and nation
Usage
mh_attr_tables(attr_list, country = "National", meta_analysis = FALSE)
Arguments
attr_list |
A list containing attributable numbers per region. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
meta_analysis |
Boolean. Whether to perform a meta-analysis. Defaults to FALSE. |
Value
-
res_attr_totDataframe. Total attributable fractions, numbers and rates for each area over the whole time series. -
attr_yr_listList. Dataframes containing yearly estimates of attributable fractions, numbers and rates by area. -
attr_mth_listList. Dataframes containing total attributable fractions, numbers and rates by calendar month and area.
Quasi-Poisson Case-Crossover model with DLNM
Description
Fits a quasi-Poisson case-crossover with a distributed lag non-linear model
Usage
mh_casecrossover_dlnm(df_list, control_cols = NULL, cb_list)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
control_cols |
A list of confounders to include in the final model adjustment. Defaults to NULL if none. |
cb_list |
List of cross_basis matrices from create_crossbasis function. |
Value
List containing models by region
Create cross-basis matrix
Description
Creates a cross-basis matrix for each region
Usage
mh_create_crossbasis(
df_list,
var_fun = "bs",
var_degree = 2,
var_per = c(25, 50, 75),
lag_fun = "strata",
lag_breaks = 1,
lag_days = 2
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75). |
lag_fun |
Character. Exposure function for arglag (see dlnm::crossbasis). Defaults to 'strata'. |
lag_breaks |
Integer. Internal cut-off point defining the strata for arglag (see dlnm::crossbasis). Defaults to 1. |
lag_days |
Integer. Maximum lag. Defaults to 2. (see dlnm::crossbasis). |
Value
A list of cross-basis matrices by region
Produce check results of model combinations
Description
Runs every combination of model based on user selected additional independent variables and returns model diagnostic checks for each.
Usage
mh_model_combo_res(df_list, cb_list, independent_cols = NULL)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
cb_list |
List of cross_basis matrices from create_crossbasis function. |
independent_cols |
Additional independent variables to test in model validation as confounders. |
Value
-
qaic_resultsA dataframe of QAIC and dispersion metrics for each model combination. -
residuals_listA list. Residuals for each model combination.
Model Validation Assessment
Description
Produces results on QAIC for each model combination, variance inflation factor for each independent variable, and plots for residuals to assess the models
Usage
mh_model_validation(
df_list,
cb_list,
independent_cols = NULL,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = NULL,
seed = NULL
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
cb_list |
List of cross_basis matrices from create_crossbasis function. |
independent_cols |
Additional independent variables to test in model validation as confounders. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
save_csv |
Boolean. Whether to save the results as a CSV. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
-
qaic_resultsA dataframe of QAIC and dispersion metrics for each model combination and geography. -
qaic_summaryA dataframe with the mean QAIC and dispersion metrics for each model combination. -
vif_resultsA dataframe. Variance inflation factors for each independent variables by region. -
vif_summaryA dataframe with the mean variance inflation factors for each independent variable.
Plot attributable fractions by calendar month
Description
Plot attributable fractions grouped over the whole time series by calendar month to explore seasonality.
Usage
mh_plot_af_monthly(
attr_mth_list,
df_list,
country = "National",
attr_thr = 97.5,
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and area. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
attr_thr |
Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of attributable fractions by calendar month per area
Plot attributable fractions by year
Description
Plot attributable fractions by year and area with confidence intervals
Usage
mh_plot_af_yearly(
attr_yr_list,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by area |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National' |
Value
Plots of yearly attributable fractions per area
Plot attributable rates by calendar month
Description
Plot attributable rates grouped over the whole time series by calendar month to explore seasonality.
Usage
mh_plot_ar_monthly(
attr_mth_list,
df_list,
country = "National",
attr_thr = 97.5,
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and area. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
attr_thr |
Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of attributable rates by calendar month per area
Plot attributable rates by year
Description
Plot attributable rates by year and area with confidence intervals
Usage
mh_plot_ar_yearly(
attr_yr_list,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by area |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of yearly attributable rates per area
Plot total attributable fractions and rates
Description
Plot total attributable fractions and rates over the whole time series by area.
Usage
mh_plot_attr_totals(
df_list,
res_attr_tot,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
res_attr_tot |
Matrix containing total attributable fractions, numbers and rates for each area over the whole time series. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
Value
Plots of total attributable fractions and rates by area
Plot power
Description
Plots the power statistic for each reference temperature at and above the attributable risk threshold for each area.
Usage
mh_plot_power(
power_list,
save_fig = FALSE,
output_folder_path = NULL,
country = "National"
)
Arguments
power_list |
A list containing power information by area. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
country |
Character. Name of country for national level estimates. Defaults to 'National' |
Value
Plots of power by temperature for the attributable threshold and above for each area.
Plot results of relative risk analysis - Mental Health
Description
Plots cumulative lag exposure-response function with histogram of temperature distribution for each region
Usage
mh_plot_rr(
df_list,
pred_list,
attr_thr = 97.5,
minpercreg,
country = "National",
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
pred_list |
A list containing predictions from the model by region. |
attr_thr |
Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5. |
minpercreg |
Vector. Percentile of minimum suicide temperature for each area. |
country |
Character. Name of country for national level estimates. Defaults to 'National'. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
Value
Plots of cumulative lag exposure-response function with histogram of temperature distribution for each region
Run regional predictions from model
Description
Use model to run regional predictions
Usage
mh_predict_reg(
df_list,
var_fun = "bs",
var_per = c(25, 50, 75),
var_degree = 2,
minpercreg,
blup,
coef_,
vcov_,
meta_analysis = FALSE
)
Arguments
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75). |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic). |
minpercreg |
Vector. Percentile of maximum suicide temperature for each region. |
blup |
A list. BLUP (best linear unbiased predictions) from the meta-analysis model for each region. |
coef_ |
A matrix of coefficients for the reduced model. |
vcov_ |
A list. Covariance matrices for each region for the reduced model. |
meta_analysis |
Boolean. Whether to perform a meta-analysis. |
Value
A list containing predictions by region
Read in and format data - Mental Health
Description
Reads in a CSV file for a daily time series of health and climate data, renames columns and creates stratum for case-crossover analysis.
Usage
mh_read_and_format_data(
data_path,
date_col,
region_col = NULL,
temperature_col,
health_outcome_col,
population_col
)
Arguments
data_path |
Path to a csv file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by region. |
date_col |
Character. Name of the column in the dataframe that contains the date. |
region_col |
Character. Name of the column in the dataframe that contains the region names. Defaults to NULL. |
temperature_col |
Character. Name of the column in the dataframe that contains the temperature column. |
health_outcome_col |
Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions). |
population_col |
Character. Name of the column in the dataframe that contains the population estimate coloumn. |
Value
A list of dataframes with formatted and renamed columns.
Produce cumulative relative risk results of analysis
Description
Produces cumulative relative risk and confidence intervals from analysis.
Usage
mh_rr_results(pred_list, df_list, attr_thr = 97.5, minpercreg)
Arguments
pred_list |
A list containing predictions from the model by region. |
df_list |
A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region. |
attr_thr |
Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5. |
minpercreg |
Vector. Percentile of minimum suicide temperature for each area. |
Value
Dataframe containing cumulative relative risk and confidence intervals from analysis.
Save results of analysis - Mental Health
Description
Saves a CSV file of cumulative relative risk and confidence intervals.
Usage
mh_save_results(
rr_results,
res_attr_tot,
attr_yr_list,
attr_mth_list,
power_list,
output_folder_path = NULL
)
Arguments
rr_results |
Dataframe containing cumulative relative risk and confidence intervals from analysis. |
res_attr_tot |
Matrix containing total attributable fractions, numbers and rates for each area over the whole time series. |
attr_yr_list |
A list of matrices containing yearly estimates of attributable fractions, numbers and rates by area |
attr_mth_list |
A list of data frames containing total attributable fractions, numbers and rates by calendar month and area. |
power_list |
A list containing power information by area. |
output_folder_path |
Path to folder where results should be saved. Defaults to NULL. |
Plot relative risk results by region (if available).
Description
Plots relative risk and confidence intervals for each lag value of wildfire-related PM2.5
Usage
plot_RR(
rr_data,
wildfire_lag,
by_region = FALSE,
save_fig = FALSE,
output_folder_path = NULL
)
Arguments
rr_data |
Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5 |
wildfire_lag |
Integer. The maximum number of days for which to plot the lags for wildfire PM2.5. Defaults to 3. |
by_region |
Bool. Whether to plot RR(relative risk) by region. Defaults to FALSE |
save_fig |
Boolean. Whether to save the generated plot. Defaults to FALSE. |
output_folder_path |
Path to folder where plots should be saved. |
Value
Plot of relative risk and confidence intervals for each lag of wildfire-related PM2.5
Core functionality for plotting results of relative risk analysis.
Description
Plots relative risk and confidence intervals for each lag value of wildfire-related PM2.5.
Usage
plot_RR_core(
rr_data,
save_fig = FALSE,
wildfire_lag,
output_folder_path = NULL,
region_name = "All regions",
ylims = NULL
)
Arguments
rr_data |
Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
wildfire_lag |
Integer. The maximum number of days for which to plot the lags for wildfire PM2.5. Defaults to 3. |
output_folder_path |
Path to folder where plots should be saved. Defaults to NULL. |
region_name |
Character. The name of the region. Defaults to 'All regions'. |
Value
Plot of relative risk and confidence intervals for each lag of wildfire-related PM2.5.
Plots attributable fractions and CI across years by regions
Description
Generates a PDF containing one or more plots of average attributable fractions over time. If by_region is TRUE, the function creates separate plots for each region. All plots are saved to a single PDF file named "aggregated_AF_by_region.pdf" in the specified output_dir.
Usage
plot_aggregated_AF(data, by_region = FALSE, output_dir = ".")
Arguments
data |
A data frame containing annual attributable fraction estimates. Must include columns: year, average_attributable_fraction, lower_ci_attributable_fraction, upper_ci_attributable_fraction. If by_region is TRUE, must also include region. |
by_region |
Logical. If TRUE, plots are generated per region using region. Defaults to FALSE. |
output_dir |
Character. Directory path where the PDF file will be saved. Must exist. Defaults to ".". |
Value
No return value. A PDF file is created.
Create a plot of aggregated annual attributable fractions with CI
Description
Aggregates annual average attributable fraction estimates and generates a ggplot showing the central estimate and CI.
Usage
plot_aggregated_AF_core(data, region_name = NULL)
Arguments
data |
A data frame with columns: year, average_attributable_fraction, lower_ci_attributable_fraction, and upper_ci_attributable_fraction. |
region_name |
Optional character string used to label the plot title with a region name. Defaults to NULL. |
Value
A ggplot object showing annual attributable rates with confidence intervals.
Combined AN and AR plots by region
Description
Creates both Attributable Number (AN) and Attributable Rate (AR) bar charts by region in a single function call.
Usage
plot_air_pollution_an_ar_by_region(
analysis_results,
max_lag = 14L,
include_national = TRUE,
output_dir = NULL,
save_plot = FALSE
)
Arguments
analysis_results |
Results from analyze_air_pollution_daily |
max_lag |
Integer. Maximum lag. Defaults to 14. |
include_national |
Logical. Whether to include national results. Default TRUE. |
output_dir |
Character. Directory to save plot |
save_plot |
Logical. Whether to save |
Value
List with two ggplot objects: an_plot and ar_plot
Plot the AN and AR by year
Description
Creates both Attributable Number (AN) and Attributable Rate (AR) plots aggregated by year in a single function call.
Usage
plot_air_pollution_an_ar_by_year(
analysis_results,
max_lag = 14L,
include_national = TRUE,
output_dir = NULL,
save_plot = FALSE
)
Arguments
analysis_results |
Results from analyze_air_pollution_daily |
max_lag |
Integer. Maximum lag. Defaults to 14. |
include_national |
Logical. Whether to include national results. Default TRUE. |
output_dir |
Character. Directory to save plot |
save_plot |
Logical. Whether to save |
Value
List with two ggplot objects: an_plot and ar_plot
Combined Monthly Time Series Plots of AN and AR
Description
Creates both Attributable Number (AN) and Attributable Rate (AR) monthly time series plots in a single function call.
Usage
plot_air_pollution_an_ar_monthly(
analysis_results,
max_lag = 14L,
include_national = TRUE,
output_dir = NULL,
save_plot = FALSE
)
Arguments
analysis_results |
Results from analyze_air_pollution_daily |
max_lag |
Integer. Maximum lag used in analysis. Defaults to 14. |
include_national |
Logical. Whether to include national results. Default TRUE. |
output_dir |
Character. Directory to save plot |
save_plot |
Logical. Whether to save the plot |
Value
List with two ggplot objects: an_plot and ar_plot
Plot exposure-response relationship with confidence intervals by region
Description
Creates faceted exposure-response plots showing RR with confidence intervals across PM2.5 concentrations for each region
Usage
plot_air_pollution_exposure_response(
analysis_results,
max_lag = 14L,
include_national = TRUE,
ref_pm25 = 15,
output_dir = NULL,
save_plot = FALSE
)
Arguments
analysis_results |
Processed results with RR/AF/AN/AR with lag variables |
include_national |
Logical. Whether to include national results. Default TRUE. |
ref_pm25 |
Numeric. Reference PM2.5 value to highlight. |
output_dir |
Character. Directory to save plot. |
save_plot |
Logical. Whether to save the plot. |
Value
ggplot object
Plot Relative Risk (RR) by lag
Description
Plot Relative Risk (RR) by lag
Usage
plot_air_pollution_forest_by_lag(
analysis_results,
max_lag = 14L,
output_dir = NULL,
save_plot = FALSE
)
Arguments
analysis_results |
Processed results with RR/AF/AN/AR with lag variables |
max_lag |
Integer. Maximum lag days. Defaults to 14. |
output_dir |
Character. Directory to save plot. Defaults to NULL. |
save_plot |
Logical. Whether to save the plot. Defaults to FALSE. |
Value
ggplot object
Plot forest plot for PM2.5 effects by region
Description
Plot forest plot for PM2.5 effects by region
Usage
plot_air_pollution_forest_by_region(
analysis_results,
max_lag = 14L,
include_national = TRUE,
output_dir = NULL,
save_plot = FALSE
)
Arguments
analysis_results |
Processed results with RR/AF/AN/AR with lag variables |
max_lag |
Integer. The maximum lag days for outdoor PM2.5. Defaults to 14. |
include_national |
Logical. Whether to include national results. Default TRUE. |
output_dir |
Character. Directory to save plot. Defaults to NULL. |
save_plot |
Logical. Whether to save the plot. Defaults to FALSE. |
Value
ggplot object
Plot histograms for AN and AR by month
Description
Creates histogram plots for Attributable Number (AN) and Attributable Rate (AR) aggregated by month with connecting lines
Usage
plot_air_pollution_monthly_histograms(
analysis_results,
max_lag = 14L,
include_national = TRUE,
output_dir = NULL,
save_plot = FALSE
)
Arguments
analysis_results |
Processed results with RR/AF/AN/AR with lag variables |
include_national |
Logical. Whether to include national results. Default TRUE. |
output_dir |
Character. Directory to save plots. |
save_plot |
Logical. Whether to save the plots. |
Value
List with ggplot objects
Plot Power vs PM2.5 Concentration
Description
Plots the power statistic for each reference PM2.5 at and above the attributable risk threshold for each region.
Usage
plot_air_pollution_power(
power_list,
output_dir = NULL,
save_plot = FALSE,
ref_name = "WHO",
include_national = TRUE
)
Arguments
power_list |
A list containing power information by region. |
output_dir |
Character. Directory to save plot. Defaults to NULL. |
save_plot |
Logical. Whether to save the plot. Defaults to FALSE. |
ref_name |
Character. Reference standard name for plot title. |
include_national |
Logical. Whether to include national level in the plot. Defaults to TRUE. |
Value
Invisible list of plot information
Plot Total Attributable Number by Region
Description
Aggregates wildfire smoke-related PM2.5 attributable numbers by region and creates a bar plot showing the total attributable number of deaths per region.
Usage
plot_an_by_region(data, output_dir = ".")
Arguments
data |
A data frame containing columns:
|
output_dir |
A character string specifying the directory where the plot will be saved.
Defaults to the current working directory ( |
Value
A ggplot object representing the bar plot.
Plot Attributable Risk by Region
Description
Aggregates wildfire-specific PM2.5 attributable risk (deaths per 100k) by region and creates a bar plot showing the mean attributable risk per region.
Usage
plot_ar_by_region(data, output_dir = ".")
Arguments
data |
A data frame containing columns:
|
output_dir |
A character string specifying the directory where the plot will be saved.
Defaults to the current working directory ( |
Value
A ggplot object representing the bar plot.
Plot monthly deaths and PM2.5 concentrations with dual y-axes
Description
Aggregates data by month and creates a dual-axis plot showing average deaths per 100,000 and mean PM2.5 concentrations.
Usage
plot_ar_pm_monthly(data, save_outputs = FALSE, output_dir = NULL)
Arguments
data |
A data frame with columns: month, deaths_per_100k, and monthly_avg_pm25. Month names must match month.abb. |
save_outputs |
Logical. If TRUE, saves the plot as PNG and the aggregated data as CSV. Defaults to FALSE. |
output_dir |
Character. Directory path where outputs are saved if save_outputs is TRUE. Must exist. Defaults to NULL. |
Value
No return value. Generates a plot and optionally saves files.
Plot Attributable Health Metrics Across Spatial and Temporal Levels
Description
Visualizes attributable health metrics (e.g., attributable number, fraction, or rate)
derived from attribution_calculation() across different spatial scales and time periods.
The function automatically adapts plots to the selected spatial level (country, region,
or district) and handles both single- and multi-year visualizations.
It supports faceted, grouped, or aggregated visualizations and can optionally
save output plots as PDF files.
Usage
plot_attribution_metric(
attr_data,
level = c("country", "region", "district"),
metrics = c("AR_Number", "AR_Fraction", "AR_per_100k"),
filter_year = NULL,
param_term,
case_type,
save_fig = FALSE,
output_dir = NULL
)
Arguments
attr_data |
A data frame or tibble containing attribution results, typically
generated by the |
level |
Character. The spatial level for plotting. One of |
metrics |
Character vector specifying which metrics to plot.
Options include |
filter_year |
Optional integer or vector of integers to restrict the plots
to specific years. Defaults to |
param_term |
Character. The exposure variable term to evaluate (e.g., |
case_type |
Character. The type of disease that the case column refers to
(e.g., |
save_fig |
Logical. If |
output_dir |
Optional string. Directory path where output PDF files will be saved
when |
Details
This function produces publication-ready plots of attributable metrics:
-
Country level: Time series line plots with 95% confidence ribbons.
-
Region/District level (no filter): Horizontal bar plots showing aggregated metrics, grouped by administrative unit.
-
Region/District level (multi-year): Grouped bar plots comparing metrics across years.
The function automatically adjusts y-axis limits, formats numeric labels with commas,
and includes optional text annotations (e.g., showing both attributable numbers and fractions).
When save_fig = TRUE, one PDF file is created per metric and spatial level, and each file
may contain multiple pages if many regions or districts are present.
Value
A named list of ggplot or patchwork plot objects, grouped by metric.
Each element corresponds to one metric ("AR_Number", "AR_Fraction", "AR_per_100k")
and may include one or more plots, depending on the level and year filters.
Plot Average Monthly Attributable Health Metrics with Climate Overlays
Description
Visualizes average monthly attributable health metrics (e.g., attributable number,
fraction, or rate) derived from attribution analyses across different spatial scales.
The function automatically adapts plots to the selected spatial level (country,
region, or district) and summarizes seasonal patterns using monthly aggregation.
Optionally, corresponding monthly climate variables can be overlaid on a secondary
axis to support joint interpretation of health impacts and climate seasonality.
Usage
plot_avg_monthly(
attr_data,
level = c("country", "region", "district"),
metrics = c("AR_Number", "AR_per_100k", "AR_Fraction"),
c_data,
param_term,
case_type,
filter_year = NULL,
save_fig = FALSE,
output_dir = NULL
)
Arguments
attr_data |
A data frame or tibble containing attributable health metrics, typically
generated by an attribution workflow. Must include at least |
level |
Character. The spatial level for plotting. One of |
metrics |
Character. The attributable metrics to visualize. One or more of
|
c_data |
A data frame containing monthly climate variables corresponding
to the same spatial and temporal resolution as |
param_term |
Character string specifying the climate exposure variable
(e.g., |
filter_year |
Optional integer or vector of integers to restrict the analysis
to specific years prior to monthly aggregation. Defaults to |
save_fig |
Logical. If |
output_dir |
Optional character string specifying the directory where output
PDF files will be saved when |
Details
This function produces publication-ready visualizations of average monthly attributable health metrics:
-
Country level: A single bar plot summarizing national average monthly attribution patterns.
-
Region/District level: One bar plot per administrative unit, showing average monthly attribution, with automatic pagination when many units are present.
-
Climate overlay (optional): Monthly climate exposure plotted as a line on a secondary y-axis to facilitate comparison with seasonal health impacts.
Metric-specific aggregation rules (sum or mean) and numeric formatting are applied
automatically. Axis limits and breaks are dynamically adjusted to improve readability.
When save_fig = TRUE, a single PDF file is created per metric and spatial level,
with multiple pages used for region- or district-level outputs when necessary.
Value
A named list of ggplot objects. Each element corresponds to the country or an
individual region or district and contains a monthly attribution plot. The list
is returned invisibly when plots are saved to file.
Plot a grid of box plots for multiple numeric variables
Description
Plot a grid of box plots for multiple numeric variables
Usage
plot_boxplots(
df,
columns = NULL,
select_numeric = FALSE,
title = "Boxplots",
ylabs = NULL,
save_plot = FALSE,
output_path = NULL
)
Arguments
df |
The dataframe containing the data |
columns |
A character vector of numeric column names to plot |
select_numeric |
If TRUE, all numeric columns in |
title |
The overall title for the plot |
ylabs |
A character vector of y-axis labels (e.g., with units) corresponding to the columns. |
save_plot |
Whether to save the plot as a PDF |
output_path |
The file path to save the PDF (if save_plot is TRUE) |
Plot a correlation matrix include a heatmap.
Description
Plot a correlation matrix include a heatmap.
Usage
plot_correlation_matrix(matrix_, title, output_path)
Arguments
matrix_ |
The matrix to plot. |
title |
The title for the correlation matrix. |
output_path |
The path to output the plot to. |
Plot histograms of column distributions.
Description
Plot histograms of column distributions.
Usage
plot_distributions(
df,
columns,
title,
xlabs = NULL,
save_hists = FALSE,
output_path = NULL
)
Arguments
df |
The dataframe containing the data. |
columns |
The columns to plot distributions for. |
title |
The title of your plot. |
xlabs |
A character vector of x-axis labels (e.g., with units) corresponding to the columns. |
save_hists |
Whether to save the histograms to file. |
output_path |
The path to save your distributions to. |
Plot Time Series of Health and Climate Variables
Description
Generate time series plots for combined health and climate data prepared for spatiotemporal and DLNM analysis. Supports aggregation at the country, region, or district level.
Usage
plot_health_climate_timeseries(
data,
param_term,
level = "country",
filter_year = NULL,
case_type,
save_fig = FALSE,
output_dir = NULL
)
Arguments
data |
A data frame containing the combined health and climate data. |
param_term |
Character. The variable to plot (e.g., tmax, tmean, tmin, Malaria). Use "all" to include all available variables. |
level |
Character. Aggregation level: one of "country", "region", or "district". Defaults to "country". |
filter_year |
Optional numeric vector to filter data by year(s). Defaults to NULL. |
case_type |
Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'. |
save_fig |
Boolean. Whether to save the figure as a PDF. Defaults to FALSE. |
output_dir |
Character. Directory path to save the figure. Default to NULL |
Value
A ggplot object.
Visualise monthly random effects for selected INLA model
Description
Generates and saves a plot of monthly random effects for different regions, visualizing their contribution to Malaria Incidence Rate.
Usage
plot_monthly_random_effects(
combined_data,
model,
save_fig = FALSE,
output_dir = NULL
)
Arguments
combined_data |
Data list from combine_health_climate_data() function. |
model |
The fitted model object. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_dir |
Character. The path to save the visualisation to. Defaults to NULL. |
Value
THe monthly random effects plot.
Plot the moving average of a column.
Description
Plot the moving average of a column.
Usage
plot_moving_average(
df,
time_col,
value_col,
ma_days,
ma_sides,
title,
save_plot = FALSE,
output_path = "",
units = NULL
)
Arguments
df |
The dataframe containing the raw data. |
time_col |
The column name of the column containing the timeseries. |
value_col |
The column name of the column containing the value. |
ma_days |
The number of days to use for MA calculations. |
ma_sides |
The number of sides to use for MA calculations (1 or 2). |
title |
The title for your plot. |
save_plot |
Whether or not to save the plot. |
output_path |
The path to output the plot to. |
units |
A named character vector of units for each variable. |
Plot the rate of a dependent variable per 100,000 population per year.
Description
Plot the rate of a dependent variable per 100,000 population per year.
Usage
plot_rate_overall(
df,
dependent_col,
population_col,
date_col,
title,
save_rate = FALSE,
output_path = NULL
)
Arguments
df |
The dataframe containing the data. |
dependent_col |
The name of the column representing the dependent variable. |
population_col |
The name of the column representing the population. |
date_col |
The name of the column containing date values. |
title |
Character. The specific title for the subset of data being used. |
save_rate |
Whether to save the plot as a PDF. |
output_path |
The file path to save the plot if save_rate is TRUE. |
Plot regional trends of a climate and healthoutcome.
Description
Plot regional trends of a climate and healthoutcome.
Usage
plot_regional_trends(
df,
region_col,
outcome_cols,
title = "Regional Averages",
ylabs = NULL,
save_plot = FALSE,
output_path = ""
)
Arguments
df |
The dataframe containing the raw data. |
region_col |
The name of the column containing regions. |
outcome_cols |
Character Vector. The names of the outcome columns to analyse. |
title |
The title of your plot. |
ylabs |
A character vector of y-axis labels (e.g., with units) corresponding to the columns. |
save_plot |
Whether or not to save the plot. |
output_path |
The path to output the plot to. |
Read in Relative Risk plot at country, Region, and District level
Description
Plots the relative risk of Malaria cases by the maximum temperature and cumulative rainfall at country, Region and District level
Usage
plot_relative_risk(
data,
model,
param_term,
max_lag,
nk,
level,
case_type,
filter_year = NULL,
output_dir = NULL,
save_csv = FALSE,
save_fig = FALSE
)
Arguments
data |
Data list from combine_health_climate_data() function. |
model |
The fitted model from run_inla_models() function. |
param_term |
A character vector or list containing parameter terms such
as |
level |
A character vector specifying the geographical disaggregation.
Can take one of the following values: |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
filter_year |
Integer. The year to filter to data to. This gives the possibility to user to have the plot for a specific year. When Defaults to NULL, it provides the plot by grouping all the years in the dataset. |
output_dir |
Character. The path where the PDF file will be saved. Default to NULL. |
save_csv |
Boolean. If TRUE, saves the RR data to the specified directory. Defaults to FALSE. |
save_fig |
Boolean. If TRUE, saves the plot to the specified directory. Defaults to FALSE. |
Value
Relative risk plot at country, region, and district levels.
Plot relative risk by PM2.5 levels for all regions and individually
Description
Generates one or more plots showing relative risk estimates across PM2.5 levels. If multiple regions are present, plots are created per region and for all regions combined. Optionally saves the output as a PDF.
Usage
plot_rr_by_pm(data, save_fig = FALSE, output_dir = NULL)
Arguments
data |
A data frame with columns: pm_levels, relative_risk, ci_lower, ci_upper, and region. |
save_fig |
Logical. If TRUE, saves the plot(s) as a PDF file in output_dir. Defaults to FALSE. |
output_dir |
Character. Directory path where the PDF file will be saved if save_fig is TRUE. Must exist. Defaults to NULL. |
Value
No return value. Generates one or more plots and optionally saves them to disk.
Create a relative risk plot across PM2.5 levels for a single region
Description
Generates a ggplot showing relative risk estimates and confidence intervals across PM2.5 levels for a given region.
Usage
plot_rr_by_pm_core(data, region_name = "All Regions", ylims = c(-2, 2))
Arguments
data |
A data frame with columns: pm_levels, relative_risk, ci_lower, and ci_upper. |
region_name |
Optional character string used to label the plot title with a region name. Defaults to "All Regions". |
ylims |
Numeric vector of length 2 specifying y-axis limits. Defaults to c(-2, 2). |
Value
A ggplot object showing relative risk and CI.
Plot Relative Risk Map at sub-national Level
Description
Generates a map of the relative risk of the diseases cases associated with climate hazards, including extreme temperature and cumulative rainfall, at a specified geographical level (district or region).
Usage
plot_rr_map(
combined_data,
model,
param_term = "tmax",
max_lag,
nk,
level = "district",
case_type,
filter_year = NULL,
output_dir = NULL,
save_fig = FALSE,
save_csv = FALSE,
cumulative = FALSE
)
Arguments
combined_data |
A list returned from the |
model |
The fitted model object returned from the |
param_term |
A character vector or list specifying the climate parameters
(e.g., |
level |
A character string indicating the spatial aggregation level.
Options are |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
filter_year |
Integer. The year to filter to data to. Defaults to NULL. |
output_dir |
Character. The directory path where the output PDF file should be saved. Defaults to NULL. |
save_fig |
Boolean. If TRUE, saves the plot to the specified directory. Defaults to FALSE. |
cumulative |
Boolean. If TRUE, plot and save cumulative risk of all year for the specific exposure at region and district level. Defaults to FALSE. |
Value
Relative risk map at the chosen level.
Plot a grid of scatter graphs comparing one column to various others.
Description
Plot a grid of scatter graphs comparing one column to various others.
Usage
plot_scatter_grid(
df,
main_col,
comparison_cols,
title,
save_scatters = FALSE,
output_path = "",
units = NULL
)
Arguments
df |
The dataframe containing the raw data. |
main_col |
The main column to compare with all other columns. |
comparison_cols |
The columns to compare with. |
title |
The title of your plot. |
save_scatters |
Whether or not to save the plot. |
output_path |
The path to output the plot to. |
units |
A named character vector of units for each variable. |
Plot seasonal trends of a health outcome and climate by month.
Description
Plot seasonal trends of a health outcome and climate by month.
Usage
plot_seasonal_trends(
df,
date_col,
outcome_cols,
title = "Seasonal Averages",
ylabs = NULL,
save_plot = FALSE,
output_path = ""
)
Arguments
df |
The dataframe containing the raw data. |
date_col |
The name of the column containing date values. |
outcome_cols |
Character Vector. The names of the outcome columns to analyse. |
title |
The title of your plot. |
ylabs |
A character vector of y-axis labels (e.g., with units) corresponding to the columns. |
save_plot |
Whether or not to save the plot. |
output_path |
The path to output the plot to. |
Plot the total of selected variables per year.
Description
Plot the total of selected variables per year.
Usage
plot_total_variables_by_year(
df,
date_col,
variables,
title,
save_total = FALSE,
output_path = ""
)
Arguments
df |
A dataframe containing the data. |
date_col |
The name of the column containing date values. |
variables |
Column names to be summed and plotted. |
title |
Character. The specific title for the subset of data being used. |
save_total |
if TRUE, saves each plot as a PDF. |
output_path |
The file path for saving plots. |
Value
Plots are printed to the console or saved as PDF files.
Visualize yearly spatial random effect of the Diseases Incidence Rate (DIR).
Description
Generates and saves plots of yearly spatial random effect of the diseases incidence rate at district level.
Usage
plot_yearly_spatial_random_effect(
combined_data,
model,
case_type,
save_fig = FALSE,
output_dir = NULL
)
Arguments
combined_data |
Data list |
model |
The fitted model from |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
output_dir |
Character. The path to save the fitted model results to. Defaults to NULL. |
Value
The yearly space random effect for the disease incidence rate plot.
Normalise descriptive stats data input to combined and regional dataframes.
Description
Normalise descriptive stats data input to combined and regional dataframes.
Usage
prepare_descriptive_input(
data,
aggregation_column = NULL,
timeseries_col = NULL
)
Arguments
data |
Dataframe or list of dataframes. |
aggregation_column |
Character. Region column for splitting dataframes. |
timeseries_col |
Character. Date column for timeseries analysis. |
Value
A list with combined_df and region_df_list.
Validate and prepare base output directory for descriptive stats.
Description
Validate and prepare base output directory for descriptive stats.
Usage
prepare_descriptive_output_dir(output_path, create_base_dir = FALSE)
Arguments
output_path |
Character. Base output path. |
create_base_dir |
Logical. Whether to create a missing base directory. |
Value
Character. Validated output path.
Raise an Error if a Parameter's Value is NULL
Description
Raise an Error if a Parameter's Value is NULL
Usage
raise_if_null(param_nm, value)
Arguments
param_nm |
Character. The parameter name. |
value |
Any. The value of the parameter. |
Value
None. Stops execution if value is NULL.
Code for producing analysis for health effects of extreme weather events - wildfires Read in and format health data
Description
Reads in a CSV file for a daily time series of health and climate data, renames columns to standard names. Creates columns for day of week, month, and year columns derived from the date.
Usage
read_and_format_data(
health_path,
date_col,
mean_temperature_col,
health_outcome_col,
population_col = NULL,
region_col = NULL,
rh_col = NULL,
wind_speed_col = NULL
)
Arguments
health_path |
Path to a CSV file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by region. |
date_col |
Character. Name of the column in the dataframe that contains the date. Date column should be in YYYY-MM-DD or YYYY/MM/DD format. |
mean_temperature_col |
Character. Name of the column in the dataframe that contains the daily mean temperature column. |
health_outcome_col |
Character. Name of the column in the dataframe that contains the daily health outcome count (e.g. number of deaths, hospital admissions) |
population_col |
Character. Name of the column in the dataframe that
contains the population data. Defaults to NULL. If omitted, a |
region_col |
Character. Name of the column in the dataframe that contains the region names. Defaults to NULL. |
rh_col |
Character. Name of the column in the dataframe that contains daily relative humidity values. Defaults to NULL. |
wind_speed_col |
Character. Name of the column in the dataframe that contains daily wind speed. Defaults to NULL. |
Value
Dataframe with formatted and renamed columns
Read a csv file into memory as a data frame.
Description
Read a csv file into memory as a data frame.
Usage
read_input_data(input_csv_path)
Arguments
input_csv_path |
The path to the csv to read as a dataframe. |
Value
A dataframe containing the data from the csv.
Examples
input_csv_path <- "directory/file_name.csv"
Reformat a dataframe using various different cleaning techniques.
Description
Take a dataframe, and apply various different cleaning methods to it in order to prepare the data for use with a climate indicator.
Usage
reformat_data(df, reformat_date = TRUE, fill_na = c(), year_from_date = TRUE)
Arguments
df |
The dataframe to apply cleaning/reformatting to. |
reformat_date |
Whether or not to reformat the data to the Date datatype. |
fill_na |
A vector of column names to fill NA values in (fills with 0). |
year_from_date |
Derive a new column 'year' from the date column. |
Value
The cleaned/reformatted data frame.
Run generic descriptive statistics and EDA outputs for indicator datasets.
Description
Run generic descriptive statistics and EDA outputs for indicator datasets.
Usage
run_descriptive_stats(
data,
output_path,
aggregation_column = NULL,
population_col = NULL,
plot_corr_matrix = FALSE,
correlation_method = "pearson",
plot_dist = FALSE,
plot_ma = FALSE,
ma_days = 100,
ma_sides = 1,
timeseries_col = NULL,
dependent_col,
independent_cols,
units = NULL,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
detect_outliers = FALSE,
calculate_rate = FALSE,
run_id = NULL,
create_base_dir = FALSE
)
Arguments
data |
Dataframe or named list of dataframes. If a dataframe is provided and |
output_path |
Character. Base output directory. |
aggregation_column |
Character. Column used to aggregate/split data by region. |
population_col |
Character. The column containing population data. |
plot_corr_matrix |
Logical. Whether to plot correlation matrix. |
correlation_method |
Character. Correlation method. One of 'pearson', 'spearman', 'kendall'. |
plot_dist |
Logical. Whether to plot distribution histograms. |
plot_ma |
Logical. Whether to plot moving averages over a timeseries. |
ma_days |
Integer. Number of days to use for moving average. |
ma_sides |
Integer. Sides to use for moving average (1 or 2). |
timeseries_col |
Character. Timeseries column used for moving averages and time-based plots. |
dependent_col |
Character. Dependent variable column. |
independent_cols |
Character vector. Independent variable columns. |
units |
Named character vector. Units for variables. |
plot_na_counts |
Logical. Whether to plot NA counts. |
plot_scatter |
Logical. Whether to plot scatter plots. |
plot_box |
Logical. Whether to plot box plots. |
plot_seasonal |
Logical. Whether to plot seasonal trends. |
plot_regional |
Logical. Whether to plot regional trends. |
plot_total |
Logical. Whether to plot total health outcomes by year. |
detect_outliers |
Logical. Whether to output an outlier table. |
calculate_rate |
Logical. Whether to plot annual rates per 100k. |
run_id |
Character. Optional run id. If |
create_base_dir |
Logical. Whether to create |
Value
A list with base_output_path, run_id, run_output_path, and region_output_paths.
Examples
df <- data.frame(
date = as.Date("2024-01-01") + 0:29,
region = rep(c("A", "B"), each = 15),
outcome = sample(1:20, 30, replace = TRUE),
temp = rnorm(30, 25, 3)
)
run_descriptive_stats(
data = df,
output_path = tempdir(),
aggregation_column = "region",
dependent_col = "outcome",
independent_cols = c("temp"),
timeseries_col = "date",
run_id = NULL
)
Create descriptive statistics via API-friendly inputs.
Description
Create descriptive statistics via API-friendly inputs.
Usage
run_descriptive_stats_api(
data,
output_path,
aggregation_column = NULL,
population_col = NULL,
dependent_col,
independent_cols,
units = NULL,
plot_corr_matrix = FALSE,
plot_dist = FALSE,
plot_ma = FALSE,
plot_na_counts = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
correlation_method = "pearson",
ma_days = 100,
ma_sides = 1,
timeseries_col = NULL,
detect_outliers = FALSE,
calculate_rate = FALSE,
run_id = NULL,
create_base_dir = TRUE
)
Arguments
data |
The dataset for descriptive stats (list-like object or CSV path). |
output_path |
Character. Base output directory. |
aggregation_column |
Character. Column used to aggregate/split data by region. |
population_col |
Character. The column containing the population. |
dependent_col |
Character. The dependent column. |
independent_cols |
Character vector. The independent columns. |
units |
Named character vector. Units for each variable. |
plot_corr_matrix |
Logical. Whether to plot a correlation matrix. |
plot_dist |
Logical. Whether to plot histograms. |
plot_ma |
Logical. Whether to plot moving averages over a timeseries. |
plot_na_counts |
Logical. Whether to plot counts of NAs in each column. |
plot_scatter |
Logical. Whether to plot dependent vs independent columns. |
plot_box |
Logical. Whether to generate box plots for selected columns. |
plot_seasonal |
Logical. Whether to plot seasonal trends. |
plot_regional |
Logical. Whether to plot regional trends. |
plot_total |
Logical. Whether to plot total dependent values per year. |
correlation_method |
Character. Correlation method. One of 'pearson', 'spearman', 'kendall'. |
ma_days |
Integer. Number of days used in moving average calculations. |
ma_sides |
Integer. Number of sides used in moving average calculations (1 or 2). |
timeseries_col |
Character. Timeseries column. |
detect_outliers |
Logical. Whether to output an outlier table. |
calculate_rate |
Logical. Whether to plot annual rates per 100k. |
run_id |
Character. Optional run id. |
create_base_dir |
Logical. Whether to create |
Value
A list with base_output_path, run_id, run_output_path, and region_output_paths.
Examples
run_descriptive_stats_api(
data = list(
date = as.character(as.Date("2024-01-01") + 0:29),
region = rep(c("A", "B"), each = 15),
outcome = sample(1:20, 30, replace = TRUE),
temp = rnorm(30, 25, 3)
),
output_path = tempdir(),
aggregation_column = "region",
dependent_col = "outcome",
independent_cols = c("temp"),
timeseries_col = "date",
plot_corr_matrix = TRUE
)
Run models of increasing complexity in INLA: Fit a baseline model including spatiotemporal random effects.
Description
: Create and run multiple INLA (Integrated Nested Laplace Approximation) models to the dataset, evaluates them using DIC (Deviance Information Criterion), and identifies the best-fitting model.
Usage
run_inla_models(
combined_data,
basis_matrices_choices,
inla_param,
max_lag,
nk,
case_type,
output_dir = NULL,
save_model = FALSE,
family = "nbinomial",
config = FALSE
)
Arguments
combined_data |
A dataframe resulting from |
basis_matrices_choices |
A character vector specifying the basis matrix
parameters to be included in the model. Possible values are |
inla_param |
A character vector specifying the confounding exposures to
be included in the model. Possible values are |
case_type |
Character. The type of disease that the case column refers
to. Must be one of |
output_dir |
Character. The path to save model output to. Defaults to NULL. |
save_model |
Boolean. Whether to save the results as a CSV. Defaults to FALSE. |
family |
Character. The probability distribution for the response
variable. The user may also have the possibility to choose |
config |
Boolean. Enable additional model configurations. Defaults to FALSE. |
Value
A list containing the model, baseline_model, and the dic_table.
Save air pollution plot with standardized dimensions
Description
Save air pollution plot with standardized dimensions
Usage
save_air_pollution_plot(plot_object, output_dir, filename)
Arguments
plot_object |
ggplot or grob object to save |
output_dir |
Character. Directory to save plot. |
filename |
Character. Name of the file (without or with .png extension). |
Value
Invisibly returns the output path
Save results of wildfire related analysis
Description
Saves a CSV file of relative risk and confidence intervals for each lag value of wildfire-related PM2.5. Also optionally save results of attributable numbers/fractions.
Usage
save_wildfire_results(
rr_results,
an_ar_results = NULL,
annual_af_an_results = NULL,
output_folder_path
)
Arguments
rr_results |
Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5. |
an_ar_results |
Dataframe containing attributable number/fraction results. Defaults to NULL. |
output_folder_path |
Path to folder where results should be saved. |
Create a cross-basis matrix set for DLNM analysis
Description
Generates cross-basis matrices for lagged climate variables in a dataset, for use in Distributed Lag Nonlinear Models (DLNM).
Usage
set_cross_basis(data, max_lag = 2, nk = 2)
Arguments
data |
A dataset returned from |
max_lag |
Character. Number corresponding to the maximum lag to be considered for the delay effect. It should be between 2 an 4. Defaults to 2. |
nk |
Numeric. Number of internal knots for the natural spline of
each predictor, controlling its flexibility: |
Value
A list of cross-basis matrices including the basis matrix for maximum temperature, minimun temperature, cumulative rainfall, and relative humidity.
Suggest a column name based on fuzzy matching
Description
Uses Jaro-Winkler distance to find the closest match to a misspelled or incorrect column name.
Usage
suggest_column_match(input, available, threshold = 0.3)
Arguments
input |
The column name that was not found |
available |
Character vector of available column names |
threshold |
Maximum distance threshold (0-1). Lower = stricter matching. |
Value
The best matching column name, or NULL if no good match found.
Full analysis pipeline for the suicides and extreme heat indicator
Description
Runs the full pipeline to analyse the impact of extreme heat on suicides using a time-stratified case-crossover approach with distributed lag non-linear model. This function generates relative risk of the suicide-temperature association as well as attributable numbers, rates and fractions of suicides to a specified temperature threshold. Model validation statistics are also provided.
Usage
suicides_heat_do_analysis(
data_path,
date_col,
region_col = NULL,
temperature_col,
health_outcome_col,
population_col,
country = "National",
meta_analysis = FALSE,
var_fun = "bs",
var_degree = 2,
var_per = c(25, 50, 75),
lag_fun = "strata",
lag_breaks = 1,
lag_days = 2,
independent_cols = NULL,
control_cols = NULL,
cenper = 50,
attr_thr = 97.5,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = NULL,
seed = NULL
)
Arguments
data_path |
Path to a csv file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by region. |
date_col |
Character. Name of the column in the dataframe that contains the date. |
region_col |
Character. Name of the column in the dataframe that contains the region names. Defaults to NULL. |
temperature_col |
Character. Name of the column in the dataframe that contains the temperature column. |
health_outcome_col |
Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions). |
population_col |
Character. Name of the column in the dataframe that contains the population estimate coloumn. |
country |
Character. Name of country for national level estimates. |
meta_analysis |
Boolean. Whether to perform a meta-analysis. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm:crossbasis). Defaults to 2 (quadratic). |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75). |
lag_fun |
Character. Exposure function for arglag (see dlnm::crossbasis). Defaults to 'strata'. |
lag_breaks |
Integer. Internal cut-off point defining the strata for arglag (see dlnm:crossbasis). Defaults to 1. |
lag_days |
Integer. Maximum lag. Defaults to 2. (see dlnm:crossbasis). |
independent_cols |
Additional independent variables to test in model validation |
control_cols |
A list of confounders to include in the final model adjustment. Defaults to NULL if none. |
cenper |
Integer. Value for the percentile in calculating the centering value 0-100. Defaults to 50. |
attr_thr |
Integer. Percentile at which to define the temperature threshold for calculating attributable risk. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
save_csv |
Boolean. Whether to save the results as a CSV. Defaults to FALSE. |
output_folder_path |
Path to folder where plots and/or CSV should be saved. Defaults to NULL. |
seed |
Optional integer random seed used when sampling residuals for model validation plots. Defaults to NULL. |
Details
This analysis pipeline requires a daily time series of temperature and suicide deaths with population values as a minimum. This is then processed using a conditional Poisson case-crossover analysis with distributed lag non-linear model and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.
The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for sensitivity analysis.
Model validation testing is provided as a standard output from the pipeline so
a user can assess the quality of the model. If a user has additional independent
variables these can be specified as independent_cols and assessed within
different model combinations in the outputs of this testing. These can be added
in the final model via control_cols.
For attributable deaths the default is to use extreme heat as a threshold,
defined as the 97.5th percentile of temperature over the corresponding time
period for each geography. This can be adjusted if desired, following review of
the relative risk association between temperature and suicides, using attr_thr.
Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published doi:10.5281/zenodo.14050224.
Value
-
qaic_resultsA dataframe of QAIC and dispersion metrics for each model combination and geography. -
qaic_summaryA dataframe with the mean QAIC and dispersion metrics for each model combination. -
vif_resultsA dataframe. Variance inflation factors for each independent variables by region. -
vif_summaryA dataframe with the mean variance inflation factors for each independent variable. -
meta_test_resA dataframe of results from statistical tests on the meta model. -
power_listA list containing power information by area. -
rr_resultsDataframe containing cumulative relative risk and confidence intervals from analysis. -
res_attr_totDataframe. Total attributable fractions, numbers and rates for each area over the whole time series. -
attr_yr_listList. Dataframes containing yearly estimates of attributable fractions, numbers and rates by area. -
attr_mth_listList. Dataframes containing total attributable fractions, numbers and rates by calendar month and area.
References
Pearce M, Watkins E, Glickman M, Lewis B, Ingole V. Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Suicides attributed to extreme heat: methodology. Zenodo; 2024. Available from: doi:10.5281/zenodo.14050224
Gasparrini A, Guo Y, Hashizume M, Lavigne E, Zanobetti A, Schwartz J, et al. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. Lancet. 2015 Jul;386(9991):369-75. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0140673614621140
Kim Y, Kim H, Gasparrini A, Armstrong B, Honda Y, Chung Y, et al. Suicide and Ambient Temperature: A Multi-Country Multi-City Study. Environ Health Perspect. 2019 Nov;127(11):1-10. Available from: https://pubmed.ncbi.nlm.nih.gov/31769300/
Gasparrini A, Armstrong B. Reducing and meta-analysing estimates from distributed lag non-linear models. BMC Med Res Methodol. 2013 Jan 9;13:1. Available from: doi:10.1186/1471-2288-13-1
Gasparrini A, Armstrong B, Kenward MG. Multivariate meta-analysis for non-linear and other multi-parameter associations. Stat Med. 2012 Dec 20;31(29):3821-39. Available from: doi:10.1002/sim.5471
Sera F, Armstrong B, Blangiardo M, Gasparrini A. An extended mixed-effects framework for meta-analysis. Stat Med. 2019 Dec 20;38(29):5429-44. Available from: doi:10.1002/sim.8362
Gasparrini A, Leone M. Attributable risk from distributed lag models. BMC Med Res Methodol. 2014 Dec 23;14(1):55. Available from: https://link.springer.com/article/10.1186/1471-2288-14-55
Examples
example_data <- data.frame(
date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 365),
region = "Example Region",
tmean = stats::runif(365, 5, 30),
suicides = stats::rpois(365, lambda = 2),
pop = 250000
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)
suicides_heat_do_analysis(
data_path = example_path,
date_col = "date",
region_col = "region",
temperature_col = "tmean",
health_outcome_col = "suicides",
population_col = "pop",
country = "Example Region",
meta_analysis = FALSE,
var_fun = "bs",
var_degree = 2,
var_per = c(25, 50, 75),
lag_fun = "strata",
lag_breaks = 1,
lag_days = 2,
independent_cols = NULL,
control_cols = NULL,
cenper = 50,
attr_thr = 97.5,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = tempdir()
)
Summarise AF and AN numbers by region and year
Description
Takes daily data with attributable fraction and attributable number and summarises by year and region.
Usage
summarise_AF_AN(data, monthly = TRUE)
Arguments
data |
Dataframe containing daily data including calculated AF and AN. |
monthly |
Bool. Whether to summarise by month as well as year and region. Defaults to TRUE. |
Value
Dataframe containing summarised AF and AN data, by year, region and optionall month (if monthly == TRUE).
Full analysis for the 'mortality attributable to high and low temperatures' indicator
Description
Runs the full methodology to analyse the impact of high and low temperatures on mortality using a quasi-Poisson time series approach with a distributed lag non-linear model. This function generates the relative risk of the temperature-mortality association as well as attributable numbers, rates and fractions of mortalities to specified temperature thresholds for high and low temperatures. Model validation statistics are also provided.
Usage
temp_mortality_do_analysis(
data_path,
date_col,
region_col,
temperature_col,
dependent_col,
population_col,
country = "National",
independent_cols = NULL,
control_cols = NULL,
var_fun = "bs",
var_degree = 2,
var_per = c(10, 75, 90),
lagn = 21,
lagnk = 3,
dfseas = 8,
meta_analysis = FALSE,
attr_thr_high = 97.5,
attr_thr_low = 2.5,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = NULL,
seed = NULL
)
Arguments
data_path |
Path to a csv file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by geography. |
date_col |
Character. Name of the column in the dataframe containing the date. |
region_col |
Character. Name of the column in the dataframe that contains the geography name(s). |
temperature_col |
Character. Name of the column in the dataframe that contains the temperature column. |
dependent_col |
Character. Name of the column in the dataframe containing the dependent health outcome variable e.g. deaths. |
population_col |
Character. Name of the column in the dataframe that contains the population estimate per geography. |
country |
Character. Name of country for national-level estimates. Defaults to 'National'. |
independent_cols |
List. Additional independent variables to test in model validation as confounders. Defaults to NULL. |
control_cols |
List. Confounders to include in the final model adjustment. Defaults to NULL. |
var_fun |
Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'. |
var_degree |
Integer. Degree of the piecewise polynomial for argvar (see dlnm:crossbasis). Defaults to 2 (quadratic). |
var_per |
Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90). |
lagn |
Integer. Number of days in the lag period. Defaults to 21. (see dlnm::crossbasis). |
lagnk |
Integer. Number of knots in lag function. Defaults to 3. (see dlnm::logknots). |
dfseas |
Integer. Degrees of freedom for seasonality. Defaults to 8. |
meta_analysis |
Boolean. Whether to perform a meta-analysis. Defaults to FALSE. |
attr_thr_high |
Integer. Percentile at which to define the high temperature threshold for calculating attributable risk. Defaults to 97.5. |
attr_thr_low |
Integer. Percentile at which to define the low temperature threshold for calculating attributable risk. Defaults to 2.5. |
save_fig |
Boolean. Whether to save the plot as an output. Defaults to FALSE. |
save_csv |
Boolean. Whether to save the results as a CSV. Defaults to FALSE. |
output_folder_path |
Path to folder where plots and/or CSV should be saved. Defaults to NULL. |
seed |
Optional integer random seed used when sampling residuals for model validation plots. Defaults to NULL. |
Details
This analysis requires a daily time series of temperature and death counts with population values as a minimum. This is then processed using a quasi-Poisson time series regression analysis with a distributed lag non-linear model and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.
The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for if appropriate for the user's context.
Model validation testing is provided as a standard output from the pipeline
so a user can assess the quality of the model. If a user has additional
independent variables these can be specified as independent_cols and
assessed within different model combinations in the outputs of this testing.
These can be added in the final model via control_cols. Note, a user
should include variables if contextually relevant, and not simply
based on model optimisation.
For attributable deaths the default is to use a high temperature threshold,
defined as the 97.5th percentile of the temperature distribution over the
full time period for each geography. The low temperature thresholds is
similarly defined at the 2.5th percentile. These can be adjusted if desired,
following review of the relative risk association between temperature and
mortality using attr_thr_high or attr_thr_low.
Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published doi:10.5281/zenodo.14865904.
Value
-
qaic_resultsDataframe. QAIC and dispersion metrics for each model combination and geography. -
qaic_summaryDataframe. Mean QAIC and dispersion metrics for each model combination. -
vif_resultsDataframe. Variance inflation factors for each independent variables by geography. -
vif_summaryDataframe. Mean variance inflation factors for each independent variable. -
adf_resultsDataframe. ADF test results for each geography. -
power_listList. Power information by area. -
rr_resultsDataframe containing cumulative relative risk and confidence intervals from analysis. -
res_attr_totDataframe. Total attributable fractions, numbers and rates for each area over the whole time series. -
attr_yr_listList. Dataframes containing yearly estimates of attributable fractions, numbers and rates by area. -
attr_mth_listList. Dataframes containing total attributable fractions, numbers and rates by calendar month and area.
References
Watkins E, Hunt C, Lewis B, Ingole V, Glickman M. Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Mortality attributed to high and low temperatures: methodology. Zenodo; 2026. Available from: doi:10.5281/zenodo.14865904
Gasparrini A, Guo Y, Hashizume M, Lavigne E, Zanobetti A, Schwartz J, et al. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. Lancet. 2015 Jul;386(9991):369-75. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0140673614621140
Gasparrini A, Armstrong B. Reducing and meta-analysing estimates from distributed lag non-linear models. BMC Medical Research Methodology. 2013 Jan 9;13:1. Available from: doi:10.1186/1471-2288-13-1
Gasparrini A, Armstrong B, Kenward MG. Multivariate meta-analysis for non-linear and other multi-parameter associations. Statistics in Medicine. 2012 Dec 20;31(29):3821-39. Available from: doi:10.1002/sim.5471
Examples
example_data <- data.frame(
date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 365),
region = "Example Region",
tmean = stats::runif(365, -2, 32),
deaths = stats::rpois(365, lambda = 8),
pop = 500000
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)
temp_mortality_do_analysis(
data_path = example_path,
date_col = "date",
temperature_col = "tmean",
dependent_col = "deaths",
population_col = "pop",
region_col = "region",
country = "Example Region",
meta_analysis = FALSE,
independent_cols = NULL,
control_cols = NULL,
var_fun = "bs",
var_degree = 2,
var_per = c(10, 75, 90),
lagn = 7,
lagnk = 2,
dfseas = 4,
attr_thr_high = 97.5,
attr_thr_low = 2.5,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = tempdir()
)
Stratify data by time period
Description
Adds columns for strata for each region:year:month:dayofweek and for the total counts of a health outcome across days in each stratum.
Usage
time_stratify(data)
Arguments
data |
Dataframe containing a daily time series of climate and health data. Assumes that 'data' has a 'month', 'year', 'dow' and 'region' column. |
Value
Dataframe with additional columns for stratum (region:year:month:dayofweek) and for the total counts of a health outcome across days in each stratum.
Ensure that the case_type parameter is valid
Description
Ensures that the case_type parameter is either malaria or diarrhea to comply with supported indicators.
Usage
validate_case_type(case_type)
Arguments
case_type |
Character. The value of the case_type parameter. |
Value
Character. The lower case_type.
Preflight validation for descriptive statistics columns based on enabled features.
Description
Preflight validation for descriptive statistics columns based on enabled features.
Usage
validate_descriptive_columns(
df,
context = "dataset",
dependent_col,
independent_cols,
aggregation_column = NULL,
population_col = NULL,
timeseries_col = NULL,
plot_corr_matrix = FALSE,
plot_dist = FALSE,
plot_ma = FALSE,
plot_scatter = FALSE,
plot_box = FALSE,
plot_seasonal = FALSE,
plot_regional = FALSE,
plot_total = FALSE,
write_outlier_table = FALSE,
calculate_rate = FALSE,
is_full_dataset = FALSE
)
Arguments
df |
Dataframe. Dataset to validate. |
context |
Character. Context label for error messages. |
dependent_col |
Character. Dependent column. |
independent_cols |
Character vector. Independent columns. |
aggregation_column |
Character. Region aggregation column. |
population_col |
Character. Population column. |
timeseries_col |
Character. Timeseries column. |
plot_corr_matrix |
Logical. Correlation matrix toggle. |
plot_dist |
Logical. Distribution plot toggle. |
plot_ma |
Logical. Moving average toggle. |
plot_scatter |
Logical. Scatter plot toggle. |
plot_box |
Logical. Boxplot toggle. |
plot_seasonal |
Logical. Seasonal plot toggle. |
plot_regional |
Logical. Regional plot toggle. |
plot_total |
Logical. Total-by-year plot toggle. |
write_outlier_table |
Logical. Outlier table toggle. |
calculate_rate |
Logical. Rate plot toggle. |
is_full_dataset |
Logical. Whether this dataset is the full combined dataset. |
Value
None. Stops execution if required columns/params are missing.
This is full analysis pipeline to analyse the impact of wildfire-related PM2.5 on a health outcome.
Description
Runs full analysis pipeline for analysis of the impact of wildfire-related PM2.5 on a health outcome using time stratified case-crossover approach with conditional quasi-Poisson regression model. This function generates relative risk of the mortality associated to wildfire-related PM2.5 as well as attributable numbers, rates and fractions of health outcome. Model validation statistics are also provided.
Usage
wildfire_do_analysis(
health_path,
join_wildfire_data = FALSE,
ncdf_path = NULL,
shp_path = NULL,
date_col,
region_col,
shape_region_col = NULL,
mean_temperature_col,
health_outcome_col,
population_col = NULL,
rh_col = NULL,
wind_speed_col = NULL,
pm_2_5_col = NULL,
wildfire_lag = 3,
temperature_lag = 1,
spline_temperature_lag = 0,
spline_temperature_degrees_freedom = 6,
predictors_vif = NULL,
calc_relative_risk_by_region = FALSE,
scale_factor_wildfire_pm = 10,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = NULL,
create_run_subdir = FALSE,
print_vif = FALSE,
print_model_summaries = FALSE
)
Arguments
health_path |
Path to a CSV file containing a daily time series of data for a particular health outcome, which may be disaggregated by region. If this does not include a column with wildfire-related PM2.5, use join_wildfire_data = TRUE to join these data. |
join_wildfire_data |
Boolean. If TRUE, a daily time series of wildfire-related PM2.5 concentration is joined to the health data. If FALSE, the data set is loaded without any additional joins. Defaults to FALSE. |
ncdf_path |
Path to a NetCDF file containing a daily time series of gridded wildfire-related PM2.5 concentration data. |
shp_path |
Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5 |
date_col |
Character. Name of the column in the dataframe that contains the date. |
region_col |
Character. Name of the column in the dataframe that contains the region names. |
shape_region_col |
Character. Name of the column in the shapefile dataframe that contains the region names. |
mean_temperature_col |
Character. Name of the column in the dataframe that contains the mean temperature column. |
health_outcome_col |
Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions) |
population_col |
Character. Name of the column in the dataframe that
contains the population data. Defaults to NULL. This is only required when
requesting region-level AF/AN outputs and no |
rh_col |
Character. Name of the column containing relative humidity values. Defaults to NULL. |
wind_speed_col |
Character. Name of the column containing wind speed. Defaults to NULL. |
pm_2_5_col |
Character. The name of the column containing PM2.5 values in micrograms. This is only required if health data isn't joined. Defaults to NULL. |
wildfire_lag |
Integer. The number of days for which to calculate the lags for wildfire PM2.5. Default is 3. |
temperature_lag |
Integer. The number of days for which to calculate the lags for temperature. Default is 1. |
spline_temperature_lag |
Integer. The number of days of lag in the temperature variable from which to generate splines. Default is 0 (unlagged temperature variable). |
spline_temperature_degrees_freedom |
Integer. Degrees of freedom for the spline(s). |
predictors_vif |
Character vector with each of the predictors to include in the model. Must contain at least 2 variables. Defaults to NULL. |
calc_relative_risk_by_region |
Bool. Whether to calculate Relative Risk by region. Default: FALSE |
scale_factor_wildfire_pm |
Numeric. The value to divide the wildfire PM2.5 concentration variables by for alternative interpretation of outputs. Corresponds to the unit increase in wildfire PM2.5 to give the model estimates and relative risks (e.g. scale_factor = 10 corresponds to estimates and relative risks representing impacts of a 10 unit increase in wildfire PM2.5). Setting this parameter to 0 or 1 leaves the variable unscaled. |
save_fig |
Boolean. Whether to save the plot as an output. |
save_csv |
Boolean. Whether to save the results as a CSV |
output_folder_path |
Path. Path to folder where plots and/or CSV should be saved. |
create_run_subdir |
Boolean. If TRUE, create a timestamped subdirectory
under |
print_vif |
Bool, whether or not to print VIF (variance inflation factor) for each predictor. Defaults to FALSE. |
print_model_summaries |
Bool. Whether to print the model summaries to console. Defaults to FALSE. |
Details
This analysis pipeline requires a daily time series with mean wildfire PM2.5, mean temperature and health outcome (all-cause mortality, respiratory, cardiovascular, hospital admissions etc) with population values as a minimum. This is then processed using a time stratified case crossover approach with conditional Poisson case-crossover analysis and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.
The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for sensitivity analysis.
Model validation testing is provided as a standard output from the pipeline so a user can assess the quality of the model. Additionally, users can incorporate extra independent variables-such as relative humidity or wind speed-directly into the model for enhanced analysis.
Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published doi:10.5281/zenodo.14052184.
Value
-
rr_resultsA dataframe with relative risk estimates and confidence intervals for each region. -
rr_pmA dataframe of relative risk estimates for wildfire-specific PM2.5 exposure across regions as PM values changes. -
af_an_resultsA dataframe containing attributable fractions, attributable numbers and deaths per 100k population for each region -
annual_af_an_resultsA dataframe containing annual attributable numbers and fractions for each region -
calculate_qaicA dataframe of QAIC and dispersion metrics for each model combination and geography. -
check_wildfire_vifA dataframe containing Variance inflation factors for each independent variables by region.
References
Brown A, Soutter E, Ingole V., Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Wildfires: introduction. Zenodo; 2024. Available from: https://zenodo.org/records/14052184
Hänninen R, Sofiev M, Uppstu A, Kouznetsov R.Daily surface concentration of fire related PM2.5 for 2003-2023, modelled by SILAM CTM when using the MODIS satellite data for the fire radiative power. Finnish Meteorological Institute; 2024. Available from: doi:10.57707/fmi-b2share.d1cac971b3224d438d5304e945e9f16c
GADM. Database for Global Administrative Areas.Available from: https://gadm.org/download_country.html
Tobias A, Kim Y, Madaniyazi L. Time-stratified case-crossover studies for aggregated data in environmental epidemiology: a tutorial. Int J Epidemiol. 2024;53(2). Available from: doi:10.1093/ije/dyae020
Wu Y, Li S, Guo Y. Space-Time-Stratified Case-Crossover Design in Environmental Epidemiology Study. Heal Data Sci. 2021; Available from: doi:10.34133/2021/9870798
Examples
example_data <- data.frame(
date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 180),
region = "Example Region",
death = stats::rpois(180, lambda = 4),
population = 400000,
tmean = stats::runif(180, 10, 35),
mean_PM = stats::runif(180, 0, 25)
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)
wildfire_do_analysis(
health_path = example_path,
join_wildfire_data = FALSE,
ncdf_path = NULL,
shp_path = NULL,
date_col = "date",
region_col = "region",
shape_region_col = NULL,
mean_temperature_col = "tmean",
health_outcome_col = "death",
population_col = "population",
rh_col = NULL,
wind_speed_col = NULL,
pm_2_5_col = " mean_PM ",
wildfire_lag = 3,
temperature_lag = 1,
spline_temperature_lag = 0,
spline_temperature_degrees_freedom = 4,
predictors_vif = NULL,
calc_relative_risk_by_region = FALSE,
scale_factor_wildfire_pm = 10,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = tempdir(),
create_run_subdir = FALSE,
print_vif = FALSE,
print_model_summaries = FALSE)
Run plotting code inside a safely managed PDF device.
Description
Run plotting code inside a safely managed PDF device.
Usage
with_pdf_device(output_path, width = 14, height = 8, context = "plot", plot_fn)
Arguments
output_path |
Character. Output path for the PDF file. |
width |
Numeric. PDF width in inches. |
height |
Numeric. PDF height in inches. |
context |
Character. Context label used in error messages. |
plot_fn |
Function. Plotting function to execute. |
Value
None. Writes a PDF and closes device safely.