Welcome to leakR, an R package designed to help researchers, data scientists, and machine learning practitioners rigorously detect and diagnose data leakage in their workflows.
Data leakage is a pervasive yet often overlooked issue that undermines the integrity and reproducibility of predictive models by allowing unintended information to “leak” between training and testing phases. leakR provides a modular, extensible toolkit for detecting the most common and impactful forms of leakage, starting with tabular data contamination, target leakage, and temporal misalignments, while laying the foundation for a universal leakage detection framework across diverse data domains.
install.packages("leakr")For the latest features and bug fixes:
# Install devtools if you don't have it
install.packages("devtools")
# Install leakR from GitHub
devtools::install_github("cherylisabella/leakR")library(leakr)
# Basic audit of your dataset
report <- leakr_audit(iris, target = "Species")
# View summary of issues found
leakr_summarise(report)
# Generate diagnostic visualizations
leakr_plot(report)
# Access detailed results
print(report)| Function | Purpose |
|---|---|
leakr_audit() |
Main auditing function - detects leakage across your dataset |
leakr_summarise() |
Generate human-readable summaries of detected issues |
leakr_plot() |
Create diagnostic visualizations highlighting problems |
leakr_from_caret() |
Import and audit caret workflow objects |
leakr_from_tidymodels() |
Import and audit tidymodels workflow objects |
leakr_from_mlr3() |
Import and audit mlr3 workflow objects |
Get started with the comprehensive vignettes:
# Getting started guide
vignette("getting-started", package = "leakr")
# Advanced detection techniques
vignette("advanced-detection", package = "leakr")
# Framework integration examples
vignette("framework-integration", package = "leakr")If you use leakR in your research, please cite:
@Manual{leakr2025,
title = {leakR: Data Leakage Detection Tools for Machine Learning},
author = {Cheryl Isabella Lim},
year = {2025},
note = {R package version 0.1.0},
url = {https://github.com/cherylisabella/leakR},
}
This project is licensed under the MIT License - see the LICENSE file for details.
leakR is currently under development. Feedback and contributions are welcome from the community!