ggvariant provides a simple, ggplot2-native toolkit for
visualising genomic variant data. Whether you are a wet-lab biologist
working from an Excel export or an experienced bioinformatician loading
VCF files directly, ggvariant gets you to a
publication-ready plot in just a few lines of R.
This vignette walks through a complete workflow:
# Install from CRAN
install.packages("ggvariant")
# Or install the development version from GitHub
# remotes::install_github("yourname/ggvariant")read_vcf() parses standard VCF v4.x files — including
gzipped files and multi-sample VCFs — and returns a tidy data frame
called a gvf object. Functional annotations from SnpEff
(ANN) or VEP (CSQ) INFO fields are extracted
automatically.
vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant")
variants <- read_vcf(vcf_file)
#> ℹ Reading VCF: 'example.vcf'✔ Reading VCF: 'example.vcf' [33ms]
#> Loaded 30 variant records across 7 chromosomes.
head(variants)
#> chrom pos ref alt qual filter consequence gene sample
#> 1 chr17 7577120 C T 250 PASS missense_variant TP53 TUMOR_S1
#> 2 chr17 7577538 G A 180 PASS missense_variant TP53 TUMOR_S1
#> 3 chr17 7578210 G T 95 PASS stop_gained TP53 TUMOR_S1
#> 4 chr17 7579472 A G 310 PASS synonymous_variant TP53 TUMOR_S1
#> 5 chr13 32907420 A T 200 PASS missense_variant BRCA2 TUMOR_S1
#> 6 chr13 32914437 G A 175 PASS frameshift_variant BRCA2 TUMOR_S1The result is a plain data frame with one row per variant per sample, with columns for chromosome, position, alleles, consequence, gene, and sample name. Because it is a standard data frame, you can filter, subset, and manipulate it with any R tools you already know.
If your variants are in a spreadsheet or the output of another tool,
use coerce_variants() to map your column names onto the
format ggvariant expects. You only need to specify the
columns that differ from the defaults.
# Example: data exported from a custom pipeline or Excel
my_df <- read.csv("my_variants.csv")
variants <- coerce_variants(my_df,
chrom = "Chr",
pos = "Position",
ref = "Ref_Allele",
alt = "Alt_Allele",
consequence = "Variant_Class",
gene = "Hugo_Symbol",
sample = "Tumor_Sample"
)Any extra columns in your data frame are carried over automatically, so you never lose information.
The lollipop plot shows where variants fall along a gene, coloured by consequence. It is particularly useful for identifying mutational hotspots — positions that are recurrently mutated across samples.
Overlaying known protein domains helps interpret where
variants fall functionally. Provide a data frame with name,
start, and end columns (in amino acid
coordinates):
tp53_domains <- data.frame(
name = c("Transactivation", "DNA-binding", "Tetramerization"),
start = c(1, 102, 323),
end = c(67, 292, 356)
)
# Scale genomic positions to protein coordinates
tp53 <- variants[variants$gene == "TP53", ]
tp53$pos <- round(
(tp53$pos - min(tp53$pos)) /
(max(tp53$pos) - min(tp53$pos)) * 393
) + 1
plot_lollipop(tp53, gene = "TP53",
domains = tp53_domains,
protein_length = 393)To see which sample each variant comes from instead of its
consequence, change color_by:
plot_lollipop(variants, gene = "TP53", color_by = "sample")
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.
#> No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.plot_consequence_summary() gives an overview of what
types of variants are present — missense, frameshift,
synonymous, and so on — broken down by sample or gene.
Each bar represents one sample, stacked by consequence type. This immediately reveals whether two samples have similar or very different mutational profiles.
To compare samples with different total variant counts fairly, use
position = "fill":
The mutational spectrum shows the relative frequency of each of the six single-base substitution (SBS) classes — C>A, C>G, C>T, T>A, T>C, T>G — normalised to the pyrimidine base (so A>G is represented as T>C, matching COSMIC convention).
A dominant C>T signature, as seen here, is characteristic of UV damage or age-related deamination — common in many tumour types.
All plot functions support interactive = TRUE, which
wraps the output in a plotly interactive plot. This is
ideal for sharing with collaborators who don’t use R — simply save as an
HTML file and send it.
# Requires the plotly package
# install.packages("plotly")
p <- plot_lollipop(variants, gene = "TP53", interactive = TRUE)
p # opens in RStudio viewer or browser
# Save as a standalone HTML file
htmlwidgets::saveWidget(p, "TP53_lollipop.html")# See the consequence colour palette
gv_palette("consequence")
#> missense_variant missense Missense_Mutation
#> "#FD8D3C" "#FD8D3C" "#FD8D3C"
#> synonymous_variant synonymous Silent
#> "#74C476" "#74C476" "#74C476"
#> frameshift_variant frameshift_deletion frameshift_insertion
#> "#E31A1C" "#E31A1C" "#FC4E2A"
#> stop_gained Nonsense_Mutation stop_lost
#> "#800026" "#800026" "#BD0026"
#> splice_site_variant Splice_Site inframe_insertion
#> "#9E9AC8" "#9E9AC8" "#6BAED6"
#> inframe_deletion In_Frame_Del In_Frame_Ins
#> "#2171B5" "#2171B5" "#6BAED6"
#> 5_prime_UTR_variant 3_prime_UTR_variant intron_variant
#> "#BCBDDC" "#DADAEB" "#D9D9D9"
#> SNV deletion insertion
#> "#BDBDBD" "#41AB5D" "#A1D99B"
#> MNV
#> "#F768A1"
# See the COSMIC SBS spectrum palette
gv_palette("spectrum")
#> C>A C>G C>T T>A T>C T>G
#> "#03BDEE" "#010101" "#E52926" "#CAC9C9" "#A0CE63" "#ECC7C5"| Function | Input | Output |
|---|---|---|
read_vcf() |
VCF file path | gvf data frame |
coerce_variants() |
Any data frame | gvf data frame |
plot_lollipop() |
gvf + gene name |
Lollipop ggplot |
plot_consequence_summary() |
gvf |
Stacked bar ggplot |
plot_variant_spectrum() |
gvf |
SBS spectrum ggplot |
gv_palette() |
palette type | Named colour vector |
theme_ggvariant() |
— | ggplot2 theme |
All plot functions return a ggplot object — extend them
freely with standard ggplot2 syntax, and use
interactive = TRUE with any of them to get a
plotly interactive version.
sessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 22000)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United Kingdom.utf8
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.utf8
#>
#> time zone: Africa/Kampala
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.2 ggvariant_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.1 cli_3.6.5 knitr_1.51 rlang_1.1.7
#> [5] xfun_0.56 otel_0.2.0 S7_0.2.1 jsonlite_2.0.0
#> [9] labeling_0.4.3 glue_1.8.0 htmltools_0.5.9 sass_0.4.10
#> [13] scales_1.4.0 rmarkdown_2.30 grid_4.5.2 evaluate_1.0.5
#> [17] jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5
#> [21] compiler_4.5.2 RColorBrewer_1.1-3 rstudioapi_0.18.0 farver_2.1.2
#> [25] digest_0.6.39 R6_2.6.1 bslib_0.10.0 withr_3.0.2
#> [29] tools_4.5.2 gtable_0.3.6 cachem_1.1.0