roam objects in
packagesroam is an R package designed to make it easy for
package developers to include “regular looking” R objects in packages,
which are active bindings that download data from remote sources. This
vignette is to demonstrate how to use roam in a
package.
A demo package roam.demo created with roam
is available at FinYang/roam.demo. This
vignette will use code from roam.demo as examples.
A “regular looking” remote data set in roam is called a
(activated) roam object. It is created with the new_roam()
function. A roam object, before it’s activated, is a function with a
class roam_object.
After it is activated, it is an active binding, which is an object that returns a value computed by its defining function. It looks like a regular object, but behaves like a function.
We set options(roam.autodownload = TRUE) in this
vignette to automatically download the data (execute the function) when
the roam object is called the first time.
This establishes the two-step process of creating a roam objection in
a package: definition and activation.
In the following, we will refer to the “developer” as the developer that
uses roam in their packages, and the “user” as the user of
the packages developed by the developers.
To explicitly download the dataset or delete the local cache, the following two functions can be used.
To define a roam object, we need three pieces of information.
package: the name of the package as a stringname: the name of the roam object. It should be the
same as the name to which the roam object is assigned.obtainer: the function the developer defines to
retrieve the dataset from the remote source.Take the following definition from the roam.demo
package as an example.
bee <- new_roam(
"roam.demo",
"bee",
function(version) {
read.csv(
"https://raw.githubusercontent.com/finyang/roam/master/demo/bee_colonies.csv"
)
}
)In the function arguments, the package name is
"roam.demo". the roam object is called "bee",
which is the same as the object bee, and the
obtainer function simply reads a csv file from a remote
source.
The obtainer function needs to have (at least) one
argument called version. This is used for versioning
purpose and will be covered in the Versioning section
below.
Active bindings are not preserved during package installation, thus roam objects need to be activated during package loading.
The .onLoad function is called during package loading.
Here we use roam_activate_all() to activate all roam
objects in the package "roam.demo".
roam_activate_all() looks through every object in the
package to find roam objects. If the package has lots of objects, use
roam_activate() to specify roam objects individually to
improve performance.
Don’t forget to import roam or the specific functions to
use when the package is loaded but not attached, especially when they
are in .onLoad, if the package depends on
roam, like roam.demo here. The developer will
be prompted to import roam in NAMESPACE during
R CMD check, if they choose to depend on
roam. This should already be done for each function if
roam is listed under Imports in
DESCRIPTION, like any other imported packages.
The roam.demo package “depends” on roam,
but this is not necessary if the developer prefers not to. But it is
recommended to at least re-export helper function like
roam_delete() or roam_install() from
roam, or create your own wrapper of these helper functions,
so the user can properly manage the cache of roam objects.
The documentation in roam.demo is generated using
roxygen2. For each roam object, the format tag
is explicitly defined.
When roxygen2 generates the Format section, it
evaluates the object and records the structure of the resulting output.
If a roam object is not cached locally, this evaluation returns
NULL, and the documented format will incorrectly reflect a
NULL object. As a result, the generated documentation may
vary across devices depending on their local cache state. To avoid this
inconsistency, the format tag should always be explicitly
specified for roam objects.
roam allows user to specify a version of the dataset
they want to download using roam_install().
# tidytuesday2026Jan is another example data in `roam.demo`
roam_install(tidytuesday2026Jan, version = "latest")
# roam_update() is a wrapper of roam_install()
# with the version set to "latest"
roam_update(tidytuesday2026Jan)This version specified by the user will be passed to the
obtainer function where the developer can use decide how to
download the data. This is why the obtainer function needs
to have an argument named version.
Again, taking an
example from roam.demo.
tidytuesday2026Jan <- new_roam(
"roam.demo",
"tidytuesday2026Jan",
function(version) {
if ((!is.character(version)) || length(version) > 1) {
stop("version must be a length character")
}
if (
!is.na(version) && (!version %in% c("latest", "2026-01-20", "2026-01-13"))
) {
stop("invalid version number")
}
if (is.na(version) || version %in% c("latest", "2026-01-20")) {
roam_set_version("2026-01-20")
read.csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/refs/heads/main/data/2026/2026-01-20/apod.csv"
)
} else {
roam_set_version("2026-01-13")
read.csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/refs/heads/main/data/2026/2026-01-13/africa.csv"
)
}
}
)Let’s look at the obtainer function. First, the
developer checks if the version the user specified follows the correct
format.
if ((!is.character(version)) || length(version) > 1) {
stop("version must be a length character")
}
if (
!is.na(version) && (!version %in% c("latest", "2026-01-20", "2026-01-13"))
) {
stop("invalid version number")
}When the user calls the roam object for the first time without a
version, this version input is NA. When the user calls the
roam object using roam_update(), this version is
"latest". Apart from that, this obtainer function allows
two other version numbers "2026-01-20" and
"2026-01-13". The validation in the function here returns
an error if the input version is not one of the four possibilities.
The format of the version is entirely decided by the developer. It
also does not need to be hard coded inside the obtainer
function. Instead, the developer can retrieve a list of valid version
number inside the obtainer.
Next, based on the input version, the obtainer downloads
the corresponding data. Again, this does not need to be hard coded, but
it can be a call to an API with the version number. This allows updating
of datasets without updating packages.
if (is.na(version) || version %in% c("latest", "2026-01-20")) {
roam_set_version("2026-01-20")
read.csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/refs/heads/main/data/2026/2026-01-20/apod.csv"
)
} else {
roam_set_version("2026-01-13")
read.csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/refs/heads/main/data/2026/2026-01-13/africa.csv"
)
}Note the use of roam_set_version().
The developer should use roam_set_version() to associate
a version number with the local cache. This version should be a version
number with a valid format that might be different from the version the
user specifies. In this example, even if the user specified the version
to be "latest", the roam object
tidytuesday2026Jan will only store the cache with the
version number the developer specifies, which is
"2026-01-13".
If the developer does not specify a version number with
roam_set_version() inside the obtainer
function, the cache will be stored with version NA. The
roam_set_version() function should also be called before
the obtainer function returns the data. The output value of
the obtainer function should always be the data itself.
The user can use the roam_version() function to check
which version of the data is cached locally.