knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(chiOpenData)
library(ggplot2)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.5.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, unionWelcome to the chiOpenData package, a R package
dedicated to helping R users connect to the Chicago Open Data Portal!
The chiOpenData package provides a streamlined interface
for accessing Chicago’s vast open data resources. It connects directly
to the Chicago Open Data Portal, helping users bridge the gap between
raw city APIs and tidy data analysis. This package is part of a broader
ecosystem of open data tools designed to provide a consistent interface
across cities. It does this in two ways:
chi_pull_dataset() functionThe primary way to pull data in this package is the
chi_pull_dataset() function, which works in tandem with
chi_list_datasets(). You do not need to know anything about
API keys or authentication.
The first step would be to call the chi_list_datasets()
to see what datasets are in the list and available to use in the
chi_pull_dataset() function. This provides information for
thousands of datasets found on the portal.
chi_list_datasets() |> head()
#> # A tibble: 6 × 27
#> key id name attribution attributionLink category createdAt dataUpdatedAt
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 dela… fuz6… Dela… City of Ch… https://www.ci… Sanitat… 2026-03-… 2026-03-31T1…
#> 2 covi… e464… COVI… City of Ch… <NA> Health … 2026-03-… 2026-03-26T2…
#> 3 stre… u5ai… Stre… City of Ch… https://www.ch… Sanitat… 2026-03-… 2026-03-27T1…
#> 4 chan… dw8r… Chan… City of Ch… https://www.ci… Transpo… 2026-03-… 2026-03-24T1…
#> 5 buil… vxcc… Buil… City of Ch… https://www.ci… Buildin… 2026-03-… 2026-03-19T1…
#> 6 buil… a9ab… Buil… City of Ch… https://www.ci… Buildin… 2026-03-… 2026-03-19T1…
#> # ℹ 19 more variables: dataUri <chr>, description <chr>, domain <chr>,
#> # externalId <lgl>, hideFromCatalog <lgl>, hideFromDataJson <lgl>,
#> # license <chr>, metadataUpdatedAt <chr>, provenance <chr>, updatedAt <chr>,
#> # webUri <chr>, approvals <list>, tags <list>,
#> # `customFields.Metadata.Data Owner` <chr>,
#> # `customFields.Metadata.Time Period` <chr>,
#> # `customFields.Metadata.Changes and Other Historical Information Useful to Understanding This Dataset` <chr>, …The output includes columns such as the dataset title, description,
and link to the source. The most important fields are the dataset
key and id. You need either
in order to use the chi_pull_dataset() function. You can
put either the key value or id value into the
dataset = filter inside of
chi_pull_dataset().
For instance, if we want to pull the dataset
Crimes - 2001 to Present, we can use either of the methods
below:
chi_motor_vehicle_collisions_data <- chi_pull_dataset(
dataset = "ijzp-q8t2", limit = 2, timeout_sec = 90)
chi_motor_vehicle_collisions_data <- chi_pull_dataset(
dataset = "crimes_2001_to_present", limit = 2, timeout_sec = 90)No matter if we put the id or the key as
the value for dataset =, we successfully get the data!
chi_any_dataset() functionThe easiest workflow is to use chi_list_datasets()
together with chi_pull_dataset().
In the event that you have a particular dataset you want to use in R
that is not in the list, you can use the chi_any_dataset().
The only requirement is the dataset’s API endpoint (a URL provided by
the Chicago Open Data portal). Here are the steps to get it:
Below is an example of how to use the chi_any_dataset()
once the API endpoint has been discovered, that will pull the same data
as the chi_pull_dataset() example:
chi_motor_vehicle_collisions_data <- chi_any_dataset(json_link = "
https://data.cityofchicago.org/resource/ijzp-q8t2.json", limit = 2)
While both functions provide access to Chicago Open Data, they serve slightly different purposes.
In general:
chi_pull_dataset() when the dataset is available in
chi_list_datasets()chi_any_dataset() when working with datasets
outside the catalogTogether, these functions allow users to either quickly access the datasets or flexibly query any dataset available on the Chicago Open Data portal.
Chicago has a population of about 2.7 million people, and
unfortunately, it has a higher than average crime rate, and all crime
data is contained in the dataset, found
here. In R, the chiOpenData package can be used to pull
this data directly.
By using the chi_pull_dataset() function, we can gather
the most recent crime cases in Chicago, and filter based upon any of the
columns inside the dataset.
Let’s take an example of 3 requests that occur on the street. The
chi_pull_dataset() function can filter based off any of the
columns in the dataset. To filter, we add filters = list()
and put whatever filters we would like inside. From our
colnames call before, we know that there is a column called
“location_description” which we can use to accomplish this.
chicago_crimes_street <- chi_pull_dataset(dataset = "ijzp-q8t2",limit = 3, timeout_sec = 90, filters = list(location_description = "STREET"))
chicago_crimes_street
#> # A tibble: 3 × 24
#> id case_number date block iucr primary_type description
#> <dbl> <chr> <dttm> <chr> <dbl> <chr> <chr>
#> 1 12131221 JD327000 2020-08-10 09:45:00 015XX… 326 ROBBERY AGGRAVATED…
#> 2 13061203 JG246126 2023-05-03 08:10:00 073XX… 486 BATTERY DOMESTIC B…
#> 3 13116982 JG312117 2023-06-23 04:44:00 040XX… 142 HOMICIDE RECKLESS H…
#> # ℹ 17 more variables: location_description <chr>, arrest <lgl>,
#> # domestic <lgl>, beat <dbl>, district <dbl>, ward <dbl>,
#> # community_area <dbl>, fbi_code <chr>, x_coordinate <dbl>,
#> # y_coordinate <dbl>, year <dbl>, updated_on <dttm>, latitude <dbl>,
#> # longitude <dbl>, location_latitude <dbl>, location_longitude <dbl>,
#> # location_human_address <chr>
# Checking to see the filtering worked
chicago_crimes_street |>
distinct(location_description)
#> # A tibble: 1 × 1
#> location_description
#> <chr>
#> 1 STREETSuccess! From calling the chicago_crimes_2026 dataset we
see there are only 3 rows of data, and from the distinct()
call we see the only location featured in our dataset is STREET.
One of the strongest qualities this function has is its ability to filter based off of multiple columns. Let’s put everything together and get a dataset of 50 crimes that occur on the STREET that are not domestic.
# Creating the dataset
chicago_crimes <- chi_pull_dataset(dataset = "ijzp-q8t2", limit = 50, timeout_sec = 90, filters = list(location_description = "STREET", domestic = FALSE))
# Calling head of our new dataset
chicago_crimes |>
slice_head(n = 6)
#> # A tibble: 6 × 24
#> id case_number date block iucr primary_type description
#> <dbl> <chr> <dttm> <chr> <chr> <chr> <chr>
#> 1 12131221 JD327000 2020-08-10 09:45:00 015XX… 0326 ROBBERY AGGRAVATED…
#> 2 13116982 JG312117 2023-06-23 04:44:00 040XX… 0142 HOMICIDE RECKLESS H…
#> 3 12888104 JF469015 2022-11-10 03:47:00 072XX… 1477 WEAPONS VIO… RECKLESS F…
#> 4 13129199 JG327622 2023-07-04 17:30:00 047XX… 0910 MOTOR VEHIC… AUTOMOBILE
#> 5 13179344 JG386917 2023-08-17 19:25:00 030XX… 0860 THEFT RETAIL THE…
#> 6 13183567 JG392073 2023-08-21 14:37:00 022XX… 1220 DECEPTIVE P… THEFT OF L…
#> # ℹ 17 more variables: location_description <chr>, arrest <lgl>,
#> # domestic <lgl>, beat <dbl>, district <dbl>, ward <dbl>,
#> # community_area <dbl>, fbi_code <chr>, x_coordinate <dbl>,
#> # y_coordinate <dbl>, year <dbl>, updated_on <dttm>, latitude <dbl>,
#> # longitude <dbl>, location_latitude <dbl>, location_longitude <dbl>,
#> # location_human_address <chr>
# Quick check to make sure our filtering worked
chicago_crimes |>
summarize(rows = n())
#> # A tibble: 1 × 1
#> rows
#> <int>
#> 1 50
chicago_crimes |>
distinct(location_description)
#> # A tibble: 1 × 1
#> location_description
#> <chr>
#> 1 STREET
chicago_crimes |>
distinct(domestic)
#> # A tibble: 1 × 1
#> domestic
#> <lgl>
#> 1 FALSEWe successfully created our dataset that contains 50 requests regarding that are not domestic that happen on the street.
Now that we have successfully pulled the data and have it in R, let’s
do a mini analysis on using the primary_type column, to
figure out what are the main types of crimes.
To do this, we will create a bar graph of the crime types.
# Visualizing the distribution, ordered by frequency
chicago_crimes |>
count(primary_type) |>
ggplot(aes(
x = n,
y = reorder(primary_type, n)
)) +
geom_col(fill = "steelblue") +
theme_minimal() +
labs(
title = "Top 50 Crime Types on the Street That Are Not Domestic",
x = "Number of Crimes",
y = "Primary Crime Type"
)Bar chart showing the frequency of crime types happening on the street that are not domestic.
This graph shows us not only which crimes were committed, but how many of each crime occurred. This suggests that theft is the most common crime type among recent non-domestic street incidents.
The chiOpenData package serves as a robust interface for
the Chicago Open Data portal, streamlining the path from raw city APIs
to actionable insights. By abstracting the complexities of data
acquisition—such as pagination, type-casting, and complex filtering—it
allows users to focus on analysis rather than data engineering.
As demonstrated in this vignette, the package provides a seamless workflow for targeted data retrieval, automated filtering, and rapid visualization.
If you use this package for research or educational purposes, please cite it as follows:
Martinez C (2026). chiOpenData: Convenient Access to Chicago Open Data API Endpoints. R package version 0.1.0, https://martinezc1.github.io/chiOpenData/.