Getting Started with chiOpenData

Introduction

Welcome to the chiOpenData package, a R package dedicated to helping R users connect to the Chicago Open Data Portal!

The chiOpenData package provides a streamlined interface for accessing Chicago’s vast open data resources. It connects directly to the Chicago Open Data Portal, helping users bridge the gap between raw city APIs and tidy data analysis. This package is part of a broader ecosystem of open data tools designed to provide a consistent interface across cities. It does this in two ways:

The `chi_pull_dataset()` function

The primary way to pull data in this package is the chi_pull_dataset() function, which works in tandem with chi_list_datasets(). You do not need to know anything about API keys or authentication.

The first step would be to call the chi_list_datasets() to see what datasets are in the list and available to use in the chi_pull_dataset() function. This provides information for thousands of datasets found on the portal.

chi_list_datasets() |> head()
#> # A tibble: 6 × 27
#>   key   id    name  attribution attributionLink category createdAt dataUpdatedAt
#>   <chr> <chr> <chr> <chr>       <chr>           <chr>    <chr>     <chr>        
#> 1 dela… fuz6… Dela… City of Ch… https://www.ci… Sanitat… 2026-03-… 2026-03-31T1…
#> 2 covi… e464… COVI… City of Ch… <NA>            Health … 2026-03-… 2026-03-26T2…
#> 3 stre… u5ai… Stre… City of Ch… https://www.ch… Sanitat… 2026-03-… 2026-03-27T1…
#> 4 chan… dw8r… Chan… City of Ch… https://www.ci… Transpo… 2026-03-… 2026-03-24T1…
#> 5 buil… vxcc… Buil… City of Ch… https://www.ci… Buildin… 2026-03-… 2026-03-19T1…
#> 6 buil… a9ab… Buil… City of Ch… https://www.ci… Buildin… 2026-03-… 2026-03-19T1…
#> # ℹ 19 more variables: dataUri <chr>, description <chr>, domain <chr>,
#> #   externalId <lgl>, hideFromCatalog <lgl>, hideFromDataJson <lgl>,
#> #   license <chr>, metadataUpdatedAt <chr>, provenance <chr>, updatedAt <chr>,
#> #   webUri <chr>, approvals <list>, tags <list>,
#> #   `customFields.Metadata.Data Owner` <chr>,
#> #   `customFields.Metadata.Time Period` <chr>,
#> #   `customFields.Metadata.Changes and Other Historical Information Useful to Understanding This Dataset` <chr>, …

The output includes columns such as the dataset title, description, and link to the source. The most important fields are the dataset key and id. You need either in order to use the chi_pull_dataset() function. You can put either the key value or id value into the dataset = filter inside of chi_pull_dataset().

For instance, if we want to pull the dataset Crimes - 2001 to Present, we can use either of the methods below:

chi_motor_vehicle_collisions_data <- chi_pull_dataset(
  dataset = "ijzp-q8t2", limit = 2, timeout_sec = 90)

chi_motor_vehicle_collisions_data <- chi_pull_dataset(
  dataset = "crimes_2001_to_present", limit = 2, timeout_sec = 90)

No matter if we put the id or the key as the value for dataset =, we successfully get the data!

The `chi_any_dataset()` function

The easiest workflow is to use chi_list_datasets() together with chi_pull_dataset().

In the event that you have a particular dataset you want to use in R that is not in the list, you can use the chi_any_dataset(). The only requirement is the dataset’s API endpoint (a URL provided by the Chicago Open Data portal). Here are the steps to get it:

On the Chicago Open Data Portal, go to the dataset you want to work with.
Click on “Export” (next to the actions button on the right hand side).
Click on “API Endpoint”.
Click on “SODA2” for “Version”.
Copy the API Endpoint.

Below is an example of how to use the chi_any_dataset() once the API endpoint has been discovered, that will pull the same data as the chi_pull_dataset() example:

chi_motor_vehicle_collisions_data <- chi_any_dataset(json_link = "  
https://data.cityofchicago.org/resource/ijzp-q8t2.json", limit = 2)

Rule of Thumb

While both functions provide access to Chicago Open Data, they serve slightly different purposes.

In general:

Use chi_pull_dataset() when the dataset is available in chi_list_datasets()
Use chi_any_dataset() when working with datasets outside the catalog

Together, these functions allow users to either quickly access the datasets or flexibly query any dataset available on the Chicago Open Data portal.

Real World Example

Chicago has a population of about 2.7 million people, and unfortunately, it has a higher than average crime rate, and all crime data is contained in the dataset, found here. In R, the chiOpenData package can be used to pull this data directly.

By using the chi_pull_dataset() function, we can gather the most recent crime cases in Chicago, and filter based upon any of the columns inside the dataset.

Let’s take an example of 3 requests that occur on the street. The chi_pull_dataset() function can filter based off any of the columns in the dataset. To filter, we add filters = list() and put whatever filters we would like inside. From our colnames call before, we know that there is a column called “location_description” which we can use to accomplish this.


chicago_crimes_street <- chi_pull_dataset(dataset = "ijzp-q8t2",limit = 3, timeout_sec = 90, filters = list(location_description = "STREET"))
chicago_crimes_street
#> # A tibble: 3 × 24
#>         id case_number date                block   iucr primary_type description
#>      <dbl> <chr>       <dttm>              <chr>  <dbl> <chr>        <chr>      
#> 1 12131221 JD327000    2020-08-10 09:45:00 015XX…   326 ROBBERY      AGGRAVATED…
#> 2 13061203 JG246126    2023-05-03 08:10:00 073XX…   486 BATTERY      DOMESTIC B…
#> 3 13116982 JG312117    2023-06-23 04:44:00 040XX…   142 HOMICIDE     RECKLESS H…
#> # ℹ 17 more variables: location_description <chr>, arrest <lgl>,
#> #   domestic <lgl>, beat <dbl>, district <dbl>, ward <dbl>,
#> #   community_area <dbl>, fbi_code <chr>, x_coordinate <dbl>,
#> #   y_coordinate <dbl>, year <dbl>, updated_on <dttm>, latitude <dbl>,
#> #   longitude <dbl>, location_latitude <dbl>, location_longitude <dbl>,
#> #   location_human_address <chr>

# Checking to see the filtering worked
chicago_crimes_street |>
  distinct(location_description)
#> # A tibble: 1 × 1
#>   location_description
#>   <chr>               
#> 1 STREET

Success! From calling the chicago_crimes_2026 dataset we see there are only 3 rows of data, and from the distinct() call we see the only location featured in our dataset is STREET.

One of the strongest qualities this function has is its ability to filter based off of multiple columns. Let’s put everything together and get a dataset of 50 crimes that occur on the STREET that are not domestic.

# Creating the dataset
chicago_crimes <- chi_pull_dataset(dataset = "ijzp-q8t2", limit = 50, timeout_sec = 90, filters = list(location_description = "STREET", domestic = FALSE))

# Calling head of our new dataset
chicago_crimes |>
  slice_head(n = 6)
#> # A tibble: 6 × 24
#>         id case_number date                block  iucr  primary_type description
#>      <dbl> <chr>       <dttm>              <chr>  <chr> <chr>        <chr>      
#> 1 12131221 JD327000    2020-08-10 09:45:00 015XX… 0326  ROBBERY      AGGRAVATED…
#> 2 13116982 JG312117    2023-06-23 04:44:00 040XX… 0142  HOMICIDE     RECKLESS H…
#> 3 12888104 JF469015    2022-11-10 03:47:00 072XX… 1477  WEAPONS VIO… RECKLESS F…
#> 4 13129199 JG327622    2023-07-04 17:30:00 047XX… 0910  MOTOR VEHIC… AUTOMOBILE 
#> 5 13179344 JG386917    2023-08-17 19:25:00 030XX… 0860  THEFT        RETAIL THE…
#> 6 13183567 JG392073    2023-08-21 14:37:00 022XX… 1220  DECEPTIVE P… THEFT OF L…
#> # ℹ 17 more variables: location_description <chr>, arrest <lgl>,
#> #   domestic <lgl>, beat <dbl>, district <dbl>, ward <dbl>,
#> #   community_area <dbl>, fbi_code <chr>, x_coordinate <dbl>,
#> #   y_coordinate <dbl>, year <dbl>, updated_on <dttm>, latitude <dbl>,
#> #   longitude <dbl>, location_latitude <dbl>, location_longitude <dbl>,
#> #   location_human_address <chr>

# Quick check to make sure our filtering worked
chicago_crimes |>
  summarize(rows = n())
#> # A tibble: 1 × 1
#>    rows
#>   <int>
#> 1    50

chicago_crimes |>
  distinct(location_description)
#> # A tibble: 1 × 1
#>   location_description
#>   <chr>               
#> 1 STREET

chicago_crimes |>
  distinct(domestic)
#> # A tibble: 1 × 1
#>   domestic
#>   <lgl>   
#> 1 FALSE

We successfully created our dataset that contains 50 requests regarding that are not domestic that happen on the street.

Mini analysis

Now that we have successfully pulled the data and have it in R, let’s do a mini analysis on using the primary_type column, to figure out what are the main types of crimes.

To do this, we will create a bar graph of the crime types.

# Visualizing the distribution, ordered by frequency
chicago_crimes |>
  count(primary_type) |>
  ggplot(aes(
    x = n,
    y = reorder(primary_type, n)
  )) +
  geom_col(fill = "steelblue") +
  theme_minimal() +
  labs(
    title = "Top 50 Crime Types on the Street That Are Not Domestic",
    x = "Number of Crimes",
    y = "Primary Crime Type"
  )

Bar chart showing the frequency of crime types happening on the street that are not domestic.

This graph shows us not only which crimes were committed, but how many of each crime occurred. This suggests that theft is the most common crime type among recent non-domestic street incidents.

Getting Started with chiOpenData

Christian Martinez

Introduction

The `chi_pull_dataset()` function

The `chi_any_dataset()` function

Rule of Thumb

Real World Example

Mini analysis

Summary

How to Cite

Getting Started with chiOpenData

Christian Martinez

Introduction

The chi_pull_dataset() function

The chi_any_dataset() function

Rule of Thumb

Real World Example

Mini analysis

Summary

How to Cite

The `chi_pull_dataset()` function

The `chi_any_dataset()` function