---
title: "Getting started with tesouror"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with tesouror}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Overview

The `tesouror` package provides a unified R interface to the Brazilian National
Treasury (Tesouro Nacional) open data APIs. It covers six major data sources:

| API | Data | Prefix |
|:---|:---|:---|
| **SICONFI** | Fiscal reports (RREO, RGF, DCA, MSC), entities | `get_` |
| **CUSTOS** | Federal government costs | `get_custos_` / `get_costs_` |
| **SADIPEM** | Public debt & credit operations | `get_pvl` / `get_opc_` / `get_res_` |
| **SIORG** | Federal organizational structure (dictionary for CUSTOS) | `get_siorg_` |
| **Transferências** | Constitutional transfers to states/municipalities | `get_tc_` |
| **SIOPE** | Education spending (FNDE/MEC) | `get_siope_` |

All functions return tidy tibbles, use in-memory caching, and have both
Portuguese-named (matching the API) and English-named versions.

## Installation

```{r}
# From CRAN (when available):
install.packages("tesouror")

# Development version from GitHub:
# remotes::install_github("StrategicProjects/tesouror")
```

## Quick examples

```{r}
library(tesouror)

# List all government entities
entes <- get_entes()

# Get RREO data for Tocantins (IBGE code 17)
rreo <- get_rreo(
  an_exercicio = 2022, nr_periodo = 6,
  co_tipo_demonstrativo = "RREO",
  no_anexo = "RREO-Anexo 01",
  co_esfera = "E", id_ente = 17
)

# Same query using English aliases
rreo <- get_budget_report(
  fiscal_year = 2022, period = 6,
  report_type = "RREO",
  appendix = "RREO-Anexo 01",
  sphere = "E", entity_id = 17
)

# Federal government costs (always filter by org to avoid slow queries!)
custos <- get_custos_pessoal_ativo(
  ano = 2023, mes = 6,
  organizacao_n1 = 244,  # MEC (auto-padded)
  organizacao_n2 = 249   # INEP
)

# Constitutional transfers (codes are Treasury-internal, NOT IBGE!)
estados <- get_tc_estados()
pe_code <- estados$codigo[estados$nome == "Pernambuco"]
tc <- get_tc_por_estados(p_estado = pe_code, p_ano = 2023)

# Education spending data from SIOPE
indicadores <- get_siope_indicators(year = 2023, period = 6, state = "PE")
```

## Caching

All functions cache responses in-memory during your R session. This means
repeated calls with the same parameters are instantaneous. To clear the cache:

```{r}
tesouror_clear_cache()
```

## Bilingual interface

Every function has two versions. The Portuguese version uses the exact API
parameter names, while the English version uses descriptive English names:
 
```{r}
# Portuguese (API-native)
get_dca(an_exercicio = 2022, id_ente = 17)

# English
get_annual_accounts(fiscal_year = 2022, entity_id = 17)
```

Both call the same endpoint and return the same data. See the
`vignette("siconfi")`, `vignette("custos")`, `vignette("sadipem")`,
`vignette("siorg")`, `vignette("transferencias")`, and
`vignette("siope")` articles for API-specific details.

## Debugging with `verbose`

Every function has a `verbose` parameter that prints the full API URL
being called. This is useful for debugging or testing in a browser/curl:

```{r}
# Per call:
get_costs_active_staff(year = 2023, month = 6, org_level1 = 244, verbose = TRUE)
#> ℹ API call: https://apidatalake.tesouro.gov.br/ords/custos/tt/pessoal_ativo?ano=2023&mes=6&organizacao_n1=000244&limit=1000

# Or globally for the session:
options(tesouror.verbose = TRUE)
get_entes()  # will print the URL
options(tesouror.verbose = FALSE)  # turn off
```

## Controlling page size

ORDS-based APIs (SICONFI, CUSTOS, SADIPEM) return paginated results.
The `page_size` parameter controls how many rows per page:

```{r}
# CUSTOS defaults to 1000 rows/page (server default is only 250)
custos <- get_costs_active_staff(
  year = 2023, org_level1 = 244, org_level2 = 249
)

# Lower for quick tests:
custos_sample <- get_costs_active_staff(
  year = 2023, org_level1 = 244, org_level2 = 249,
  page_size = 100, max_rows = 200
)

# SICONFI/SADIPEM default to server's 5000 rows/page (fast)
entes <- get_entes()
```

## Column names

All API responses are cleaned with `janitor::clean_names()` to ensure
consistent snake_case column names (e.g., `CO_IBGE` becomes `co_ibge`).

## API Reference

### SICONFI — Fiscal Reports

Base URL: `https://apidatalake.tesouro.gov.br/ords/siconfi/tt/`

Fiscal reports (RREO, RGF, DCA), accounting matrices (MSC), and entity
registry. Maintained by STN. ORDS pagination (`hasMore`/`offset`) with
server default of 5,000 rows/page. 18 functions (9 PT + 9 EN).

### CUSTOS — Federal Government Costs

Base URL: `https://apidatalake.tesouro.gov.br/ords/custos/tt/`

Cost data for active/retired staff, pensioners, depreciation, transfers,
and other costs. ORDS pagination with default of **1,000 rows/page**
(server default of 250 is too slow; 5,000 causes timeouts). SIORG codes
are auto-padded (`244` → `"000244"`). 12 functions (6 PT + 6 EN).

> **Warning**: Always filter by organization level (`organizacao_n1` +
> `organizacao_n2`) to avoid downloading hundreds of thousands of rows.

### SADIPEM — Public Debt

Base URL: `https://apidatalake.tesouro.gov.br/ords/sadipem/tt/`

PVL (public debt verification letters), credit operations, payment
schedules, exchange rates, and debt capacity. ORDS pagination with
server default of 5,000 rows/page. 14 functions (7 PT + 7 EN).

### Transferências Constitucionais

Base URL: `https://apiapex.tesouro.gov.br/aria/v1/transferencias_constitucionais/custom/`

Constitutional transfers (FPE, FPM, FUNDEB, etc.). No pagination (single
response). Accepts vectors (`c(1,2)`) or colon-separated strings
(`"1:2"`). Uses **Treasury-internal codes**, not IBGE. 14 functions
(7 PT + 7 EN).

### SIORG — Organizational Structure

Base URL: `https://estruturaorganizacional.dados.gov.br/`

Federal organizational structure: ministries, autarchies, foundations.
Used as a dictionary for CUSTOS organization codes. No pagination.
6 functions (3 PT + 3 EN).

### SIOPE — Education Spending

Base URL: `https://www.fnde.gov.br/olinda-ide/servico/DADOS_ABERTOS_SIOPE/versao/v1/odata/`

Education spending data from FNDE/MEC: revenues, expenses, indicators,
staff compensation. OData pagination (`$top`/`$skip`) with default of
**1,000 rows/page**. Supports server-side `filter` (OData `$filter`),
`orderby`, and `select`. 16 functions (8 PT + 8 EN).

> **Tip**: Use `filter = "NOM_MUNI eq 'Recife'"` to narrow results on
> the server. Column names in `filter`/`select`/`orderby` must use the
> original API names (uppercase). Run `toupper(names(result))` on a
> `max_rows = 1` query to discover them.

### Common features (all APIs)

All 80 functions share these features:

- **Retries**: 5 attempts with progressive backoff (3s, 6s, 9s, 12s,
  15s) on HTTP 500/502/503/504/429 and connection failures. HTTP
  400/404 are not retried.
- **Caching**: In-memory per session. Clear with
  `tesouror_clear_cache()`.
- **`verbose` mode**: Per call (`verbose = TRUE`) or globally
  (`options(tesouror.verbose = TRUE)`).
- **`max_rows`**: Cap the number of rows returned (adjusts `limit`
  or `$top` automatically).
- **Column cleaning**: `janitor::clean_names()` applied to all
  responses.
- **Bilingual**: Portuguese (API-native) and English aliases for every
  function.
- **Error messages**: Friendly, actionable messages with URL and hints.
  HTTP 400 errors suggest checking column names in `filter`/`select`.
