This is a new version of the genderBR package that includes a new
function: get_gender_nn(), which uses a character-level
neural network to predict gender from Brazilian first names. This model
can generalise to names not present in the IBGE census dataset, so it
can be used as a complement to the existing functionality in the
package. The release also includes some improvements, tests, and
documentation updates.
get_gender_nn() is a new exported function that uses a
character-level neural network to predict gender from Brazilian first
names. Unlike get_gender(), this function can generalise to
names not present in the IBGE census dataset.clear_nn_cache() to manage the in-memory model
cache.download_gender_model(), an internal function
that handles downloading and caching the neural network model weights
and vocabulary from Hugging Face.iconv() with chartr() for
stripping accents in name cleaning. The previous approach relied on
iconv(name, to = "ASCII//TRANSLIT"), which is
platform-dependent and returns NA on macOS for accented
names (e.g., “joão”). The encoding argument in
get_gender, get_gender_nn, and
map_gender is now deprecated and will be removed in a
future version.torch to Imports; luz
and httr2 to Suggests.get_gender.nomes now includes probabilities for
2010 and 2022 (prob_fem10, prob_fem22) and is
used when internal = TRUE. This data covers 141,742 unique
Brazilian first names.%>% with the base
|> operator, thus removing the magrittr
dependency (requires R 4.1.0 or higher).data.table for
faster joins and removed dplyr/tibble
dependencies.In this version, a few improvements and bug fixed were introduced. Most important, connection errors now return informative messages to users.
map_gender and get_gender now return
informative error messages when reach timeoutget_gender function better handles non-ASCII
charactersIn this minor release, the genderBR package was improved in two ways. First, bugs and some minor issues were fixed, making the package’s functions more stable. Second, the package now contains an internal dataset with all the names reported by the IBGE’s Census that is used by the get_gender function to predict gender from Brazilian first names. Therefore, classifying a vector with more than 1,000 names takes no more than a few seconds now. Overall, these are the improvements:
NEWS.md file to track changes to the
package.get_gender function.round_guess funcion.get_gender function to work with internal
data.