GGO states codebook

James Hollway

2025-09-19

Release 1.0

This document provides a brief overview of the coding rationale for key variables in the list of episodes of independent states and state-like entities in the international system provided in manystates::states$GGO.

Note that this dataset was constructed as a complement to datasets such as the Gleditsch and Ward Revised List of Independent States (manystates::states$GW) and Butcher and Griffiths’ International System(s) Dataset (manystates::states$ISD). As such, it is incomplete in observations nor variables, yet offers some more specificity and some additional entries compared to such other datasets.

Work on this dataset was supported by the Swiss National Science Foundation (SNSF) Grant Number 188976: “Power and Networks and the Rate of Change in Institutional Complexes” (PANARCHIC).

Please direct all comments and suggestions to:

James Hollway

International Relations/Political Science Department

Graduate Institute of International and Development Studies

Geneva, Switzerland

States

StateName, StateNameAlt

This is the name or names of the state or state-like entity. Since the dataset includes entities (or dates placing these entities) before the advent of the modern interstate system, the definition of a state has changed but we include them here for reasons of comprehensivity. Where there are alternative or longer forms of the name of the state name, or names in other languages, these are included in the StateNameAlt variable. The shorter or more common name is preferred for the StateName variable, so long as it is unambiguous.

stateID

This is the three-letter code associated with the state or state-like entity. These three-letter codes are based on the ISO 3166-1 alpha-3 list, and all codes are consistent with it, however additional codes have been added to cover historical and other states that are not covered by the ISO’s own list. Where possible, we use the Correlates of War three-letter codes for this purpose, or those used in the GW or ISD datasets. However, in some cases we must select new codes and in such situations, we aim to use recognisable, unique codes relying on significant consonants or vowels.

Note that we endeavour to use existing codes where possible for state episodes that are substantially similar in territory and involve some inheritance of the international legal obligations, rights, and recognitions of the predecessor states. For this reason there is a series of episodes associated with “RUS”, for example, ranging from the Russian Empire, through the USSR, to the Russian Federation. However, where the state is not considered the legal successor state, for example Serbia is not considered the legal successor of Yugoslavia, we use different stateID codes (in this case “SRB” and “YUG”). In cases of dissolution (see below), the old stateID code should cease, whereas in cases of secession, the old stateID code should continue for the rump state.

Dates

Begin, End

These are the dates when an episode of state independence is deemed to have begun or ended. Dates are coded using the messydates system. This implements ISO’s extended date/time format. As such, some dates are only entered as a year or are annotated with a question mark if the source is uncertain. For more details see {messydates}.

States that are currently independent have an end date 9999-12-31. This distinguishes them from missing data, which is always coded NA.

Basis

The basis is coded as how the episode of state independence began. We adopt many of the categories offered in the ISD dataset, but add some additional categories to improve specificity:

Where the code is followed by a ? annotation, this indicates uncertainty about the coding.

Grounds

The grounds is coded as how the state ended. We use the categories offered in the ISD dataset:

Where the code is followed by a ? annotation, this indicates uncertainty about the coding.

Places

Capital, CapitalAlt

This is the name of the capital city. For the most part, this is fairly straightforward, however in some cases there is a second capital city, in which case this will appear in the CapitalAlt variable.

Latitude, Longitude

Here we use the latitude and longitude in decimal form. If possible, we code the location of the capital city. If this is not possible, we attempt to identify the longitude and latitude of the barycentre of the territory.

Region

We code the region more specifically than in some other datasets. We code the region descriptively and as a character string, which affords the opportunity to search by regular expression such as “America” to get “Northern America”, “Southern America”, “Central America”, and “Caribbean America”. Note that we use the adjectival form, e.g. “Southern Africa”, to distinguish the region from the country “South Africa”. We use “Central” to describe areas in the middle of the continent, if applicable. The data includes the following regions:

Coder, Comments, Source

The Coder variable is a comma separated vector of the surnames of those who have added or verified data for each entry/observation. Where special conditions arise, the Comments variable offers a free text area for explanations or recording how the coding has changed from version to version. The Source variable should contain only links or bibliographic information for the sources used to add or verify information.