This document provides a brief overview of the coding rationale for
key variables in the list of episodes of independent states and
state-like entities in the international system provided in
manystates::states$GGO
.
Note that this dataset was constructed as a complement to datasets
such as the Gleditsch and Ward Revised List of Independent States
(manystates::states$GW
) and Butcher and Griffiths’
International System(s) Dataset (manystates::states$ISD
).
As such, it is incomplete in observations nor variables, yet offers some
more specificity and some additional entries compared to such other
datasets.
Work on this dataset was supported by the Swiss National Science Foundation (SNSF) Grant Number 188976: “Power and Networks and the Rate of Change in Institutional Complexes” (PANARCHIC).
Please direct all comments and suggestions to:
James Hollway
International Relations/Political Science Department
Graduate Institute of International and Development Studies
Geneva, Switzerland
james.hollway@graduateinstitute.chThis is the name or names of the state or state-like entity. Since
the dataset includes entities (or dates placing these entities) before
the advent of the modern interstate system, the definition of a state
has changed but we include them here for reasons of comprehensivity.
Where there are alternative or longer forms of the name of the state
name, or names in other languages, these are included in the
StateNameAlt
variable. The shorter or more common name is
preferred for the StateName
variable, so long as it is
unambiguous.
This is the three-letter code associated with the state or state-like
entity. These three-letter codes are based on the ISO 3166-1 alpha-3
list, and all codes are consistent with it, however additional codes
have been added to cover historical and other states that are not
covered by the ISO’s own list. Where possible, we use the Correlates of
War three-letter codes for this purpose, or those used in the
GW
or ISD
datasets. However, in some cases we
must select new codes and in such situations, we aim to use
recognisable, unique codes relying on significant consonants or
vowels.
Note that we endeavour to use existing codes where possible for state episodes that are substantially similar in territory and involve some inheritance of the international legal obligations, rights, and recognitions of the predecessor states. For this reason there is a series of episodes associated with “RUS”, for example, ranging from the Russian Empire, through the USSR, to the Russian Federation. However, where the state is not considered the legal successor state, for example Serbia is not considered the legal successor of Yugoslavia, we use different stateID codes (in this case “SRB” and “YUG”). In cases of dissolution (see below), the old stateID code should cease, whereas in cases of secession, the old stateID code should continue for the rump state.
These are the dates when an episode of state independence is deemed
to have begun or ended. Dates are coded using the messydates system.
This implements ISO’s extended date/time format. As such, some dates are
only entered as a year or are annotated with a question mark if the
source is uncertain. For more details see {messydates}
.
States that are currently independent have an end date
9999-12-31
. This distinguishes them from missing data,
which is always coded NA
.
The basis is coded as how the episode of state independence began. We adopt many of the categories offered in the ISD dataset, but add some additional categories to improve specificity:
Where the code is followed by a ?
annotation, this
indicates uncertainty about the coding.
The grounds is coded as how the state ended. We use the categories offered in the ISD dataset:
Where the code is followed by a ?
annotation, this
indicates uncertainty about the coding.
This is the name of the capital city. For the most part, this is
fairly straightforward, however in some cases there is a second capital
city, in which case this will appear in the CapitalAlt
variable.
Here we use the latitude and longitude in decimal form. If possible, we code the location of the capital city. If this is not possible, we attempt to identify the longitude and latitude of the barycentre of the territory.
We code the region more specifically than in some other datasets. We code the region descriptively and as a character string, which affords the opportunity to search by regular expression such as “America” to get “Northern America”, “Southern America”, “Central America”, and “Caribbean America”. Note that we use the adjectival form, e.g. “Southern Africa”, to distinguish the region from the country “South Africa”. We use “Central” to describe areas in the middle of the continent, if applicable. The data includes the following regions:
The Coder
variable is a comma separated vector of the
surnames of those who have added or verified data for each
entry/observation. Where special conditions arise, the
Comments
variable offers a free text area for explanations
or recording how the coding has changed from version to version. The
Source
variable should contain only links or bibliographic
information for the sources used to add or verify information.