pyglottolog.Glottolog
Most of the Glottolog’s data can be accessed through an instance of pyglottolog.Glottolog
.
- class pyglottolog.Glottolog(repos='.', *, cache=False)[source]
API to access Glottolog data
This class provides (read and write) access to a local copy of the Glottolog data, which can be obtained as explained in the README
- Parameters:
cache (
bool
) –repos – Path to a copy of https://github.com/glottolog/glottolog
cache – Indicate whether to cache Languoid objects or not. If True, the API must be used read-only.
-
repos:
pathlib.Path
Absolute path to the copy of the data repository:
-
tree:
pathlib.Path
Absolute path to the tree directory in the repos.
- references_path(*comps)[source]
Path within the references directory of the repos.
- Parameters:
comps (
str
) –
- property iso: ISO
- Returns:
clldutils.iso_639_3.ISO instance, fed with the data of the latest ISO code table zip found in the build directory.
- property ftsindex: Path
Directory within build where the FullTextSearch index is created.
Accessing configuration data
Configuration data in https://github.com/glottolog/glottolog/tree/master/config
can be accessed conveniently via the following properties of
pyglottolog.Glottolog
:
- property Glottolog.aes_status: Dict[str, AES]
- Return type:
mapping with
config.AES
values.
- property Glottolog.aes_sources: Dict[str, AESSource]
- Return type:
mapping with
config.AESSource
values
- property Glottolog.document_types: Dict[str, DocumentType]
- Return type:
mapping with
config.DocumentType
values
- property Glottolog.med_types: Dict[str, MEDType]
- Return type:
mapping with
config.MEDType
values
- property Glottolog.macroareas: Dict[str, Macroarea]
- Return type:
mapping with
config.Macroarea
values
- property Glottolog.language_types: Dict[str, LanguageType]
- Return type:
mapping with
config.LanguageType
values
- property Glottolog.languoid_levels: Dict[str, LanguoidLevel]
- Return type:
mapping with
config.LanguoidLevel
values
- property Glottolog.editors: Dict[str, Generic]
Metadata about editors of Glottolog
- Return type:
mapping with
config.Generic
values
- property Glottolog.publication: Dict[str, Generic]
Metadata about the Glottolog publication
- Return type:
mapping with
config.Generic
values
See configuration data for details about the returned objects.
Accessing languoid data
- property Glottolog.glottocodes: Glottocodes
Registry of Glottocodes.
- Glottolog.languoid(id_)[source]
Retrieve a languoid specified by language code.
- Parameters:
id – Glottocode or ISO code.
id_ (
typing.Union
[str
,pyglottolog.languoids.languoid.Languoid
]) –
- Return type:
- Glottolog.languoids(ids=None, maxlevel=None, exclude_pseudo_families=False)[source]
Yields languoid objects.
- Parameters:
ids (
set
) – set of Glottocodes to limit the result to. This is useful to increase performance, since INI file reading can be skipped for languoids not listed.maxlevel (
typing.Union
[int
,pyglottolog.config.LanguoidLevel
,str
]) – Numeric maximal nesting depth of languoids, or Languoid.level.exclude_pseudo_families (
bool
) – Flag signaling whether to exclude pseud families, i.e. languoids from non-genealogical trees.
- Return type:
typing.Generator
[pyglottolog.languoids.languoid.Languoid
,None
,None
]
- Glottolog.languoids_by_code(nodes=None)[source]
Returns a dict mapping the three major language code schemes (Glottocode, ISO code, and Harald’s NOCODE_s) to Languoid objects.
- Return type:
typing.Dict
[str
,pyglottolog.languoids.languoid.Languoid
]
The classification can be accessed via a pyglottolog.languoids.Languoid
’s
attributes. In addition, it can be visualized via
- Glottolog.ascii_tree(start, maxlevel=None)[source]
Prints an ASCII representation of the languoid tree starting at start to stdout.
- Parameters:
start (
typing.Union
[str
,pyglottolog.languoids.languoid.Languoid
]) –
and serialized as Newick string via
- Glottolog.newick_tree(start=None, template=None, nodes=None, maxlevel=None)[source]
Returns the Newick representation of a (set of) Glottolog classification tree(s).
- Parameters:
start (
typing.Union
[None
,str
,pyglottolog.languoids.languoid.Languoid
]) – Root languoid of the tree (or None to return the complete classification).template (
str
) – Python format string accepting the Languoid instance as single variable named l, used to format node labels.maxlevel (
typing.Union
[int
,pyglottolog.config.LanguoidLevel
]) –
- Return type:
str
Accessing reference data
Performance considerations
Reading the data for Glottolog’s more than 25,000 languoids from the same number of files in individual directories isn’t particularly quick. So on average computers running
>>> list(glottolog.languoids())
would take around 15 seconds.
Due to this, care should be taken not to read languoid data from disk repeatedly. In particular “N+1”-type problems should be avoided, where one would read all languoids into memory and then look up attributes on each languoid, thereby triggering new reads from disk. This may easily happen, since attributes such as Languoid.family are implemented as properties, which traverse the directory tree and read information from disk at access time.
To make it possible to avoid such problems, many of these properties can be substituted with a call to a similar method of Languoid, which accepts a “node map” (i.e. a dict mapping Languoid.id to Languoid objects) as parameter, e.g. Languoid.ancestors_from_nodemap or Languoid.descendants_from_nodemap. Typical usage would look as follows:
>>> languoids = {l.id: l for l in glottolog.languoids()}
>>> for l in languoids.values():
... if not l.ancestors_from_nodemap(languoids):
... print('top-level {0}: {1}'.format(l.level, l.name))
Alternatively, if you only want to read Glottolog data, you may enable caching
when instantiating pyglottolog.Glottolog
.