pyglottolog.Glottolog

Most of the Glottolog’s data can be accessed through an instance of pyglottolog.Glottolog.

class pyglottolog.Glottolog(repos='.', *, cache=False)[source]

API to access Glottolog data

This class provides (read and write) access to a local copy of the Glottolog data, which can be obtained as explained in the README

Parameters:
repos: pathlib.Path

Absolute path to the copy of the data repository:

tree: pathlib.Path

Absolute path to the tree directory in the repos.

references_path(*comps)[source]

Path within the references directory of the repos.

Parameters:

comps (str) –

languoids_path(*comps)[source]

Path within the languoids directory of the repos.

property iso: ISO
Returns:

clldutils.iso_639_3.ISO instance, fed with the data of the latest ISO code table zip found in the build directory.

property ftsindex: Path

Directory within build where the FullTextSearch index is created.

Accessing configuration data

Configuration data in https://github.com/glottolog/glottolog/tree/master/config can be accessed conveniently via the following properties of pyglottolog.Glottolog:

property Glottolog.aes_status: Dict[str, AES]
Return type:

mapping with config.AES values.

property Glottolog.aes_sources: Dict[str, AESSource]
Return type:

mapping with config.AESSource values

property Glottolog.document_types: Dict[str, DocumentType]
Return type:

mapping with config.DocumentType values

property Glottolog.med_types: Dict[str, MEDType]
Return type:

mapping with config.MEDType values

property Glottolog.macroareas: Dict[str, Macroarea]
Return type:

mapping with config.Macroarea values

property Glottolog.language_types: Dict[str, LanguageType]
Return type:

mapping with config.LanguageType values

property Glottolog.languoid_levels: Dict[str, LanguoidLevel]
Return type:

mapping with config.LanguoidLevel values

property Glottolog.editors: Dict[str, Generic]

Metadata about editors of Glottolog

Return type:

mapping with config.Generic values

property Glottolog.publication: Dict[str, Generic]

Metadata about the Glottolog publication

Return type:

mapping with config.Generic values

See configuration data for details about the returned objects.

Accessing languoid data

property Glottolog.glottocodes: Glottocodes

Registry of Glottocodes.

Glottolog.languoid(id_)[source]

Retrieve a languoid specified by language code.

Parameters:
Return type:

pyglottolog.languoids.languoid.Languoid

Glottolog.languoids(ids=None, maxlevel=None, exclude_pseudo_families=False)[source]

Yields languoid objects.

Parameters:
  • ids (set) – set of Glottocodes to limit the result to. This is useful to increase performance, since INI file reading can be skipped for languoids not listed.

  • maxlevel (typing.Union[int, pyglottolog.config.LanguoidLevel, str]) – Numeric maximal nesting depth of languoids, or Languoid.level.

  • exclude_pseudo_families (bool) – Flag signaling whether to exclude pseud families, i.e. languoids from non-genealogical trees.

Return type:

typing.Generator[pyglottolog.languoids.languoid.Languoid, None, None]

Glottolog.languoids_by_code(nodes=None)[source]

Returns a dict mapping the three major language code schemes (Glottocode, ISO code, and Harald’s NOCODE_s) to Languoid objects.

Return type:

typing.Dict[str, pyglottolog.languoids.languoid.Languoid]

The classification can be accessed via a pyglottolog.languoids.Languoid’s attributes. In addition, it can be visualized via

Glottolog.ascii_tree(start, maxlevel=None)[source]

Prints an ASCII representation of the languoid tree starting at start to stdout.

Parameters:

start (typing.Union[str, pyglottolog.languoids.languoid.Languoid]) –

and serialized as Newick string via

Glottolog.newick_tree(start=None, template=None, nodes=None, maxlevel=None)[source]

Returns the Newick representation of a (set of) Glottolog classification tree(s).

Parameters:
Return type:

str

Accessing reference data

property Glottolog.bibfiles: BibFiles

Access reference data by BibFile.

Return type:

references.BibFiles

Performance considerations

Reading the data for Glottolog’s more than 25,000 languoids from the same number of files in individual directories isn’t particularly quick. So on average computers running

>>> list(glottolog.languoids())

would take around 15 seconds.

Due to this, care should be taken not to read languoid data from disk repeatedly. In particular “N+1”-type problems should be avoided, where one would read all languoids into memory and then look up attributes on each languoid, thereby triggering new reads from disk. This may easily happen, since attributes such as Languoid.family are implemented as properties, which traverse the directory tree and read information from disk at access time.

To make it possible to avoid such problems, many of these properties can be substituted with a call to a similar method of Languoid, which accepts a “node map” (i.e. a dict mapping Languoid.id to Languoid objects) as parameter, e.g. Languoid.ancestors_from_nodemap or Languoid.descendants_from_nodemap. Typical usage would look as follows:

>>> languoids = {l.id: l for l in glottolog.languoids()}
>>> for l in languoids.values():
...    if not l.ancestors_from_nodemap(languoids):
...        print('top-level {0}: {1}'.format(l.level, l.name))

Alternatively, if you only want to read Glottolog data, you may enable caching when instantiating pyglottolog.Glottolog.