pyglottolog.Glottolog

Most of the Glottolog’s data can be accessed through an instance of pyglottolog.Glottolog.

class pyglottolog.Glottolog(repos='.', *, cache=False)[source]

API to access Glottolog data

This class provides (read and write) access to a local copy of the Glottolog data, which can be obtained as explained in the README

Parameters
repos: Path

Absolute path to the copy of the data repository:

tree: Path

Absolute path to the tree directory in the repos.

references_path(*comps)[source]

Path within the references directory of the repos.

Parameters

comps (str) –

languoids_path(*comps)[source]

Path within the languoids directory of the repos.

iso[source]
Returns

clldutils.iso_639_3.ISO instance, fed with the data of the latest ISO code table zip found in the build directory.

property ftsindex: Path

Directory within build where the FullTextSearch index is created.

Return type

pathlib.Path

Accessing configuration data

Configuration data in https://github.com/glottolog/glottolog/tree/master/config can be accessed conveniently via the following properties of pyglottolog.Glottolog:

See configuration data for details about the returned objects.

Accessing languoid data

property Glottolog.glottocodes: Glottocodes

Registry of Glottocodes.

Return type

pyglottolog.languoids.models.Glottocodes

Glottolog.languoid(id_)[source]

Retrieve a languoid specified by language code.

Parameters
Return type

pyglottolog.languoids.languoid.Languoid

Glottolog.languoids(ids=None, maxlevel=None, exclude_pseudo_families=False)[source]

Yields languoid objects.

Parameters
  • ids (typing.Optional[set]) – set of Glottocodes to limit the result to. This is useful to increase performance, since INI file reading can be skipped for languoids not listed.

  • maxlevel (typing.Union[int, pyglottolog.config.LanguoidLevel, str, None]) – Numeric maximal nesting depth of languoids, or Languoid.level.

  • exclude_pseudo_families (bool) – Flag signaling whether to exclude pseud families, i.e. languoids from non-genealogical trees.

Return type

typing.Generator[pyglottolog.languoids.languoid.Languoid, None, None]

Glottolog.languoids_by_code(nodes=None)[source]

Returns a dict mapping the three major language code schemes (Glottocode, ISO code, and Harald’s NOCODE_s) to Languoid objects.

Return type

typing.Dict[str, pyglottolog.languoids.languoid.Languoid]

The classification can be accessed via a pyglottolog.languoids.Languoid’s attributes. In addition, it can be visualized via

Glottolog.ascii_tree(start, maxlevel=None)[source]

Prints an ASCII representation of the languoid tree starting at start to stdout.

Parameters

start (typing.Union[str, pyglottolog.languoids.languoid.Languoid]) –

and serialized as Newick string via

Glottolog.newick_tree(start=None, template=None, nodes=None, maxlevel=None)[source]

Returns the Newick representation of a (set of) Glottolog classification tree(s).

Parameters
  • start (typing.Union[None, str, pyglottolog.languoids.languoid.Languoid]) – Root languoid of the tree (or None to return the complete classification).

  • template (typing.Optional[str]) – Python format string accepting the Languoid instance as single variable named l, used to format node labels.

  • maxlevel (typing.Union[int, pyglottolog.config.LanguoidLevel, None]) –

Return type

str

Accessing reference data

Performance considerations

Reading the data for Glottolog’s more than 25,000 languoids from the same number of files in individual directories isn’t particularly quick. So on average computers running

>>> list(glottolog.languoids())

would take around 15 seconds.

Due to this, care should be taken not to read languoid data from disk repeatedly. In particular “N+1”-type problems should be avoided, where one would read all languoids into memory and then look up attributes on each languoid, thereby triggering new reads from disk. This may easily happen, since attributes such as Languoid.family are implemented as properties, which traverse the directory tree and read information from disk at access time.

To make it possible to avoid such problems, many of these properties can be substituted with a call to a similar method of Languoid, which accepts a “node map” (i.e. a dict mapping Languoid.id to Languoid objects) as parameter, e.g. Languoid.ancestors_from_nodemap or Languoid.descendants_from_nodemap. Typical usage would look as follows:

>>> languoids = {l.id: l for l in glottolog.languoids()}
>>> for l in languoids.values():
...    if not l.ancestors_from_nodemap(languoids):
...        print('top-level {0}: {1}'.format(l.level, l.name))

Alternatively, if you only want to read Glottolog data, you may enable caching when instantiating pyglottolog.Glottolog.