Languoid data

All metadata related to a languoid (i.e. the content of the languoid’s INI file and the classification - its relation to other languoids) is available from a pyglottolog.languoids.Languoid instance.

class pyglottolog.languoids.Languoid(cfg, lineage=None, id_=None, directory=None, tree=None, _api=None)[source]

Info on languoids is encoded in the INI files and in the directory hierarchy of pyglottolog.Glottolog.tree. This class provides access to all of it.

Languoid formatting:

Variables:

_format_specs – A dict mapping custom format specifiers to conversion functions. Usage:

>>> l = Languoid.from_name_id_level(pathlib.Path('.'), 'N(a,m)e', 'abcd1234', 'language')
>>> '{0:newick_name}'.format(l)
'N{a/m}e'
Parameters:
  • cfg (clldutils.inifile.INI) –

  • lineage (typing.Optional[list[tuple[str, str, str]]]) –

  • id_ (typing.Optional[str]) –

  • directory (typing.Optional[pathlib.Path]) –

  • tree (typing.Optional[pathlib.Path]) –

Refer to the factory methods for typical use cases of instantiating a Languoid:

Parameters:
  • cfg (clldutils.inifile.INI) – INI instance storing the languoid’s metadata.

  • lineage (typing.Optional[list[tuple[str, str, str]]]) – list of ancestors (from root to this languoid).

  • id – Glottocode for the languoid (or None, if directory is passed).

  • _api – Some properties require access to config data which is accessed through a Glottolog API instance.

  • id_ (typing.Optional[str]) –

  • directory (typing.Optional[pathlib.Path]) –

  • tree (typing.Optional[pathlib.Path]) –

classmethod from_dir(directory, nodes=None, _api=None, **kw)[source]

Create a Languoid from a directory, named with the Glottocode and containing md.ini.

This method is used by pyglottolog.Glottolog to read Languoid`s from the repository’s `languoids/tree directory.

Parameters:
classmethod from_name_id_level(tree, name, id_, level, **kw)[source]

This method is used in pyglottolog.lff to instantiate Languoid s for new nodes encountered in “lff”-format trees.

newick_node(nodes=None, template=None, maxlevel=None, level=0)[source]

Return a newick.Node representing the subtree of the Glottolog classification starting at the languoid.

Parameters:
  • template – Python format string accepting the Languoid instance as single variable named l, used to format node labels.

  • nodes (typing.Optional[dict[str, pyglottolog.languoids.languoid.Languoid]]) –

Return type:

newick.Node

write_info(outdir=None)[source]

Write Languoid metadata as INI file to outdir/<INFO_FILENAME>.

Parameters:

outdir (typing.Optional[pathlib.Path]) –

Return type:

pathlib.Path

property glottocode: Glottocode

Alias for id

property category: str | None

Languoid category.

  • Category name from pyglottolog.config.LanguageType for languoids of level “language”,

  • “Family” or “Pseudo Family” for families,

  • “Dialect” for dialects.

property isolate: bool

Flag signaling whether the languoid is an isolate, i.e. has level “language” and is not member of a family.

children_from_nodemap(nodes)[source]

A faster alternative to children when the relevant languoids have already been read from disc.

Parameters:

nodes (dict[str, pyglottolog.languoids.languoid.Languoid]) –

Return type:

list[pyglottolog.languoids.languoid.Languoid]

descendants_from_nodemap(nodes, level=None)[source]

A faster alternative to descendants when the relevant languoids have already been read from disc.

Parameters:

nodes (dict[str, pyglottolog.languoids.languoid.Languoid]) –

Return type:

list[pyglottolog.languoids.languoid.Languoid]

property children: list[pyglottolog.languoids.languoid.Languoid]

List of direct descendants of the languoid in the classification tree.

Note

Using this on many languoids can be slow, because the directory tree may be traversed and INI files read multiple times. To circumvent this problem, you may use a read-only pyglottolog.Glottolog instance, by passing cache=True at initialization.

ancestors_from_nodemap(nodes)[source]

A faster alternative to ancestors when the relevant languoids have already been read from disc.

Parameters:

nodes (dict[str, pyglottolog.languoids.languoid.Languoid]) –

Return type:

list[pyglottolog.languoids.languoid.Languoid]

iter_ancestors()[source]

Yield ancestors going up the directory tree.

Return type:

collections.abc.Generator[pyglottolog.languoids.languoid.Languoid, None, None]

property ancestors: list[pyglottolog.languoids.languoid.Languoid]

List of ancestors of the languoid in the classification tree, from root (i.e. top-level family) to parent node.

Note

Using this on many languoids can be slow, because the directory tree may be traversed and INI files read multiple times. To circumvent this problem, you may use a read-only pyglottolog.Glottolog instance, by passing cache=True at initialization.

property parent: Languoid | None

Parent languoid or None.

Note

Using this on many languoids can be slow, because the directory tree may be traversed and INI files read multiple times. To circumvent this problem, you may use a read-only pyglottolog.Glottolog instance, by passing cache=True at initialization.

property family: Languoid | None

Top-level family the languoid belongs to or None.

Note

Using this on many languoids can be slow, because the directory tree may be traversed and INI files read multiple times. To circumvent this problem, you may use a read-only pyglottolog.Glottolog instance, by passing cache=True at initialization.

property names: dict[str, list]

A dict mapping alternative name providers to list s of alternative names for the languoid by the given provider.

add_name(name, type_='glottolog')[source]

Add an alternative name.

Parameters:
  • name (str) –

  • type_ (str) –

update_names(names, type_='glottolog')[source]

Update alternative names of a specific type.

Parameters:
  • names (collections.abc.Iterable[str]) –

  • type_ (str) –

Return type:

bool

property identifier: dict | SectionProxy

Alternative identifiers of the languoid.

property sources: list[pyglottolog.languoids.models.Reference]

List of Glottolog references linked to the languoid

Return type:

pyglottolog.references.Reference

property endangerment: Endangerment | None

Endangerment information about the languoid.

Return type:

Endangerment

property classification_comment: ClassificationComment | None

Classification information about the languoid.

Return type:

ClassificationComment

property ethnologue_comment: EthnologueComment | None

Commentary about the classification of the languoid in Ethnologue.

Return type:

EthnologueComment

property macroareas: list[pyglottolog.config.Macroarea]
Return type:

list of config.Macroarea

property timespan: tuple[int, int] | None

Extinct languages are associated with a timespan

Links to web resources related to the languoid

Update the links section of the languoid for a particular domain.

Parameters:
  • domain (str) –

  • urls (collections.abc.Iterable[str]) –

Return type:

bool

property countries: list[pyglottolog.languoids.models.Country]

Countries a language is spoken in.

property name: str

The Glottolog mame of the languoid

property latitude: float | None

The geographic latitude of the point chosen as representative coordinate of the languoid

property longitude: float | None

The geographic longitude of the point chosen as representative coordinate of the languoid

property hid: str | None

The languoid’s “H(arald)ID”, aka a “NOCODE” code.

closest_iso(api=None, nodes=None)[source]

ISO 639-3 code assigned to the languoid or one of its ancestors in the classification (in case of dialects) or None.

Parameters:
Return type:

typing.Optional[str]

property iso_retirement: ISORetirement | None

Information about a retired ISO code related to the languoid.

property fname: Path

The location of the languoid’s info file in the Glottolog tree directory.

class pyglottolog.languoids.Glottocodes(fname)[source]

Registry keeping track of glottocodes that have been dealt out.

new(name, dry_run=False)[source]

Mint a new Glottocode

Return type:

pyglottolog.languoids.models.Glottocode

Some of the data available for languoids has enough internal structure to merit separate classes, simplyfying access.

class pyglottolog.languoids.Reference(key, pages=None, trigger=None, endtag='**', pattern=re.compile('\\\\*\\\\*(?P<key>[a-z0-9\\\\-_]+:[a-zA-Z.?\\\\-;*\\'/()\\\\[\\\\]!_:0-9\\\\u2014]+?)(?P<endtag>\\\\*\\\\*|\\\\(\\\\*\\\\*\\\\))(:(?P<pages>[0-9\\\\-f]+))?(<trigger "(?P<trigger>[^\\\\"]+)">)?'), old_pattern=re.compile('[^\\\\[]+\\\\[(?P<pages>[^]]*)]\\\\s*\\\\([0-9]+\\\\s+(?P<key>[^)]+)\\\\)'))[source]

A reference of a bibliographical record in Glottolog.

Parameters:
  • key (str) –

  • pages (typing.Optional[str]) –

  • trigger (typing.Optional[str]) –

  • endtag (str) –

  • pattern (re.Pattern) –

  • old_pattern (re.Pattern) –

get_source(api)[source]

Retrieve the referenced bibliographical record.

Return type:

pyglottolog.references.bibfiles.Entry

property provider: str

The provider id.

property bibname: str

The name of the bibtex file.

property bibkey: str

The local bibtex key in the bib.

classmethod from_match(match)[source]

Instantiate a reference from a regex match.

Parameters:

match (re.Match) –

Return type:

pyglottolog.languoids.models.Reference

classmethod from_string(string, pattern=None)[source]

Parse a reference from a string.

Parameters:
  • string (str) –

  • pattern (typing.Optional[re.Pattern]) –

Return type:

pyglottolog.languoids.models.Reference

classmethod from_list(list_, pattern=None)[source]

Turn list of strings into list of Reference instances.

Parameters:
Return type:

list[pyglottolog.languoids.models.Reference]

class pyglottolog.languoids.Endangerment(status, source, comment, date)[source]

Info about the endangerment status of the languoid

Parameters:
date: datetime.datetime

Date when the endangerment status was assessed

check(lang, keys, log)[source]

Check formatting of endangerment info.

Parameters:
class pyglottolog.languoids.EthnologueComment(isohid, comment_type, ethnologue_versions=<factory>, comment=None)[source]

Commentary about the classification of the languoid according to Ethnologue

Parameters:
  • isohid (str) –

  • comment_type (typing.Literal['spurious', 'missing']) –

  • ethnologue_versions (list[str]) –

  • comment (str) –

comment_type: typing.Literal['spurious', 'missing']

Either

  • “spurious” meaning the comment is to explain why the languoid in question is spurious and in which Ethnologue (as below) that is/was

  • “missing” meaning the comment is to explain why the languoid in question is missing (as a language entry) and in which Ethnologue (as below) that is/was

ethnologue_versions: list[str]

Which Ethnologue version(s) from E16-E19 the comment pertains to, joined by /:s. E.g. E16/E17. In the case of comment_type=spurious, E16/E17 in the version field means that the code was spurious in E16/E17 but no longer spurious in E18/E19. In the case of comment_type=missing, E16/E17 would mean that the code was missing from E16/E17, but present in E18/E19. If the comment concerns a language where versions would be the empty string, instead the string ISO 639-3 appears.

check(lang, keys, log)[source]

Check formatting of the comment

Parameters:
class pyglottolog.languoids.ISORetirement(code=None, name=None, change_request=None, effective=None, reason=None, change_to=<factory>, remedy=None, comment=None)[source]

Information extracted from accepted ISO 639-3 change requests about retired ISO codes associated with the languoid.

Parameters:
  • code (typing.Optional[str]) –

  • name (typing.Optional[str]) –

  • change_request (typing.Optional[str]) –

  • effective (typing.Optional[str]) –

  • reason (typing.Optional[str]) –

  • change_to (list[str]) –

  • remedy (typing.Optional[str]) –

  • comment (typing.Optional[str]) –

code: typing.Optional[str] = None

Retired ISO 639-3 code

name: typing.Optional[str] = None

Name of the retired ISO language

change_request: typing.Optional[str] = None

Number of the ISO change request

effective: typing.Optional[str] = None

Date of acceptance of the change request

reason: typing.Optional[str] = None

Reason to retire the ISO code

change_to: list[str]

List of ISO codes replacing the retired code

remedy: typing.Optional[str] = None

What to do about the retired code

class pyglottolog.languoids.ClassificationComment(sub=None, subrefs=<factory>, family=None, familyrefs=<factory>)[source]

Commentary on the classification of the languoid

Parameters:
sub: typing.Optional[str] = None

Commentary on the internal classification of the descendants of the languoid

subrefs: list[pyglottolog.languoids.models.Reference]

References for the internal classification

family: typing.Optional[str] = None

Commentary on the classification of the languoid within its family

familyrefs: list[pyglottolog.languoids.models.Reference]

References for the family classification

merged_refs(type_)[source]

Get unique sources referenced for the classification type, with accumulated page ranges.

Parameters:

type_ (typing.Literal['sub', 'family']) –

Return type:

list[pyglottolog.languoids.models.Reference]

check(lang, keys, log)[source]

Check formatting and content.

Parameters: