Editorial illustration for "Archi's Verb Has 1.5 Million Forms. The Language Has 1,000 Speakers. That Math Is the Crisis."

Archi's Verb Has 1.5 Million Forms. The Language Has 1,000 Speakers. That Math Is the Crisis.


There is a village in southern Dagestan, perched at roughly 2,000 meters in the Caucasus mountains, where approximately 1,000 people speak a language that contains more verbal complexity than most linguists thought a human language could hold. The village is Archib. The language is Archi. And its verb system — documented painstakingly by the Surrey Morphology Group over two decades — can generate up to 1,502,839 distinct inflected forms from a single verb stem.

That number is not a typo. It is a structural fact about how Archi organizes reality. And it is at risk of disappearing inside a generation.


The Verb Is Not Complicated — It Is Complete

The instinct, when confronted with a figure like 1.5 million verb forms, is to assume the system is baroque — a linguistic accident, an evolutionary overgrowth. That instinct is wrong.

Archi's verbal morphology is not complex in the way a bureaucratic form is complex. It is complete in the way a coordinate system is complete: every dimension of an event gets encoded, and the grammar refuses to let you be vague about any of them. Each verb stem interacts with up to 12,405 basic tense-aspect-mood categories — including gerunds, participles, and masdars — while simultaneously agreeing with the absolutive argument in gender and number across a four-gender system. Stack in additional derivations like commentatives and case-inflected masdars, and the combinatorics become staggering.

What this means in practice: an Archi speaker does not simply say that something happened. They encode when it happened relative to other events, what kind of happening it was, who was involved and in what grammatical role, and what gender class that participant belongs to — all within the verb itself. The verb is not a predicate that points at an event. It is a compressed ethnographic record of the event's structure.

Archi has around 170 simple verbs and over 1,000 complex predicates, formed by combining lexical elements with light verbs meaning roughly "do" or "become." The vocabulary of action is relatively small. The grammar of action is almost incomprehensibly rich. That inversion — few roots, infinite precision — tells you something about what Archi speakers found worth encoding.


A Four-Gender System That Reaches Into Everything

The verb complexity does not exist in isolation. It is downstream of a broader grammatical architecture that treats gender as a pervasive organizing principle. Archi's four-gender system influences agreement across verbs, adjectives, and adverbs — not just nouns. When a verb agrees with its absolutive argument, it is tracking that argument's gender class, which in turn reflects a set of categorical distinctions the community has encoded into the structure of the language itself.

The phonological system is equally uncompromising. Archi has over 100 phonemes, including 74 to 82 consonants — uvulars, pharyngeals, ejectives, laterals — and a six-vowel system that generates 26 distinct vowel sounds through length and pharyngealization contrasts. This is a language that demands precision at every level, from the shape of a consonant to the structure of a verb paradigm. It is not a language that rounds off.

What gets lost when a language like this disappears is not just vocabulary. It is a set of perceptual and categorical commitments — the specific distinctions a community decided, over centuries, were worth making mandatory.


The Demographic Tension the Grammar Cannot Solve

As of 2022, Archi has approximately 1,000 speakers, classified by UNESCO as definitely endangered. The speaker population skews toward two poles: monolingual children under ten, and bilingual adults proficient in Russian. The middle — working-age adults who might transmit the language in daily economic and social life — is the vulnerable layer.

This is the structural trap of many minority languages in the post-Soviet Caucasus. Russian functions as the language of education, employment, and upward mobility. Archi functions as the language of the village, the household, the community's internal life. When young people leave Archib for Makhachkala or Moscow, they do not necessarily stop being Archi. But they stop needing 1.5 million verb forms.

The Surrey Morphology Group's documentation work — a trilingual dictionary, a grammar, a digital corpus, developed in collaboration with native speakers and Russian linguists since the early 2000s — represents the most significant preservation effort to date. That work is real and valuable. But documentation is not transmission. A language preserved in a corpus is an archive, not a living system.

The question worth sitting with is this: what does it mean that the most morphologically complex verb system ever documented belongs to a community of 1,000 people in a highland village, and that the tools we have built to help are still, by most accounts, not meeting the actual needs of documentary linguists in the field? The grammar encodes everything. The infrastructure to save it is still catching up.