There is a village in the Daghestanian highlands where roughly 1,200 people speak a language whose verbs alone could fill a dictionary the size of a city phone book. The village is Archib. The language is Archi. And the verb system — documented painstakingly by linguists who have spent careers inside its morphological architecture — is, by any reasonable measure, one of the most elaborate grammatical structures ever recorded in a human language.
The Architecture of Complexity Is Not Chaos
The first thing to understand about Archi's verbal morphology is that the 1.5 million possible forms are not the product of irregularity or historical accident. They emerge from a highly systematic — almost crystalline — interaction of grammatical features. Research by Marina Chumakina and Greville Corbett documents how Archi organizes agreement through a feature system that includes gender, number, and — in a genuinely unusual twist — a person feature that operates without producing phonologically distinct forms. The verb agrees with its arguments, but the agreement is encoded in ways that don't map cleanly onto the categories linguists typically import from European languages.
This is the deeper point: Archi doesn't have a bloated verb system because it evolved carelessly. It has one because it encodes relational information — who is doing what to whom, under what conditions, in what aspect — at a granularity that other languages distribute across entire sentences or leave to context. The verb is not just an action. It is a compressed social and spatial situation.
The same Chumakina and Corbett research situates Archi within the Lezgic branch of the Northeast Caucasian family, spoken across six settlements in the highlands of Daghestan. The geographic isolation is not incidental. A recent macroevolutionary study published in PNAS found that morphological complexity of this kind — what linguists call polysynthesis, or the packing of multiple grammatical categories into single word forms — is more likely to evolve in small, isolated populations. Archi is, in this sense, a living experiment in what language does when it develops for centuries inside a tight community where everyone shares context and the grammar can afford to be dense.
The Documentation Gap Is a Design Problem
Here is where the story turns uncomfortable. Linguists have known for decades that Archi's morphology is extraordinary. What they have struggled to build are tools that can actually help document it at scale — tools that speakers and fieldworkers can use without a computational linguistics PhD.
A position paper from researchers working on computational morphology for language documentation makes the problem explicit: despite over two decades of interest in applying NLP to endangered languages, the field still lacks broadly usable tools that fit real documentation workflows. The paper presents a case study of GlossLM, a state-of-the-art multilingual model for generating Interlinear Glossed Text — the annotation format linguists use to break down and label each morpheme in an utterance. In a small user study with three documentary linguists, GlossLM performed well on standard metrics but failed to meet core usability needs in actual fieldwork contexts. The model couldn't be constrained to a specific language's morphological rules, labels weren't standardized across projects, and personalization was essentially impossible.
For a language like Archi, where the verb paradigm is so large that even experienced linguists require computational support to generate and verify forms, this gap is not a minor inconvenience. It is a structural obstacle to documentation. The researchers argue that the field needs User-Centered Design — sustained, iterative engagement with the linguists and communities who actually use these tools — rather than systems optimized for benchmark performance that then sit unused in the field.
What the Verb Knows That We Don't
A typological study on morpheme ordering published in Morphology offers a useful frame for why Archi's system is so resistant to computational modeling. Across the world's languages, morpheme order inside the verb is shaped by competing principles — how relevant a category is to the verbal core, whether grammar or lexicon comes first, where pronominal affixes prefer to anchor. Archi's system sits at an extreme of this typological space, stacking categories in ways that reflect deep cultural priorities about what information must be grammatically obligatory.
That's the part that haunts me. When a language requires its speakers to encode, every time they use a verb, the precise social and spatial configuration of an action — not as optional context but as grammatical necessity — it is training attention. It is building a cognitive habit of noticing. When Archi goes, that habit goes with it. Not just the words. The orientation.
What to Watch
The computational morphology community's next test will be whether User-Centered Design principles can be operationalized in time to matter for languages like Archi. The researchers behind the GlossLM critique are calling for structural changes in how interdisciplinary tools are built — but structural changes move slowly, and Archi's speaker community is not getting larger. Watch for whether the next generation of IGT tools builds in language-specific morphological constraints from the start, or whether the field repeats the cycle of building for benchmarks and discovering, too late, that the tools don't fit the work.
