Ask a leading AI chatbot a question in fluent isiXhosa, Hausa or Luganda and the results fall off a cliff fast. The systems that feel almost magical in English are, for most of Africa’s languages, somewhere between clumsy and useless. A growing community of African researchers and startups is trying to change that, and the effort is one of the more consequential technical projects on the continent. This explainer covers who is building, why it is hard, and what is at stake.
The scale of the gap
Africa is home to more than 2,000 living languages. Mainstream large language models handle only a tiny fraction well. One 2025 survey of the field found that just a handful of African languages, among them Amharic, Swahili and Afrikaans, are reliably supported, while the vast majority of African languages remain effectively unsupported. The reason these systems perform so well in English and so poorly elsewhere is simple: they learn from data, and the internet is overwhelmingly English.
Who is building
The response has been a homegrown ecosystem rather than a single hero project. The Masakhane initiative, a pan-African research collective, has spent years building open-source datasets, benchmarks and translation tools across dozens of languages. The South African lab Lelapa AI released InkubaLM, a compact multilingual model covering Swahili, Yoruba, isiXhosa, Hausa and isiZulu, spoken by hundreds of millions of people, alongside a commercial language tool, VulaVula. Nairobi’s Jacaranda Health built UlizaLlama to support maternal-health guidance in several African languages. And researchers at Uganda’s Sunbird AI have argued for a regionally focused approach, building deep coverage country by country rather than chasing global breadth.
The strategy these groups share is telling. Rather than trying to out-build global giants, they favour smaller, efficient, open models tuned for specific languages and tasks, and they pool data and benchmarks so each project strengthens the others.
Why it is so hard
The core obstacle is data. Most African languages are what researchers call low-resource: there simply is not enough written text to train a model the way English models are trained on much of the internet. Many languages are primarily spoken, their digital footprints thin, their spelling not always standardised.
That scarcity forces hard, careful work, gathering and cleaning datasets, sometimes recording speech from scratch, which is slow and expensive. It also raises ethical questions that the field takes seriously: who owns language data, whether the communities that provide it consent and are compensated, and how to avoid a new extraction in which African languages are mined to train models that those communities never benefit from.
What is at stake
The payoff is not abstract. Language is the interface to almost everything digital, and AI that understands local languages can widen access dramatically. A farmer getting agricultural guidance by voice in their own language, a mother receiving health information she can actually read, a citizen accessing government services without first mastering English or French, call-centre and education tools that meet people where they are, these are the concrete applications driving the work.
There is also a sovereignty dimension. If the AI systems that increasingly mediate information, commerce and services understand only a few global languages, vast populations are pushed to assimilate to a foreign tongue just to participate, as Lelapa’s founders have pointedly argued. Building local-language AI is partly about making sure African voices are represented in the technology, not just served by it.
Why it matters
This is the kind of work that rarely makes headline funding rounds but quietly determines whether the AI era includes most Africans or sidelines them. It is unglamorous, data-heavy and slow, and it depends on collaboration more than on any single breakthrough. But if the continent’s languages are going to live fully in the digital world rather than fade from it, the labs doing this patient, foundational building are the ones laying the groundwork.







