The Future of Language Models: Embracing a Multilingual Approach
Learn how we are pioneering a paradigm shift in the development of multilingual language models.
TWO Team
In recent years, the development of large language models (LLMs) has accelerated dramatically, largely spearheaded by OpenAI's introduction and continuous evolution of the GPT series. As GPT5 looms on the horizon, this trend of creating increasingly larger models persists, significantly influencing global AI research strategies. Countries and companies worldwide are replicating this monolithic model approach, creating language-specific AIs like Airaivata for Hindi, Rakuten for Japanese, Jais for Arabic, and HyperClova for Korean, just to name a few.
Questioning the Sustainability of Current Trends
However, this race toward ever-larger language-specific models raises a critical question: Are we merely retracing the steps of previous models, leading to a cycle of increased data needs, greater computational power, higher energy consumption, and ultimately, resource-heavy monolithic language models?
The implications are far-reaching, concerning not only resource utilization but also the applicability of these models in a rapidly globalizing world where linguistic intermixing, speaking in multiple languages in the same conversation or even sentence, is commonplace. And the potential for LLMs to have dramatic impact for developing countries to level the playing field in education, health, social good, and many other areas is tremendous, but gated by the fact that outside of English, there are limited options.
SUTRA: A Paradigmatic Shift in Language Learning
Enter SUTRA, which challenges the prevailing norms by rethinking how language models learn and function. SUTRA differentiates itself by separating concept learning from language learning. This separation reduces the reliance on vast single-language datasets and diminishes the need for extensive training in multiple languages from scratch.
The results speak for themselves. SUTRA not only outperforms HyperClova in multilingual understanding (MMLU) scores in English but crucially, achieves comparable results in Korean MMLU scores. It achieves this feat without the need for the expansive datasets that HyperClova relies on and without specific training in the Korean language. SUTRA produces similar outcomes for MMLU improvement over single-language specific models like Airavata in Hindi, Jais in Arabic, and Rakuten-7B in Japanese as seen in the table below.
Implications and the Future of Language Processing
A shift is required towards more efficient, versatile, and globally applicable language models. SUTRA is already competent in 50+ languages, and allows for understanding and participating in the nuanced, mixed-language dialogues common in today's multilingual societies—a capability that single-language models struggle with.
Moreover, SUTRA’s approach could lead to more sustainable AI development paths, focusing on smarter, not larger, data and energy utilization strategies. This not only aligns with environmental concerns but also with the practical realities of processing increasingly complex and varied global datasets.
As language models continue to evolve, the focus must shift from creating larger, more isolated models to developing more integrated, efficient, and contextually aware systems. Technologies like SUTRA not only pave the way for more inclusive and sustainable growth in AI but also ensure that the future of technology mirrors the diversity and dynamism of human languages. As we move forward, embracing multilinguality and multicultural understanding will be key—not just for improving model performance, but for building systems that truly understand and interact with a diverse global populace.