Updated March 5, 2026
0:00 Welcome to Colaberry AI podcast brought to you by Colaberry AI Research Labs and the Carl Foundation. Today, we're gonna take a deep dive into something pretty cool, Project Indus. It's this initiative by Tech Mahindra right here in India, and they're trying to build their very own large language model. And we've got a lot of info straight from Tech Mahindra themselves, so we can really get into the nitty gritty. Basically, what we wanna do today is figure out, you know, what's the big idea behind it, what can this thing actually do, and what kind of impact could it have. 0:28 Okay. So building India's own, like, big language brain, that sounds super ambitious. What's the main reason behind Project Indus? Why are they doing this? You know, it's not just about the technology itself. 0:38 Right? What's really fascinating about Project Indus is how deeply it's connected to India's, like, incredibly rich language history. Mhmm. The driving force here is to really celebrate and empower all those languages that come from the Indus Valley civilization. Mhmm. 0:55 So you've got this really cool mix building a super advanced language model, but one that's also truly Indian at its core. And they want it to be just as good as any other top language model out there. So it's not just about, like, making a new piece of tech just because they can. There's a bigger purpose. Exactly. 1:12 It's about if you zoom out and look at the big picture, it's about creating this, like, fundamental language model specifically for Indian languages. Think of it as like a building block. Right? Something that makes communication easier for the whole country. And a really important part of this is that it could help preserve those languages, even dialects that might be fading away. 1:31 Wow. That's powerful. Using AI to, like, keep languages alive. How are they planning on doing all this? Where are they starting? 1:38 So they're taking a step by step approach. The first focus is on Hindi, along with a whole bunch of its dialects, like, over 37 of them. The idea is to build a really strong base with those. And then from there, they'll branch out and include more languages from all across India. Okay. 1:52 So Hindi plus 37 dialects, that's a lot. And we know the why now, like, supporting their heritage and giving power back to these languages. But what about the what? What can this first language model actually do? Right. 2:06 So the way they've designed it, it's got 539,000,000 parameters. Mhmm. And they've trained it on a huge amount of text data. We're talking at 10,000,000,000 tokens. Mhmm. 2:14 Just pure Hindi and those dialects. And think of tokens like, the basic pieces of language it learns from, you know, words, parts of words. And those parameters, well, they're like the settings inside that help it learn and remember things. So more parameters usually and it can handle more complex stuff. But here's something else that's super cool. 2:31 They want to make this tech available to everyone. They were planning on releasing it as open source back in February 2024. Open source from the get go. That's a big deal. What's the thinking behind that? 2:42 It shows how much they wanna see this technology get used widely, and they wanna encourage other people to build on it. By going open source, they're basically saying, hey, anyone, tech companies, researchers, even individual developers, you can take what we made, make it your own, and help make it even better. That's smart. So this first model, it's focused on, like, dealing with text. But what's next? 3:04 How do they plan to develop this even further? Well, they got these phases mapped out. Phase one was all about creating what they call a decoder only model. So they're good at predicting what word comes next in a sense, which is perfect for, you know, creating text. Then phase two, they're gonna add something called reinforcement learning from human feedback or RLHF. 3:25 Basically, humans look at the model's responses, give feedback, and it makes the model's conversation sound way more natural. And then phase three, that's all about giving it voice capabilities. Got it. So first, just text, then more interactive chats, and eventually, it'll even have a voice. Sounds like a pretty solid plan. 3:43 But now for the really exciting part, what kind of impact could this have in the real world? Tech Mahindra has talked about a bunch of potential uses for this. Yeah. Absolutely. When you start thinking about how this technology could be used, the possibilities are huge. 3:58 And they could really make a difference in India. Like, take their farmers network idea. They imagine this model working almost like a digital buddy for over a 140,000,000 farmers, giving them info on stuff like loans, the right pesticides, all in their local language or dialect. That could be a game changer for those farmers, getting the info they need. What else are they looking at? 4:18 Another really cool application is something called Jamstack Connect for India. Jamstack is this modern way to build websites that makes them faster and more secure. So they wanna use their language model to build that kind of web infrastructure across the whole country. Okay. So better websites for everyone. 4:34 What else? They also mentioned education enablement. Right. The model could help kids learn better, you know, by explaining school subjects in their own language and giving them learning materials that make sense to them. Right. 4:47 Learning in your own language can make a huge difference. It takes away a big barrier for a lot of students. Exactly. And they're thinking about specific industries too. Industry foundation models, they call them. 4:58 It's basically specialized versions of the model designed for, like, media, telecom, health care. Imagine a health care chatbot that can understand and answer a patient's questions in their local dialect. That could really improve access to health care information, especially in more remote areas. That's a great example. And didn't they say something about preserving dialect? 5:20 Oh, yeah. Dialect preservation is a big part of their plan. They wanna help digitize and preserve dialects that people speak, but that haven't been written down or, like, digitally recorded. It's crucial for protecting India's linguistic diversity. I see. 5:32 So it's not just about technology. It's about culture too. Definitely. And they also talked about rural finance. Mhmm. 5:38 Little kiosks in rural areas could use the model to understand the local dialects, help people with money problems, and even make it cheaper for banks to work in those areas. So more access to financial services even in remote communities. It seems like they're really thinking about how this tech can impact people at a very local level. Exactly. They even talked about putting these systems that understand local dialects into different machines and equipment, making them easier for people to use. 6:04 And in public health care, they think it could help make sure everyone has equal and clear access to important health info in the language they understand best. Wow. So many potential uses touching so many parts of life in India. It's It's really interesting how they're focusing on India's diverse languages and how this could help people connect on a deeper level. If we think about the bigger picture here It's huge. 6:26 Building a language model that's so connected to Indian culture and languages. Instead of just using those general global models, it could really lead to new and innovative solutions specifically designed for India. And it could boost the economy across different sectors just by making communication and information sharing way more effective in local languages. So it's about creating solutions that really fit the needs and unique qualities of India instead of trying to use a one size fits all approach. Exactly. 6:54 And you can see how committed Tech Mahindra is. They've got a team of 15 people just for project Indus, and they've gathered a ton of data, 1.2 terabytes, just for Hindi and those dialects. We even saw some news articles talking about how they wanna build a 7,000,000,000 parameter language model to really push the boundaries of how computers understand and process language. It's a long term vision for how this tech can be used in the Indian context. Okay. 7:19 So for our listeners, for anyone in India listening to this, what does all this actually mean for you? It might sound like a bunch of technical stuff, but how could Project INDUS actually impact your life even if you're not a tech expert? Well, the big potential is in making technology, information, and all those everyday services much easier for you to access in your own language or dialect. Imagine talking to government services using educational resources, even the apps on your phone, and it all feels natural because it actually understands how you communicate. It's a big step towards making India a more digitally inclusive place where language isn't a barrier anymore. 7:56 That could really open things up for a lot of people. Absolutely. But it's important to remember that it's still under development, and releasing it as open source software is a big step, one that could lead to even more innovation and people using it. The real impact will become clearer once developers and companies start using this tech to build new apps and services. So it's kinda like the start of a new chapter. 8:17 You could say that. Yeah. And it brings up a really interesting question for everyone to think about. How could these language models that are so deeply rooted in local languages change how you use tech and access information in the future? That's a great question to consider. 8:30 Okay. Let's quickly sum up what we talked about today. Project Indus is this major effort by Tech Mahindra to create a language model made for all the different languages spoken in India, and they're really focused on making it culturally relevant. Making it open source is a big part of their plan and the potential uses for it. Wow. 8:47 They're everywhere, helping farmers, improving education, preserving dialects, making health care better. Thanks for joining us for this deep dive into project Indus. It was my pleasure. Thank you for listening in. Subscribe and follow Colaberry on social media. 8:59 The links are in the description. And check out our website, www.colaberry.ai for more insights like this.