AI and Privacy: How Indian Companies Use Machine Learning on Your Data
Most people assume AI in India is still experimental, mostly chatbots and Netflix suggestions. They're wrong. Indian companies are already using machine learning to decide your loan eligibility, set the prices you see online, and scan your face in public. Here's what that means for your data.

I keep hearing people say that artificial intelligence in India is "still in its early stages." That it's mostly being used for chatbots and product recommendations and maybe some backend automation that nobody notices. I disagree. Not because the technology itself is more advanced here than elsewhere — it's probably not — but because the deployment of AI on Indian users' data has already gone far beyond what most people realize. And the privacy implications are playing out right now, not in some hypothetical future.
ML-Driven Credit Scoring and Alternative Data
A friend of mine applied for a personal loan through a fintech app last September. He got rejected. The rejection message said something generic about not meeting eligibility criteria. His CIBIL score was 740, which is solid. He had a stable salary, no outstanding debts. When he called customer support, they couldn't explain the specific reason. What he didn't know — what the app's own support team probably didn't know — was that a machine learning model had analyzed somewhere between 50 and 200 data points about him to make that decision, and the model's reasoning wasn't interpretable even to the people who built it. That's where we are.
The credit scoring problem is bigger than most people think. Traditional credit scoring in India runs through CIBIL (now TransUnion CIBIL), which looks at your loan repayment history, credit card usage, outstanding balances, and similar financial data. It's imperfect but at least transparent — you can check your CIBIL report and understand roughly why your score is what it is. The new generation of lenders doesn't stop there. Companies like CreditVidya, ZestMoney (before its troubles), and dozens of NBFC-backed lending apps use what the industry calls "alternative data" to build ML-driven credit models. Alternative data means your phone's metadata. Which apps you have installed. How often you recharge your prepaid plan. Your UPI transaction patterns. In some cases, your social media activity.
One lending company's pitch deck, leaked and circulated among fintech journalists in mid-2025, bragged about using "over 10,000 data points per applicant" to assess creditworthiness. Ten thousand. That includes things like the time of day you typically use your phone, whether your contacts list contains names associated with certain professions, and how frequently you update your apps. The pitch deck didn't call it surveillance. It called it "holistic borrower assessment." The line between the two is thinner than the deck acknowledged.
CRED deserves a specific mention because of the sheer scale of data it collects from Indian users. The app started as a credit card payment platform but has expanded into lending, commerce, and financial services. It encourages users to link all their financial accounts, share credit card statements, and enable transaction monitoring. The company has stated that it uses AI/ML models to provide personalized offers and services. What "personalized" means in practice is that CRED's models are building a detailed financial profile of each user — income patterns, spending habits, debt levels, consumption preferences — and using that profile for decisions that affect what products the user sees and at what terms. Whether users understand the full scope of what they're consenting to when they link their accounts is... debatable.
Dynamic Pricing and Personalized Discrimination
Dynamic pricing is the one that should make you angry. Indian e-commerce platforms use machine learning to adjust the prices you see based on your profile. This isn't speculation. It's been documented by researchers and confirmed indirectly by the platforms themselves through euphemisms like "personalized pricing" and "demand-responsive algorithms." The factors that influence what price Flipkart or Amazon shows you for a given product may include your browsing history (did you look at this product three times already, suggesting high intent?), your device (iPhone users sometimes see higher prices than Android users), your location (metro vs. tier-2 city), and the time of day. The gap isn't always large — maybe 50 to 200 rupees on a 5,000-rupee product — but it's real, and it's driven by ML models analyzing your behavior in ways you can't see.
A consumer rights group in Bangalore ran an experiment in late 2025 where ten volunteers searched for the same product on the same platform at the same time, using ten different phones with different browsing histories and account profiles. The price varied by up to 12% across the group. Twelve percent. The volunteer who'd been browsing that product category for days saw the highest price. The one using a fresh account with no browsing history saw the lowest. The ML model had learned who was most likely to pay more — and charged accordingly.
Recommendation Engines and Algorithmic Bias
Recommendation engines seem harmless but aren't neutral. When Swiggy shows you restaurants, when Spotify India queues your next song, when Amazon suggests products — those are all ML models making decisions about what you see. The privacy angle isn't just that these models need your data to function (they do). It's that the models create feedback loops. You watch certain types of content, the algorithm shows you more of that type, you watch more of it, and the model becomes increasingly confident about what kind of person you are. Over time, the algorithm's model of you becomes a self-fulfilling prophecy. Your "preferences" are partly genuine and partly manufactured by the recommendation system itself.
In India, this has cultural dimensions that are uncomfortable to discuss. If a job platform's recommendation engine learns that certain employers prefer candidates from specific educational institutions — which in Indian context often correlates with caste and economic background — the algorithm will start preferentially showing those candidates to those employers. Nobody programmed caste bias into the model. The model learned it from the data, which reflects the biases of the humans who generated it. The algorithm didn't create the discrimination. It automated it. And because it operates at scale, it amplified it too.
Facial Recognition and Public Surveillance
Facial recognition and surveillance are already deployed. This isn't future tense. The Delhi Police began using automated facial recognition technology during the 2020 protests and have expanded its use since then. The Hyderabad Police operate one of the largest facial recognition systems in India, with cameras across the city feeding into a central database. The Telangana state government's TSCOP app uses facial recognition for identity verification. Several airports, including Bangalore's Kempegowda International, have implemented DigiYatra — a facial recognition-based boarding system that ties your face to your Aadhaar data.
The training data for these systems is where the privacy concern gets acute. Building a facial recognition model that works well on Indian faces requires large datasets of Indian faces. Where do those datasets come from? In some cases, from publicly available photos scraped from social media. In other cases, from government ID databases. In at least one documented case, from photos taken at a public event without informed consent. The people whose faces trained these models didn't agree to be part of a surveillance system. Many of them don't even know their photos were used.
There's also the accuracy problem. Facial recognition systems have documented biases — they perform worse on darker skin tones and on women. In the Indian context, where skin tones vary enormously, a system trained predominantly on lighter-skinned faces from North India may have higher error rates when deployed in South India or among Adivasi populations. A false match in a surveillance context doesn't just mean you see the wrong movie recommendation. It could mean you're flagged as a suspect, detained, or worse. The error rate might be low — maybe 1-2% — but when you're running that system against millions of faces per day, 1% means tens of thousands of false matches.
Ad Targeting and Behavioral Profiling
Ad targeting is ML's most profitable application, and India is its fastest-growing market. Google and Meta (Facebook/Instagram) between them control roughly 70% of India's digital advertising market. Both companies run enormously sophisticated ML models that analyze your behavior to predict what ads you're most likely to respond to. The data inputs include your search queries, your social media posts and interactions, your location history, your purchase behavior (if you've ever clicked a shopping ad), your email content (in Google's case, though they say they stopped scanning Gmail for ad targeting), and your browsing activity across millions of websites via tracking pixels and cookies.
Indian companies are building their own ad-targeting ML systems too. Jio's advertising platform uses data from its 400+ million telecom subscribers — call patterns, data usage, app activity — to target ads. Flipkart Ads uses purchase and browse behavior from its e-commerce platform. Hotstar (now JioStar) uses viewing habits. The amount of behavioral data being fed into these models is staggering, and the consent mechanisms are, to put it charitably, weak. Most users "consented" by accepting a terms of service they didn't read when they signed up for a phone plan or downloaded an app.
Training Data Persistence and Consent Problems
Training data persistence is a problem nobody's solved. Here's something that should bother you. When you delete your account from a platform, the company is supposed to delete your data. Under the DPDPA, they're legally required to, subject to certain exceptions. But what about the ML models that were trained on your data? A model that learned patterns from your behavior doesn't store your data in a way that's easily separable. Your individual contribution to the model is diffused across millions of mathematical parameters. You can't just pull your data out of a trained model the way you can delete a row from a database. Some researchers call this the "right to be forgotten" problem for machine learning. The DPDPA doesn't address it. Neither does the GDPR, really. Nobody's figured out a practical solution, and most companies are quietly pretending the problem doesn't exist.
Consent under Indian law is supposed to mean something. The DPDPA requires that consent for data processing be "free, specific, informed, unconditional, and unambiguous." For AI/ML applications, this raises serious questions. When you consent to a fintech app processing your data for "providing services and improving user experience," does that consent cover training a credit scoring model on your transaction patterns? Does it cover sharing insights derived from your data with a partner insurance company? Does it cover using your anonymized data as part of a training dataset that might be sold or licensed to third parties? The answer, under a strict reading of the DPDPA, is probably no — each of those uses requires separate, specific consent. In practice, companies are bundling these uses under broad consent language and hoping regulators don't crack down.
The DPDPA does include provisions on automated decision-making. Section 11 gives data principals the right to information about their data processing, which should include information about automated decisions. But the Act doesn't directly grant a right to contest automated decisions or to demand human review of an AI-made determination. The EU's GDPR, by contrast, includes a right not to be subject to decisions based solely on automated processing that produce legal or significant effects. India's law is softer on this point, and until the Data Protection Board starts issuing guidance or rulings on AI-specific complaints, the practical rights of Indian citizens facing algorithmic decisions remain uncertain.
Algorithmic bias in India has specific local dimensions. Beyond caste (which I mentioned earlier), there's gender bias in lending models — women in India have historically had lower access to formal credit, which means models trained on historical data will assess women as higher risk, perpetuating the disparity. There's regional bias — someone from a tier-3 city may be scored differently than someone from Bangalore based on patterns in the training data, even if their individual financial behavior is identical. There's language bias — NLP models trained primarily on English and Hindi content perform worse on content in Tamil, Telugu, Kannada, or Odia, which affects everything from content moderation on social media to voice assistants to customer service chatbots.
The Indian government's proposed AI governance framework, published as a discussion paper by MeitY in 2024, acknowledged these issues but stopped short of mandatory regulation. The framework recommends voluntary adoption of "responsible AI" principles by companies, which is a bit like recommending that foxes voluntarily adopt responsible henhouse management principles. The upcoming Digital India Act is expected to include more specific AI provisions, but as of March 2026, it hasn't been introduced in Parliament.
What can you actually do about any of this? Honestly, less than I'd like. You can minimize the data you share — don't grant unnecessary permissions, don't link accounts across services, don't fill in optional profile fields. Every data point you withhold is a data point that can't be fed into a model. You can opt out of personalization features where platforms offer the option. Google allows you to disable ad personalization. Some e-commerce apps let you turn off recommendations. These opt-outs are partial — the company still has your data, they just aren't using it for that specific purpose — but they're better than nothing.
You can exercise your DPDPA rights. If an automated decision affects you — a loan rejection, an insurance premium increase, a service denial — you can request information about how the decision was made. The company may not give you a satisfying answer (they might not have one), but the request itself creates a record and puts the company on notice that someone is paying attention. If enough people make these requests, it creates regulatory pressure.
You can support organizations pushing for stronger AI regulation in India. The Internet Freedom Foundation has been vocal about algorithmic accountability. The IT for Change think tank publishes research on AI governance. These aren't just advocacy organizations — they're the groups that will eventually shape the rules that constrain how companies use your data for AI.
Indian AI startups face a particular accountability gap. India has over 3,000 AI startups, according to NASSCOM's 2025 count. Many of them are too small to have a dedicated DPO, too new to have established data governance practices, and too focused on growth to prioritize privacy. A startup building a hiring algorithm doesn't necessarily think about whether its training data reflects caste or gender bias. A healthtech startup using ML to predict disease risk from lifestyle data doesn't necessarily think about what happens when that risk prediction is shared with an insurer. The DPDPA applies to these companies just as it does to Infosys or Reliance, but the practical reality is that a five-person startup in a co-working space in HSR Layout isn't going to conduct a Data Protection Impact Assessment before training a new model. Enforcement that's scaled to the size of the offender — rather than one-size-fits-all penalties — would be more effective, but the Data Protection Board hasn't signaled how it intends to approach the startup ecosystem.
There's also the question of government use of AI, which I've barely touched on because it deserves its own treatment. But briefly: when the government deploys AI for tax enforcement, welfare distribution, border security, or law enforcement, the privacy stakes are different and arguably higher than when a private company does it. You can choose not to use Flipkart. You can't choose not to interact with the tax department. Government AI systems that make decisions about citizens' rights and entitlements need higher standards of transparency and accountability than commercial systems. Whether India's AI governance framework will recognize that distinction remains to be seen.
Where all of this ends up — whether India develops a meaningful AI governance framework or whether the current Wild West approach persists — is genuinely unclear. The technology is advancing faster than the law, faster than public understanding, and possibly faster than the companies deploying it fully appreciate. Five years from now, the ML models making decisions about Indian citizens will be more powerful, more pervasive, and more opaque than they are today. Whether they'll also be more accountable is the open question. I'd like to say I'm optimistic. I'm probably closer to uncertain.
Written by
Rajesh KumarFounder & Chief Editor
Rajesh Kumar is a cybersecurity expert with over 12 years of experience in digital privacy and data protection. He has worked with CERT-In and various Indian enterprises to strengthen their data security practices. He founded PrivacyTechIndia to make privacy awareness accessible to every Indian.
Related Posts
Children's Online Privacy: What DPDPA Says About Minors' Data
A ten-year-old in Pune opens a gaming app and taps 'I agree' without reading a word. India's DPDPA 2023 says that shouldn't count as consent. But does the law actually protect kids, or does it just look good on paper?
Monthly Privacy Roundup: Key Updates from February 2026
February 2026 was a busy month for privacy in India — a fintech breach exposed 2.3 million records, the Data Protection Board got its full bench, and UPI fraud numbers got worse. Here's what happened.
Understanding Biometric Data Protection in India
Your fingerprints can't be reset like a password. India holds biometric data on 1.4 billion people through Aadhaar alone, and the legal protections around that data remain thinner than most citizens realize.


