How the lack of the right data affects the promise of big data in India

Analytics   |   
Published June 15, 2015   |   

Big data is the big buzzword these days. Big data refers to a collection of data sets or information too large and complex to be processed by standard tools. It is the art and science of combining enterprise data, social data, and machine data to derive new insights, which it otherwise would not be possible to derive. It is also about combining past data with real-time data to predict or suggest outcomes in a current or future context.

The digital footprint is progressively expanding the world over, into fragmented mediums (blogs, tweets, reviews etc.) and technologies (mobile, web, cloud/SaaS etc.).

Digital landscape in India

India’s digital landscape too may be evolving quickly but overall penetration remains low, with only 1 in 5 Indians using the Internet in July 2014.

In India, enterprises and businesses have access to a veritable wealth of information. And though some of the larger organizations have made a start in harnessing this information, most Indian companies are still learning how to collect and store big data.

Telecom providers, online travel agencies and online retail stores are some of the industries that are using big data analytics to engage customers in some way or another.

However, big data analytics is still in its infancy in India. Most companies are still learning to store the data collected. Also, there are several challenges when it comes to the collection of data sets themselves. Past and current data is required to make the application of big data analytics really useful, and there is a scarcity of this in public and private sectors in India. Some of the reasons for the lack of enough data are:

Yet to be fully computerized

Healthcare, economic, and statistical data, in both private and public sectors in India, is yet to be computerized. The main reason for this is the late adoption of IT in India. Unlike in the West, most industries in India made the transition from manual records to computerized information systems only during the last decade.

Over the years, the state and central ministries have made moves towards e-governance.  Efforts to deliver public services, and to make access to these services easier, are being made as well. This is still a work in progress; huge amounts of data across many government sectors are yet to be digitized.

Quality of data

In big data analytics, data sufficiency plays a critical role when samples are run across different dimensions. Sufficient data points to make informed analyses are required. Not only the quantity of data, the quality of data being used for crunching, too, influences the quality of insights.  If the signal-to-noise ratio is high, the accuracy of results may vary for less than optimum data samples. In a country like India, there is very little information about the individuals, due to the fact that Indians are not overly expressive, especially on public forums.

Public social media information that is available for most individuals from India lacks quality information about users themselves. Random facts and figures in individual profiles, sharing of spam content, and fake social media accounts that are created for bots are very common in India.


Social media sites are becoming increasingly vulnerable to spam attacks. Time spent by a captive audience on social media sites opens up windows of opportunities for online threats and spammers.

Again, social media spam contributes to the signal-to-noise-ratio that defines the quality of big data. This takes away from the accuracy of results.

Cultural and Social influences

In most western markets, insights generated through big data can be applied across the whole consumer base. However, given the extensive cultural and linguistic variation across India, any insight generated for a consumer based out of Chandigarh, for example, will not be directly applicable to a consumer based in Chennai. This problem is made worse by the fact that a lot of local data lives in regional publications, in different languages, and has very limited online visibility.

Unstructured data leads to mapping issues

Big data in India is not structured. Most transactional data in the healthcare and retail segments are stored purely for book-keeping purposes. They have very limited appropriate information of the kind that can help big data analytics map enterprise-generated transactional data with public information.

In the case of developed countries, user data is rich enough to provide demographic or group level markers that can be used to generate customized insights while maintaining individual privacy. Lack of these standard identifiers in Indian consumer data is one of the biggest bottlenecks while mapping various transactional and social records in India.

Handsets and internet connectivity

Even though smartphones are driving the new handset market in India, feature phones still dominate everyday usage. Most connections in India are pre-paid and fewer than 10% of users have access to 3G networks. To add to it, internet connection speeds are amongst the lowest in Asia. As a result, consumer data, especially retail enterprise data, is limited.

As more people in India make the move to smartphones, and internet connectivity improves, there will be an increase in the amount of usable data generated. As big data analytics is in its infancy in India today, huge efforts would need to be made to improve the quality of data stored by organizations and enterprises. However, key contributors to the promise of big data analytics in India are steadily gaining ground. An increase in social media users, and efforts by enterprises, both public and private for optimum collection and storage of transactional enterprise data, will contribute to better quality data sets for the better application of big data analytics.

This article originally appeared here. Republished with permission from the author. Submit your copyright complaints here.