Demystifying Big Data: Your guide to data sources and privacy

Published November 9, 2021   |   
Team Crayon Data

From zero-party to third-party data and everything in between – find out how data sources can influence privacy issues

Homo sapiens aren’t done evolving. They continue to show changes in physiology and behavior. One key difference from our early ancestors is our increased need for privacy. They lived in large groups, sharing food, shelter, and responsibilities. Living their entire lives as a collective meant they had little privacy.

Privacy was also dangerous. The person could end up as a jaguar’s mid-morning snack if they wandered too far from the herd! But as human communities and societies expanded beyond the barriers of geography and culture.

Still, it was aristocrats and the wealthy who could afford privacy. The invasion of privacy was a political move or a personal vendetta. They built thicker walls. And technology kept up. Infrared, Bluetooth, and Wi-Fi could permeate these walls unnoticed, with ease. In many countries, this is why privacy is now considered an individual, social right by law.

As society extended into the virtual world, we became hyper-social. Ironically, this led to us needing more privacy. We needed to draw a line between our public and private lives.


Social media gave birth to two important behavioral changes:

  1. Individuals volunteer to disperse their personal data across the metaverse [pun intended].
  2. Invasion of privacy is now a commercial, rather than personal, venture.

Breaking down data sources

The average internet consumer leaves a trove of data trails. Every human created at least 1.7 MB of data per second in 2020. And enterprises use this information to drive marketing and communication efforts. Unwittingly, the massive data collected creates a ‘data organization’ problem. Therefore, data is classified based on several factors such as source, utility, etc. Data sources can be classified as follows:


Zero-party data

The data which the user provides or declares voluntarily to an enterprise or an institution to use their services. This may include:

  1. Personally identifiable information (PII): name, email address, gender, etc.
  2. Preferential Information: product, service, or communication preferences.

Zero-party data is usually collected through gated content, surveys, questionnaires, subscriptions, etc. Every organization’s sales team uses this data to engage and build stronger relationships with prospects and turn them into customers. In fact, since customers willingly handed over the data, they’d be expecting communication from the organization.


First-party data

Tracking user interactions and activities gives this data. This could be from the company’s physical store, website, or app. Marketers then use this data to analyze consumer behavior patterns. This supplements their audience segmentation and marketing efforts beyond demographic information provided in sign-up forms (i.e., zero-party data). Large retail and grocery stores, such as Target, track customer habits to predict purchase behaviors or buying patterns.

Second-party data

The data which is sourced/bought from another company. Essentially, one company’s first-party data becomes another’s second-party data upon purchase. Companies with a smaller customer pool choose to buy relevant data to scale up and reach a new audience. However, second-party data can be hit-or-miss. Users will wonder how these enterprises accessed their data. This creates mistrust.


Third-party data

The data that is acquired from a data aggregator. Instead of collecting data directly, these aggregators compile massive volumes of data from multiple sources into a unified dataset. The gray area here is that the sources of data or methods of extraction or the data itself may not be reliable.


Breaking down data sources


There’s a fine line between interpretation and intrusion.

Target’s experimentation gone wrong is a prime example. In the early 2000s Target had thousands of stores, collecting terabytes of customer data from the millions who shopped there. They used this to play around with pattern analysis to figure out human behavior, and predictive analytics to recommend products.

In one such experiment, the company was targeting pregnant shoppers with flyer ads. One angry father demanded to know why such offers were being sent to his teenage daughter. A few days later, it came to light that the daughter was, in fact, pregnant.

The major flaw here was that the store was oblivious to context. Not to mention, such behavior brings up questions of ethics when it comes to sensitive medical information. Even if they inferred it from shopping patterns. The underage customer’s cart behavior was unusual for that demographic and the AI should have flagged it as such.

Many customers would be uncomfortable with companies learning about them in such detail. It’s a sure way to break their trust. This prompts them to look for a “safe” business instead.

With’s GDPR compliance, enterprises can avoid such situations. They can personalize recommendations without using PII, simply by understanding:

  • User preferences through context and behavior, and
  • Opportunities within the portfolio

Learn how uses Taste Studio to add the missing ingredient to your customer data. Share your details below, and one of our personalization experts will get in touch with more details.

More from the Explore series here.