Apache Hadoop has emerged as the leading technology for helping companies mine big data. And while every organization is different, their big data demands are often similar. Hadoop enables companies to collect and process massive amounts of data that was once thought of as too expensive or unwieldy to store and analyze. They have learned that these data types are valuable as sources of insight and business advantage. Let’s take a look at how Hadoop is used to mine value from these new types of data.
* Clickstream Data. Analyzing the clickstream (a succession of mouse clicks) can reveal how users research products and, more importantly, how they complete their online purchases. Online marketers can then optimize product web pages and promotional content to improve the likelihood that a visitor will learn more about certain products and then click the buy button. There are tools that help web teams analyze clickstreams, but Hadoop adds three key benefits:
” Hadoop can join clickstream data with other data sources like CRM data on customer demographics, sales data from brick-and-mortar stores, or information on advertising campaigns. This additional data provides a more comprehensive view of customer patterns than an isolated analysis of clickstream alone.
” Hadoop scales easily so you can store years of data without much incremental cost, allowing you to perform temporal or year-over-year analysis on clickstream data. You can save years of data on commodity machines and find deeper patterns that your competitors may miss.
” Hadoop makes website analysis easier. Without Hadoop, clickstream data is typically very difficult to process and structure. With Hadoop, even a beginning web business analyst can organize clickstream data by user session and then refine it and feed it to analytics or visualization tools.
* Sentiment Data. Sentiment data is unstructured data on opinions, emotions and attitudes contained in social media posts, blogs, online product reviews and customer support interactions. Enterprises use sentiment analysis to understand how the public feels about something and track how those opinions change over time.
Sentiment analysis quantifies the qualitative views expressed in social media. Researchers need big data to do this reliably. With Hadoop, social media posts can be organized and scored for sentiment with advanced machine learning methodologies. Here’s how it works: Words and phrases are assigned a polarity score of positive, neutral or negative. By scoring and aggregating millions of interactions, analysts can judge candid sentiment at scale, in real time.
After scoring sentiment, it’s important to join the social data with other sources of data. Hadoop makes that easy and reproducible. CRM, ERP and clickstream data can be used to attribute what was previously anonymous or semi-anonymous sentiment to a particular customer or segment of customers. The results can be visualized with business intelligence tools like Microsoft Excel, Platfora, Splunk or Tableau.