For years, big data has been one of the hottest buzzwords across all industries.
Big data is the term used to describe the process of analyzing complex data sets to discover information that can help make better decisions or find certain patterns that were previously unknown.
At Crayon, we have a framework that cleans, transforms, and processes big data using technologies like Amazon Server’s, HDFS, Hive, Pig, and Spark. This framework is what we call the Data Factory.
The Data Factory
Crawling on the web pages
Stripping the data down
Packing them in Avro or Json fabric
Taking them to Hadoop’s town
Diving in Amazon’s server waters
Each has a different name
Boiling them to required temperatures
Setting up the factory to start the game
Now starts the journey of data
To take a shape; to get some life
They first get addresses in the config
Then get eaten by pig
Pig then grunts and says aloud
“Data are now clean; have some proud”
Then data get some new clothes
For the tough journey they strive
They get the home of attractive rows and columns
We call it in general; the hive
Adding to it some more data
From other sources; manually curated
Cleaning data once more and making them shine
Getting data into shape – long awaited!
Then a lot of Queries are asked
We enrich the data; adorn them with N-grams
All our data then get abode in one place
Thus we drive the iterations, creating a new database
Finally, we bid adieu to data, and present them a new gown
We walk with them to the corner of hive’s town
Thus, we welcome, greet and solve data’s each mystery
And they call us with love, “The data factory”