Over the years it has become commonly accepted that the concept of Big Data could be fully explained based on its three major components or three V’s: Velocity, Variety and Volume. After all, without the volume, Big Data would not be so Big would it?
However, while it’s undeniable that the 3 V’s provide a deep insight of the nature of the data, additional components such as Veracity, Variability and Value are needed to define its purpose. Volume being the most identifiable component of Big data, how do the 5 other vectors relate to the ever increasing size of the data? [1]
What is Volume?
Over 100 terabytes of data are uploaded to Facebook daily. Akamai analyses 75 million events a day to target online ads. over 90% of today’s data universe has been created in the last 2 years. On the global spectrum, 2.5 Quintillion bytes of data are created every day.
From now onward the amount of data in the world will double every two years. The increasing volume of available data has opened up growth opportunity to businesses of any sector. With IoTs, social medias, the decreasing cost of storage, and the growing hunger for data, it has become difficult for the existing RDBMS solution to handle and manage the data as efficiently as needed to produce the expected business value. To address that challenge, storage solutions such as Hadoop and complex machine learning algorithms where designed. [2]
What is Velocity?
Every minute, over 100 hours of video are uploaded to YouTube, 200 million emails are sent and 30 thousand photos are viewed on Flickr. While the volume represents the size of the data created, the velocity is the index of rate at which that data is created.
In essence velocity is the derivative of the size over time. The high rate of data consumption has created the need to increase the processing resources and analytics algorithms to new standards that meet the needs of the growing population of data consumers. [3]
What is Variety?
Social networks, blogs, IoTs and corporate data come in different shapes and forms. Over 90% of the data generated by organization is unstructured. These data are designed to accommodate emotions, and other social engineered matrices that would be unfit for our rows and columns RDBMS systems. Allowing users to express themselves using mediums other than words could be seen as a major catalyst to the growing volume of data. [4]
What is Veracity?
What do we gain by harvesting high volumes of data at the speed of light if the data collected is incorrect?
False and incorrect data will most likely lead to inaccurate predictions that could negatively impact decision making. In that sense, the value of any Big data initiative lies not only on the size, velocity and variety of the data but also on its reliability. [5]
What is Variability?
One of the main objectives of machine learning is to train a computer system to independently behave and react as a human operator would if he had access to same information.
That implies the ability to identify an object variable within a set context. A human operator for instance can recognize different breads of cats and differentiate cats from dogs even though both are animals with 4 legs. For a computer operating on bits and bytes, each cat represents a different class altogether unless it meets all the requirements to fit in the distinction provided for the previous cat.
Variability provides the framework needed to feed the machine with numerous variations of the same object and train the system to identify the common denominator. Deep Learning is one example of such machine learning algorithm [6]
What is Value?
What’s in there for me? Why should an organization invest in Big data? The potential annual value of Big Data to the US Health Care is $300 billion and € 250 billion to the European public sector.[1]
It’s undeniable that the data has no value in itself. The value lies on the intelligence that is drawn from it. The wisdom that it gives and the ability to confidently predict what and when to produce and what market and strategy is more susceptible to work. All that would be impossible without an extremely large volume of reliable data, received at a high rate that covers all potential variables. [1]
Today more than ever, the use of Big Data is shaping the world around us, offering more qualitative insights into our everyday lives.[2]