A Big Data story of sex, relationships, and true love
Big Data is like teen sex: everybody is talking about it, everyone thinks everyone else is doing it, so everybody claims they are doing it.
Dan Ariely, Duke University
But what is ‘it’ exactly? Even the experts can’t agree on what Big Data is. Take these four different definitions, from four different perspectives:
- The Date Perspective: “Big Data is the new and massive data types that have appeared over the last decade or so” – Tom Davenport – Academic and Business Analyst
- The Characteristics Perspective: ““Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” – Gartner Inc, 2011. Gartner goes on to define another 9 dimensions, or characteristics: complexity, technology, pervasive use, classification, contracts, validation, linking, fidelity, and perishability.
- The Results Perspective: “The ability of society to harness information in novel ways to produce useful insights or goods and services of significant value.… Big data refers to things one can do at a large scale that cannot be done at a smaller one” – Mayer-Schonberger and Cukier, 2013
- The Process Perspective: “Big Data is about the analysis of data that’s really messy or where you don’w know the right questions to ask – where you look for patterns or anomalies” – Philip Ashlock Chief Architect, Data.Gov, 2014
In case you need more proof that there is some confusion about this matter,datascience@berkeley gives you 43 more definitions collected in 2014 from a survey of experts in the field.
Unfortunately non of the perspectives given above – certainly not Gartner’s ubiquitous “3-V” defintion, helps much in understanding why Big Data is such a Big Deal.
I can now tell you that Big Data is a Big Deal not because it is data, and also not because it is necessarily big. As a branch of Artificial Intelligence (AI) Big Data has the ability to transform our lives because it is Semantically Aware. Big Data, like true love, contains meaning and understanding. It is so powerful because with meaning and understanding, comes the possibility for creating true relationships. These ‘true relationships’ – manifesting themselves in new insights and new answers to questions that no one had thought to ask before, when data was just data.
Big Data therefore is a branch of AI technologies that, when applied to a dataset, creates semantic context. The kinds of semantic context that can be applied include, but are not limited to:
- Language Awareness: for example a machine-based analysis of emails in legal fraud cases detected that a reversion to formal language, followed later by a invitation to “call me” indicated guilt. Machines can now detect irony and emotions – a hotel chain used online reviews to determine that the first 20 minutes of a guest’s experience were the most important in framing customer satisfaction.
- Pattern Awareness: analyzing the pattern of basketball players moving around a court gives predictive insight into scoring capabilities
- Existential Awareness: In the world of the Internet of Things, sensors in everything from automobiles to refrigerators to parts in railway locomotives provides the self-knowledge of how the parts are performing, and when they are about to fail.
- Temporal Awareness: understanding time – its passing and its cyclicality – provides context that can predict when your favorite coffee shop is going to be too noisy to have a conversation in, or when to start irrigating
- Geographic Awareness: the ubiquity of Google’s map and ‘earth’ technologies has made geographic awareness common-place. Google can track the spread of seasonal flu ten times faster than the CDC, by looking at search terms for flu-like symptoms, based on IP location.
- Facial/Emotional Awareness: by tracking 16,000 datapoints on a face, Microsoft can even distinguish between identical twins, to authenticate users. Facial awareness technologies can also detect emotions, and even physical anomalies that might indicate a medical condition.
- Spatial Awareness: Tesla recently did a software upgrade on its S series that enables the car to detect an impending collision
Often, in order to answer so-called “Big Data” questions, data with different semantic contexts are combined. So, existential awareness might be combined with geographic awareness, for example, to determine under what climatic conditions engine parts perform best, or language awareness with facial/emotional awareness to determine the most compelling advertising copy. Such context rich information very often requires lots of data, and very often the technology necessitates that this data comes in a variety of forms – not all of it in the neatly ordered, relational way that we are used to dealing with. Hence the term “Big Data” – but any form of data – even traditional ‘small data’ can be made Big just by the application of awareness. Even teen sex can evolve into true love.
Although the opinions are mine, this article was inspired by, and draws heavily from, a recent talk at SeaSPIN (seaspin.org) by Michael Kauffman @bignimble