Sir, an equation has no meaning for me unless it expresses a thought of God — Srinivasa Ramanujan
At one point in human history, MS Excel followed by MS Access and later Relational Database Management Systems (RDMS) like Oracle were enough to crunch data. That’s because we were not searching for unknown unknowns. We did not have sensors everywhere and a computer in our pockets. Today, machine learning algorithms need to be supported by quantum computing to handle the level of data being generated. Our current output of data is roughly 2.5 Quintillion bytes a day.
There isn’t a single industry in the economy that won’t be completely transformed by Artificial Intelligence (AI). Therefore, it follows that data will define the economy and the jobs of the future. This article provides an overview and a history of the data economy. It ends by forecasting what the future may look like. In any case, to be a data scientist, today, is to be at the right place at the right time.
A Brief History of Data Science
The earliest known history of data is as old as 18,000 BCE when tally sticks were used to record and analyze data in what is now Uganda. The Ishango bone, found in 1960, is evidence of that ancient history. The World Economic Forum provides a detailed history of data science on its website.
Like any other technology, data science had humble beginnings and came to be fully developed and known as such because of enabling technologies such as the storage mechanisms (data centers), cloud computing, the internet and machine learning algorithms. Most technologies are combinations are existing technologies.
In November 1997, CF Jeff Wu gave an inaugural lecture titled “Statistics=Data Science?” for his appointment to the H. C. Carver Professorship at the University of Michigan. This lecture marked the first modern day, non computer science usage of the words “Data Science”.
The Data Economy
A distinction needs to be drawn between data, information and intelligence. Data is raw data without any modification. Information is data arranged to represent facts. Finally, Intelligence is understanding the pattern that information provides. Think of intelligence as a higher order of data interpretation. Humans, with 100 billion neurons in their brains, are highly sophisticated machines with superior cognitive abilities.
Life generates data. Every change in our body from the time we wake up to going to sleep is a data point. All of our activities that fill the day including the quality of our sleep is a data point. Beyond that, the Internet of Things (IoT) enables machines to create and share data by monitoring and reacting to the world around us using sensors and actuators. The Data Economy is an economy that has come to being because of:
- A proliferation in communication technologies such as 4G and Wi-Fi, data gathering instruments i.e. sensors and actuators that have spawned an Internet of Things (IoT)
2. A data explosion created by continuous stream of data collection through the IoT
3. Advancements in hardware such as Graphic Processing Units (GPUs) and field-programmable gate array (FPGAs)
4. A confluence of technologies such as the Blockchain etc and significant investments in Artificial Intelligence.
Data can be used in a million ways. Common uses include:
- Business: To monetize through advertisement and subscription revenue a la Facebook and Netflix or to recommend products and services e.g. Amazon.
- Governance: To create digital ID’s for banking and delivering financial services, taxation, licensing, voting and other civic functions
- Healthcare: To solve medical problems through crowd sourcing of data
- Artificial Intelligence (AI): To power machine learning algorithms solve problems not possible using human computing abilities across public and private domains
As you can imagine, it is hard to put a monetary value to the data economy because in the future data is the economy. However, as per a recent report by the McKinsey Global Institute, seven sectors alone can unlock $3-$5 trillion in economic value using open data.
Then, there is the hidden data economy which is a dark underbelly where stolen data is traded. As per a McAfee Labs report, every piece of data has a price in this dark market.
Who Is A Data Scientist?
As per Wikipedia, Data science is a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.
In the most simplest terms, the way a data scientists job goes beyond statistics is that it requires an ability to understand business, an ability to understand programming and to handle modern tools (programming languages and design software) so that all these ingredients can be combined to mimic human decision making.
If you were ever part of a train the trainer program as a trainer or trainee, you can understand what data scientists do. They are training computers to essentially to draw inferences on their own in the future. Let’s say, sometime in the future, a member of a company’s management team issues a voice command to a machine instructing the machine to identify trends in sales over a five year period and report the key results. Today, that job is a part of what data scientists do. They work with data engineers (who do the coding and supply the data to data scientists in a form the scientists can understand), product or business managers and modern tools (programming languages and design tools such as Python, R etc.) to create decision making systems that mimic human intelligence.
If you are familiar with investment banking, the closest analogy is an investment associate that works with the analyst i.e. the person churning out MS Excel models (in our case the data engineer) and a director i.e. a person who is managing the client relationship (in our case the management) to transmit instructions from the director to the analyst (in our case the data engineers) and sending the results back to the director (management) who in turn parses the findings and provides a summary of the results to the client (senior management) for making an executive decision.
Ultimately, the goal of a data scientist is to train the Machine Learning (ML) algorithm to discover things i.e. unknown unknowns that weren’t possible previously because of lack of data, machine learning algorithms and computing power.
The App Economy
A recent report on the State of the App Economy released by the App Association puts the app economy in the US at $950 billion i.e. roughly the size of Apple. According to the report, the app economy employs more than 4.7 million Americans as developers, software engineers, systems managers, and teachers with an average salary of $86,000 (twice the US national average). Globally, the app economy rides on the coat tails of significant demographic growth. For instance, in China alone, the app economy is expected to triple between 2016 and 2021. Naturally, that means more jobs created in the app development ecosystem.
The Privacy Question
Considering the size of Facebook’s Cambridge Analytica snafu, privacy has become a buzzword that has suddenly occupied everyone’s imagination. Hacking has a history as old as computing. Therefore, the default assumption is that there is nothing private on the internet. However, the mass breach of data that is used for political manipulation or financial gain just leaves a large reminder in people’s minds. Equifax data breach is another prime example of events that trigger the privacy debate. As per the World Economic Forum:
“Equifax, a US based consumer credit reporting agency that collects and aggregates information on over 800 million individual consumers and more than 88 million businesses worldwide, suffered a data breach of 143 million users. As a result, they’re facing a class action lawsuit of up to US$ 70 billion”
The data economy, in a way, is based on various layers of privacy. In the extreme scenario, a human can be tracked every second of his life. On the other hand, going off the grid is like going back to the stone age.
On May 25, 2018, Europe mandated a set of stringent rules under the General Data Protection Regulation (GDPR) that are designed to protect privacy of data belonging to EU citizens worldwide while laying out onerous requirements for tech companies and businesses dealing with that data.
Europe and Argentina have also enacted laws under the “Right To Be Forgotten” category where individuals who wish that their past data be purged from the internet can have it done so.
The Ship of Theseus
If you are a parent, you will relate to this analogy. Think of a child growing up and the miracles in its life as it reaches teenage and then adulthood. The first word, the first step, the first sentence and the first fall are nothing short of miracles. Can such natural miracles be reduced to data? I might have used the phrase ‘converted to data’ instead of ‘reducing to data’. However, I am not a fan of technology for technology’s sake unless it benefits mankind without severe side effects. If data science exacerbates social and civil strife by leading to greater inequality, is it still life changing? Perhaps not to the ones left behind.
As per Oxfam International, Eighty two percent of the wealth generated last year went to the richest one percent of the global population, while the 3.7 billion people who make up the poorest half of the world saw no increase in their wealth, according to a new Oxfam report released today.
Think about art. Can art be reduced to data? Many think so. Bloomberg Businessweek recently ran a cover that had a painting created by AI algorithms. Is it art if it is created by a machine? Yes, if you look at music as an example where you can literally create a song with your bare hands using a software program loaded on your smartphone. However, is that music the same as the classics created using old school instruments? I don’t have an answer to that conundrum.
One thing is clear: there is no fun in homogeneity. Old buildings become heritage sites with age. Wine ages better with time. There are certain things that get better with time. We must preserve them even if they are the last traces of our unknown origins and of a mysterious superpower. A life without any mystery is like an equation without any meaning. Wouldn’t you say so?