Data Science is probably one of the fastest-growing fields of all time, which is rapidly changing and evolving. It is still deemed as the hottest industry that is not stopping anytime soon. But, why Data Science is such a huge thing? Why has no one heard about it 10 years before? and, why is everyone talking about it now?
I recently gave a 5-minute presentation to a group of non-technical people, trying to convince them why data science is our future. In this post, I will describe the presentation and probably this read will convince you that data science indeed is a game-changer possibility. Before I start, let’s dive deep into what is Data Science.
What is Data Science?
Data Science is a concept to unify statistics, data analysis and their related methods. That said, it is nothing more than the above-stated unification idea. People are using so many other definitions, but it is really simple if you think from the perspective of the below diagram.
Some of the things we can deduce from the above diagram:
- If you have domain expertise with a nice intuition and you know how to automate that domain knowledge, then you are a traditional software developer.
- If you have domain expertise on a research topic and you know the statistical ways to validate your work using univariate and multivariate tools, then you are a traditional researcher.
- If you have validation expertise of using statistical tools with a knowledge of automating and designing models, then you have the underlying proficiency of machine learning.
Consequently, to be a data scientist, you need to know how to automate and validate your intuition. This is where Data Science truly resides. It is an amalgamation of software development with statistical knowledge.
Data Science is an interdisciplinary field focused on extracting knowledge from large data sets, which combines the concepts of statistics, data analysis and their related methods.
Okay. So, now we know what everyone means when they use the word ‘Data Science’. But, why is data science so popular?
Why Data Science?
The easiest way to answer this question - look at how much data you are currently generating each day. To give you some perspective, every time you create a post on Facebook, tweet a new quote, publish a story on Instagram or submit your assignment, that is all data. But, this data is produced by you. Now, imagine billions of people doing the same, every day, every year, forever.
Based on the projected growth of data published by Data Age, it can be deduced that we are almost growing the data at an exponential pace.
With this current speed, we will have gigantic 175 zettabytes of data in the next 5 years. One zettabyte is equal to one sextillion bytes or $10^{21}$ (1,000,000,000,000,000,000,000) bytes or, one zettabyte is equal to a trillion gigabytes.
To put you into perspective how big a zettabyte (ZB) is, consider that “if each terabyte in a zettabyte were a kilometre, it would be equivalent to 1,300 round trips to the moon and back (768,800 kilometres)”
Now, if your manager asks you to find some useful insights out of such immeasurable data, how will you even start? The processing of finding even a basic pattern will take ages before it starts giving you any useful results. This is the why of Data Science. This is why we need more sophisticated and advanced tools and techniques to deal with such prodigious growth.
As a consequence of this awareness, Data Science provides tremendous opportunities for traditional software engineers. If they can learn how to automate their intuition with underlying principles of statistical modelling, then they can earn in really good figures. This is the second biggest reason as to why data science is getting so popular. This is again supported by Linkedin when they published their top 20 emerging jobs, where Machine Learning Engineer and Data Scientist are growing at the rate of 9.8x and 6.5x respectively.
Currently, there are around 1,829 open Machine Learning Engineering positions in LinkedIn - showing its popularity and need around the world.
So, this brings us to some of our last burning questions - what is data science doing? why so many jobs? why are companies looking for a data scientist?
What is Data Science doing?
One of the powerful aspects of Data Science is its limitless potential in providing benefits in almost all possible sectors. This is beautifully summarized by Data Flair, where they show data science applications in 6 most important sectors, like Banking or Finance, as shown below.
The revenue generated by applying data science in these fields is extremely high, which is spreading a positive message around the world regarding the usefulness of these tools.
Data-driven businesses are worth $1.2 trillion collectively in 2020, an increase from $333 billion in the year 2015.
To see what we can do with these tools, let’s look at some cool Data Science projects. First, let’s see how we can use some of the powerful data visualization and analysis tools to get deeper insights from the data. This example is the ultimate song analysis for 50 mainstream artists using “This Is” Spotify playlists by James Le. You can read the full story using below link.
Spotify’s “This Is” playlists: the ultimate song analysis for 50 mainstream artists
One of the fascinating things I found about this project is the fact that how accurate data analysis can be sometimes. As an example, I usually find Taylor Swift’s songs danceable and ChainSmoker’s songs less danceable but usually more energetic. Turns out, that’s what the analysis is also predicting 🤯 Isn’t it crazy how useful such insights can be?
Similarly, I did a data science project for my client who is looking for some space optimization opportunity inside the Parkville campus of the University of Melbourne. I created a prediction model which can help him to identify closest buildings for booking a meeting room, based on supply and demand constraints. What came as a surprise to all of us is the fact that good supply of meeting rooms are generally not close to some of the most popular buildings. Clearly, we couldn’t derive this insight looking at the dataset manually and even after doing some basic exploratory data analysis.
So, here, Doug McDonell building is a popular hotspot for booking a meeting room, and buildings which are around 400-500 metres apart are the only possible candidates for getting next best supply of meeting rooms. This exercise gave our client some insight as to where space optimization was necessary and should be implemented for providing more efficient space allocation. As you can guess, this will help in improving the revenue for the university, where they will only allocate money wherever it is necessary.
And, this is hardly 0.01% of what data science can offer.
I barely scratched the surface and tried to convince you as to why there is so much buzz around the need of Data Science. Probably, it all makes sense and maybe you are now looking at this term in a more dignified manner. Lastly, I will leave you with a famous quote by the CEO of VMware, who accepted that the world is changing way back before any of us.
Data is the new Science. Big Data holds the answers - Pat Gelsinger
Cheers,
A.
Leave a comment