Data Science — A quick overview

Bloffin Technologies > News > Blog > Data Science — A quick overview

Data Science — A quick overview

Data Science — A quick overview

Data Science is gaining a wide recognition. But what does a data scientist do? Data scientists are the people who make sense out of all the big data and determine what can be done with it in order to increase the productivity of a business.

The big data ETL process is the key to engineering any data pipeline. In general, an end to end big data pipelining consists of four main blocks:

  1. Data preparation and collection
  2. Data cleaning, transformation, and loading (ETL)
  3. Data analytics (statistics or data mining)
  4. Predictive analytics (Machine learning / Artificial Intelligence)

Over the last decade, there has been a massive growth in both data generated and data retained. These data are retained by companies and organization to drive their business, we call this “Big Data”.


Let’s understand with an example:

Consider, you are eating candy, generally, a person takes those candies that he likes, in contrast, data scientists are the people who will get all the flavors of the candies and analyze them because they really need to know what each one tastes like. In short, the title “Data Scientist” encompasses different flavors of the work. According to me, that is the major difference between a “Data Scientist”, “Statistician”, “Analyst” or an “Engineer”. A data scientist is one who does little of those tasks done by a statistician, analyst, and engineer.

To be more specific, a data scientist is the one who does the following primary tasks:

  1. Data Analysis (Statistics)
  2. Predictive analytics (Machine Learning)
  3. Visualization

Let’s have a look at each of the tasks in brief:

  • Data Analysis (Statistics):
Image for post

In this task, lots of plots of data are made in order to understand the pattern of the data. Through this process, some theories regarding data behavior are crafted in a way that will be easy to communicate and easy to act on. A data scientist develops different models by understanding the data patterns through data analysis and develops some strategies based on understood or developed statistics. But the most challenging aspect of this task is that the models or statistics cannot act as a permanent solution to the defined problem. Therefore, a lot of time is dedicated to this task in which a data scientist may need to evaluate and make some changes in the existing models, as well as going back to the data and bring out new features to help make better models.

  • Predictive Analytics (Machine Learning)
Image for post

Another important task of a data scientist can be developing predictive models for forecasting possible outcomes or patterns in the data. Such predictive analytics can help businesses take valuable business decisions. An important challenge for building a predictive model is the reliability of the model. A data scientist must make sure that the model passes several validation tests before it is ready for making business decisions. Some interesting use cases of predictive analytics can be weather forecasting, stock predictions, recommender systems based on buyer purchase patterns, etc.

  • Visualization:
Image for post

The above-discussed tasks can just be defined or act as a tip of the iceberg. This is because even if we have state of the art data models for different applications, it doesn’t do anyone much good if the insights are not given to the customers or users and do it consistently. This means building a sort of a data product that can be used by the people who are not data scientists. This can be implemented in many forms like chart visualizations, metrics on a dashboard, or an application. The best examples of such applications are Tableau, Alteryx, etc. which are the market leaders in data dashboarding.

Understanding all the above tasks of a data scientist, in brief, it can also be understood that a long-term life cycle of a data science project may involve going back and re-analyzing the data models if there is always a new source of data coming in and there is a need to incorporate them.

Analyzing such traits and tasks of a data scientist it can be concluded beyond doubt that how great importance the data science and data scientist may have in the growth of any organization in the era of highest competition and the need of constant improvements in the services of the organizations.