8 things you are MOST probably doing wrong as a beginner data scientist

Shivam Agarwal
3 min readNov 16, 2022

Data science is very growing field and a lot of new people are interested in this field. However, there are some common mistake which any amateur data scientist can make.

Photo by ThisisEngineering RAEng on Unsplash
  1. Jumping to the Solution:- In real world the data science is 20% about the solution and 80% about the data. First you must give time to understand the data. You should try to ask all the doubts related to data and once you have the enough understanding of the data you should proceed next.
Photo by Walker Fenton on Unsplash

2. Not Cleaning the data:- The data given in the classroom environment is mostly clean while the same is not the case in the real world. You should assume that the data provided to you is not clean. You must clean the data before applying any machine learning algorithm

Photo by No Revisions on Unsplash

3. Not understanding the business:- Understanding the business requirement is mort important. You can create a model which is 100% accurate but it is not fulfilling the requirement of business the model is of no use.

Photo by Zuzana Ruttkay on Unsplash

4. Creating Complex Visualization:- There is difference between exploratory and explanatory analysis. You can create complex visualization while doing exploratory analysis. However, you should create very simple visualization when you want to explain to general audience.

Photo by Isaac Smith on Unsplash

5. Explainable vs Accurate Model:- The business prefers explainable model with decent accuracy. A model with high accuracy but not explanation is not preferred in comparison to the model with high explainability and decent accuracy.

Photo by National Cancer Institute on Unsplash

6. Not thinking about Laws:- Always remember the law of land is supreme. The model created based on the past data always carry the bias of history (Black vs White). You must remember to remove the biasness from your model.

Photo by Tingey Injury Law Firm on Unsplash

7. Too Big Model:- The model created by the data scientist has to go to production. The production environment always has limited compute and any delay in getting the result may deteriorate the experience of the end user.

Photo by Nick Fewings on Unsplash

8. Working in Silos:- As a data scientist the communication is very important. You have to communicate with data engineer, software engineer, business analyst and end consumer. Communication plays a key role in real world data science project.

Photo by Noah Silliman on Unsplash

--

--

Shivam Agarwal

Shivam is an accomplished analytics professional and algo trader, sharing expertise in algo trading, data science, and AI through insightful publications.