8 things you are MOST probably doing wrong as a beginner data scientist
Data science is very growing field and a lot of new people are interested in this field. However, there are some common mistake which any amateur data scientist can make.
- Jumping to the Solution:- In real world the data science is 20% about the solution and 80% about the data. First you must give time to understand the data. You should try to ask all the doubts related to data and once you have the enough understanding of the data you should proceed next.
2. Not Cleaning the data:- The data given in the classroom environment is mostly clean while the same is not the case in the real world. You should assume that the data provided to you is not clean. You must clean the data before applying any machine learning algorithm
3. Not understanding the business:- Understanding the business requirement is mort important. You can create a model which is 100% accurate but it is not fulfilling the requirement of business the model is of no use.
4. Creating Complex Visualization:- There is difference between exploratory and explanatory analysis. You can create complex visualization while doing exploratory analysis. However, you should create very simple visualization when you want to explain to general audience.
5. Explainable vs Accurate Model:- The business prefers explainable model with decent accuracy. A model with high accuracy but not explanation is not preferred in comparison to the model with high explainability and decent accuracy.
6. Not thinking about Laws:- Always remember the law of land is supreme. The model created based on the past data always carry the bias of history (Black vs White). You must remember to remove the biasness from your model.
7. Too Big Model:- The model created by the data scientist has to go to production. The production environment always has limited compute and any delay in getting the result may deteriorate the experience of the end user.
8. Working in Silos:- As a data scientist the communication is very important. You have to communicate with data engineer, software engineer, business analyst and end consumer. Communication plays a key role in real world data science project.