It’s wild to think it’s been a year since I first became a data scientist, and I wanted to share some of the lessons I’ve learned so far.
The Data Science Title is Meaningless
I still have no idea what a “typical” data scientist is, and many companies have no idea either. A data science role is very dependent on the company and the maturity of their data infrastructure. Instead of a title, focus on what business problems are present for a particular company and how your skillset in data can solve it. Want to build data products? Then chase those business problems! Interested in using deep learning? Find companies with the infrastructure and problems that warrant such methods. Chasing data problems instead of titles will put you in a better place.
Ask More Questions Before Coding
I’ve been burned a few times learning that most non-data people have no idea what data solution they need. Jumping straight into coding after getting a request will set you up for failure. Take a step back and ask probing questions for further clarification. Many times you will find that someone will ask for “ABC” but after further questions they actually need “XYZ”. This skill of getting clarity and consensus among stakeholders, regarding data problems and solutions, is such an important facet of being an effective data scientist.
Prototype to Build Buy In
Start with a simple example, get feedback, implement feedback, then repeat. This process saves you time and makes your stakeholders feel heard/valued. For example, I recently had to create an algorithm to classify our product’s users. Rather than jumping straight into python, I created a slide deck describing the algorithms logic visually and an excel spreadsheet of different use cases. I presented these prototypes to stakeholders and then implemented their feedback into the prototype. By the end of this process it was clear as to what I needed to code and the stakeholders understood what value my data solution would bring to them.
Talk To Domain Experts
You end up making A LOT of assumptions about the data. Talking to domain experts of your data subject and or product will help you make better assumptions. Go talk to Sales or Customer Success teams to learn about customer pain points. Talk to engineers to learn why certain product decisions were made. If it’s a specific domain, talk to a subject matter expert to learn whether there is an important nuance about the data or if it’s a data quality issue.
Learn Software Engineering Best Practices
Notebooks are awesome for experimenting and data exploration, but they can only take you so far. Learn how to build scripts for your data science workflow instead of just using notebooks. Take advantage of git to keep track of your code. Write unit tests to make sure your code is working as expected. Put effort into how you structure your code (e.g. functions, separate scripts, etc.). This will help you stand out as a data scientist, as well as make it way easier to put your data solutions into production.
There is probably more, but these are the topics top of mind for me right now! Would love to hear what other data scientist have learned as well!