It’s common to see job descriptions along the lines of the following.
Looking for a data scientist.
- Building big data databases, data warehouses, data lakes, and pipelines.
- Training and deploying deep learning models and neural networks for prediction.
- Creating dashboards, analyses, visualisationgs and reports.
- Supporting marketing and other teams with A/B test analyses, data deep-dives, etc.
- 10 years of experience with Hadoop, Spark, TensorFlow, Keras, Pytorch, Python, R,
I obviously exaggerate, but there is still significant confusion about what is reasonable to expect from a single data scientist.
AirBnB has a nice blog post about their distinction of three core types of data science work: Analytics, Algorithms, and Inference. In their own words (emphasis mine):
The Analytics track is ideal for those who are skilled at asking a great question, exploring cuts of the data in a revealing way, automating analysis through dashboards and visualizations, and driving changes in the business as a result of recommendations. The Algorithms track would be the home for those with expertise in machine learning, passionate about creating business value by infusing data in our product and processes. And the Inference track would be perfect for our statisticians, economists, and social scientists using statistics to improve our decision making and measure the impact of our work.
I wish this distinction was more widely used, especially among people hiring their first data scientists. In the fictional job responsibilities list above, each bullet corresponds to a different role: 1. Data Engineer, 2. Data Scientist (algorithms) or Machine Learning Engineer, 3. Data Scientist (Analytics), and 4. Data Scientist (Inference).
There are generalists who are at least somewhat versed in each of the above, but you’re unlikely to find candidates that are able to take on more than two of the above roles. Your first data scientist should be generalist, or grow into one quickly, but you should still understand your requirements well, and write a job description that doesn’t consist simply of the top 10 buzzwords of the day.
So, my algorithm for writing a job posting:
- read the AirBnb blog post;
- decide which kind of data scientist you need (or possibly a mixture of two);
- job title: Data Scientist – <type>;
- job description: the responsibilities for the type.