Data Analysts vs Data Scientists: What's the Difference?
Data Science and AI technologies are already having a major impact on business and society – from creating more personalised user experiences, to building cost-effective operations, to providing more accurate fraud detection. These are just three examples, new AI-powered applications and Data Science use cases are emerging across industries, and Big Data and Business Analytics revenues are forecast to reach $210 billion in 2020 according to the 2017 International Data Corporation Big Data and Analytics Spending Guide.
These rapid technological advances and business benefits have led to a growing demand for Data Analyst and Data Science skills; yet the need for individuals to fill positions as Data Analysts and Data Scientists is so huge it exceeds the current supply.
In response, companies can make use of internal Data Science training schemes and career development programmes to both upskill their workforce and develop new talent. The 2017 Global Talent Competitiveness Index advises that “companies must offer work-based training opportunities to allow youngsters to develop their employability and gain the required skills, and upskill their existing workforces.” However, if you are new to the field it can be difficult to know the difference between both roles. The good news is that there are clear industry standards published by the Institute of Apprenticeships (IfA) outlining the required skills for a Data Analyst and Data Scientist.
In this article, we draw upon IFA standards to highlight the difference between both roles and explain how you can make use of training programmes to address critical skill gaps in your organisation.
What are Data Analysts and Data Scientists?
The Data Analyst
In essence, the primary role of a Data Analyst is to collect, organise and study data to provide business insight. As stated in the IfA Data Analyst Standards “Data Analysts are typically involved with managing, cleansing, abstracting and aggregating data, and conducting a range of analytical studies on that data.”
Managing, Cleansing, Abstracting and Aggregating Data: A Definition
- Managing: involves planning, executing and maintaining data processes for the secure storage of data and information assets.
- Cleansing: the process of checking data quality and accuracy by recognising then removing incorrect or biased data from a database
- Abstracting: the process of removing characteristics from a dataset to reduce it to a set of essential characteristics for more efficient data processing.
- Aggregating: the process of compiling information from multiple data sources to prepare combined datasets for data processing.
The Data Scientist
Data Scientists build upon the core competencies of a Data Analyst with additional Machine Learning and Software Engineering skills. The IfA Data Scientist Standards states; “Data Scientists are dynamic and adaptable, addressing varied problems with varied techniques. They actively explore innovative ways to use existing and new statistical, algorithmic, predictive, machine learning and artificial intelligence tools and techniques, to find significant and valuable patterns in data and transform these into information for their organisation.”
What are the typical problems a Data Analyst and a Data Scientist might work on?
Different types of analytics can be categorised into “The Four Analytic Capabilities” – a widely used framework put forward by Gartner Research. These approaches increase in complexity, from Description and Diagnostic (more traditional techniques), to Predictive and Prescriptive (more sophisticated techniques), providing a useful way to demonstrate the progression from Data Analytics to Data Science.
The Four Analytic Capabilities: A Definition
- Descriptive: What happened? Example: What is the turnover this month?
- Diagnostic: Why did it happen? Example: In your monthly report, you can see that last month’s sales performance declined. What caused this?
- Predictive: What will happen? Example: Imagine you are a retailer and you want to maximise product sales while minimising waste. How can you accurately forecast how much stock you need?
- Prescriptive: What should I do? Example: Based on the traffic predictions, what are the best marketing initiatives you can put in place to maximise the prospects-to-lead ratio?
While each company will face different data challenges and business problems, a Data Analyst will often be tasked with performing descriptive and diagnostic analysis to provide business insights. In contrast, a Data Scientist would be expected to apply predictive and prescriptive analytics to develop business solutions. Their work requires strong programming skills and deep theoretical knowledge, combined with the dedication and curiosity to keep up with the latest tools and techniques.
Our industry is moving forward fast, so we need to stay ahead. I recommend practising with real data, experimenting with different methods and evaluating them as you go. Collaboration and teamwork are key too… just as important as understanding complex problems quickly and working on code.
Sebastian Kaltwang , Research Scientist, FiveAI
Core Skills - Domain Expertise, Mathematics, and Programming
At the core, Data Analysts and Data Scientists have skills in three broad areas: Domain Expertise, Mathematics and Statistics, and Programming. In this section we break down these three areas to identify the foundational skills expected of a Data Analyst, then describe the additional expectations for a Data Scientist.
Domain expertise is a fundamental part of Data Analytics and Data Science that can be taught on the job, enforcing the importance of graduate training schemes that quickly get individuals up to speed. Individuals draw upon domain expertise to:
- Understand and identify business problems that can benefit from Data Science.
- Apply relevant tools and techniques to solve the problem.
- Convert that solution into actionable insights to help the business.
- Communicate the findings in way that wider business units can understand and act on the insights.
Additional skills for Data Science:
Both roles require the ability to present results to a range of stakeholders and reason about their methods. However, Data Scientists need to have a strong understanding of industry best practices for interpreting complex machine learning models. For example, LIME and SHAP are recognised techniques for model explainability.
Mathematics and Statistics
Statistical foundations are important to ensure individuals grasp machine learning’s underlying mathematics, and have an understanding of how the model works and when it works well. For example, to train a basic prediction model, an Analyst will need an intuition of linear regression and gradient descent – these methods draw upon an understanding of linear algebra, optimisation and probability.
- Linear Regression is a statistical learning method used to visualise a linear relationship between dependent output variables (y) and independent input variables (x), and use that line to predict future values of the output variables.
- Gradient Descent is a basic optimisation algorithm for finding the minimum of a function; this value is called a parameter. Parameters are properties used to train the model to fit the data as accurately as possible, in order to minimise error in the model’s predictions.
A great reference for further information on mathematical foundations is: Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares. By Stephen Boyd and Lieven Vandenberghe, 2018.
Additional skills for Data Science:
A Data Scientist’s mathematics skills will be more advanced than those of a Data Analyst. Machine Learning modules become less intuitive as they get more complex in design. In turn, their implementation requires more rigorous mathematical knowledge beyond basic concepts, not only to train the model, but also to decompose what the algorithm has done to explain your decisions to stakeholders.
Programming and Database knowledge is the third core component of Data Analyst and Data Scientist roles. To gather, process and analyse large amounts of data, individuals must have the skills to:
- Bring data together from multiple sources.
- Clean, transform and explore data to deliver practical insights quickly.
- Work in accordance with software development standards, including security, code quality and version control.
As a Data Analyst, this means going beyond Excel to make use of Python, a widespread programming language with an abundance of ready-made libraries for data analysis, data visualisation and modelling. Data Analysts use their intermediate Python programming skills to apply techniques such as Exploratory Data Analysis, Supervised Learning and associated methodologies to maintain and tune models. In addition, Data Analysts must be comfortable collecting and storing various forms of data (relational, document-oriented and graph) utilising Big Data Systems and technologies such as Spark and Parquet.
Additional skills for Data Science:
A Data Scientist’s programming skills are well beyond those of a Data Analyst. Combining their advanced mathematics and programming skills, Data Scientists create more complex Machine Learning solutions using techniques such as Ensembles Models, Time Series Forecasting, Natural Language Processing, Deep Learning, and Recommender Systems. Data Scientists must bring an engineering mindset to Data projects.
I expect a Data Scientist to have an engineering mindset about functionality that is destined to production – everything needs to be measured and tested. Additionally, Data Scientists need to be able to write their own feature engineering code (either in Python / Scala / Java) with light touch guidance if needed.
David Illes, Vice President, Morgan Stanley
The Need for Data Analysts and Data Scientists
To improve the supply of data talent, leading business are investing in continuous learning opportunities; such as offering graduate training schemes and apprenticeship programmes that attract high-caliber students and skill them up quickly. This is a promising movement to address the UK’s long-standing data skills shortage. Numerous government publications such as Nesta’s 2014 report titled ‘Mind the data skills gap’ stated, “urgent action is needed to deal with this data skills crunch, and ensure that ‘data talent’ coming out of UK universities is able to transform data into insights in industry.”
Training schemes can be used to support your talent management and recruiting strategies. The 2017 Global Talent Competitiveness Index reports that “in this new environment, training, investment in professional development, and continuous learning and upskilling are becoming ever-more desirable values for a new generation of workers.” With this in mind, there are some real opportunities for companies to leverage their training programmes to provide competitive offers that attract and hire the best candidates.
Cambridge Spark offers an efficient way to find, hire and equip individuals with the right skills for your business. We provide adaptive Data Analysis and Data Science training programmes using blended learning. This approach includes live-coded online lectures and learning activities, in-person diagnosis and group presentation sessions, and practical projects with instant feedback using K.A.T.E.® – our Data Science training and assessment platform.
The K.A.T.E. platform assists your talent management and learning and development initiatives by developing personalised training programmes based on each individual’s strengths/weaknesses, and learning objectives to get graduates up to speed and ready to deliver value to your organisation.
Ready to optimise your graduate training?
Get in touch using the form below. We’ll give you a call to discuss your objectives and how we can support your Data Science initiatives.