Core Data Science

Exploratory data analysis and interactive visualisation, unsupervised learning, dimensionality reduction and feature extraction, supervised learning and more.


  • Level: beginner
  • Duration: 2-day course
  • Delivered: in-house

What you will learn

The course is extremely interactive and hands-on. You will learn by working through concrete problems with a real dataset. You will be taught by academic and industry experts in the field, who have a wealth of experience and knowledge to share.

  • Preprocessing (scaling, log transformations, imputation, hot coding)
  • Exploratory data analysis and interactive visualisation
  • Unsupervised learning (k-means clustering, hierarchical clustering)
  • Dimensionality reduction and feature extraction
  • Supervised learning (KNN, decision trees, random forests, SVMs)
  • Model Evaluation and Tuning
  • Logistic Regression

Languages and libraries

  • Python programming language
  • Numpy and pandas for data manipulation
  • Scikit-learn for machine learning algorithms
  • Matplotlib and seaborn for data visualisation

OUTLINE

Day One

Data Science Essentials

Session 1

Introduction to Data Science

  • Overview of Data Science and Machine Learning
  • Supervised vs. Unsupervised Learning
  • Working with the Jupyter notebook
  • The Numpy library for array manipulation

Session 2

Working with real-world data

  • The Pandas library for data manipulation
  • Data cleaning and pre-processing
  • Data visualisation with Matplotlib and Seaborn

Session 3

Principal Component Analysis (PCA)

  • What is PCA and why you need it
  • Applying PCA in Python with SKLearn

Day Two

Unsupervised learning and supervised learning

Session 1

Unsupervised learning

  • The scikit-learn library for Machine Learning and scikit-learn pipelines
  • k-means clustering
  • Hierarchical cluster analysis
  • Density-based clustering (DBScan)

Session 2

Supervised Learning

  • The k-Nearest Neighbour algorithm
  • Overfitting, underfitting, bias-variance tradeoff
  • Cross-Validation and hyperparameter tuning

Day Three

Machine Learning

Session 1

Random Forests

  • Decision Trees
  • Ensemble models and Random Forests

Session 2

Logistic Regression

  • Logistic Regression
  • Regularisation: Ridge and Lasso

Session 3

Support Vector Classifiers

  • Linear Support Vector Classifiers (SVC)
  • The kernel-trick and non-linear SVCs

Prerequisites

  • Elementary Python programming and use of the command line. You can acquire these skills at our Python bootcamp.
  • Basic probability and linear algebra.

Audience

Individuals who want to master new technical skills and learn the latest techniques and industry best practices to work effectively with Data Science teams.


Get in touch

Get in touch to discuss team size, pricing and your tech requirements. Send an email to training@cambridgespark.com or fill in our contact form. We’ll be sure to get back to you soon.

Contact our team