Introduction to Data Science

Exploratory data analysis and interactive visualisation, unsupervised learning, dimensionality reduction and feature extraction, supervised learning and more.

Next event: 29 Jul - 30 Jul 2017 in London







two days Course

What you will learn

The course is extremely interactive and hands-on. You will learn by working through concrete problems with a real dataset. You will be taught by academic and industry experts in the field, who have a wealth of experience and knowledge to share.

  • Preprocessing (scaling, log transformations, imputation, hot coding)
  • Exploratory data analysis and interactive visualisation
  • Unsupervised learning (k-means clustering, hierarchical clustering)
  • Dimensionality reduction and feature extraction (PCA, t-SNE)
  • Supervised learning (decision trees)

Languages and libraries

  • Python programming language
  • Numpy and pandas for data manipulation
  • Scikit-learn for machine learning algorithms
  • Matplotlib and seaborn for data visualisation

Progression paths

Cement your skills by working through a follow-up project with our feedback and attaining the Data Science Foundation certificate.

Learn state-of-the art machine learning techniques at our Machine Learning Techniques using Python bootcamp.

Acquire specialised Natural Language Processing skills at our Text Mining and Natural Language Processing with Python bootcamp.

Learn how to make quantitative predictions with our Forecasting and Regression course.


Audience: All aspiring data scientists, students, researchers and professionals who are are curious about this exciting and rapidly growing field.

Prerequisites: basic statistics and probability theory; basic python.

You can get learn Python at our Python bootcamp.

Day 1

Pre-processing, exploratory data analysis (EDA), visualisation, principle component analysis.

Session 1

Introduction to Data Science

  • Overview of Data Science and Machine Learning
  • Supervised vs. Unsupervised Learning
  • Industrial Applications

Session 2

Working with real-world data

  • Loading and manipulating data in Python with Panda
  • Data cleaning and pre-processing
  • Exploratory data analysis

Session 3

Principal Component Analysis (PCA)

  • What is PCA and why you need it
  • Applying PCA in Python with SKLearn



  • Drinks with fellow participants and lecturers

Day 2

Unsupervised learning and supervised learning.

Session 1

Unsupervised learning

  • k-means clustering
  • Hierarchical cluster analysis
  • Density-based clustering

Session 2

Supervised Learning

  • The K Nearest Neighbor algorithm
  • Decision Tree classifier
  • Overfitting and Validation
  • Hyperparameter tuning

Continuous learning project

Our continuous learning project comprises a real-world problem and data set to complete in your own time, and practice using the course material and techniques covered during the bootcamp. The package includes model notebook answers, with a detailed explanation of the solution and problem-solving process.

Price: £100 extra


Check out video highlights, photos and interviews from our previous bootcamps.

In-house Training

Get in touch to discuss your requirements by emailing or by completing our contact form.

We can deliver this course as a private training at your office during week days.

We can also design a bespoke curriculum matching your specific training objectives.

Book your ticket

29–30 Jul, London
THECUBE - Studio 5 , 155 COMMERCIAL STREET, E1 6BJ, London (London)
Ticket includes course materials and code resources.