Introduction to Data Science using Python

Exploratory data analysis and interactive visualisation, unsupervised learning, dimensionality reduction and feature extraction, supervised learning and more.

LEVEL: BEGINNER
DURATION: 2-DAY COURSE
DELIVERED: AT YOUR OFFICE

What you will learn

Introduction to Data Science

The course is extremely interactive and hands-on. You will learn by working through concrete problems with a real dataset. You will be taught by academic and industry experts in the field, who have a wealth of experience and knowledge to share

Preprocessing (scaling, log transformations, imputation, one hot coding)
Exploratory data analysis and interactive visualisation
Unsupervised learning (k-means clustering, hierarchical clustering)
Dimensionality reduction and feature extraction
Supervised learning (KNN, decision trees, random forests, SVMs)

Languages and libraries :

Python 3
Numpy and Pandas for data manipulation
Scikit-learn and statsmodel for linear and time series models
Matplotlib for visualisation

PREREQUISITES

Elementary Python programming and use of the command line. You can acquire these skills at our Python bootcamp.

Basic probability and linear algebra.

AUDIENCE

Individuals who want to master new technical skills and learn the latest techniques and industry best practices to work effectively with Data Science teams.

Get in touch to learn about the course

DAY ONE

DATA SCIENCE ESSENTIALS

Session 1

Introduction to Data Science

  • Overview of Data Science and Machine Learning
  • Supervised vs. Unsupervised Learning
  • Working with the Jupyter notebook
  • The Numpy library for array manipulation
Session 2

Working with real-world data

  • The Pandas library for data manipulation
  • Data cleaning and pre-processing
  • Data visualisation with Matplotlib and Seaborn
Session 3

Principal Component Analysis (PCA)

  • What is PCA and why you need it
  • Applying PCA in Python with SKLearn

DAY TWO

UNSUPERVISED LEARNING AND SUPERVISED LEARNING

Session 1

Unsupervised learning

  • The scikit-learn library for Machine Learning and scikit-learn pipelines
  • k-means clustering
  • Hierarchical cluster analysis
  • Density-based clustering (DBScan)
Session 2

Supervised Learning

  • The k-Nearest Neighbour algorithm
  • Overfitting, underfitting, bias-variance tradeoff
  • Cross-Validation and hyperparameter tuning

Get in Touch

CONTACT US

We will email you within the next 24 hours to arrange a quick call to help with any questions about the programme and recommend pre-course materials.

We look forward to speaking with you.

Dr. Raoul-Gabriel Urma

Dr. Raoul-Gabriel Urma

Director