Data Science Webinar Series

Join free Data Science training webinars from experts

Free to join
Learn from experts
40min + Q&A

Upcoming Webinars

Interested in learning about the latest data science libraries in Python? Join the Cambridge Spark’s free Webinar series where you will learn through live-coded examples from experts in the field. These webinars feature Spark for big data processing, pandas, matplotlib and seaborn for exploratory data analysis and more.

  • Date: 18 April 201818 April 2018 - 1700-1745 BST
  • Free webinarFree webinar and resources

Walking the Random Forest and boosting the trees

Deep Learning is all the rage, but ensemble models are still in the game. With libraries such as the recent and performant LightGBM, the Kaggle superstar XGboost or the classic Random Forest from scikit-learn, ensembles models are a must-have in a data scientist’s toolbox. They’ve been proven to provide good performance on a wide range of problems, and are usually simpler to tune and interpret. This talk focuses on two of the most popular tree-based ensemble models. You will learn about Random Forest and Gradient Boosting, relying respectively on bagging and boosting. This talk will demonstrate how to apply these techniques on a real-world business problem in a live-coding session using the latest implementations available in the Python ecosystem.

About the speaker: Kevin Lemagnen

Kevin has lead development of data products for the energy sector and worked for the telecommunications industry at Qualcomm. He was also a visiting researcher at Stanford University. Kevin has delivered data science and machine learning training courses to various clients from industries that include finance, engineering and research helping individuals leverage the latest techniques.

Receive the recording of this past webinar

Past Webinars

  • Date: 02 November 201702 November 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Making Sense of Big Data File Formats

Modern applications generate and manipulate a lot of data. The growth rate of the data is staggering. Unfortunately, large datasets can be expensive to store at large scale and also slow to process. In fact, memory speed has been evolving at a much lower rate in comparison to CPUs. Thankfully, there are various file formats suited for big data systems to help. In this webinar, you will learn about popular file formats suitable for big data systems with a focus on Parquet. Through live coded examples in Python, you will learn the good, the bad, the ugly, and how you can make use of Parquet in practice.

About the speaker: Dr Raoul-Gabriel Urma

Raoul-Gabriel Urma is CEO of Cambridge Spark, a leading learning community for data scientists and developers. Raoul is author of the bestselling programming book "Java 8 in Action" which sold over 20,000 copies globally. He completed a PhD in Computer Science at the University of Cambridge. In addition, he holds a MEng in Computer Science from Imperial College London and graduated with first class honours having won several prizes for technical innovation. Raoul has delivered over 100 technical talks at international conferences. He has worked for Google, eBay, Oracle, and Goldman Sachs, and a Fellow of the Royal Society of Arts.

Receive the recording of this past webinar
  • Date: 21 November 201721 November 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Managing Spatial Data Using Geopandas in Python

The pandas library has made data manipulation and analysis far easier in Python. But what happens when you need to analyse and manipulate geospatial data? This webinar will talk you through the use of GeoPandas to create and manipulate data frames with geometric data types covering importing, manipulating and outputting spatial data in Python, converting text/CSV files to spatial data and plotting spatial data using matplotlib.

About the speaker: Tim Hillel

Tim is a PhD student at Cambridge University looking into understanding multi-modal passenger behaviour at city-scale. His research interests are in transport modelling, machine learning and big data.

Receive the recording of this past webinar
  • Date: 11 October 201711 October 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Exploratory Data Analysis in Python

In machine learning and data science, fancy new algorithms get all the press, but the real battle is won in the trenches. You are only as good as your data. Garbage in, garbage out. Exploratory data analysis, pre-processing, and feature engineering are the unsung heroes of data science and machine learning. This webinar will use python to illustrate a number of tools and strategies to take an unprocessed data set (such as one that you might find online or receive from a colleague) and do exploratory analysis, clean the data, and engineer features to feed to machine learning algorithms.

About the speaker: Patrick Short

Patrick is a teaching fellow at Cambridge Spark and conducting his PhD at Cambridge University at the Wellcome Trust Sanger Institute. Patrick's work focuses on analyzing the role of non-coding mutations in developmental disorders. Patrick is integrating a variety of diverse data sets including epigenetic marks, transcription factor binding predictions, and detailed clinical phenotypes with genome sequence data. Patrick completed his undergraduate education at the University of North Carolina at Chapel Hill (UNC) with a double-major in Applied Mathematics and Quantitative Biology with a minor in Chinese.

Receive the recording of this past webinar
  • Date: 6 September 201706 September 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Getting Started with Spark and Zeppelin in Python

You may hear a lot of buzz about Spark in the Big Data Space. What is it all about and why should you care? In this interactive webinar, you will get familiar with the Spark RDD API which lets you process data using functional-style patterns. Through live coded examples in Python, you will explore a real-word dataset made of JSON entries. In addition, you will discover how simple it is to scale the data processing over a cluster of computers using AWS EMR. At the same time, you will learn about the new cool interactive notebook on the block, which supports common data visualisation and filtering out of the box: Zeppelin.

About the speaker: Dr Raoul-Gabriel Urma

Raoul-Gabriel Urma is CEO of Cambridge Spark, a leading learning community for data scientists and developers. Raoul is author of the bestselling programming book "Java 8 in Action" which sold over 20,000 copies globally. He completed a PhD in Computer Science at the University of Cambridge. In addition, he holds a MEng in Computer Science from Imperial College London and graduated with first class honours having won several prizes for technical innovation. Raoul has delivered over 100 technical talks at international conferences. He has worked for Google, eBay, Oracle, and Goldman Sachs, and a Fellow of the Royal Society of Arts.

Receive the recording of this past webinar