Data Science Webinar Series

Join free Data Science training webinars from experts


  • Technical Talks
  • Deep Dive Workshops
  • Networking Opportunities

Upcoming Webinars

Interested in learning about the latest data science libraries in Python? Join the Cambridge Spark’s free Webinar series where you will learn through live-coded examples from experts in the field. These webinars feature Spark for big data processing, pandas, matplotlib and seaborn for exploratory data analysis and more.


  • Date: 02 November 201702 November 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Making Sense of Big Data File Formats

Modern applications generate and manipulate a lot of data. The growth rate of the data is staggering. Unfortunately, large datasets can be expensive to store at large scale and also slow to process. In fact, memory speed has been evolving at a much lower rate in comparison to CPUs. Thankfully, there are various file formats suited for big data systems to help. In this webinar, you will learn about popular file formats suitable for big data systems with a focus on Parquet. Through live coded examples in Python, you will learn the good, the bad, the ugly, and how you can make use of Parquet in practice.

About the speaker: Dr Raoul-Gabriel Urma

Raoul-Gabriel Urma is CEO of Cambridge Spark, a leading learning community for data scientists and developers. Raoul is author of the bestselling programming book "Java 8 in Action" which sold over 20,000 copies globally. He completed a PhD in Computer Science at the University of Cambridge. In addition, he holds a MEng in Computer Science from Imperial College London and graduated with first class honours having won several prizes for technical innovation. Raoul has delivered over 100 technical talks at international conferences. He has worked for Google, eBay, Oracle, and Goldman Sachs, and a Fellow of the Royal Society of Arts.


  • Date: 21 November 201721 November 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Managing Spatial Data Using Geopandas in Python

The pandas library has made data manipulation and analysis far easier in Python. But what happens when you need to analyse and manipulate geospatial data? This webinar will talk you through the use of GeoPandas to create and manipulate data frames with geometric data types covering importing, manipulating and outputting spatial data in Python, converting text/CSV files to spatial data and plotting spatial data using matplotlib.

About the speaker: Tim Hillel

Tim is a PhD student at Cambridge University looking into understanding multi-modal passenger behaviour at city-scale. His research interests are in transport modelling, machine learning and big data.


Past Webinars

  • Date: 11 October 201711 October 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Exploratory Data Analysis in Python

In machine learning and data science, fancy new algorithms get all the press, but the real battle is won in the trenches. You are only as good as your data. Garbage in, garbage out. Exploratory data analysis, pre-processing, and feature engineering are the unsung heroes of data science and machine learning. This webinar will use python to illustrate a number of tools and strategies to take an unprocessed data set (such as one that you might find online or receive from a colleague) and do exploratory analysis, clean the data, and engineer features to feed to machine learning algorithms.

About the speaker: Patrick Short

Patrick is a teaching fellow at Cambridge Spark and conducting his PhD at Cambridge University at the Wellcome Trust Sanger Institute. Patrick's work focuses on analyzing the role of non-coding mutations in developmental disorders. Patrick is integrating a variety of diverse data sets including epigenetic marks, transcription factor binding predictions, and detailed clinical phenotypes with genome sequence data. Patrick completed his undergraduate education at the University of North Carolina at Chapel Hill (UNC) with a double-major in Applied Mathematics and Quantitative Biology with a minor in Chinese.

Receive the recording of this past webinar

  • Date: 6 September 201706 September 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Getting Started with Spark and Zeppelin in Python

You may hear a lot of buzz about Spark in the Big Data Space. What is it all about and why should you care? In this interactive webinar, you will get familiar with the Spark RDD API which lets you process data using functional-style patterns. Through live coded examples in Python, you will explore a real-word dataset made of JSON entries. In addition, you will discover how simple it is to scale the data processing over a cluster of computers using AWS EMR. At the same time, you will learn about the new cool interactive notebook on the block, which supports common data visualisation and filtering out of the box: Zeppelin.

About the speaker: Dr Raoul-Gabriel Urma

Raoul-Gabriel Urma is CEO of Cambridge Spark, a leading learning community for data scientists and developers. Raoul is author of the bestselling programming book "Java 8 in Action" which sold over 20,000 copies globally. He completed a PhD in Computer Science at the University of Cambridge. In addition, he holds a MEng in Computer Science from Imperial College London and graduated with first class honours having won several prizes for technical innovation. Raoul has delivered over 100 technical talks at international conferences. He has worked for Google, eBay, Oracle, and Goldman Sachs, and a Fellow of the Royal Society of Arts.

Receive the recording of this past webinar

Get in touch

Send an email to events@cambridgespark.com or fill in our contact form. We’ll be sure to get back to you soon.

Contact our team