Data Science Webinar Series


  • Date: 6 September 201706 September 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Getting Started with Spark and Zeppelin in Python

You may hear a lot of buzz about Spark in the Big Data Space. What is it all about and why should you care? In this interactive webinar, you will get familiar with the Spark RDD API which lets you process data using functional-style patterns. Through live coded examples in Python, you will explore a real-word dataset made of JSON entries. In addition, you will discover how simple it is to scale the data processing over a cluster of computers using AWS EMR. At the same time, you will learn about the new cool interactive notebook on the block, which supports common data visualisation and filtering out of the box: Zeppelin.

About the speaker: Dr Raoul-Gabriel Urma

Raoul-Gabriel Urma is CEO of Cambridge Spark, a leading learning community for data scientists and developers. Raoul is author of the bestselling programming book "Java 8 in Action" which sold over 20,000 copies globally. He completed a PhD in Computer Science at the University of Cambridge. In addition, he holds a MEng in Computer Science from Imperial College London and graduated with first class honours having won several prizes for technical innovation. Raoul has delivered over 100 technical talks at international conferences. He has worked for Google, eBay, Oracle, and Goldman Sachs, and a Fellow of the Royal Society of Arts.

Register for this free webinar series

  • Date: 11 October 201711 October 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Exploratory Data Analysis in Python

In machine learning and data science, fancy new algorithms get all the press, but the real battle is won in the trenches. You are only as good as your data. Garbage in, garbage out. Exploratory data analysis, pre-processing, and feature engineering are the unsung heroes of data science and machine learning. This webinar will use python to illustrate a number of tools and strategies to take an unprocessed data set (such as one that you might find online or receive from a colleague) and do exploratory analysis, clean the data, and engineer features to feed to machine learning algorithms.

About the speaker: Patrick Short

Patrick is a teaching fellow at Cambridge Spark and conducting his PhD at Cambridge University at the Wellcome Trust Sanger Institute. Patrick's work focuses on analyzing the role of non-coding mutations in developmental disorders. Patrick is integrating a variety of diverse data sets including epigenetic marks, transcription factor binding predictions, and detailed clinical phenotypes with genome sequence data. Patrick completed his undergraduate education at the University of North Carolina at Chapel Hill (UNC) with a double-major in Applied Mathematics and Quantitative Biology with a minor in Chinese.

Register for this free webinar series

  • Date: 02 November 201702 November 2017 - 1700-1745 BST
  • Free webinarFree webinar and resources

Making Sense of Big Data File Formats

Modern applications generate and manipulate a lot of data. The growth rate of the data is staggering. Unfortunately, large datasets can be expensive to store at large scale and also slow to process. In fact, memory speed has been evolving at a much lower rate in comparison to CPUs. Thankfully, there are various file formats suited for big data systems to help. In this webinar, you will learn about two popular file formats suitable for big data systems: Avro and Parquet. Through live coded examples in Python, you will learn the good, the bad, the ugly, and how you can make use of Avro and Parquet in practice.

About the speaker: Dr Raoul-Gabriel Urma

Raoul-Gabriel Urma is CEO of Cambridge Spark, a leading learning community for data scientists and developers. Raoul is author of the bestselling programming book "Java 8 in Action" which sold over 20,000 copies globally. He completed a PhD in Computer Science at the University of Cambridge. In addition, he holds a MEng in Computer Science from Imperial College London and graduated with first class honours having won several prizes for technical innovation. Raoul has delivered over 100 technical talks at international conferences. He has worked for Google, eBay, Oracle, and Goldman Sachs, and a Fellow of the Royal Society of Arts.

Register for this free webinar series

Get in touch

Send an email to events@cambridgespark.com or fill in our contact form. We’ll be sure to get back to you soon.

Contact our team