Introduction to Big Data Analytics and Engineering

3Vs of big data, Lambda architecture, MapReduce, Spark, pipeline tuning and more.

Next Date: 22 Apr 2017


Level

intermediate

Location

London and Cambridge

Duration

one day Course



What you will learn

You will learn about Big data, the Hadoop ecosystem and Spark in practice. After taking this class you will be able to:

  • Understand the challenges in the Big Data ecosystem
  • Describe the fundamentals of the Hadoop ecosystem
  • Implement your own MapReduce applications
  • Use the core Spark APIs to express data processing queries
  • Optimise data pipelines

Languages and libraries

  • Python programming language
  • Hadoop
  • Spark

Progression paths

Learn state-of-the art machine learning techniques at our Machine Learning Techniques using Python bootcamp.

Acquire specialised Natural Language Processing skills at our Text Mining and Natural Language Processing with Python bootcamp.


Prerequisites

Audience: Those who are curious about the Big Data space and who want to feel comfortable getting their hands dirty with high volume, high velocity, diverse real-world datasets

Prerequisites: Good knowledge of python, some familiarity with matrices, basic understanding of machine learning practice (as taught in Introduction to Data Science)

There are five modules in this course. After completing the course, you will understand the challenges in the Big Data ecosystem, describe the fundamentals of the Hadoop ecosystem, implement your own MapReduce applications, use the core Spark APIs to express data processing queries and optimise data pipelines.

Session 1

Introduction to "Big Data"

  • Volume, Velocity, Variety
  • Scaling horizontally
  • Batch vs Streaming
  • NoSQL landscape
  • Lambda architecture

Session 2

Hadoop Ecosystem

  • Architecture overview
  • HDFS
  • The MapReduce pattern
  • Using Pydoop

Session 3

Spark

  • Architecture overview
  • Resilient Distributed Datasets (RDDs)
  • Transformation, Actions, and DAG
  • RDD programming API

Session 4

Tuning

  • RDD caching
  • Broadcast variables
  • Accumulators
  • Pipeline tuning

Evening

Social

  • Drinks with fellow participants and lecturers

Highlights

Check out video highlights, photos and interviews from our previous bootcamps.


Book your ticket

Event:
22 Apr, London
Location:
THECUBE - Studio 5 , 155 COMMERCIAL STREET, E1 6BJ, London (London)
Ticket:
Ticket includes course materials, code resources, lunch and networking drinks.

In-house Training

Get in touch to discuss your requirements by emailing contact@cambridgespark.com or by completing our contact form.

We can deliver this course as a private training at your office during week days.

We can also design a bespoke curriculum matching your specific training objectives.