Introduction to Big Data and Spark

Learn about Big Data Architectures, Hadoop and Spark

LEVEL: BEGINNER
DURATION: 1-DAY COURSE
DELIVERED: AT YOUR OFFICE

What you will learn

rawpixel-edited

This one-day course will provide a hands-on introduction to the Big Data ecosystem, Hadoop and Apache Spark in practice.

Understand the challenges in the Big Data ecosystem
Describe the fundamentals of the Hadoop ecosystem
Use the core Spark RDD API to express data processing queries
Monitoring and tuning

Languages and libraries :

Python 3
Spark

PREREQUISITES

Elementary Python programming and use of the command line. You can acquire these skills at our Python bootcamp

AUDIENCE

Those who are curious about the Big Data space and who want to feel comfortable getting their hands dirty with high volume, high velocity, diverse real-world datasets.

Get in touch with us to learn about the course

Session 1

Introduction to “Big Data”

  • Volume, Velocity, Variety
  • Scaling horizontally
  • Batch vs Streaming
  • NoSQL landscape
  • Lambda architecture

 

Session 2

Hadoop Ecosystem

  • Architecture overview
  • HDFS
  • The MapReduce pattern
Session 3

Spark

  • Architecture overview
  • Resilient Distributed Datasets (RDDs)
  • Transformation, Actions, and DAG
  • RDD programming API
  • Using Amazon EMR and Spark

 

Session 4

Tuning

  • RDD caching
  • Broadcast variables
  • Accumulators
  • Pipeline tuning

Get in Touch

CONTACT US

We will email you within the next 24 hours to arrange a quick call to help with any questions about the programme and recommend pre-course materials.

We look forward to speaking with you.

Dr. Raoul-Gabriel Urma

Dr. Raoul-Gabriel Urma

Director