Introduction to Big Data and PySpark

 Upskill data scientists in the Big Data technologies landscape and Pyspark as a distributed processing engine

LEVEL: BEGINNER
DURATION: 2-DAYS COURSE
DELIVERED: AT YOUR OFFICE

What you will learn

rawpixel-edited

This two-days course will provide a hands-on introduction to the Big Data ecosystem, Hadoop and Apache Spark in practice.

Understand the challenges in the Big Data ecosystem
Describe the fundamentals of the Hadoop ecosystem
Use the core Spark RDD API to express data processing queries
Monitoring and tuning

Languages and libraries :

Python 3
Spark

PREREQUISITES

Fundamentals Python programming and use of the command line. You can acquire these skills at our Python bootcamp.

AUDIENCE

Those who are curious about the Big Data space and who want to feel comfortable getting their hands dirty with high volume, high velocity, diverse real-world datasets.

DAY ONE

Introduction to Big Data and Spark

DAY TWO

Spark in Practice

Case Studies

Learning  Outcomes:

Get familiar with the concepts behind Big Data
Learn the theory behind Spark and how to process data in practice
Use Spark Dataframes
Learn the fundamentals of Spark tuning and partitions
Be able to test Spark code
Get familiar with common big data format and how to work with the Parquet format

KATE Projects:

  • Big Data: Spark RDD API for data processing

Get in Touch 

Interested in learning more?

If you’re interested in what the ‘Introduction to Big Data and PySpark’ module could do for your team or department, please complete the form to the right of this text and we’ll get back to you within two working days with more information.

Get in touch now

Please complete all of the required fields to get in touch with us. Alternatively, call +44(0)7816 419378 or email contact@cambridgespark.com now