Apache Spark is the next-generation successor to MapReduce. Spark is a powerful, open source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex, iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs.

This course will provide you an excellent kick start in building your fundamentals in developing big data solutions using Apache Spark platform. The course is well balanced between theory and hands-on lab (more than 10 lab exercises) spread on real world uses cases.

What participants will learn?

The attendees will learn below topics through lectures and hands-on exercises


3 Days

Intended Audience

Architects, developers & data scientists who wish to write, build and maintain Apache Spark jobs.


All the programming will be done using Python, hence the participants should have basic programming knowledge of Python. It is advised to refresh these skills to obtain maximum benefit from this workshop.

Detailed Course Outline

Big Data & Spark Overview

Spark Architecture – Deep Dive

Spark APIs & Usages

Working with Advanced Spark Features

Writing Spark Streaming Applications

Using Spark Machine Learning Algorithms

Optimizing and Tuning Spark Applications

End to End Use Case Implementation

Write to mailto:manaranjan@enablecloud.com or mailto:manaranjan@gmail.com, if you are interested to conduct a workshop for your organization or a group of people.