CourseData ScienceBeginner

Apache Spark - Hands on with Big Data

Apache Spark tutorial with 15+ hours of content and several hands-on examples of analyzing large data sets!!

5.0(1)38 enrolledEnglish

Zep Admin

₹999

50 lessons
38 students enrolled
English
Beginner

About this course

Are you interested in learning Apache Spark from scratch? Do you want to learn how to work with the Spark Core, Spark SQL, Spark Streaming, Spark GraphX, and Spark MLlib modules? Then this is the perfect course for you!!

Our comprehensive 15+ hour course is designed to provide you with a detailed introduction to Apache Spark and its various modules. We start by introducing you to the basics of Apache Spark, its architecture, and how to set up a Spark cluster. You'll learn how to write Spark programs using the Scala programming language and how to use various APIs to interact with Spark.

The Spark Core module covers the essential building blocks of Spark, such as Resilient Distributed Datasets (RDDs), transformations, and actions. We'll teach you how to work with RDDs, perform transformations on them, and apply actions to obtain the desired results.

In the Spark SQL module, you'll learn how to apply SQL queries on Spark data sources, create DataFrames and Datasets, and manipulate them using DataFrame operations. The Spark Streaming module covers working with real-time data streams with Spark, enabling you to build scalable real-time data processing and analytics applications.

In the Spark GraphX module, you will learn how to work with graph data models, manipulate graphs, apply graph algorithms, and work with graph data sources. We'll teach you how to build graph-based processing pipelines that are optimized for performance and scalability.

Finally, we'll cover the Spark MLlib module, which provides a wide array of machine learning algorithms and data types. You'll learn how to apply various machine learning techniques on Spark, ranging from simple models such as linear regression to advanced techniques such as decision trees and random forests.

Throughout the course, you'll find multiple quiz questions to test your knowledge of the material. We'll provide detailed explanations for each question and give you feedback on how you can improve.

By the end of this course, you'll be confident in your ability to work with Apache Spark and its various modules. You'll possess the skills and knowledge needed to build scalable, high-performance data processing and analytics applications with Apache Spark. Enroll in our course today and start your journey towards mastering Apache Spark!

What you'll learn

Gain a comprehensive understanding of Apache Spark and its key features
Learn how to set up a Spark cluster and write Spark programs using the Scala programming language
Master the Spark Core module, including Resilient Distributed Datasets (RDDs), transformations, and actions
Explore the Spark SQL module and learn how to apply SQL queries on Spark data sources, create DataFrames and Datasets, and manipulate them using DataFrame operations
Dive into real-time data streams with Spark Streaming, enabling you to build scalable real-time data processing and analytics applications
Work with graph data models, manipulate graphs, apply graph algorithms, and work with graph data sources in the Spark GraphX module
Apply various machine learning techniques on Spark with the Spark MLlib module, including simple models such as linear regression and advanced techniques such as decision trees and random forests
Test your knowledge with multiple quiz questions provided at the end of the course
Build scalable, high-performance data processing and analytics applications with Apache Spark.

This course includes:

16 hours of on-demand video
100 quiz questions
2 projects
Certificate of completion

Curriculum

1 section · 50 lessons

Introduction
Spark Configurations - Spark Versions
Spark Configurations - Spark context and sessions
Spark Configurations - Operations
Spark application execution continued
Spark official documentation
Spark MR difference
Spark Introduction Continued
Spark Introduction
Spark Features
Spark configurations
Apache Spark Availability
Spark Architecture
Spark User Interface
Shared Variables - Broadcast
Spark code - Shared Variables
Spark code - Shared Variables continued
Shared Variables - Accumulator
Word Count example execution
Output - Spark Application
RDD examples - parallelize method
Transformation & Action - RDD
RDD - Spark data objects
Quick Start VM - Vendors
Quick Start VM - cloudera Practice
Spark Shell
Persist and Unpersist
Analysis on output
Hadoop Stack
Google Colab - Practice
ETL and how spark Fits in
ETL pipeline
Spark Core Closure
Directed Acyclic Graph continued
Directed Acyclic Graph
Tools comparions
Cluster setup - Storage
Cluster Resources
Cluster Setup
Spark Installation and configs
Cluster Architecture
Cluster - Application Execution Modes
Big Data V's
Big Data continued
Big Data Introduction
Big Data Storage
Big Data Capabilities
Big Data Problems
Big Data Solutions to the problems
Spark Application Execution

₹999

50 lessons
38 students enrolled
English
Beginner

Apache Spark - Hands on with Big Data

About this course

Curriculum

Spark Core50 lessons