CourseData ScienceBeginner

Apache Spark - Hands on with Big Data

Apache Spark tutorial with 15+ hours of content and several hands-on examples of analyzing large data sets!!

0 enrolledEnglish
Z

Zep Admin

Apache Spark - Hands on with Big Data

₹999

  • 50 lessons
  • 0 students enrolled
  • English
  • Beginner

About this course

Are you interested in learning Apache Spark from scratch? Do you want to learn how to work with the Spark Core, Spark SQL, Spark Streaming, Spark GraphX, and Spark MLlib modules? Then this is the perfect course for you!!

Our comprehensive 15+ hour course is designed to provide you with a detailed introduction to Apache Spark and its various modules. We start by introducing you to the basics of Apache Spark, its architecture, and how to set up a Spark cluster. You'll learn how to write Spark programs using the Scala programming language and how to use various APIs to interact with Spark.

The Spark Core module covers the essential building blocks of Spark, such as Resilient Distributed Datasets (RDDs), transformations, and actions. We'll teach you how to work with RDDs, perform transformations on them, and apply actions to obtain the desired results.

In the Spark SQL module, you'll learn how to apply SQL queries on Spark data sources, create DataFrames and Datasets, and manipulate them using DataFrame operations. The Spark Streaming module covers working with real-time data streams with Spark, enabling you to build scalable real-time data processing and analytics applications.

In the Spark GraphX module, you will learn how to work with graph data models, manipulate graphs, apply graph algorithms, and work with graph data sources. We'll teach you how to build graph-based processing pipelines that are optimized for performance and scalability.

Finally, we'll cover the Spark MLlib module, which provides a wide array of machine learning algorithms and data types. You'll learn how to apply various machine learning techniques on Spark, ranging from simple models such as linear regression to advanced techniques such as decision trees and random forests.

Throughout the course, you'll find multiple quiz questions to test your knowledge of the material. We'll provide detailed explanations for each question and give you feedback on how you can improve.

By the end of this course, you'll be confident in your ability to work with Apache Spark and its various modules. You'll possess the skills and knowledge needed to build scalable, high-performance data processing and analytics applications with Apache Spark. Enroll in our course today and start your journey towards mastering Apache Spark!
 

What you'll learn

  • Gain a comprehensive understanding of Apache Spark and its key features
  • Learn how to set up a Spark cluster and write Spark programs using the Scala programming language
  • Master the Spark Core module, including Resilient Distributed Datasets (RDDs), transformations, and actions
  • Explore the Spark SQL module and learn how to apply SQL queries on Spark data sources, create DataFrames and Datasets, and manipulate them using DataFrame operations
  • Dive into real-time data streams with Spark Streaming, enabling you to build scalable real-time data processing and analytics applications
  • Work with graph data models, manipulate graphs, apply graph algorithms, and work with graph data sources in the Spark GraphX module
  • Apply various machine learning techniques on Spark with the Spark MLlib module, including simple models such as linear regression and advanced techniques such as decision trees and random forests
  • Test your knowledge with multiple quiz questions provided at the end of the course
  • Build scalable, high-performance data processing and analytics applications with Apache Spark.

This course includes:

  • 16 hours of on-demand video
  • 100 quiz questions
  • 2 projects
  • Certificate of completion

Curriculum

1 section · 50 lessons

  • Introduction
  • Spark Configurations - Spark Versions
  • Spark Configurations - Spark context and sessions
  • Spark Configurations - Operations
  • Spark application execution continued
  • Spark official documentation
  • Spark MR difference
  • Spark Introduction Continued
  • Spark Introduction
  • Spark Features
  • Spark configurations
  • Apache Spark Availability
  • Spark Architecture
  • Spark User Interface
  • Shared Variables - Broadcast
  • Spark code - Shared Variables
  • Spark code - Shared Variables continued
  • Shared Variables - Accumulator
  • Word Count example execution
  • Output - Spark Application
  • RDD examples - parallelize method
  • Transformation & Action - RDD
  • RDD - Spark data objects
  • Quick Start VM - Vendors
  • Quick Start VM - cloudera Practice
  • Spark Shell
  • Persist and Unpersist
  • Analysis on output
  • Hadoop Stack
  • Google Colab - Practice
  • ETL and how spark Fits in
  • ETL pipeline
  • Spark Core Closure
  • Directed Acyclic Graph continued
  • Directed Acyclic Graph
  • Tools comparions
  • Cluster setup - Storage
  • Cluster Resources
  • Cluster Setup
  • Spark Installation and configs
  • Cluster Architecture
  • Cluster - Application Execution Modes
  • Big Data V's
  • Big Data continued
  • Big Data Introduction
  • Big Data Storage
  • Big Data Capabilities
  • Big Data Problems
  • Big Data Solutions to the problems
  • Spark Application Execution
Apache Spark - Hands on with Big Data | ZepAnalytics