A Big Data Hadoop and Spark project for absolute beginners

A Big Data Hadoop and Spark project for absolute beginners

Hadoop, Spark, Python, PySpark, Scala, Hive, coding framework, testing, IntelliJ, Maven, PyCharm, Glue, AWS, Streaming

What you’ll learn

  • Big Data , Hadoop and Spark from scratch by solving a real world use case using Python and Scala
  • Spark Scala & PySpark real world coding framework.
  • Real world coding best practices, logging, error handling , configuration management using both Scala and Python.
  • Serverless big data solution using AWS Glue, Athena and S3



Get started with Big Data quickly leveraging free cloud cluster and solving a real world use case!  Learn Hadoop, Hive , Spark (both Python and Scala) from scratch!

Learn to code Spark Scala & PySpark like  a real world developer. Understand real world coding best practices, logging, error handling , configuration management using both Scala and Python.


A bank is launching a new credit card and wants to identify prospects it can target in its marketing campaign.

It has received prospect data from various internal and 3rd party sources. The data has various issues such as missing or unknown values in certain fields. The data needs to be cleansed before any kind of analysis can be done.

Since the data is in huge volume with billions of records, the bank has asked you to use Big Data Hadoop and Spark technology to cleanse, transform and analyze this data.

What you will learn :

  • Big Data, Hadoop concepts
  • How to create a free Hadoop and Spark cluster using Google Dataproc
  • Hadoop hands-on – HDFS, Hive
  • Python basics
  • PySpark RDD – hands-on
  • PySpark SQL, DataFrame – hands-on
  • Project work using PySpark and Hive
  • Scala basics
  • Spark Scala DataFrame
  • Project work using Spark Scala
  • Spark Scala Real world coding framework and development using Winutil, Maven and IntelliJ.
  • Python Spark Hadoop Hive coding framework and development using PyCharm
  • Building a data pipeline using Hive , PostgreSQL, Spark
  • Logging , error handling and unit testing of PySpark and Spark Scala applications
  • Spark Scala Structured Streaming
  • Applying spark transformation on data stored in AWS S3 using Glue and viewing data using Athena

Prerequisites :

Who this course is for:

  • Beginners who want to learn Big Data or experienced people who want to transition to a Big Data role
  • Big data beginners who want to learn how to code in the real world

Course content

19 sections • 95 lectures • 8h 41m total length
  • Introduction
  • Big Data Hadoop concepts
  • Hadoop – Hands-On
  • Spark concepts and hands-on
  • Project – Bank prospects marketing data cleansing using Spark
  • Running the project in Scala
  • Bank prospects data transformation using AWS S3, Glue and Athena
  • Advanced Hive
  • Advanced Spark
  • Spark Scala real world coding framework and best practices
  • A Data Pipeline with Spark Scala Hadoop PostgreSQL
  • Spark Scala Unit Testing using ScalaTest
  • Running Spark and Hive on a Cloudera QuickStart VM on GCP
  • Spark Scala – Structured Streaming
  • Creating a PySpark real world coding framework
  • PySpark Logging and Error Handling
  • Creating a Data Pipeline with Hadoop PySpark and PostgreSQL
  • PySpark – Reading Configuration from properties file
  • Unit testing PySpark application and spark-submit

Created by: FutureX Skill, (Big Data, Cloud and AI Solution Architects)

Last updated 12/2020
3.5 GB (Direct Download Available)

(100 ratings)
4,649 students

Download link

Friendly Websites

Related Posts

Add a Comment

Your email address will not be published. Required fields are marked *