PySpark – Python Spark Hadoop coding framework & testing

PySpark - Python Spark Hadoop coding framework & testing

Big data Python Spark PySpark coding framework logging error handling unit testing PyCharm PostgreSQL Hive data pipeline

What you’ll learn

  • Python Spark PySpark industry standard coding practices – Logging, Error Handling, reading configuration, unit testing
  • Building a data pipeline using Hive, Spark and PostgreSQL
  • Python Spark Hadoop development using PyCharm



This course will bridge the gap between your academic and real world knowledge and prepare you for an entry level Big Data Python Spark developer role. You will learn the following

  • Python Spark coding best practices
  • Logging
  • Error Handling
  • Reading configuration from properties file
  • Doing development work using PyCharm
  • Using your local environment as a Hadoop Hive environment
  • Reading and writing to a Postgres database using Spark
  • Python unit testing framework
  • Building a data pipeline using Hadoop , Spark and Postgres

Prerequisites :

  • Basic programming skills
  • Basic database knowledge
  • Hadoop entry level knowledge

Who this course is for:

  • Students looking at moving from Big Data Spark academic background to a real world developer role

Course content

8 sections • 33 lectures • 2h 3m total length
  • Introduction
  • Setting up Hadoop Spark development environment
  • Creating a PySpark coding framework
  • Logging and Error Handling
  • Creating a Data Pipeline with Hadoop Spark and PostgreSQL
  • Reading configuration from properties file
  • Unit testing PySpark application
  • spark-submit

Created by: FutureX Skill, (Big Data, Cloud and AI Solution Architects)

Last updated 12/2020
English [Auto]
853.4 MB (Direct Download Available)
Hot & New
Rating: 4.3 out of 5
(15 ratings)
4,086 students

Download link

Add a Comment

Your email address will not be published. Required fields are marked *