Spark is a popular framework for processing big data, widely used in the industry. This course will give you a short introduction to the popular Python API to the Spark framework.

The goal of the course is to get the participant started using PySpark through interactive lession in the parts of PySpark one uses the most as a Data Scientist.

Participant profile

You have experience with programming from your daily work with data or from your education and have a basic Mathematical understanding. You seek an introduction to PySpark, so you can start using it in your daily work.

Content

  1. Introduction Introduction to the Course.
  2. Azure Databricks Notebooks Introduktion til Spark Azure Databricks Notebooks.
  3. Spark Introduktion til Spark.
  4. Spark SQL DataFrame API’en. Spark SQL. Data Aggregation. Window Functions. User Defined Functions
  5. Spark ML Machine Learning med pyspark.ml.