PySpark – What is SparkSession?

Since Spark 2.0 SparkSession has become an entry point to PySpark to work with RDD, and DataFrame.

 

What is SparkSession

SparkSession was introduced in version 2.0, It is an entry point to underlying PySpark functionality in order to programmatically create PySpark RDD, DataFrame. It’s object spark is default available in pyspark-shell and it can be created programmatically using SparkSession.

 

1. SparkSession

With Spark 2.0 a new class SparkSession (pyspark.sql import SparkSession) has been introduced. SparkSession is a combined class for all different contexts we used to have prior to 2.0 release (SQLContext and HiveContext e.t.c). Since 2.0 SparkSession can be used in replace with SQLContext, HiveContext, and other contexts defined prior to 2.0.

SparkSession internally creates SparkConfig and SparkContext with the configuration provided with SparkSession.

SparkSession also includes all the APIs available in different contexts –

  • SparkContext,
  • SQLContext,
  • StreamingContext,
  • HiveContext.

How many SparkSessions can you create in a PySpark application?

You can create as many SparkSession as you want in a PySpark application using either SparkSession.builder() or SparkSession.newSession(). Many Spark session objects are required when you wanted to keep PySpark tables (relational entities) logically separated.

 

2. SparkSession in PySpark shell

Be default PySpark shell provides “spark” object; which is an instance of SparkSession class. We can directly use this object where required in spark-shell. Start your “pyspark” shell from $SPARK_HOME\bin folder and enter the pyspark command.

 

Create SparkSession

In order to create SparkSession programmatically (in .py file) in PySpark, you need to use the builder pattern method builder() as explained below. getOrCreate() method returns an already existing SparkSession; if not exists, it creates a new SparkSession.

# Create SparkSession from builder 
import pyspark from pyspark.sql import SparkSession 
spark = SparkSession.builder.master("local[1]").appName('Sparktutorial.com').getOrCreate()

 

Leave a Reply