Since Spark 2.0 SparkSession has become an entry point to PySpark to work with RDD, and DataFrame.
What is SparkSession
SparkSession was introduced in version 2.0, It is an entry point to underlying PySpark functionality in order to programmatically create PySpark RDD, DataFrame. It’s object spark
is default available in pyspark-shell and it can be created programmatically using SparkSession.
1. SparkSession
With Spark 2.0 a new class SparkSession (pyspark.sql import SparkSession
) has been introduced. SparkSession is a combined class for all different contexts we used to have prior to 2.0 release (SQLContext and HiveContext e.t.c). Since 2.0 SparkSession can be used in replace with SQLContext, HiveContext, and other contexts defined prior to 2.0.
SparkSession internally creates SparkConfig and SparkContext with the configuration provided with SparkSession.
SparkSession also includes all the APIs available in different contexts –
- SparkContext,
- SQLContext,
- StreamingContext,
- HiveContext.
How many SparkSessions can you create in a PySpark application?
You can create as many SparkSession
as you want in a PySpark application using either SparkSession.builder()
or SparkSession.newSession()
. Many Spark session objects are required when you wanted to keep PySpark tables (relational entities) logically separated.
2. SparkSession in PySpark shell
Be default PySpark shell provides “spark
” object; which is an instance of SparkSession class. We can directly use this object where required in spark-shell. Start your “pyspark
” shell from $SPARK_HOME\bin
folder and enter the pyspark
command.
Create SparkSession
In order to create SparkSession programmatically (in .py file) in PySpark, you need to use the builder pattern method builder()
as explained below. getOrCreate()
method returns an already existing SparkSession; if not exists, it creates a new SparkSession.
# Create SparkSession from builder import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.master("local[1]").appName('Sparktutorial.com').getOrCreate()