Spark Magic

BeakerX has a Spark magic that provides deeper integration with Spark. It provides a GUI dialog for connecting to a cluster, a progress meter that shows how your job is working and links to the regular Spark UI, and it forwards kernel interrupt messages onto the cluster so you can stop a job without leaving the notebook, and it automatically displays Datasets using an interactive widget. Finally, it automatically closes the Spark session when the notebook is closed.

In [ ]:
%%classpath add mvn
org.apache.spark spark-sql_2.11 2.2.1

The spark cell magic can be run all by itself in a cell. It produces a GUI dialog you fill out to connect to your cluster.

In [ ]:
%%spark

Optionally, the contents of the cell can produce a Spark session to fill out default values for the GUI. Only one spark magic can be active at a time.

In [ ]:
%%spark
SparkSession.builder()
      .appName("BeakerX Demo")
      .master("local[4]")

You can also provide a --connect (or -c) option to automatically connect with the cluster.

In [ ]:
%%spark --connect
SparkSession.builder().master("local[100]")
In [ ]:
val NUM_SAMPLES = 10000000

val count2 = spark.sparkContext.parallelize(1 to NUM_SAMPLES).map{i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)

println("Pi is roughly " + 4.0 * count2 / NUM_SAMPLES)

By default the first 1000 rows are materialized to preview a dataset.

In [ ]:
val tornadoesPath = java.nio.file.Paths.get("../resources/data/tornadoes_2014.csv").toAbsolutePath()

val ds = spark.read.format("csv").option("header", "true").load("file://" + tornadoesPath)
ds

Or you can use the display method to specify any number of rows.

In [ ]:
ds.display(1)