In [1]:

%%help

Magic	Example	Explanation
info	%%info	Outputs session information for the current Livy endpoint.
cleanup	%%cleanup -f	Deletes all sessions for the current Livy endpoint, including this notebook's session. The force flag is mandatory.
delete	%%delete -f -s 0	Deletes a session by number for the current Livy endpoint. Cannot delete this kernel's session.
logs	%%logs	Outputs the current session's Livy logs.
configure	%%configure -f {"executorMemory": "1000M", "executorCores": 4}	Configure the session creation parameters. The force flag is mandatory if a session has already been created and the session will be dropped and recreated. Look at Livy's POST /sessions Request Body for a list of valid parameters. Parameters must be passed in as a JSON string.
sql	%%sql -o tables -q SHOW TABLES	Executes a SQL query against the sqlContext. Parameters: -o VAR_NAME: The result of the query will be available in the %%local Python context as a Pandas dataframe. -q: The magic will return None instead of the dataframe (no visualization). -m METHOD: Sample method, either `take` or `sample`. -n MAXROWS: The maximum number of rows of a SQL query that will be pulled from Livy to Jupyter. If this number is negative, then the number of rows will be unlimited. -r FRACTION: Fraction used for sampling.
local	%%local a = 1	All the code in subsequent lines will be executed locally. Code must be valid Python code.

In [2]:

%%info

Current session configs: {'kind': 'pyspark', 'driverMemory': '1000M', 'executorCores': 2}

No active sessions.

In [3]:

%%logs

No logs yet.

In [4]:

sc.parallelize(range(1000)).count()

Creating SparkContext as 'sc'

ID	YARN Application ID	Kind	State	Spark UI	Driver log	Current session?
4	None	pyspark	idle			✔

Creating HiveContext as 'sqlContext'
SparkContext and HiveContext created. Executing user code ...
1000

In [5]:

import os
print(os.environ.get('SPARK_HOME', None))
print(os.environ.get('HADOOP_CONF_DIR', None))

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark
/etc/hadoop/conf:/etc/hive/conf:/etc/hive/conf

In [6]:

%%info

Current session configs: {'kind': 'pyspark', 'driverMemory': '1000M', 'executorCores': 2}

ID	YARN Application ID	Kind	State	Spark UI	Driver log	Current session?
4	None	pyspark	idle			✔

In [7]:

sc.parallelize(range(1000)).count()

In [8]:

sc.parallelize(range(2000)).count()

In [9]:

%%sql 
show tables

In [10]:

%%sql
select * from movies_pq_s3 limit 100

In [11]:

%%sql -o ratings
select movieid, rating from ratings_pq_s3

In [12]:

%%local
%matplotlib inline
import matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
sns.distplot(ratings.rating, kde=False, rug=True)

Out[12]:

<matplotlib.axes._subplots.AxesSubplot at 0x115f7d0b8>