In [1]:
%%help
Magic Example Explanation
info %%info Outputs session information for the current Livy endpoint.
cleanup %%cleanup -f Deletes all sessions for the current Livy endpoint, including this notebook's session. The force flag is mandatory.
delete %%delete -f -s 0 Deletes a session by number for the current Livy endpoint. Cannot delete this kernel's session.
logs %%logs Outputs the current session's Livy logs.
configure %%configure -f
{"executorMemory": "1000M", "executorCores": 4}
Configure the session creation parameters. The force flag is mandatory if a session has already been created and the session will be dropped and recreated.
Look at Livy's POST /sessions Request Body for a list of valid parameters. Parameters must be passed in as a JSON string.
sql %%sql -o tables -q
SHOW TABLES
Executes a SQL query against the sqlContext. Parameters:
  • -o VAR_NAME: The result of the query will be available in the %%local Python context as a Pandas dataframe.
  • -q: The magic will return None instead of the dataframe (no visualization).
  • -m METHOD: Sample method, either take or sample.
  • -n MAXROWS: The maximum number of rows of a SQL query that will be pulled from Livy to Jupyter. If this number is negative, then the number of rows will be unlimited.
  • -r FRACTION: Fraction used for sampling.
local %%local
a = 1
All the code in subsequent lines will be executed locally. Code must be valid Python code.
In [2]:
%%info
Current session configs: {'kind': 'pyspark', 'driverMemory': '1000M', 'executorCores': 2}
No active sessions.
In [3]:
%%logs
No logs yet.
In [4]:
sc.parallelize(range(1000)).count()
Creating SparkContext as 'sc'
IDYARN Application IDKindStateSpark UIDriver logCurrent session?
4Nonepysparkidle
Creating HiveContext as 'sqlContext'
SparkContext and HiveContext created. Executing user code ...
1000
In [5]:
import os
print(os.environ.get('SPARK_HOME', None))
print(os.environ.get('HADOOP_CONF_DIR', None))
/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark
/etc/hadoop/conf:/etc/hive/conf:/etc/hive/conf
In [6]:
%%info
Current session configs: {'kind': 'pyspark', 'driverMemory': '1000M', 'executorCores': 2}
IDYARN Application IDKindStateSpark UIDriver logCurrent session?
4Nonepysparkidle
In [7]:
sc.parallelize(range(1000)).count()
1000
In [8]:
sc.parallelize(range(2000)).count()
2000
In [9]:
%%sql 
show tables
In [10]:
%%sql
select * from movies_pq_s3 limit 100
In [11]:
%%sql -o ratings
select movieid, rating from ratings_pq_s3
In [12]:
%%local
%matplotlib inline
import matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
sns.distplot(ratings.rating, kde=False, rug=True)
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x115f7d0b8>