%autosave 10
Autosaving every 10 seconds
Idea: store in DB, load post-processes subset into pandas. It'd be nice if we could just do everything on a regular laptop
!!AI isn't this why HDF5 was invented? Or is that only suitable for numeric data?
import pandas as pd
import pandasql
# Useful shim, saves typing
pysqldf = lambda q: pandasql.sqldf(q, globals())
# !!AI maybe use examples from Intro to Data Science course,
# it's identical to this.
The most interesting part isn't that it's an Object Relational Mapper (ORM).
It executes queries in layers, where the ORM is optional.
!!AI the speaker gives an SQLAlchemy tutorial via IPython Notebook.
math.stackexchange.com
Posts.xml.etree.iterparse
because the XML file is massive, don't load it all into memory.# !!AI won't run, just the gist
import pandas.io.sql
import psycopg2
connection = psycopg2.connect() # !!AI TODO fill in
math_by_date = pandas.io.sql.read_sql("""\
SELECT ...
FROM...
WHERE ...
AND .
AND ...
GROUP BY ...
""", connection)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-8-cbea362ebd4b> in <module>() 11 AND ... 12 GROUP BY ... ---> 13 """, connection) /usr/local/lib/python2.7/site-packages/pandas/io/sql.pyc in read_frame(sql, con, index_col, coerce_float, params) 158 List of parameters to pass to execute method. 159 """ --> 160 cur = execute(sql, con, params=params) 161 rows = _safe_fetch(cur) 162 columns = [col_desc[0] for col_desc in cur.description] /usr/local/lib/python2.7/site-packages/pandas/io/sql.pyc in execute(sql, con, retry, cur, params) 51 except Exception: 52 try: ---> 53 con.rollback() 54 except Exception: # pragma: no cover 55 pass AttributeError: 'NoneType' object has no attribute 'rollback'
Error on sql SELECT ... FROM... WHERE ... AND .. AND ... GROUP BY ...
ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line string', (1, 0))
# More work with pandas.io.sql
pandas.io.sql.read_sql
knows about HSTORE obviously, because it executes queries directly using a psycopg2
connection.