Rethink your database
RethinkDB is the open-source, scalable database that makes building realtime apps dramatically easier.
The query language is called ReQL
RethinkDB is NOT a good choice if you need full ACID support or strong schema enforcement
Open a terminal within the notebook server
From bash just run the command: rethinkdb
import rethinkdb
# establish a connection with the database
connection = rethinkdb.connect()
# execute the very first query
rethinkdb.db_list().run(connection)
['rethinkdb', 'test']
db = 'library'
collection = 'book'
# create db if not existing already
if db not in rethinkdb.db_list().run(connection):
rethinkdb.db_create(db).run(connection)
print("Created db", db)
Created db library
# create table if not existing already
query_db = rethinkdb.db(db)
if collection not in query_db.table_list().run(connection):
query_db.table_create(collection).run(connection)
print("Created collection", collection)
Created collection book
RethinkDB uses a special language/syntax (ReQL) to interact with the data.
ReQL is chainable. You start with a database, chain to table, and chain to other API to get what you want, in a way very natural.
ReQL is a particular binding to your language (!)
query_collection = rethinkdb.db(db).table(collection)
query_collection.insert({
'ISBN': 9780992461225,
'title': "JavaScript: Novice to Ninja",
'author': "Darren Jones",
'format': "ebook",
'price': 29.00,
'publisher': {
'name': "SitePoint",
'country': "Australia",
'email': "feedback@sitepoint.com"
}
}).run(connection)
{'deleted': 0, 'errors': 0, 'generated_keys': ['99ed3de1-1f41-4725-a7e2-8a5f9f311df6'], 'inserted': 1, 'replaced': 0, 'skipped': 0, 'unchanged': 0}
query_collection.run(connection)
<rethinkdb.net.DefaultCursor object at 0x7fcbbc43ff98 (done streaming): [{'title': 'JavaScript: Novice to Ninja', 'format': 'ebook', 'author': 'Darren Jones', 'ISBN': 9780992461225, 'publisher': {'country': 'Australia', 'email': 'feedback@sitepoint.com', 'name': 'SitePoint'}, 'price': 29, 'id': '99ed3de1-1f41-4725-a7e2-8a5f9f311df6'}]>
from beeprint import pp
pp(list(query_collection.run(connection)))
[ { 'ISBN': 9780992461225, 'author': 'Darren Jones', 'format': 'ebook', 'id': '99ed3de1-1f41-4725-a7e2-8a5f9f311df6', 'price': 29, 'publisher': { 'country': 'Australia', 'email': 'feedback@sitepoint.com', 'name': 'SitePoint', }, 'title': 'JavaScript: Novice to Ninja', }, ]
def print_rquery_result(query):
json_output = list(query.run(connection))
pp(json_output)
return
Why did we cast the result of the query to a list?
Default, no admin password
Default, no connection from outside the current host
more info: https://www.rethinkdb.com/docs/permissions-and-accounts/
JSON is the de facto standard when it comes to (un)serialising and exchanging data in web and mobile programming.
But how well do you really know JSON?
query_collection.insert({
'ISBN': 9780992463328,
'title': "Python: from Novice to Zen",
'subtitle': "If only I had read this book before",
'author': "Just Kidding",
'format': [
"illustrations",
"paper cover",
"ebook"
],
'price': 0.00,
'publisher': {
'name': "No Editor",
'country': "Wonderland",
'email': "feedback@wonderland.universe"
}
}).run(connection)
{'deleted': 0, 'errors': 0, 'generated_keys': ['854fd662-2b6a-4aaa-b2a2-3377764fcf77'], 'inserted': 1, 'replaced': 0, 'skipped': 0, 'unchanged': 0}
print_rquery_result(query_collection)
[ { 'ISBN': 9780992461225, 'author': 'Darren Jones', 'format': 'ebook', 'id': '99ed3de1-1f41-4725-a7e2-8a5f9f311df6', 'price': 29, 'publisher': { 'country': 'Australia', 'email': 'feedback@sitepoint.com', 'name': 'SitePoint', }, 'title': 'JavaScript: Novice to Ninja', }, { 'ISBN': 9780992463328, 'author': 'Just Kidding', 'format': ['illustrations', 'paper cover', 'ebook'], 'id': '854fd662-2b6a-4aaa-b2a2-3377764fcf77', 'price': 0, 'publisher': { 'country': 'Wonderland', 'email': 'feedback@wonderland.universe', 'name': 'No Editor', }, 'subtitle': 'If only I had read this book before', 'title': 'Python: from Novice to Zen', }, ]
import json
json.dump(list(query_collection.run(connection)), fp=open('test.json', 'w'))
%cat test.json
[{"title": "JavaScript: Novice to Ninja", "format": "ebook", "author": "Darren Jones", "ISBN": 9780992461225, "publisher": {"country": "Australia", "email": "feedback@sitepoint.com", "name": "SitePoint"}, "price": 29, "id": "99ed3de1-1f41-4725-a7e2-8a5f9f311df6"}, {"subtitle": "If only I had read this book before", "format": ["illustrations", "paper cover", "ebook"], "author": "Just Kidding", "title": "Python: from Novice to Zen", "ISBN": 9780992463328, "publisher": {"country": "Wonderland", "email": "feedback@wonderland.universe", "name": "No Editor"}, "price": 0, "id": "854fd662-2b6a-4aaa-b2a2-3377764fcf77"}]
import pandas as pd
pd.read_json('test.json')
ISBN | author | format | id | price | publisher | subtitle | title | |
---|---|---|---|---|---|---|---|---|
0 | 9780992461225 | Darren Jones | ebook | 99ed3de1-1f41-4725-a7e2-8a5f9f311df6 | 29 | {'country': 'Australia', 'email': 'feedback@si... | NaN | JavaScript: Novice to Ninja |
1 | 9780992463328 | Just Kidding | [illustrations, paper cover, ebook] | 854fd662-2b6a-4aaa-b2a2-3377764fcf77 | 0 | {'country': 'Wonderland', 'email': 'feedback@w... | If only I had read this book before | Python: from Novice to Zen |
About pandas&json, see also: https://www.dataquest.io/blog/python-json-tutorial/
Some JSON resources
Accepts regular expression.
Based on re2 syntax
list(
query_collection
.filter({'author': 'Just Kidding'})
.run(connection))
[{'ISBN': 9780992463328, 'author': 'Just Kidding', 'format': ['illustrations', 'paper cover', 'ebook'], 'id': '854fd662-2b6a-4aaa-b2a2-3377764fcf77', 'price': 0, 'publisher': {'country': 'Wonderland', 'email': 'feedback@wonderland.universe', 'name': 'No Editor'}, 'subtitle': 'If only I had read this book before', 'title': 'Python: from Novice to Zen'}]
list(query_collection.filter({'author': 'kidding'}).run(connection))
[]
list(
query_collection.filter(
rethinkdb.row['author'].match("(?i)kidding", )
).run(connection)
)
[{'ISBN': 9780992463328, 'author': 'Just Kidding', 'format': ['illustrations', 'paper cover', 'ebook'], 'id': '854fd662-2b6a-4aaa-b2a2-3377764fcf77', 'price': 0, 'publisher': {'country': 'Wonderland', 'email': 'feedback@wonderland.universe', 'name': 'No Editor'}, 'subtitle': 'If only I had read this book before', 'title': 'Python: from Novice to Zen'}]
Compare SQL to ReQL:
Like many traditional database systems, RethinkDB supports JOIN commands to combine data from multiple tables.
In RethinkDB joins are automatically distributed—a join command is automatically sent to the appropriate nodes across the cluster, the relevant data is combined, and the final result is presented to the user.
coljoin1 = "book_sql"
coljoin2 = "publisher_sql"
if coljoin1 not in query_db.table_list().run(connection):
query_db.table_create(coljoin1).run(connection)
if coljoin2 not in query_db.table_list().run(connection):
query_db.table_create(coljoin2).run(connection)
result = query_db.table(coljoin1).insert({
'name': "SitePoint",
'country': "Australia",
'email': "feedback@sitepoint.com"
}).run(connection)
id = result['generated_keys'].pop()
id
'bbd7694a-3be6-42ec-94dd-a999bee21588'
query_db.table(coljoin2).insert({
'ISBN': 9780992461225,
'title': "JavaScript: Novice to Ninja",
'author': "Darren Jones",
'format': "ebook",
'price': 29.00,
'publisher': id
}).run(connection)
{'deleted': 0, 'errors': 0, 'generated_keys': ['0b85761b-a8ca-40a9-902d-0f3136a484db'], 'inserted': 1, 'replaced': 0, 'skipped': 0, 'unchanged': 0}
rquery = query_db.table(coljoin2) \
.eq_join('publisher', query_db.table(coljoin1))
print_rquery_result(rquery)
[ { 'left': { 'ISBN': 9780992461225, 'author': 'Darren Jones', 'format': 'ebook', 'id': '0b85761b-a8ca-40a9-902d-0f3136a484db', 'price': 29, 'publisher': 'bbd7694a-3be6-42ec-94dd-a999bee21588', 'title': 'JavaScript: Novice to Ninja', }, 'right': { 'country': 'Australia', 'email': 'feedback@sitepoint.com', 'id': 'bbd7694a-3be6-42ec-94dd-a999bee21588', 'name': 'SitePoint', }, }, ]
rquery = query_db.table(coljoin2) \
.eq_join('publisher', query_db.table(coljoin1)) \
.zip()
print_rquery_result(rquery)
[ { 'ISBN': 9780992461225, 'author': 'Darren Jones', 'country': 'Australia', 'email': 'feedback@sitepoint.com', 'format': 'ebook', 'id': 'bbd7694a-3be6-42ec-94dd-a999bee21588', 'name': 'SitePoint', 'price': 29, 'publisher': 'bbd7694a-3be6-42ec-94dd-a999bee21588', 'title': 'JavaScript: Novice to Ninja', }, ]
How does it compare in performances to RDMS joins?
Note about relations and joins:
consider using a graphdb (e.g. neo4j).
import attr
@attr.s
class MyClass(object):
x = attr.ib()
y = attr.ib()
test = MyClass(x=1, y=2)
test
MyClass(x=1, y=2)
attr.asdict(test)
{'x': 1, 'y': 2}
if collection in query_db.table_list().run(connection):
query_db.table_drop(collection).run(connection)
query_db.table_create(collection).run(connection)
query_collection.insert(
attr.asdict(test)
).run(connection)
{'deleted': 0, 'errors': 0, 'generated_keys': ['e21cd210-2c21-4a45-9973-fabdcc5c9cb1'], 'inserted': 1, 'replaced': 0, 'skipped': 0, 'unchanged': 0}
print_rquery_result(query_collection)
[ { 'id': 'e21cd210-2c21-4a45-9973-fabdcc5c9cb1', 'x': 1, 'y': 2, }, ]
args = list(query_collection.run(connection)).pop()
args.pop('id')
test2 = MyClass(**args)
pp(test2)
instance(MyClass): MyClass(x=1, y=2)
test == test2
True
rethinkdb.db(db).table(collection).insert([
{
'id': 1,
'city': 'San Francisco',
'location': rethinkdb.point(-122.423246,37.779388)
},
{
'id': 2,
'city': 'San Diego',
'location': rethinkdb.point(-117.220406,32.719464)
}
]).run(connection)
{'deleted': 0, 'errors': 0, 'inserted': 2, 'replaced': 0, 'skipped': 0, 'unchanged': 0}
rethinkdb.db(db).table(collection).get(1)['location'].distance(
rethinkdb.db(db).table(collection).get(2)['location']).run(connection)
734125.2496021875
Coordinate system
Coordinates of points on the sphere’s surface are addressed by a pair of floating point numbers that denote longitude and latitude.
The range of longitude is −180 through 180, which wraps around the whole of the sphere: −180 and 180 denote the same line. The range of latitude is −90 (the south pole) through 90 (the north pole).
rethinkdb.db(db).table(collection).index_create('location', geo=True).run(connection)
{'created': 1}
point = rethinkdb.point(-121.422876,37.777128)
rethinkdb.db(db).table(collection).get_nearest(point, index='location').run(connection)
[{'dist': 88128.8029078556, 'doc': {'city': 'San Francisco', 'id': 1, 'location': {'$reql_type$': 'GEOMETRY', 'coordinates': [-122.423246, 37.779388], 'type': 'Point'}}}]
By default distances are specified in meters, but you can pass an optional argument to distance functions to specify kilometers, miles, nautical miles, and feet.
How is RethinkDB different from MongoDB?
RethinkDB is based on a fundamentally different architecture from MongoDB. Instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results in realtime.
You can also write applications on top of RethinkDB using traditional query-response paradigm, and subscribe to realtime feeds later as you start adding realtime functionality to your app.
# register this notebook to modifications
query = rethinkdb.db(db).table(coljoin1)
try:
for change in query.changes().run(connection):
print("Subscription update!")
pp(change)
except Exception as e:
print("Table removed or connection aborted")
Where NoSQL is better than SQL:
1. It supports semi-structured data and volatile data
2. It does not have schema
3. Read/Write throughput is very high
4. Horizontal scalability can be achieved easily
5. Will support Bigdata in volumes of Terra Bytes & Peta Bytes
6. Provides good support for Analytic tools on top of Bigdata
7. Can be hosted in cheaper hardware machines
8. In-memory caching option is available to increase the performance of queries
9. Faster development life cycles for developers
Where RDBMS is better than NoSQL:
1. Transactions with ACID properties - Atomicity, Consistency, Isolation & Durability
2. Adherence to Strong Schema of data being written/read
3. Real time query management ( in case of data size < 1 0 10 Tera bytes )
4. Execution of complex queries involving join & group by clauses
A final list of db types
Final advices:
%reload_ext version_information
%version_information rethinkdb
Software | Version |
---|---|
Python | 3.5.2 64bit [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] |
IPython | 5.1.0 |
OS | Linux 4.4.24 moby x86_64 with glibc2.2.5 |
rethinkdb | 2.3.0.post6 |
Thu Oct 27 11:44:10 2016 UTC |