Content of talk given at PyCarolinas 2012.
This material is on my github page: https://github.com/tdhopper/Pickle-and-Redis.
import pickle
Dump the text string "abcdefg" to a file called "pickle_test."
pickle.dump("abcdefg", open("pickle_test", "wb"))
Pickle dumps are binary files. They're not designed to be read as text.
print open("pickle_test", "r").read()
S'abcdefg' p0 .
data1 = {'a': [1, 2.0, 3, 4+6j],
'b': ('string', u'Unicode string'),
'c': None}
pickle.dump(data1, open('data.pkl', 'wb'))
data2 = pickle.load(open('data.pkl', 'rb'))
data1 == data2
True
What can be pickled?
(From the official documentation)
Pickle can handle much more than built in classes:
class PicklePerson(object):
def __init__(self, name, age, location):
self.name = name
self.age = age
self.location = location
def __repr__(self):
return "name: " + self.name + "\n" + "age: " + self.age + \
"\n" + "location: " + self.location
todd = PicklePerson("Todd", "30", "Raleigh")
print todd
name: Todd age: 30 location: Raleigh
pickle.dump(todd, open("pickle_todd", "wb"))
recovered_todd = pickle.load(open("pickle_todd","r"))
recovered_todd
name: Todd age: 30 location: Raleigh
def f(x): return x+1
pickle.dump(f, open("pickle_good","wb"))
try:
with open("pickle_bad","wb") as f:
pickle.dump(lambda x: x+1, f)
except pickle.PicklingError:
print "Can't pickle :-("
Can't pickle :-(
Also, from http://stackoverflow.com/a/11685634/982745:
class NotPickable(object):
def __init__(self, x):
self.attr = x
o = NotPickable(open('Pickle and Redis.ipynb', 'r+w'))
try:
with open("pickle_bad","wb") as f:
pickle.dumps(o)
except TypeError:
print "Can't pickle :-("
Can't pickle :-(
"cPickle can be up to 1000 times faster than pickle because the former is implemented in C. "
import cPickle, os
%timeit pickle.dump([data1 for x in range(1000)], open("pickle_todd", "wb"))
100 loops, best of 3: 7.8 ms per loop
%timeit cPickle.dump([data1 for x in range(1000)], open("pickle_todd", "wb"))
100 loops, best of 3: 2.76 ms per loop
For reference, the size of the pickle, in bytes, is:
os.path.getsize('/Users/tdhopper/Dropbox/PyCarolinas 2012/pickle_todd')
4112
However, "in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses."
"Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets." (http://redis.io/)
Some advantages:
Redis is easy to install on *nix systems:
$ wget http://redis.googlecode.com/files/redis-2.4.17.tar.gz
$ tar xzf redis-2.4.17.tar.gz
$ cd redis-2.4.17
$ make
(There's an unofficial Windows port.)
Start a redis server with:
$ redis-server
The Redis server can be accessed directly from the Redis Command Line Interface (CLI):
$ redis-cli
Setting and getting keys is easy:
redis> set foo bar
OK
redis> get foo
"bar"
SET mykey "10"
OK
redis> INCR mykey
(integer) 11
redis> GET mykey
"11"
("Note: this is a string operation because Redis does not have a dedicated integer type.")
redis> EXISTS mykey
(integer) 0
redis> APPEND mykey "Hello"
(integer) 5
redis> APPEND mykey " World"
(integer) 11
redis> GET mykey
"Hello World"
This gives fast way to store a time series.
Also see DECR and INCRBY.
Slicing strings is easy:
redis> SET mykey "This is a string"
OK
redis> GETRANGE mykey 0 3
"This"
redis> GETRANGE mykey -3 -1
"ing"
redis> GETRANGE mykey 0 -1
"This is a string"
redis> RPUSH mylist "hello"
(integer) 1
redis> RPUSH mylist "world"
(integer) 2
redis> RPUSH mylist "HELLO" "PyCarolinas"
(integer) 4
redis> LRANGE mylist 0 -1
1) "hello"
2) "world"
3) "HELLO"
4) "PyCarolinas"
The maximum list size is $2^{32}-1\approx\mbox{}4\text{ billion}$.
Combine RPUSH, LPUSH, RPOP, and LPOP to create your favorite queue!
redis> SADD myset "Hello"
(integer) 1
redis> SADD myset "World"
(integer) 1
redis> SADD myset "World"
(integer) 0
redis> SMEMBERS myset
1) "World"
2) "Hello"
Get a random set item with SPOP or SRANDMEMBER.
"Redis Hashes are maps between string fields and string values."
HMSET myhash field1 "Hello" field2 "World"
OK
redis> HGET myhash field1
"Hello"
redis> HGET myhash field2
"World"
"...every member of a Sorted Set is associated with score, that is used in order to take the sorted set ordered, from the smallest to the greatest score."
redis> ZADD myzset 1 "one"
(integer) 1
redis> ZADD myzset 1 "uno"
(integer) 1
redis> ZADD myzset 2 "two"
(integer) 1
redis> ZADD myzset 3 "two"
(integer) 0
redis> ZRANGE myzset 0 -1 WITHSCORES
1) "one"
2) "1"
3) "uno"
4) "1"
5) "two"
6) "3"
redis> ZRANGE myzset 0 -1
1) "one"
2) "uno"
3) "two"
Notice that "two" only appears once. When ZADD myzset 3 "two" is called, the score of "two" is updated from 2 to 3.
"While members are unique, scores may be repeated."
A Python interface is redis is available at https://github.com/andymccurdy/redis-py
$ sudo pip install redis
Using redis from Python is as easy as importing the package and connecting to a server:
import redis
r = redis.StrictRedis(host='localhost', port=6379, db=0)
Setting and getting keys is easy:
r.set('foo', 'bar')
True
r.get('foo')
'bar'
%timeit r.set('foo', 'bar')
10000 loops, best of 3: 146 us per loop
%timeit r.get('foo')
10000 loops, best of 3: 160 us per loop
In general, the StrictRedis class implements commands identically to the redis-cli commands.
Let's create a set of words from a paragraph in the Wikipedia page on redis:
import string
text = """
Redis typically holds the whole dataset in RAM. Versions up to 2.4 could be configured
to use virtual memory but this is now deprecated. Persistence is reached in two different
ways: One is called snapshotting, and is a semi-persistent durability mode where the dataset
is asynchronously transferred from memory to disk from time to time, written in RDB dump format.
Since version 1.1 the safer alternative is AOF, an append-only file (a journal) that is written
as operations modifying the dataset in memory are processed. Redis is able to rewrite the
append-only file in the background in order to avoid an indefinite growth of the journal."""
# Strip punctuation: http://stackoverflow.com/a/2402306/982745
text_list = [word.translate(None, string.punctuation) for word in text.split()]
for word in text_list: r.delete(word) # In case these words are already in redis, delete them.
Create a set of the words:
for word in text_list:
r.sadd("persistence", word)
print [r.srandmember('persistence') for i in range(10)] # Get ten random words
print [r.srandmember('persistence') for i in range(10)] # Get ten more random words
['of', 'avoid', 'semipersistent', 'from', 'rewrite', 'RAM', 'Persistence', 'memory', 'file', 'Since'] ['a', 'virtual', 'alternative', 'where', 'semipersistent', 'modifying', 'in', 'an', 'asynchronously', 'rewrite']
Count all the word frequency in this text:
for word in text_list:
r.incr(word)
# Print most used words in this document:
for word in set(text_list):
if int(r.get(word)) > 2:
print word
print "\t\t", r.get(word)
dataset 3 in 6 to 6 memory 3 is 8 the 7
The best part is that all this data will persist across your Python sessions!
bob = pickle.dumps(PicklePerson("bob","50","durham"))
print bob
ccopy_reg _reconstructor p0 (c__main__ PicklePerson p1 c__builtin__ object p2 Ntp3 Rp4 (dp5 S'age' p6 S'50' p7 sS'name' p8 S'bob' p9 sS'location' p10 S'durham' p11 sb.
r.set("bob", bob)
True
pickle.loads(r.get("bob"))
name: bob age: 50 location: durham
Redisco is a library build on redis-py that allows you to store objects in Redis.
import redisco
from redisco import connection_setup, models
redisco.connection_setup(host='localhost', port=6379, db=0)
class Person(models.Model):
name = models.Attribute(required=True)
age = models.Attribute(required=False)
location = models.Attribute(required=False)
for x in Person.objects.filter(name="Tim"):
x.delete()
tim_hopper = Person(name="Tim",age="26",location="Morrisville")
tim_smith = Person(name="Tim",age="75",location="Chapel Hill")
tim_hopper.save()
tim_smith.save()
True
Person.objects.filter(name="Tim")
[<Person:12 {'age': u'26', 'name': u'Tim', 'location': u'Morrisville'}>, <Person:13 {'age': u'75', 'name': u'Tim', 'location': u'Chapel Hill'}>]
Person.objects.filter(name="Tim", age="26")[0] == tim_hopper
True
Redisco is in version 0.1.4 and hasn't been updated recently. Nevertheless, it gives you an idea of what redis-py is capable of.
Content of talk given at PyCarolinas 2012.
This material is on my github page: https://github.com/tdhopper/Pickle-and-Redis.