In [1]:
from brightway2 import *
from time import time

Disable cache to get fair timings of different approaches

In [2]:
config.p["use_cache"] = False

Copy ecoinvent 2.2 to a new database using the JSON backend.

In [3]:
ei = Database("ecoinvent 2.2")

# Can also do 
# from bw2data.backends import JSONDatabase
# db = JSONDatabase("ecoinvent 2.2 json")
# But you can't change backend after instantiation of database object
db = Database("ecoinvent 2.2 json", backend="json")
db.register()

# Writing ~4000 files takes a little while
start = time()
db.write(ei.relabel_data(ei.load(), "ecoinvent 2.2 json"))
print time() - start

# So does loading everything again to process it
db.process()
47.1706380844
/Users/cmutel/local/bw2dev/lib/python2.7/site-packages/bw2data/data_store.py:46: UserWarning: 
	Brightway2 JSONDatabase: ecoinvent 2.2 json is not registered
  warnings.warn(u"\n\t%s is not registered" % self, UserWarning)

There are two main beenfits to JSON databases.

First, because each file is stored in a separate text file, with indentation and line breaks, it integrates perfectly in version control.

Second, faster random access to datasets.

The correct backend is created based on the metadata key backend. The default backend is a single file database.

In [7]:
print databases["ecoinvent 2.2 json"]
print databases["ecoinvent 2.2"]
{u'depends': [u'biosphere'], u'backend': u'json'}
{u'depends': [u'biosphere'], u'version': 9}
In [6]:
db = Database("ecoinvent 2.2 json")
type(db)
Out[6]:
bw2data.backends.json.database.JSONDatabase
In [8]:
ei = Database("ecoinvent 2.2")
type(ei)
Out[8]:
bw2data.backends.default.database.SingleFileDatabase

Getting a single dataset, or even a single dataset key, from ecoinvent 2.2 requires loading the whole database. On the otherhand, once it is loaded, it will be stored in the cache, so further access is quick.

In [9]:
config.p["use_cache"] = True
In [10]:
start = time()
print "First time slow"
print ei.random()
print time() - start

start = time()
print "Next time fast"
print ei.random()
print time() - start
First time slow
(u'ecoinvent 2.2', u'42bd150f219fde7cabe2b2541dc29548')
2.67773008347
Next time fast
(u'ecoinvent 2.2', u'8d0bc3cfb14fe7a5747eacd38cc4cb17')
0.000772953033447

The JSON database has access to all keys without loading a single file.

In [11]:
start = time()
print "First time fast"
print db.random()
print time() - start

start = time()
print "Next time fast"
print db.random()
print time() - start
First time fast
('ecoinvent 2.2 json', u'bf5d6693ffb27e88dc021b3d17285a2c')
0.00043511390686
Next time fast
('ecoinvent 2.2 json', u'9474e77916f97e16425ecf2e34133990')
0.000362157821655

Getting a single dataset is also fast

In [12]:
start = time()
ds = db.load()[db.random()]
print time() - start
0.00414896011353
In [14]:
ds.keys()
Out[14]:
[u'code',
 u'name',
 u'unit',
 u'key',
 u'exchanges',
 u'type',
 u'categories',
 u'location']

However, calling db.load() doesn't give us the whole database: (See https://bitbucket.org/cmutel/brightway2-data/src/default/bw2data/backends/json/sync_json_dict.py for the dirty details).

In [15]:
type(db.load())
Out[15]:
bw2data.backends.json.sync_json_dict.SynchronousJSONDict

However, it is pretty good at pretending to be the same as a database loaded into memory:

In [18]:
print len(db.load())
print db.random() in db.load()
for index, x in enumerate(db.load()):
    print x
    if index > 5:
        break
4087
True
('ecoinvent 2.2 json', u'31be4cd55f5f227d0cb087637d32f14f')
('ecoinvent 2.2 json', u'a143e52d90b9eba20e43a96a2c89193a')
('ecoinvent 2.2 json', u'ab6f4f2d9765cd42d4e041c5596a580d')
('ecoinvent 2.2 json', u'0363effc128adc4042b0dd249a1ac32e')
('ecoinvent 2.2 json', u'1e81d6df5644b1c07cb902c0515358f6')
('ecoinvent 2.2 json', u'4741952ad13b80b4ecb1519b28481f59')
('ecoinvent 2.2 json', u'0ba6d475a1cc836e64010b1c3c8a2741')

One important difference is in how you update data. A dataset return from a JSON database is not a normal dictionary, and you can't change it:

In [19]:
ds = db.load()[db.random()]
print type(ds)
ds['foo'] = 'bar'
<class 'bw2data.backends.json.sync_json_dict.frozendict'>
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-2b99ba6ce1ff> in <module>()
      1 ds = db.load()[db.random()]
      2 print type(ds)
----> 3 ds['foo'] = 'bar'

/Users/cmutel/local/bw2dev/lib/python2.7/site-packages/bw2data/backends/json/sync_json_dict.pyc in _blocked_attribute(obj)
     12     From http://code.activestate.com/recipes/414283-frozen-dictionaries/"""
     13     def _blocked_attribute(obj):
---> 14         raise AttributeError("A frozendict cannot be modified")
     15     _blocked_attribute = property(_blocked_attribute)
     16 

AttributeError: A frozendict cannot be modified

Instead, you must make a copy, make your changes, and then write the changed dictionary to the database:

In [20]:
key = db.random()
db_data = db.load()
ds = db_data[key]
new_ds = dict(ds)
new_ds['foo'] = 'bar'
db_data[key] = new_ds

This limitation exists because changes are instantly saved to disk, and Brightway2 doesn't know if you make a change in a nested dictionary, so any such changes aren't allowed.

Instant syncing means you don't need to call .write() anymore - indeed, it doesn't do anything (https://bitbucket.org/cmutel/brightway2-data/src/default/bw2data/backends/json/database.py?at=default#cl-65). Note, however, that you still need to call .process() at the end of your changes.