In [1]:
from brightway2 import *
In [2]:
projects.set_current("US LCI")
In [3]:
bw2setup()
Creating default biosphere

Applying strategy: normalize_units
Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA[sec]: 0.000 
Total time elapsed: 0.408 sec
Applying strategy: drop_unspecified_subcategories
Applied 2 strategies in 0.01 seconds
Title: Writing activities to SQLite3 database:
  Started: 09/29/2015 14:14:27
  Finished: 09/29/2015 14:14:28
  Total time elapsed: 0.408 sec
  CPU %: 98.300000
  Memory %: 0.273383
Created database: biosphere3
Creating default LCIA methods

Applying strategy: normalize_units
Applying strategy: set_biosphere_type
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Applied 4 strategies in 1.12 seconds
Wrote 692 LCIA methods with 170915 characterization factors
Creating core data migrations

In [3]:
sp = SingleOutputEcospold1Importer(
    "/Users/cmutel/Documents/LCA Documents/US LCI database/2014", 
    "US LCI"
)
Extracting ecospold1 files:
0%                          100%
[##############################] | ETA[sec]: 0.000 | Item ID: /Users/cmutel/D
Total time elapsed: 1.018 sec
Title: Extracting ecospold1 files:
  Started: 09/29/2015 14:57:05
  Finished: 09/29/2015 14:57:06
  Total time elapsed: 1.018 sec
  CPU %: 100.000000
  Memory %: 0.300372
Extracted 702 datasets in 1.02 seconds
In [4]:
sp.apply_strategies()
Applying strategy: normalize_units
Applying strategy: assign_only_product_as_production
Applying strategy: clean_integer_codes
Applying strategy: drop_unspecified_subcategories
Applying strategy: normalize_biosphere_categories
Applying strategy: normalize_biosphere_names
Applying strategy: strip_biosphere_exc_locations
Applying strategy: set_code_by_activity_hash
Applying strategy: link_iterable_by_fields
Applying strategy: link_technosphere_by_activity_hash
Couldn't apply strategy link_technosphere_by_activity_hash:
	Not each object in database to be linked is unique with given fields. The following appear at least twice:
[{'categories': ['Crop Production', 'Wheat Farming'],
  'code': '341fb9c00b29da237263a75537cc5d76',
  'comment': '1 metric ton of wheat straw, dried to 12%\n'
             'moisture.\n'
             'Location:  North America\n'
             'Technology:  future\n'
             'Time period:  2022',
  'database': 'US LCI',
  'exchanges': [{'amount': 1.1723e-07,
                 'categories': (),
                 'comment': '30 year lifetime for storage from INL '
                            'feedstock design report.',
                 'loc': 1.1723e-07,
                 'location': 'RNA',
                 'name': 'Dummy_dried roughage store, non ventilated',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'cubic meter'},
                {'amount': 1.33,
                 'categories': (),
                 'loc': 1.33,
                 'location': 'RNA',
                 'name': 'Spring wheat straw, carted, 2022',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'ton'},
                {'amount': 3.4747e-05,
                 'categories': (),
                 'comment': 'conveyor for grinder in-feed system',
                 'loc': 3.4747e-05,
                 'location': 'RNA',
                 'name': 'Dummy_conveyor belt, at plant',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'meter'},
                {'amount': 2.7751,
                 'categories': (),
                 'loc': 2.7751,
                 'location': 'RNA',
                 'name': 'Dummy_fodder loading, by self-loading trailer',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'cubic meter'},
                {'amount': 33.3,
                 'categories': (),
                 'loc': 33.3,
                 'location': 'RNA',
                 'name': 'Dummy_maize drying',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 0.97003,
                 'categories': (),
                 'loc': 0.97003,
                 'location': 'RNA',
                 'name': 'Grinding',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'tn.sh'},
                {'amount': 1.43,
                 'categories': (),
                 'comment': 'loading bales for grinder. Calculation of '
                            'number of bales comes from 2000 lbs of corn '
                            'stover divided by weight per bale. Trailer '
                            'volume is 2511 ft^3 (INL table 4-5), density '
                            'is 12 lbs/ft^3 dry (=20 wet at 40% moisture).',
                 'loc': 1.43,
                 'location': 'RNA',
                 'name': 'Dummy_loading bales',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'unit'},
                {'amount': 0.9,
                 'categories': (),
                 'loc': 0.9,
                 'location': 'RNA',
                 'name': 'Dummy_agricultural machinery, general, '
                         'production',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 61.289,
                 'categories': (),
                 'comment': 'proxy: Electricity, medium voltage, at '
                            'grid/US with US electricity. Electricity '
                            'needed for dust collection.',
                 'loc': 61.289,
                 'location': 'RNA',
                 'name': 'Electricity, at grid, US, 2008',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'kilowatt hour'},
                {'amount': 0.333,
                 'categories': ('air',),
                 'comment': 'water dried per ton of final moisture corn '
                            'stover. drying from 34% moisture to 12% '
                            'moisture.',
                 'input': ('biosphere3',
                           '075e433b-4be4-448e-9510-9a5029c1ce94'),
                 'loc': 0.333,
                 'name': 'Water',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'cubic meter'},
                {'amount': 1.0,
                 'categories': (),
                 'loc': 1.0,
                 'location': 'RNA',
                 'name': 'Spring wheat straw, ground and stored, 2022',
                 'type': 'production',
                 'uncertainty type': 0,
                 'unit': 'ton'}],
  'filename': '/Users/cmutel/Documents/LCA Documents/US LCI '
              'database/2014/Spring wheat straw, production, average, US, '
              '2022.xml',
  'location': 'RNA',
  'name': 'Spring wheat straw, ground and stored, 2022',
  'production amount': 1.0,
  'products': [{'amount': 1.0,
                'categories': (),
                'loc': 1.0,
                'location': 'RNA',
                'name': 'Spring wheat straw, ground and stored, 2022',
                'type': 'production',
                'uncertainty type': 0,
                'unit': 'ton'}],
  'type': 'process',
  'unit': 'ton'}]
Applied 10 strategies in 0.66 seconds

OK, our first error. There are two process datasets that have the same process name - in this case, it looks like one was a first draft, and the other is the final dataset. One file is called Spring wheat straw, production, average, US, 2022.xml, and the other is called Spring wheat straw, ground and stored, 2022.xml. We will ignore the average production dataset file.

In [5]:
bad_file = ('/Users/cmutel/Documents/LCA Documents/US LCI database/2014/'
            'Spring wheat straw, production, average, US, 2022.xml')
sp.data = [obj for obj in sp.data if obj.get('filename') != bad_file]

Apply the last two strategies; the error stopped the program from going through the list.

In [6]:
sp.apply_strategies(sp.strategies[-2:])
Applying strategy: link_iterable_by_fields
Applying strategy: link_technosphere_by_activity_hash
Applied 2 strategies in 0.34 seconds

The US LCI has "dummy" processes - links to activities which are real inputs, but which aren't modeled in the database. We need to add these dummy processes as real activities (even if they don't have any inputs themselves).

In [7]:
from bw2io.strategies import *
In [8]:
sp.apply_strategy(special.add_dummy_processes_and_rename_exchanges)
Applying strategy: add_dummy_processes_and_rename_exchanges

Let's see how things look. In an ideal dataset, everything would already be linked, but we know that this is not yet true for the US LCI.

In [9]:
sp.statistics()
1205 datasets
31272 exchanges
14076 unlinked exchanges
  Type biosphere: 1234 unique unlinked exchanges
  Type production: 931 unique unlinked exchanges
  Type substitution: 1 unique unlinked exchanges
  Type technosphere: 534 unique unlinked exchanges
Out[9]:
(1205, 31272, 14076)

We are now ready to start internally linking the database.

First, we migrate some names for biosphere flows.

In [10]:
sp.migrate("biosphere-2-3-names")
sp.migrate("biosphere-2-3-categories")
sp.migrate('default-units')
Applying strategy: migrate_datasets
Applied 1 strategies in 0.01 seconds
Applying strategy: migrate_exchanges
Applied 1 strategies in 0.15 seconds
Applying strategy: migrate_datasets
Applied 1 strategies in 0.01 seconds
Applying strategy: migrate_exchanges
Applied 1 strategies in 0.11 seconds
Applying strategy: migrate_datasets
Applied 1 strategies in 0.00 seconds
Applying strategy: migrate_exchanges
Applied 1 strategies in 0.10 seconds

Then, we try to internally link the database. We call the match_database method with two arguments. The first is None, i.e. we are not linking against another database, but only doing internal linking. Because the US LCI doesn't use categories in exchange definitions consistently, we also ignore_categories.

In [11]:
sp.match_database(None, ignore_categories=True)
Applying strategy: link_technosphere_based_on_name_unit_location
Couldn't apply strategy link_technosphere_based_on_name_unit_location:
	Not each object in database to be linked is unique with given fields. The following appear at least twice:
[{'categories': ['Crop Production', 'Other Noncitrus Fruit Farming'],
  'code': 'bc7fb2fb585b3565ca215412d8871cd3',
  'comment': ' Important note: although most of the data in the US LCI '
             'database has  undergone some sort of review, the database as '
             'a whole has not yet  undergone a formal validation process. '
             'Please email comments to [email protected]\n'
             'unspecified\n'
             'Location:  North America (US and Canada)\n'
             'Technology:  Harvesting of palm trees in Malaysia\n'
             'Production volume:  0',
  'database': 'US LCI',
  'exchanges': [{'amount': 1.6357e-05,
                 'categories': (),
                 'loc': 1.6357e-05,
                 'location': 'US',
                 'name': 'Diesel, combusted in industrial equipment',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'cubic meter'},
                {'amount': 1.2e-05,
                 'categories': (),
                 'input': ('US LCI',
                           'disposal, solid waste, unspecified, to '
                           'unspecified landfill'),
                 'loc': 1.2e-05,
                 'location': 'US',
                 'name': 'Dummy_Disposal, solid waste, unspecified, to '
                         'unspecified landfill',
                 'type': 'technosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 1.0,
                 'categories': ['natural resource', 'in ground'],
                 'loc': 1.0,
                 'name': 'Fresh fruit bunches',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 0.00856,
                 'categories': ('air',),
                 'loc': 0.00856,
                 'name': 'Carbon dioxide',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 2.3e-06,
                 'categories': ('air',),
                 'loc': 2.3e-06,
                 'name': 'Carbon monoxide',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 1e-06,
                 'categories': ('air',),
                 'input': ('biosphere3',
                           'd3260d0e-8203-4cbb-a45a-6a13131a5108'),
                 'loc': 1e-06,
                 'name': 'NMVOC, non-methane volatile organic compounds, '
                         'unspecified origin',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 5.5e-06,
                 'categories': ('air',),
                 'input': ('biosphere3',
                           'c1b91234-6f24-417b-8309-46111d09c457'),
                 'loc': 5.5e-06,
                 'name': 'Nitrogen oxides',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 6.6e-05,
                 'categories': ('air',),
                 'loc': 6.6e-05,
                 'name': 'Particulates, unspecified',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 2.3e-06,
                 'categories': ('air',),
                 'input': ('biosphere3',
                           'ba5fc0b6-770b-4da1-9b3f-e3b5087f07cd'),
                 'loc': 2.3e-06,
                 'name': 'Sulfur oxides',
                 'type': 'biosphere',
                 'uncertainty type': 0,
                 'unit': 'kilogram'},
                {'amount': 1.0,
                 'categories': (),
                 'loc': 1.0,
                 'location': 'RNA',
                 'name': 'Harvesting, fresh fruit bunch, at farm',
                 'type': 'production',
                 'uncertainty type': 0,
                 'unit': 'kilogram'}],
  'filename': '/Users/cmutel/Documents/LCA Documents/US LCI '
              'database/2014/Harvesting, fresh fruit bunch, at farm.xml',
  'location': 'RNA',
  'name': 'Harvesting, fresh fruit bunch, at farm',
  'production amount': 1.0,
  'products': [{'amount': 1.0,
                'categories': (),
                'loc': 1.0,
                'location': 'RNA',
                'name': 'Harvesting, fresh fruit bunch, at farm',
                'type': 'production',
                'uncertainty type': 0,
                'unit': 'kilogram'}],
  'type': 'process',
  'unit': 'kilogram'}]
Applied 1 strategies in 0.01 seconds

We find another error liek before - the same process dataset is repeated using two different filenames.

In [12]:
[x['filename'] for x in sp.data if x['name'] == 'Harvesting, fresh fruit bunch, at farm']
Out[12]:
['/Users/cmutel/Documents/LCA Documents/US LCI database/2014/Fresh fruit bunches.xml',
 '/Users/cmutel/Documents/LCA Documents/US LCI database/2014/Harvesting, fresh fruit bunch, at farm.xml']

The Harvesting... dataset is older; presumably, the Fresh fruit... dataset is the updated version. We can delete the older dataset and continue.

In [13]:
bad_file = '/Users/cmutel/Documents/LCA Documents/US LCI database/2014/Harvesting, fresh fruit bunch, at farm.xml'
sp.data = [obj for obj in sp.data if obj.get('filename') != bad_file]
In [14]:
sp.match_database(None, ignore_categories=True)
Applying strategy: link_technosphere_based_on_name_unit_location
Applied 1 strategies in 0.04 seconds

We have done the internal linking that we can - now we need to link the biosphere flows. This looks complicated, but is just a fancy way of linking the biosphere flows by their names, units, and categories.

In [15]:
import functools
f = functools.partial(link_iterable_by_fields,
    other=Database(config.biosphere),
    kind='biosphere'
)
sp.apply_strategy(f)
Applying strategy: link_iterable_by_fields

Let's see how far we have got:

In [16]:
sp.statistics()
1204 datasets
31262 exchanges
10040 unlinked exchanges
  Type biosphere: 1154 unique unlinked exchanges
  Type production: 278 unique unlinked exchanges
  Type substitution: 1 unique unlinked exchanges
  Type technosphere: 243 unique unlinked exchanges
Out[16]:
(1204, 31262, 10040)

Not great.

Some of these unlinked exchanges are links to ecoinvent 2.2, so they shouldn't work.

Let's export lists of what we have so far.

In [17]:
sp.write_excel(only_unlinked=True)
Wrote matching file to:
/Users/cmutel/Library/Application Support/Brightway3/US-LCI.dc95923157ce5b74345603ecff24cb4d/export/db-matching-US-LCI-unlinked.xlsx
In [18]:
sp.write_excel(only_names=True)
Wrote matching file to:
/Users/cmutel/Library/Application Support/Brightway3/US-LCI.dc95923157ce5b74345603ecff24cb4d/export/db-matching-US-LCI-names.xlsx
In [19]:
sp.write_excel()
Wrote matching file to:
/Users/cmutel/Library/Application Support/Brightway3/US-LCI.dc95923157ce5b74345603ecff24cb4d/export/db-matching-US-LCI.xlsx

Files

The Excel output files are available for download at https://bitbucket.org/cmutel/brightway2/src/tip/notebooks/files/?at=2.0. Click on "view raw" for each file to download it.

We can search the biosphere database to find out why some biosphere flows weren't linked. For example, Carbon dioxide - that seems strange. Why didn't that work?

In [20]:
db = Database("biosphere3")
db.search("Carbon dioxide")
Out[20]:
['Carbon dioxide, in air' (kilogram, None, ('natural resource', 'in air')),
 'Carbon dioxide, fossil' (kilogram, None, ('air', 'urban air close to ground')),
 'Carbon dioxide, fossil' (kilogram, None, ('air', 'non-urban air or from high stacks')),
 'Carbon dioxide, fossil' (kilogram, None, ('air', 'lower stratosphere + upper troposphere')),
 'Carbon dioxide, fossil' (kilogram, None, ('air',)),
 'Carbon dioxide, fossil' (kilogram, None, ('air', 'low population density, long-term')),
 'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'non-urban air or from high stacks')),
 'Carbon dioxide, non-fossil' (kilogram, None, ('air',)),
 'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'urban air close to ground')),
 'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'lower stratosphere + upper troposphere')),
 'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'low population density, long-term')),
 'Carbon dioxide, to soil or biomass stock' (kilogram, None, ('soil', 'industrial')),
 'Carbon dioxide, from soil or biomass stock' (kilogram, None, ('air', 'indoor')),
 'Carbon dioxide, to soil or biomass stock' (kilogram, None, ('soil',)),
 'Carbon dioxide, from soil or biomass stock' (kilogram, None, ('air',)),
 'Carbon dioxide, to soil or biomass stock' (kilogram, None, ('soil', 'agricultural')),
 'Carbon dioxide, from soil or biomass stock' (kilogram, None, ('air', 'urban air close to ground')),
 'Carbon dioxide, from soil or biomass stock' (kilogram, None, ('air', 'non-urban air or from high stacks')),
 'Carbon dioxide, from soil or biomass stock' (kilogram, None, ('air', 'lower stratosphere + upper troposphere')),
 'Carbon dioxide, to soil or biomass stock' (kilogram, None, ('soil', 'forestry')),
 'Carbon dioxide, from soil or biomass stock' (kilogram, None, ('air', 'low population density, long-term'))]

Oh, we would need to specify if it was fossil or non-fossil, as they are handled differently in GWP calculations.

For every unmatched exchange, there is a reason the computer couldn't match it exactly. The next step is to figure out the problem for each exchange, and then write a migration to fix the input data to match what is expected.