Importing a SimaPro CSV that includes ecoinvent

This notebook shows how to import a SimaPro CSV export that includes ecoinvent 2.2. We will use a custom strategy to remove the ecoinvent processes, and then link to the ecoinvent already present on our machine.

In [1]:
from brightway2 import *

Basic setup

Start a new project, and install base data

In [2]:
projects.set_current("simapro-ecoinvent-import")
In [3]:
bw2setup()
Creating default biosphere

Applying strategy: drop_unspecified_subcategories
Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA[sec]: 0.000 
Total time elapsed: 0.666 sec
Title: Writing activities to SQLite3 database:
  Started: 05/19/2015 08:46:17
  Finished: 05/19/2015 08:46:17
  Total time elapsed: 0.666 sec
  CPU %: 91.200000
  Memory %: 1.037788
Created database: biosphere3
Creating default LCIA methods

Applying strategy: set_biosphere_type
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Wrote 692 LCIA methods with 170915 characterization factors
Creating core data migrations

Import ecoinvent 2.2

In [4]:
ei22 = SingleOutputEcospold1Importer(
    "/Users/cmutel/Documents/LCA Documents/Ecoinvent/2.2/processes",
    "ecoinvent 2.2"
)
ei22.apply_strategies()
ei22.write_database()
Extracting ecospold1 files:
0%                          100%
[##############################] | ETA[sec]: 0.000 | Item ID: /Users/cmutel/D
Total time elapsed: 12.012 sec
Title: Extracting ecospold1 files:
  Started: 05/19/2015 08:47:38
  Finished: 05/19/2015 08:47:50
  Total time elapsed: 12.012 sec
  CPU %: 87.100000
  Memory %: 2.893019
Extracted 4087 datasets in 12.18 seconds
Applying strategy: assign_only_product_as_production
Applying strategy: clean_integer_codes
Applying strategy: drop_unspecified_subcategories
Applying strategy: normalize_biosphere_categories
Applying strategy: normalize_biosphere_names
Applying strategy: strip_biosphere_exc_locations
Applying strategy: set_code_by_activity_hash
Applying strategy: link_iterable_by_fields
Applying strategy: link_technosphere_by_activity_hash
Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA[sec]: 0.000 
Total time elapsed: 20.534 sec
Title: Writing activities to SQLite3 database:
  Started: 05/19/2015 08:47:55
  Finished: 05/19/2015 08:48:15
  Total time elapsed: 20.534 sec
  CPU %: 86.500000
  Memory %: 3.013086
Created database: ecoinvent 2.2

Load SimaPro CSV export

In [4]:
sp = SimaProCSVImporter("/Users/cmutel/Downloads/Furniture particle board and hardboard.CSV", "particle-board")
sp.statistics()
NameError
   1000/_NCV/(1+_Percent_humidity_u)*_C_ratio*_CO2_conversion
                                      ^^^
name '_C_ratio' is not defined
NameError
   _Furniture_lifetime_equivalence*_furniture_elasticity
name '_Furniture_lifetime_equivalence' is not defined
NameError
   _Furniture_weight_ppbox
name '_Furniture_weight_ppbox' is not defined
NameError
   _Furniture_weight_metalbox
name '_Furniture_weight_metalbox' is not defined
NameError
   (_Furniture_PPbox_PP)/0.9
     ^^^
name '_Furniture_PPbox_PP' is not defined
NameError
   (_Furniture_PPbox_Isorel)/0.9
     ^^^
name '_Furniture_PPbox_Isorel' is not defined
NameError
   _Furniture_PPbox_Melamine
name '_Furniture_PPbox_Melamine' is not defined
NameError
   _Furniture_PPbox_ABS
name '_Furniture_PPbox_ABS' is not defined
NameError
   _Furniture_PPbox_steel
name '_Furniture_PPbox_steel' is not defined
NameError
   _Furniture_PPbox_steel
name '_Furniture_PPbox_steel' is not defined
NameError
   (_Furniture_PPbox_PP*_G_PP_Density+_Furniture_PPbox_Isorel*_G_Hardboard_density)/0.9-(_Furniture_PPbox_PP*_G_PP_Density+_Furniture_PPbox_Isorel*_G_Hardboard_density)
     ^^^
name '_Furniture_PPbox_PP' is not defined
NameError
   _EU_Steel_REcycling
name '_EU_Steel_REcycling' is not defined
NameError
   1/(1+0.2)*(0.6-0.2)*1.4*3.6/0.9*_Eu_Per_Gas
                                    ^^^
name '_Eu_Per_Gas' is not defined
---------------------------------------------------------------------------
CapitalizationError                       Traceback (most recent call last)
<ipython-input-4-75468c1fb3f4> in <module>()
----> 1 sp = SimaProCSVImporter("/Users/cmutel/Downloads/Furniture particle board and hardboard.CSV", "particle-board")
      2 sp.statistics()

/Users/cmutel/local34/bw3dev/lib/python3.4/site-packages/bw2io/importers/simapro_csv.py in __init__(self, filepath, name, delimiter, encoding, normalize_biosphere, biosphere_db)
     37                 delimiter=delimiter,
     38                 name=name,
---> 39                 encoding=encoding,
     40             )
     41         print(u"Extracted {} unallocated datasets in {:.2f} seconds".format(

/Users/cmutel/local34/bw3dev/lib/python3.4/site-packages/bw2io/extractors/simapro_csv.py in extract(cls, filepath, delimiter, name, encoding)
    102                     filepath,
    103                     global_parameters,
--> 104                     project_metadata
    105                 )
    106                 datasets.append(ds)

/Users/cmutel/local34/bw3dev/lib/python3.4/site-packages/bw2io/extractors/simapro_csv.py in read_data_set(cls, data, index, db_name, filepath, gp, pm)
    512         ps = ParameterSet(
    513             ds['parameters'],
--> 514             {key: value['amount'] for key, value in gp.items()}
    515         )
    516         # Changes in-place

/Users/cmutel/local34/bw3dev/lib/python3.4/site-packages/bw2parameters/parameter_set.py in __init__(self, params, global_params)
     21                 )
     22 
---> 23         self.order = self.get_order()
     24 
     25     def get_order(self):

/Users/cmutel/local34/bw3dev/lib/python3.4/site-packages/bw2parameters/parameter_set.py in get_order(self)
     48                         u"\n".join([
     49                             u"'{}'' not found; could be '{}'".format(a, b)
---> 50                             for a, b in wrong_case
     51                         ])
     52                     )

CapitalizationError: Possible errors in upper/lower case letters for some parameters.
'_per_load'' not found; could be '_per_load'
'_EU_conso_semi_min_l_km'' not found; could be '_EU_Conso_semi_min_l_km'
'_EU_conso_semi_max_l_km'' not found; could be '_EU_Conso_semi_max_l_km'

OK, the SimaPro export file has several cases where the parameter names were incorrect - SimaPro appears to ignore the difference between upper and lower case for parameter names, so for SimaPro "Foo" and "foo" are the same thing, which raises errors for our parameter parser. These have to be fixed manually in the CSV file. Basically, fix the errors that come up until the code runs without any errors.

In [3]:
sp = SimaProCSVImporter("/Users/cmutel/Downloads/Furniture particle board and hardboard-fixed.CSV", "particle-board")
sp.statistics()
Extracted 2045 unallocated datasets in 13.11 seconds
2045 datasets
85048 exchanges
85048 unlinked exchanges
  Type biosphere: 1798 unique unlinked exchanges
  Type production: 2045 unique unlinked exchanges
  Type substitution: 8 unique unlinked exchanges
  Type technosphere: 2075 unique unlinked exchanges
Out[3]:
(2045, 85048, 85048)

Unit conversions

To match against ecoinvent using the unit field, we need to have everything in the right units. We accomplish this by applying a migration that converts units to ecoinvent defaults.

In [4]:
sp.migrate('default-units')
Applying strategy: migrate_datasets
Applying strategy: migrate_exchanges

Strip ecoinvent processes

We are lucky in this case, as we know that the version of ecoinvent was not modified. If there were modifications, we would have to include the whole modified database. No software that I know of, including Brightway2, will keep a log of the modified processes after a database import, though this is certainly on our radar. But at least in this simple case, we can just delete the ecoinvent processes.

To do this, we use the most common technique in the IO library - matching based on a certain number of attributes. In this case, because we are importing from SimaPro, the categories will not match ecoinvent categories. However, at least for version 2.2, the names, units, and locations will (after we normalize things, and ignoring SimaPro changes to capitalization).

First, however, we need to apply some basic strategies.

In [5]:
sp.strategies[:5]
Out[5]:
[<function bw2io.strategies.generic.assign_only_product_as_production>,
 <function bw2io.strategies.biosphere.drop_unspecified_subcategories>,
 <function bw2io.strategies.simapro.sp_allocate_products>,
 <function bw2io.strategies.simapro.split_simapro_name_geo>,
 <function bw2io.strategies.biosphere.strip_biosphere_exc_locations>]
In [6]:
sp.apply_strategies(sp.strategies[:5])
Applying strategy: assign_only_product_as_production
Applying strategy: drop_unspecified_subcategories
Applying strategy: sp_allocate_products
Applying strategy: split_simapro_name_geo
Applying strategy: strip_biosphere_exc_locations

Now we can try to find Ecoinvent datasets, and remove them:

In [7]:
from bw2io.utils import activity_hash

def match_by_activity_hash(data):
    ei22 = Database("ecoinvent 2.2")
    fields = ["name", "unit", "location"]
    ei22_hashes = {activity_hash(ds, fields=fields) for ds in ei22}
    assert len(ei22) == len(ei22_hashes)
    return [obj for obj in data if activity_hash(obj, fields=fields) not in ei22_hashes]
In [8]:
sp.apply_strategy(match_by_activity_hash)
sp.statistics()
Applying strategy: match_by_activity_hash
110 datasets
14992 exchanges
14992 unlinked exchanges
  Type biosphere: 1500 unique unlinked exchanges
  Type production: 110 unique unlinked exchanges
  Type substitution: 8 unique unlinked exchanges
  Type technosphere: 232 unique unlinked exchanges
Out[8]:
(110, 14992, 14992)

Internal and external linking

First, construct internal links within the database:

In [9]:
sp.apply_strategy(sp.strategies[5])
Applying strategy: link_technosphere_based_on_name_unit_location

Next, link biosphere flows:

In [10]:
sp.apply_strategies(sp.strategies[6:])
Applying strategy: normalize_biosphere_categories
Applying strategy: normalize_simapro_biosphere_categories
Applying strategy: normalize_biosphere_names
Applying strategy: normalize_simapro_biosphere_names
Applying strategy: link_iterable_by_fields

Finally, link to ecoinvent 2.2:

In [11]:
sp.match_database("ecoinvent 2.2", ignore_categories=True)
sp.statistics()
Applying strategy: link_technosphere_based_on_name_unit_location
110 datasets
14992 exchanges
238 unlinked exchanges
  Type biosphere: 139 unique unlinked exchanges
  Type technosphere: 28 unique unlinked exchanges
Out[11]:
(110, 14992, 238)

This is fairly typical - there are some flows specific to SimaPro that we can't match perfectly yet. Let's look at what we still have to do:

In [13]:
for i, e in enumerate(sp.unlinked):
    print(e['name'], e['unit'], e['categories'])
    if i > 20:
        break
Clay kilogram ('natural resource', 'in ground')
Water, unspecified natural origin/m3 cubic meter ('natural resource', 'in water')
Mineral oil kilogram ('soil',)
Mineral oil kilogram ('soil', 'forestry')
Particulates kilogram ('air',)
_Exported electricity kilowatt hour ('Final waste flows',)
_Exported heat megajoule ('Final waste flows',)
Barite, 15% in crude ore, in ground kilogram ('natural resource',)
Bauxite kilogram ('natural resource',)
Clay, bentonite, in ground kilogram ('natural resource',)
Anhydrite, in ground kilogram ('natural resource',)
Chromium, 25.5% in chromite, 11.6% in crude ore, in ground kilogram ('natural resource',)
Clay kilogram ('natural resource',)
Coal, hard, unspecified, in ground kilogram ('natural resource',)
Copper, 0.99% in sulfide, Cu 0.36% and Mo 8.2E-3% in crude ore, in ground kilogram ('natural resource',)
Gravel, in ground kilogram ('natural resource',)
Iron, 46% in ore, 25% in crude ore, in ground kilogram ('natural resource',)
TEAM_(r) Iron Sulphate (FeSO4, ore) kilogram ('natural resource',)
Lead, 5.0% in sulfide, Pb 3.0%, Zn, Ag, Cd, In, in ground kilogram ('natural resource',)
Coal, brown, in ground kilogram ('natural resource',)
Calcite, in ground kilogram ('natural resource',)
Manganese, 35.7% in sedimentary deposit, 14.2% in crude ore, in ground kilogram ('natural resource',)

Write excel sheet of what is left

Let's write an excel sheet and get a broader picture of what is left to do.

In [15]:
sp.write_excel()
Wrote matching file to:
/Users/cmutel/Library/Application Support/Brightway3/simapro-ecoinvent-import.9cd6edad52b89dcde4f02f36f9e9e106/export/db-matching-particle-board.xlsx