Importing Ecoinvent

This noteboook shows you how to import the various flavors and versions of ecoinvent, and how to resolve any problems that occur.

In [1]:
from brightway2 import *

Basic setup

Start a new project, and install base data.

In [2]:
In [5]:
Creating default biosphere

Applying strategy: drop_unspecified_subcategories
Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA[sec]: 0.000 
Total time elapsed: 0.341 sec
Title: Writing activities to SQLite3 database:
  Started: 05/21/2015 12:30:27
  Finished: 05/21/2015 12:30:28
  Total time elapsed: 0.341 sec
  CPU %: 96.900000
  Memory %: 0.798309
Created database: biosphere3
Creating default LCIA methods

Applying strategy: set_biosphere_type
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Wrote 692 LCIA methods with 170915 characterization factors
Creating core data migrations

Ecoinvent 2.2

Importing ecoinvent 2.2 is easy and relatively fast.

In [ ]:
ei22 = SingleOutputEcospold1Importer(
    "/Users/cmutel/Documents/LCA Documents/Ecoinvent/2.2/processes",
    "ecoinvent 2.2"

All exchanges are linked, so we can write the database:

In [ ]:

To reduce memory use, we remove the stored copy by setting the importer to None:

In [8]:
ei22 = None

Ecoinvent 3.1 cutoff

The cutoff system model is similar to the ecoinvent 1.x and 2.x system model, and its easy linking algorithm means that there are no problems when importing it:

When downloading files from, take one of the following (from the 'Files' tab):

  • current_Version_3.1_allocation_default_ecoSpold02.7z
  • current_Version_3.1_consequential_longterm_ecoSpold02.7z
  • current_Version_3.1_cutoff_ecoSpold02.7z
In [ ]:
ei31cutoff = SingleOutputEcospold2Importer(
    "/Users/cmutel/Documents/LCA Documents/Ecoinvent/3.1/cutoff/datasets",
    "ecoinvent 3.1 cutoff"

Again, no linking errors, so we can write the database:

In [ ]:

Ecoinvent 3.1 default

We can do the same thing with the default system model:

In [11]:
ei31default = SingleOutputEcospold2Importer(
    "/Users/cmutel/Documents/LCA Documents/Ecoinvent/3.1/default/datasets",
    "ecoinvent 3.1 default"
Extracting ecospold2 files:
0%                          100%
[##############################] | ETA[sec]: 0.000 | Item ID: ffed8e5b-8ecb-4
Total time elapsed: 104.858 sec
Title: Extracting ecospold2 files:
  Started: 05/18/2015 11:15:39
  Finished: 05/18/2015 11:17:24
  Total time elapsed: 104.858 sec
  CPU %: 87.300000
  Memory %: 7.371628
Extracted 11329 datasets in 105.83 seconds
Applying strategy: remove_zero_amount_coproducts
Applying strategy: remove_zero_amount_inputs_with_no_activity
Applying strategy: es2_assign_only_product_with_amount_as_reference_product
Applying strategy: assign_single_product_as_activity
Applying strategy: create_composite_code
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_biosphere_by_flow_uuid
Applying strategy: link_internal_technosphere_by_composite_code
Applying strategy: delete_exchanges_missing_activity
Applying strategy: delete_ghost_exchanges
116 exchanges couldn't be linked and were deleted. See the logfile for details:
	/Users/cmutel/Library/Application Support/Brightway3/ecoinvent-import.99d620c13d8ce5ab521afdd7a1c6ab75/logs/Ecospold2-import-error.bVBLjP.log
11329 datasets
806149 exchanges
0 unlinked exchanges
(11329, 806149, 0)

So, we have no linking errors are applying the default strategies, but we do have some complaints. Let's look at the logfile:

In [26]:
fp = "/Users/cmutel/Library/Application Support/Brightway3/ecoinvent-import.99d620c13d8ce5ab521afdd7a1c6ab75/logs/Ecospold2-import-error.bVBLjP.log"

with open(fp) as f:
    lines = [x.strip() for x in f]

for line in lines[:26]:
Purging unlinked exchange:
Filename: 064b8ab0-3be5-4461-aeda-fa8cdc91d150_96f36f31-8f6a-48f8-921f-1d781b7a545f.spold
{ 'activity': '22265c57-a1cf-4aab-add0-3bd6c4ca0110',
'amount': 0.00790582069651522,
'comment': 'Calculated value, acording to paper mass ratio at factory. '
'Biogas (from sanitary landfill) burning provides most of the '
'heat on site. A direct link to the dataset "heat and power '
'co-generation, biogas, gas engine, Quebec" is made to ensure '
'this fuel specificity. This dataset does not exactly reflect '
'this specificity, as there is no co-generation but only heat '
'production, but it is the most suitable is the current version '
'of the database. ',
'flow': 'dd80f0f2-f4d5-40f0-9035-09c1a7f3f07b',
'loc': 9.07,
'name': 'heat, central or small-scale, other than natural gas',
'pedigree': { 'completeness': 1,
'further technological correlation': 1,
'geographical correlation': 5,
'reliability': 1,
'temporal correlation': 3},
'production volume': 0.0,
'scale': 0.0046,
'scale without pedigree': 0.0006,
'type': 'technosphere',
'uncertainty type': 2,
'unit': 'megajoule'}

OK, so in file 064b8ab0-3be5-4461-aeda-fa8cdc91d150_96f36f31-8f6a-48f8-921f-1d781b7a545f.spold there is a link to a flow dd80f0f2-f4d5-40f0-9035-09c1a7f3f07b from activity 22265c57-a1cf-4aab-add0-3bd6c4ca0110. The problem is that activity 22265c57-a1cf-4aab-add0-3bd6c4ca0110 doesn't produce this flow. We can see the flows it produces from the filenames, which are {activity}-{flow}.spold. Here are the available flows:

In [27]:
import os

for filename in os.listdir("/Users/cmutel/Documents/LCA Documents/Ecoinvent/3.1/default/datasets"):
    if "22265c57-a1cf-4aab-add0-3bd6c4ca0110" in filename:

So activity 22265c57-a1cf-4aab-add0-3bd6c4ca0110 only produces flow ad551fe0-84c5-471a-9acf-b7d2204fdb65.

If we look at the spold file, we can see details on this activity:

    <activityName xml:lang="en">treatment of manure and biowaste by anaerobic
    <includedActivitiesStart xml:lang="en">Input of livestock manure (cattle slurry, pig slurry,
      cattle manure) and biowaste (agroindustrial waste, municipal biowaste), used edible oil
      and glycerine) to incoming storage at the biogas plant. </includedActivitiesStart>
    <includedActivitiesEnd xml:lang="en">The treatment includes storage (and 10% of the total
      pre-treatment storage emissions) of the livestock manure and cosubstrates, anerobic
      fermentation, as well as the storage after fermentation. The activity ends with the biogas
      and digestate being available at the biogas plant. Due to the absence of reliable
      references and pertinent data, H2S emissions during substrate storage are not taken into
      account in the present study.</includedActivitiesEnd>
      <text xml:lang="en" index="1">This multi-output activity produces biogas and digestate
        from manure and biowaste. The methane content of the biogas is calculated depending on
        the substrate input.</text>
    <tag>combined production</tag>
    <tag>combined production</tag>

Combined production is a tricky business, and treatment datasets are also known to be difficult to link correctly.

We also have two interesting exchanges:

First, the reference product:

    amount="0"  (This can't be correct - you can't produce zero of your reference product)
    sourceFirstAuthor="Dauriat A." 
    <name xml:lang="en">manure, liquid, swine</name>
    <unitName xml:lang="en">kg</unitName>
    <comment xml:lang="en">Shares of substrates as in Switzerland 2009</comment>
    (... deleted properties)
    <productionVolumeComment xml:lang="en">Calculated from production volume of biogas using the
      relative outputs.</productionVolumeComment>
    <classification classificationId="a20a9e2f-b3f0-40fd-8358-f8adf7f419c4">
      <classificationSystem xml:lang="en">By-product classification</classificationSystem>
      <classificationValue xml:lang="en">Recyclable</classificationValue>
    <outputGroup>0</outputGroup>  (outputGroup 0 is "Reference product")

Second, our missing flow (dd80f0f2-f4d5-40f0-9035-09c1a7f3f07b) is in the file:

        mathematicalRelation="(biogas*-1)* factor_process_heat"
        sourceFirstAuthor="Dauriat A." 
        intermediateExchangeId="dd80f0f2-f4d5-40f0-9035-09c1a7f3f07b"  (Our flow!)
        activityLinkId="61e83ed0-8b91-426c-af94-62b189e8c098">         (Wrong activity)
    <name xml:lang="en">heat, central or small-scale, other than natural gas</name>
    <unitName xml:lang="en">MJ</unitName>
    <comment xml:lang="en">Calculated with a fixed factor per m3 biogas produced</comment>
    <classification classificationId="39b0f0ab-1a2f-401b-9f4d-6e39400760a4">
      <classificationSystem xml:lang="en">By-product classification</classificationSystem>
      <classificationValue xml:lang="en">allocatable product</classificationValue>
    <inputGroup>5</inputGroup>  (inputGroup 5 is "FromTechnosphere")

In this case, we are requesting an input in the original activity of something which is not produced by the linked activity, but rather consumed (it is an input!) of the linked activity. Here, the Brightway2-io library gives up and just deletes the exchange.

If you have ideas on how to handle this better, please get in touch

We don't want to make manual fixes to individual files (it is not our job, and I consider the files produced by the ecoinvent center to be the definitive versions - the last thing we need is for each software developer to develop their own database versions. SimaPro making changes left and right is already frustrating enough). But if we could define a strategy that would apply cleanly to all datasets that would be great.