Defining a New Matrix

This is a tutorial for Brightway2, an open source framework for Life Cycle Assessment. This tutorial will cover defining new matrices to be used in the LCA calculation - specifically, new matrices for weighting and normalization.

The code from this example has already been included in Brightway2, and therefore for this tutorial you don't need to download this notebook, but can just read through it at your leisure.

Adding Weighting and Normalization to Brightway2

The default impact assessment methods used in Brightway2 are processed by the ecoinvent centre, and already include weighting and normalization. That is why we have method names like ('eco-indicator 99, (I,I)', 'ecosystem quality', 'total').

We want to define weighting and normalization separately from the characterization factors themselves so we can try using different weighting or normalization scenarios, and so we can apply uncertainty distributions.

Brightway2 is designed to be easily extended, and we can add some weighting and normalization without much trouble. Lets start by defining some metadata classes for our weighting and normalization, similar to the methods class. The metadata classes store a list of all available weightings and normalizations, plus some additional data about each weighting and normalization.

In [2]:
from bw2data.meta import Methods

class WeightingMeta(Methods):
    """A dictionary for weighting metadata. File data is saved in ``methods.json``."""
    _filename = "tutorial-weightings.json"

class NormalizationMeta(Methods):
    """A dictionary for normalization metadata. File data is saved in ``methods.json``."""
    _filename = "tutorial-normalizations.json"

weightings = WeightingMeta()
normalizations = NormalizationMeta()

Now we define the classes that will store the actual weighting and normalization data itself. These are based on the Impact Assessment data store class.

Weighting

Weighting is used to combine different impact categories - each category is weighted relative to the others. Here is a good review of weighting methods. As such, a weighting is just a number with an uncertainty distribution. The data format is simple:

{
    "amount": float,  
    "uncertainty_type": integer uncertainty type from stats_toolkit.uncertainty_choices (optional),
    # .. plus some fields specific to the uncertainty distribution
}

If you paid attention reading the documentation, you will recognize that this is simply an uncertainty dictionary.

We define the data storage class:

In [3]:
from bw2data.ia_data_store import ImpactAssessmentDataStore
from bw2data.meta import weightings, mapping, normalizations
import numpy as np


class Weighting(ImpactAssessmentDataStore):
    """LCIA weighting data - used to combine or compare different impact categories."""
    metadata = weightings
    dtype_fields = []

    def write(self, data):
        """Because of DataStore assumptions, need a one-element list"""
        if not isinstance(data, list) or not len(data) == 1:
            raise ValueError("Weighting data must be one-element list")
        super(Weighting, self).write(data)

    def process_data(self, row):
        """Return a tuple of length two:
        
          * additional values for `dtype_fields` (we have none, so this is empty)
          * a fixed number *or* an uncertainty dictionary, which is the input, so return it unchanged
        
        """
        return (
            (), # don't know much,
            row # but I know I love you
        )

This looks confusingly simple, because the underlying DataStore and ImpactAssessmentDataStore classes are doing much of the work for us.

First, we specify the metadata: this tells us where to register each instance of Weighting. We defined the weightings metadata store above.

Next, we specify what additional fields need to be added to our processed array. Usually we would have somehting like a biosphere flow, which would then get mapped to rows in the biosphere matrix. But as weightings are a single number, we don't need to know where they go - this will be a static value, not a matrix at all. Aside from our uncertainty dictionary, we don't need anything else, and the values for the uncertainty field are added automatically.

Similary, the definition of .process_data(row) doesn't seem to do anything, because in this case we don't need to do anything. We will have a more complicated process_data method below when defining normalization.

Example weighting

Let's make a simple weighting - the API is the same as for normal LCIA method datasets. For the distribution-specific uncertainty fields, see the stats_arrays documentation.

In [4]:
from stats_arrays import NormalUncertainty

really_important_data = [{
    "uncertainty_type": NormalUncertainty.id,
    "amount": 100,
    "sigma": 10,
    "minimum": 0
}]

ri = Weighting(("really important",))
if ("really important",) not in weightings.list:
    ri.register(description="Something we care about")
ri.write(really_important_data)
ri.process()
In [5]:
print "Weightings", weightings.list
print "Metadata", weightings[('really important',)]
print "Data", ri.load()
Weightings [(u'really important',)]
Metadata {u'abbreviation': u'reallyi-Wa2r00Hn', u'description': u'Something we care about'}
Data {'amount': 100, 'minimum': 0, 'sigma': 10, 'uncertainty_type': 3}

Normalization

Normalization is trickier, and to be perfectly honest, I don't always understand the motivation some types of normalization. The ISO 14042 says that normalization is calculating the "magnitude of indicator results relative to reference information". A lot of normalization steps are country-specific, e.g Development of the U.S. Normalization Database. In any case, normalization is different from weighting, in that the values are biosphere flow-specific.

The data format is therefore a list of flows, plus their (potentially uncertain) normalization factors,

[
    [flow tuple, e.g. ("biosphere", "cadmium"), uncertainty dictionary],
]

As opposed to weighting, with normalization we will have a list with many different elements, one for each normalized flow.

We define the data storage class:

In [7]:
from bw2data.utils import MAX_INT_32


class Normalization(ImpactAssessmentDataStore):
    """
    LCIA normalization data - used to transform meaningful units, like mass or damage, into "person-equivalents" or some such thing.

    The data schema for IA normalization is:

    .. code-block:: python

            Schema([
                [valid_tuple, maybe_uncertainty]
            ])

    where:
        * ``valid_tuple`` is a dataset identifier, like ``("biosphere", "CO2")``
        * ``maybe_uncertainty`` is either a number or an uncertainty dictionary

    """
    metadata = normalizations
    dtype_fields = [
        ('flow', np.uint32),  # 32 bit unsigned integer
        ('index', np.uint32),
    ]

    def add_mappings(self, data):
        """Add each normalization flow (should be biosphere flows) to global mapping"""
        mapping.add({obj[0] for obj in data})

    def process_data(self, row):
        """Return values that match `dtype_fields`, as well as number or uncertainty dictionary"""
        return (
            mapping[row[0]],  # Integer number corresponding to biosphere flow
            MAX_INT_32,       # Will be replaced with matrix row number
            ), row[1]         # Actual certain/uncertain value

Most of these fields are similar to weighting, and indeed most new matrices will be similar, as we are doing the same thing over and over - storing data, and then putting it into matrices in a reasonable fashion. Let's look at the new elements:

First, dtype_fields is no longer empty. Instead, there are two fields, flow and index. Flow in this case means biosphere flows. Because of the way structured arrays work, this needs to be a integer, so we can't insert something like ("biosphere", "CO2"). Instead, we keep a global mapping of biosphere and technosphere flows to integers - this is what mapping is, a dictionary from flows to integers, which is in a sense just a giant counter. Each new flow inserted into mapping gets the next biggest integer. index will be the index in the matrix, e.g. row 0 or row 42.

You can now guess what add_mappings does - you could define normalization factors for completely new biosphere flows. mapping needs to know about all flows, so we add all flows in each normalization method (flows already existing are ignored by mapping).

For each dtype_field, we need to define what kind of number to use - in this case, we choose a 32 bit unsigned (i.e. implicitly positive) integer.

In process_data, we take as an input a single row from the normalization dataset, and return the values that go into the dtype_fields we define above, as well as the actual data value. In this case, for flow we in the integer number given by looking our biosphere flow up in mapping. For index, we insert a dummy value, as this will be determined dynamically as we build the matrix (we don't know how big the normalization matrix is yet).

Details of structured arrays, and how matrices are built, are covered more in the brightway2-calc documentation.

Example normalization

Again, we make a simple normalization dataset, with made up numbers.

In [8]:
greenhouse_gases = [
    {"amount": 4000, "flow": (u'biosphere', u'6382dd23b5ac86860bdc9951ab449777')},  # Carbon dioxide, fossil
    {"amount": 0.25, "flow": (u'biosphere', u'34abd8a0c832e8bc96ef5e560b574a05')}   # Dinitrogen monoxide
]

gg = Normalization(("some greenhouse gases",))  # Names have to be a tuple, just like IA methods
if ("some greenhouse gases",) not in normalizations.list:
    gg.register(description="Some like it hot")
gg.write(greenhouse_gases)
gg.process()
In [9]:
print "Normalizations", normalizations.list
print "Metadata", normalizations[('some greenhouse gases',)]
print "Data:"
gg.load()
Normalizations [(u'some gases',), (u'some greenhouse gases',)]
Metadata {u'abbreviation': u'somegg-o83vbiDK', u'description': u'Some like it hot'}
Data [{'amount': 4000, 'flow': (u'biosphere', u'6382dd23b5ac86860bdc9951ab449777')}, {'amount': 0.25, 'flow': (u'biosphere', u'34abd8a0c832e8bc96ef5e560b574a05')}]

Applying to LCA

Now we need to use all the data managers we just defined in an actual LCA. We do this by creating a subclass of LCA:

In [10]:
from bw2calc.lca import LCA
from bw2calc.matrices import MatrixBuilder
from bw2calc.utils import load_arrays

class ComplicatedLCA(LCA):
    def __init__(self, demand, method=None, weighting=None,
                 normalization=None, config=None):
        super(ComplicatedLCA, self).__init__(  # Do all the other initialization stuff
            demand, 
            method=method, 
            config=config
        )
        self.weighting = weighting             # Our weighting method
        self.normalization = normalization     # Our normalization method

    def load_normalization_data(self, builder=MatrixBuilder):
        """Load normalization data."""
        self.normalization_params, _, _, self.normalization_matrix = \  # _ is common python shorthand for an ignored value
            builder.build(
                self.dirpath,
                [Normalization(self.normalization).filename],  # Filename of processed array
                "amount",
                "flow",
                "index",
                row_dict=self.biosphere_dict,
                one_d=True
            )

    def load_weighting_data(self):
        """Load weighting data, a 1-element array."""
        self.weighting_params = load_arrays(
            self.dirpath,
            [Weighting(self.weighting).filename]
        )
        self.weighting_value = self.weighting_params['amount']

    def normalize(self):
        assert hasattr(self, "characterized_inventory"), \
            "Must do LCI and LCIA before normalization"
        self.load_normalization_data()
        self.normalized_inventory = \
            self.normalization_matrix * self.characterized_inventory

    def weight(self):
        assert hasattr(self, "normalized_inventory"), \
            "Must do LCI, LCIA, and normalization before weighting"
        self.load_weighting_data()
        self.weighted_inventory = \
            self.weighting_data['amount'][0] * self.normalized_inventory  # Just multiplying by a weighting value

Hopefully the comments in the source code are self-explanatory. See the above mention brightway2-calc documentation, and especially MatrixBuilder, for details on how to turn processed arrays into matrices.

We can then apply our weighting and normalization to a inventory dataset from ecoinvent 2.2:

In [11]:
lca = ComplicatedLCA(
    demand={Database('ecoinvent 2.2').random(): 1}, 
    method=(u'IPCC 2001', u'climate change', u'GWP 500a'),
    weighting=("really important",),
    normalization=("some greenhouse gases",)
)
lca.lci()
lca.lcia()
lca.normalize()
lca.weight()
print "Normalized and weighted score:", lca.weighted_inventory.sum()
Normalized and weighted score: 400025.0

Of course, this result doesn't mean anything, because our normalization and weighting figures were made up.

Conclusions

In a relatively few lines of code, we added the normalization and weighting steps, including the ability to give these values uncertainty distributions. Because Brightway2 is open source and flexible, you can add similar functionality in ways that meet your specific calculation needs - you have the freedom to add or remove any functionality you want.

  • Brightway2 makes it relatively easy to add extra steps like weighting and normalization to impact assessment.
  • Brightway2 base classes like ImpactAssessmentDataStore and MatrixBuilder are made to be easily extended.
  • You can add whatever additional steps or transformations you want - you have the freedom to define and use new data stores in creative ways.