Benchmarking FHWA I - Background Linking
Benchmarking the FHWA Framework Study I
Part I - Background Linking
Antelope is a complete (though not yet fully comprehensive) LCA computing environment, but the first task it was created for is benchmarking of existing datasets. Today we are going to look at a Federal LCA Commons dataset that is the product of years of multi-stakeholder work across industry, academia, and state and federal government agencies: the Hot Mix Asphalt LCA Framework model prepared under the Federal Highway Administration’s (FHWA) Sustainable Pavement Project. Read all about the Asphalt Framework here (if you prefer 15-MB government project reports) or here if you prefer paywalled academic articles.
You can follow along with this article with the walkthrough (jupyter) in the User Support repository.
Overview of the data source
The Asphalt Framework is a guidance document describing how LCA could be used effectively for pavement procurement. The study, on the other hand, is an actual LCA that was conducted in support of a PCR for hot mix asphalt on behalf of the National Asphalt and Pavement Association (NAPA). Read the LCA study report.
Obtaining the Repository
The study model was submitted to the US Federal LCA Commons. So, we go there to download it.
We download the full dataset as JSON-LD 2. Save it on a computer in a location you remember.
Accessing the Repository
We do our work in a catalog. The catalog doesn’t depend on any resources on our computer to start, other than the default flow content that comes bundled with antelope_core
.
1
2
from antelope_core import LcCatalog
cat = LcCatalog()
We add a new resource that connects the study and assigns it an origin. A resource has four required components: an origin which you use to refer to it; a source which points to the data, a data source type which tells Antelope what kind of data provider to use to access the data, and a list of interfaces the data supports.
In Antelope, resources provide different types of information about the LCA data via different interfaces. Here’s what each interface provides:
basic
: documentary information (properties)exchange
: data about process inventories and exchange valuesquantity
: data about flow properties and characterization.
Generally OpenLCA datasets include all three.
[!TIP] Idea! Maybe different data source types should have sensible default interface values.
1
2
3
cat.new_resource('my.fhwa`, `/path/to/Federal_Highway_Administration-mtu_pavement.zip', 'OpenLcaJsonLdArchive',
('basic', 'exchange', 'quantity'))
cat.show_interfaces()
1
2
my.fhwa [basic, exchange, quantity]
local.qdb [basic, index, quantity]
Then we can interact with the resource by querying it.
1
q = cat.query('my.fhwa')
At this point we can retrieve specific datasets if we know their IDs, but we can’t do search or discovery because the data source is not yet indexed. So, we can search on the Federal LCA commons and identify datasets to retrieve:
1
2
p = q.get('72d5a381-8cae-4e1d-b0a3-26cc43b69867')
p.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
ProcessRef catalog reference (72d5a381-8cae-4e1d-b0a3-26cc43b69867)
origin: my.fhwa
UUID: 72d5a381-8cae-4e1d-b0a3-26cc43b69867
Name: Asphalt binder, 8% ground rubber tire (GRT), consumption mix, at terminal, from crude oil, 8% ground rubber tire
Comment:
Exchange ID 629 (e10e86a1-fd6a-3ac5-a822-55762e8ae99d): Conversion Error from unit kBq to kg
Exchange ID 1053 (bdff254b-ff47-305e-99ee-ff2dddd46876): Conversion Error from unit kBq to kg
==Local Fields==
Classifications: ['42: Wholesale Trade', '4247: Petroleum and Petroleum Products Merchant Wholesalers']
SpatialScope: Northern America
TemporalScope: {'begin': '2015-12-31-05:00', 'end': '2022-12-31-05:00'}
@type: Process
description: This cradle-to-gate dataset covers all relevant process steps and technologies for production of asphalt binder with high overall data quality. The inventory is based on primary data from twelve refineries and eleven terminals in North America. The product boundary....
Moreover, the data source by itself is not sufficient to perform LCI or LCIA operations because those require a linked technology matrix. In Antelope, this is delivered via the background interface, which depends on an index of the data.
Add an index interface
First, we index the data source- this means loading all its data. An index includes a list of all processes, flows, quantities, and contexts referenced in the data, as well as the reference exchanges for each process.
1
cat.index_ref('my.fhwa')
1
2
3
4
5
6
Loading /data/LCI/FedCommons/Federal_Highway_Administration-mtu_pavement.zip
fedcommons.fhwa.index.20250805: None
fedcommons.fhwa.index.20250805: Setting NSUUID (None) 6f9c793c-a94a-4b1b-b7c4-39de328ef486
Ignoring ns_uuid specification
fedcommons.fhwa: /data/LCI/FedCommons/Federal_Highway_Administration-mtu_pavement.zip
'my.fhwa.index.20250805'
Now we can see that an index interface was added to the catalog:
1
cat.show_interfaces()
1
2
3
my.fhwa [basic, exchange, quantity]
my.fhwa.index.20250805 [basic, index]
local.qdb [basic, index, quantity]
Now we can do things like counting and searching. We can also investigate linking the database for LCI computation.
Investigate background linking
The Antelope Background engine performs Tarjan ordering of Exchange data to detect strongly-connected components (i.e. collections of processes where everything depends on everything). Building this network relies on every dependent exchange being linked to a reference exchange of another process. The linking algorithm has a few ways to determine an appropriate provider.
- In most data sources, an exchange can have a “Preferred provider” (OpenLCA) or “ActivityLink” (ecospold v2) made explicit. These are always followed, as long as they are valid.
- Some flows may have only one viable target, i.e. there is precisely one database process that provides the given flow as a reference.
- Some flows may have no viable targets, in which case they become cutoffs (like emissions, but into the modeling environment instead of the natural environment)
- When flows have more than one viable target, the user must specify a preferred provider for each ambiguous flow, or else specify an algorithmic approach to pick one. At present, the only algorithmic choices available are “first” and “last” (alphabetically), “cutoff”, or “abort” (the default).
We can learn a lot about a dataset by inspecting its linking characteristics. There’s a tool for this called CheckTerms
. The tool requires a query with the index and exchange interfaces, and inspects each exchange for valid terminations. For our FHWA database it tells us the following:
1
2
from antelope_core import CheckTerms
check = CheckTerms(q)
1
2
3
4
5
6
7
8
9
10
1298 processes
1485 reference exchanges
260476 dependent exchanges
anchored: 6952 exchanges
cutoff: 4832 exchanges
elementary: 248663 exchanges
self: 16 exchanges
broken: 11 exchanges
ambiguous: 2 exchanges
We see that most of the exchanges are proper, but several are “broken” and a few are “ambiguous”. A broken exchange is one in which the specified provider does not provide the linked flow. Antelope is strict about this relationship- an exchange can only be linked to a process that produces the exact same flow.
The main reason for this is to prevent errors. However, this is a design decision and could be revisited. As we will see later in this post, other LCA software does permit squishing providers into exchanges even if the flows don’t match.
Maybe we should consider permitting squishy linking in the background?
Review broken exchanges
Let’s take a look at the broken exchanges:
1
check.show_broken()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Process: [my.fhwa] Metal composite material (MCM) sheet, at plant [Northern America]
Disposal, solid waste, unspecified, to inert material landfill <--# ! Petroleum refining, at refinery [Northern America] (0)
Process: [my.fhwa] Natural soda ash (Sodium carbonate), at plant [United States]
Hazardous waste, DK ==># ! Natural gas, processed, for energy use, at plant [GLO] (0)
Process: [my.fhwa] Calcium carbonate, ground, screened grade, at plant [United States]
Disposal, mineral waste, underground deposit <--# ! Petroleum refined, for material use, at plant [GLO] (0)
Process: [my.fhwa] Calcium carbonate, ground, fine treated, 3 micron, at plant [United States]
Lubricant feedstock, at refinery <--# ! Petroleum refined, for material use, at plant [GLO] (1)
Process: [my.fhwa] Calcium carbonate, ground, fine slurry, 3 micron, at plant [United States]
Transport, combination truck, diesel powered <--# ! Petroleum refined, for material use, at plant [GLO] (1)
Process: [my.fhwa] Asphalt mix 2 - 15% RAP, 3% RAS liquid asphalt binder with SBS [GLO]
Electricity, AC, 2300-7650 V <--# ! Electricity, at Grid, US, 2010 [Northern America] (472)
Process: [my.fhwa] EPS insulation board, at plant [Northern America]
Disposal, solid waste to incineration with energy recovery <--# ! Petroleum refined, for material use, at plant [GLO] (1)
Process: [my.fhwa] Containerboard, average production, at mill [United States]
Unspecified polymer <--# ! Petroleum refining, at refinery [Northern America] (0)
Disposal, wastewater treatment plant residuals, to uns. beneficial use <--# ! Soda powder, at plant [Northern America] (0)
Process: [my.fhwa] Calcium carbonate, ground, 30 micron, at plant [United States]
Transport, combination truck, diesel powered <--# ! Petroleum refined, for material use, at plant [GLO] (1)
Liquefied petroleum gas, combusted in industrial boiler <--# ! Sulfuric acid, at plant [Northern America] (1)
The way to read this output is as follows: The broken exchanges are grouped by process. Within each process, the broken flow is shown with its direction, with the non-matching target indicated with a !. Then in the parentheses at the end of each line is the number of valid targets for the flow in the database.
So the first entry:
1
2
Process: [my.fhwa] Metal composite material (MCM) sheet, at plant [Northern America]
Disposal, solid waste, unspecified, to inert material landfill <--# ! Petroleum refining, at refinery [Northern America] (0)
Shows us that the process named ‘Metal composite material (MCM) sheet, at plant’ has an inflow of “Disposal, solid waste, unspecified, to inert material landfill”, and it’s supposedly being provided by “Petroleum refining, at refinery”. That is so screwy it must be a mistake. Let’s check the source…
Indeed, that is a broken link. Several Federal Commons datasets (including USLCI) had this problem for awhile. Most of them have been fixed, but USLCI was duplicated into the FHWA repository so the problem persists here.
The art of ignoring errors
Fortunately for us, most of these errors are trivial, and the rest are not too important because they originate in the model foreground so we can solve them with modeling later on.
First, the trivial errors: those are the ones with trailing (0)
or (1)
. See, the default behavior of the linker when encountering a broken exchange is to ignore the faulty target and attempt to find a valid target. For the flows with “0” valid targets, those cannot be linked in the current database and will simply become cutoffs. The ones with “1” valid target will simply be linked to the valid target. So both those sets of errors are trivial.
The only nontrivial errors are the ones with ambiguous matches. These include the broken exchanges with more than one valid target (marked with a *
above), along with the flows which do not have a target provider specified and have more than one valid target.
1
amb = enum(check.ambiguous_flows)
1
2
3
[00] [my.fhwa] Waste, industrial [kg]
[01] [my.fhwa] Aggregate [kg]
[02] [my.fhwa] Electricity, AC, 2300-7650 V [MJ]
1
check.show_ambiguous()
1
2
3
4
5
Process: [my.fhwa] Corn steep liquor [Northern America]
* Waste, industrial ==># ! [5622: Waste Treatment and Disposal] (2)
Process: [my.fhwa] Asphalt mix 1 - virgin mix - with EPD Aggregate [GLO]
* Aggregate <--# ! [Asphalt Mixture] (5)
The ‘Waste, industrial’ flow is also an error, in fact. Look at the targets for that flow:
1
_=enum(amb[0].targets())
1
2
[00] [my.fhwa] Metal panel, insulated, at plant [Northern America]
[01] [my.fhwa] Coil, coating, m2, at plant [Northern America]
Neither of those are actual providers for industrial waste management. Those should be designated as “cutoff” in any case.
The other two flows both belong to the model foreground- we will want to specify those anchors manually, which we will do during modeling.
We do an end-run around the linking problem by telling the linking algorithm to simply set ambiguous link targets to “cutoff”.
First, we create a background interface, then we link the background. If we try using the default settings, the attempt will fail. So, we specify to simply “cutoff” flows with multiple targets:
1
2
cat.background_for_origin('my.fhwa')
cat.query('my.fhwa').check_bg(multi_term='cutoff')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Creating flat background
c245a252-0860-41d2-9789-802bab7984ab: Disposal, wastewater treatment plant residuals, to uns. beneficial use [Input]: Target 4cb0c558-b0ac-3656-bc1f-95b477c14921 MISSING REFERENCE
c245a252-0860-41d2-9789-802bab7984ab: Unspecified polymer [Input]: Target 0aaf1e13-5d80-37f9-b7bb-81a6b8965c71 MISSING REFERENCE
08e766d4-ec5f-3426-a0d4-9533e55f9081: Disposal, solid waste, unspecified, to inert material landfill [Input]: Target 0aaf1e13-5d80-37f9-b7bb-81a6b8965c71 MISSING REFERENCE
0d95cc8b-a9a0-3630-a760-1ab4d88257d8: Hazardous waste, DK [Output]: Target d3db6ab2-33de-453b-ac74-0567ba7fa95b MISSING REFERENCE
247aa74a-8368-37b1-a090-96a1428ef30f: Disposal, mineral waste, underground deposit [Input]: Target 70d2004a-9f80-48ce-a86c-50e85dd6a637 MISSING REFERENCE
34487c89-62dd-3d4e-98dd-06f0c382cc17: Lubricant feedstock, at refinery [Input]: Target 70d2004a-9f80-48ce-a86c-50e85dd6a637 MISSING REFERENCE
48be22f1-e30c-3e4a-8832-a9c5ed32350b: Transport, combination truck, diesel powered [Input]: Target 70d2004a-9f80-48ce-a86c-50e85dd6a637 MISSING REFERENCE
4bc4086f-29ef-44e6-af32-845b505fd869: Electricity, AC, 2300-7650 V [Input]: Target 89389d98-1ba6-30c5-9c33-92443694936b MISSING REFERENCE
66691548-5be9-3c87-bf87-2f38136bc7bf: Disposal, solid waste to incineration with energy recovery [Input]: Target 70d2004a-9f80-48ce-a86c-50e85dd6a637 MISSING REFERENCE
d48fa04d-9401-3158-9ad1-8f37b902e397: Liquefied petroleum gas, combusted in industrial boiler [Input]: Target 1b6afe73-a064-33a3-87b0-265ae10851e9 MISSING REFERENCE
d48fa04d-9401-3158-9ad1-8f37b902e397: Transport, combination truck, diesel powered [Input]: Target 70d2004a-9f80-48ce-a86c-50e85dd6a637 MISSING REFERENCE
Missing reference (term:4cb0c558-b0ac-3656-bc1f-95b477c14921;flow:444bea1c-306c-361e-975e-68663d7a25e7)
Missing reference (term:0aaf1e13-5d80-37f9-b7bb-81a6b8965c71;flow:66f981ad-439c-3f19-a925-11eb2d347545)
Missing reference (term:0aaf1e13-5d80-37f9-b7bb-81a6b8965c71;flow:60c44220-4e65-3d83-8769-6534f8a4c284)
Missing reference (term:d3db6ab2-33de-453b-ac74-0567ba7fa95b;flow:e20ae2e6-868f-3926-93c5-37cbdf6ba0ee)
Missing reference (term:70d2004a-9f80-48ce-a86c-50e85dd6a637;flow:5facd189-c993-3221-95dd-2c56ea7777f4)
Missing reference (term:70d2004a-9f80-48ce-a86c-50e85dd6a637;flow:99479e91-ab64-3c73-86cb-d96373f37939)
Missing reference (term:70d2004a-9f80-48ce-a86c-50e85dd6a637;flow:628c07ec-0802-39c1-ab88-1c62848ef436)
Missing reference (term:89389d98-1ba6-30c5-9c33-92443694936b;flow:fc406690-160c-37d5-bf36-added9542164)
Missing reference (term:70d2004a-9f80-48ce-a86c-50e85dd6a637;flow:c747a0a0-bd5d-3a78-af02-58a5027d6cf5)
Missing reference (term:1b6afe73-a064-33a3-87b0-265ae10851e9;flow:f77da622-e84d-3639-8917-fcf792aaee13)
my.fhwa.index.20250807: <source removed>
my.fhwa: my.fhwa.index.20250807_background.mat
Completed in 13 sec
We see a lot of messages there about missing references– None of those are unexpected, and they all become cutoffs.
At this point, we have a working background interface, and we can perform LCIA.
1
cat.show_interfaces()
1
2
3
local.qdb [basic, index, quantity]
my.fhwa [basic, exchange, quantity]
my.fhwa.index.20250807 [background, basic, index]
Stay tuned for post #2 to do the LCIA walkthrough.