term_manager
Edelen and Ingwersen et al 2017: Recommendations: "... separating flowable names from context and unit information ..."
Term Manager - used to handle string references to entities that are known by several synonyms. Specifically: - contexts, which are drawn from flows' specified 'Compartment' or other classifying properties - flowables, which are drawn from flows' uuid and external link, and Name, CasNumber, Synonyms properties - quantities, which should be recognizable by external ref or uuid
The main objective of the TermManager is to enable the use of generalized synonym matching in performing LCIA. It collects two kinds of mapping information: mapping string terms to flowables and contexts; and mapping (quantity, flowable, context) tuples to [regionalized] characterization objects.
LciaEngine is designed to handle information from multiple origins to operate as a qdb in a catalog context. The subclass adds canonical lists of flowables and contexts that would be redundant if loaded into individual archives, introduces "smart" hierarchical context lookup (CLookup), and adds the ability to quell biogenic CO2 emissions. Someday, it might make sense to expose it as a massive, central graph db.
INTERFACE
The TermManager is assumed to implement the following interface: Required for external operability: - is_lcia_engine: [bool] whether the term manager performs flow and context matching - is_context(obj): [bool] whether the supplied object maps to a known Context;;; hmm, this was implemented twice - getitem(obj): retrieve a context or None **note: this is because archives all return None for failed getitem, which I know is bad but I haven't been moved to change it yet) - get_canonical(qty): return the best-fit quantity, or raise EntityNotFou - synonyms() # currently required by BasicImplementation
Required by the default implementation: Post data: - add_quantity() - add_context() - add_flow() - add_characterization() - add_from_json()
retrieve data: - serialize() - flows_for_flowable() - factors_for_flowable() - factors_for_quantity() - get_flowable() - flowables() - quantities() - contexts()
FlowableConflict
Bases: Exception
A list of synonyms indicates two or more flowables
NoFQEntry
Bases: Exception
This exception has the specific meaning that there is no lookup for the named flow-quantity pair
TermManager
Bases: object
A TermManager is an archive-specific mapping of string terms to flowables and contexts. During normal operation it is captive to an archive and automatically harvests flowable and context terms from flows when added to the archive. It also hosts a set of CLookups for the archive's quantities, which enable fuzzy traversal of context hierarchies to find best-fit characterization factors.
When a new entity is added to the archive, there are two connections to make: combine uuids and human-readable descriptors into a common database of synonyms, and connect those sets of synonyms to a set of entities known to the local archive.
When harvesting terms from a new flow, the general approach is to merge the new entry with an existing entry if the existing one uniquely matches. If there is more than one match, there are three general strategies:
'prune': trim off the parts that match existing entries and make a new entry with only the distinct terms. this creates a larger more diffuse
'merge': merge all the terms from the new entry and any existing entries together
The mapping is prolific in both cases- adds a flowable-to-flow mapping for every flowable that matches a given flow (merge_strategy='merge' ensures that this is exactly one flowable), and adds the CF to every flowable that matches.
USAGE MODEL: TODO The Term Manager is basically a giant pile of Characterization objects, grouped by canonical query quantity, flowable, and context. The basic TermManager enforces a restriction of one CF per qq | fb | cx combination. The LciaEngine subclass uses a CLookup that can either be strict or non-strict, but nonetheless enforces one CF per qq | fb | cx | origin combination.
There are thus four things the Term Manager must be able to do: - store names of flowables and contexts from source data - retrieve canonical flowable and context names from the store - store characterizations, properly indexed by canonical names - retrieve characterizations according to a query
The components used to accomplish this are: _cm and _fm: the context and flowable synonym sets _flow_map: reverse-maps flowable terms to flows
_q_dict: a 3-level nested dictionary: defaultdict: quantity uuid -> defaultdict: flowable -> CLookup: context-> {CFs} - first level defaultdict maps quantity uuid to second level - second level defaultdict maps flowable canonical name to CLookup / subclass - third level CLookup maps context to a set of CFs _fq_map: reverse-maps flowable canonical name to a set of quantities that characterize it
__getitem__(item)
TermManager.getitem retrieves a context known to the TermManager, or None if one is not found. Getitem exposes only the contexts, since flow external_refs are used as flowable synonyms
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item |
|
required |
Returns:
Type | Description |
---|---|
|
__init__(contexts=None, flowables=None, quantities=None, merge_strategy='graft', quiet=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
contexts |
optional filename to initialize CompartmentManager |
None
|
|
flowables |
optional filename to initialize FlowablesDict |
None
|
|
merge_strategy |
(can also be specified at add_flow()) 'graft': - on conflict, follow incoming flow's link or name; add new terms to the existing flowable and discard conflicting terms. solves many "shared CAS number" problems-- by depriving the new flowable of the CAS identifier 'prune' or 'distinct': - on conflict, create a new flowable containing only new terms, discard all conflicting terms. This is the default for flows added with no context (so that distinct intermediate flows with the same name e.g. "bolt 10mm" could have different characterizations. If you want to prevent this, give your flow any nonempty context) 'merge': - aggressively merge all co-synonymous flowables. not tested. |
'graft'
|
|
quiet |
|
True
|
add_characterization(flowable, ref_quantity, query_quantity, value, context=None, origin=None, location=None, overwrite=False)
Replacement for flow-based add_characterization. THE ONLY place to create Characterization objects. Add them to all flowables that match the supplied flow.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flowable |
if not known to the flowables dict, added as a new flowable |
required | |
ref_quantity |
[entity or ref] |
required | |
query_quantity |
[entity or ref] |
required | |
value |
mandatory. either a numeric value or a dict mapping locations to values |
required | |
context |
the context for which the characterization applies. Should be a string or tuple. None means the factor applies to all contexts. |
None
|
|
overwrite |
whether to overwrite an existing value if it already exists (ignored if value is a dict) |
False
|
|
location |
(ignored if value is a dict) 'GLO' used if no location is provided |
None
|
|
origin |
(optional; origin of value; defaults to quantity.origin) |
None
|
Returns:
Type | Description |
---|---|
created or updated characterization |
add_context(context, *terms, origin=None)
We are using conflict='attach' by default so that 'emissions' and 'resources' can be attached to 'Elementary flows' if it exists. It may cause undesirable side-effects if different intermediate categories use similar terms, e.g. ('buildings', 'heat', 'steam') and ('chemical', 'heat', 'process') could result in InconsistentLineage (or else something crazy like ('chemical', 'buildings', 'heat', ('process', 'steam')) )
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context |
|
required | |
origin |
apply origin to context if it has none |
None
|
Returns:
Type | Description |
---|---|
|
add_flow(flow, merge_strategy=None)
We take a flow from outside and add its known terms. That means - adding flow's reference quantity - merging any context with the local context tree; - adding flowable terms to the flowables list; - mapping flow to all flowables (assigning flowable if none is found)
Note that all flows having null context are forced to be distinct from one another. If you want two flowables to share properties, you must add them having non-null contexts. not sure how I feel about this but it does work.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow |
|
required | |
merge_strategy |
overrule default merge strategy |
None
|
Returns:
Type | Description |
---|---|
the Flowable object to which the flow's terms have been added |
add_flow_terms(flow, merge_strategy=None)
This process takes in an inbound FlowInterface instance, identifies the flowable(s) that match its terms, and adds new terms to the existing or new flowable. May update a flow's name in case of conflict.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow |
|
required | |
merge_strategy |
|
None
|
Returns:
Type | Description |
---|---|
|
add_from_json(j, q_map, origin=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
j |
|
required | |
q_map |
a dict whose keys are external_refs and uuids, and whose values are quantities |
required | |
origin |
|
None
|
Returns:
Type | Description |
---|---|
|
factors_for_flowable(flowable, quantity=None, context=None, **kwargs)
This is the method that actually performs the lookup. Other methods are wrappers for this.
If context is None, this is "unspecified". It matches all CFs regardless of context.
Core to this is getting a canonical context, which is done by getitem. In TermManager, this returns None for contexts that are not known
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flowable |
a string |
required | |
quantity |
a quantity known to the quantity manager |
None
|
|
context |
[None] default provide all contexts; must explicitly provide 'none' to filter by null context |
None
|
Returns:
Type | Description |
---|---|
|
factors_for_quantity(quantity, flowable=None, context=None, **kwargs)
param dist [0] only used if compartment is specified. by default report only exact matches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
quantity |
|
required | |
flowable |
|
None
|
|
context |
|
None
|
Returns:
Type | Description |
---|---|
|
flowables(search=None, origin=None, quantity=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
origin |
used in subclass |
None
|
|
search |
|
None
|
|
quantity |
|
None
|
Returns:
Type | Description |
---|---|
|
get_flowable(term, strict=True)
Input is a Flow or str
Parameters:
Name | Type | Description | Default |
---|---|---|---|
term |
|
required | |
strict |
[True] if the input corresponds to more than one flowable and strict is True, raise FlowableConflict |
True
|
Returns:
Type | Description |
---|---|
|
serialize(origin, *quantities, values=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
origin |
CFs are limited to: (for quantities belonging to origin) all CFs (for other quantities) only CFs that match origin. If origin is None, all quantities are serialized with all CFs |
required | |
quantities |
|
()
|
|
values |
|
False
|
Returns:
Type | Description |
---|---|
3-tuple: - serialized term manager as dict, - set of query quantity external refs, - set of reference quantity uuids == not sure WHY UUIDs rather than external refs, but that is what characterizations.py gives us |
synonyms(term)
Search for synonyms, first in contexts, then flowables, then quantities. The somewhat awkward structure here is because of the dynamics of returning generators-- using try: return self._cm.synonyms(term) except KeyError: ... the KeyError was not getting caught because the generator was already returned before iterating.
contexts are searched first, then quantities, then flowables
Parameters:
Name | Type | Description | Default |
---|---|---|---|
term |
|
required |
Returns:
Type | Description |
---|---|
|
unmatched_flowables(flowables)
Given an iterable of flowable strings, return a list of entries that were not recognized as synonyms to known flowables
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flowables |
|
required |
Returns:
Type | Description |
---|---|
|