entity_store
A repository of typed entities, retrievable by their external reference
Entity object API: entity.entity_type --> string used for groupung entity.external_ref --> lookup name entity.origin --> one-time settable parameter, set by the entity store entity.validate() --> must return True [for valid entities and False for invalid ones] entity.name --> printable name
Optional: entity.uuid --? used for entity retrieval
EntityStore
Bases: object
names
property
Return a mapping of data source to semantic reference, based on the catalog_names property. This is used by a catalog interface to convert entity origins from physical to semantic.
If a single data source has multiple semantic origins, only the most-downstream one will be kept. If there are multiple semantic origins for the same data source in the same archive, one will be kept at random. This should be avoided and I should probably test for it when setting catalog_names.
Returns:
Type | Description |
---|---|
|
source
property
The catalog's original source is the "master descriptor" of the catalog's content. This is required for subclass methods to work properly, in the event that the original source is called upon.
Returns:
Type | Description |
---|---|
|
__getitem__(item)
Client-facing entity retrieval. item is a key that can be converted to a valid UUID from self._ref_to_key()-- either a literal UUID, or a string containing something matching a naive UUID regex.
First checks upstream, then local.
Returns None if nothing is found
NOTE: IT IS REALLY PATHOLOGICALLY BROKEN TO RETURN None INSTEAD OF RAISING KeyError #FounderCode
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item |
|
required |
Returns:
Type | Description |
---|---|
|
__init__(source, ref=None, quiet=True, static=False, dataReference=None, ns_uuid=None, no_validate=None, **kwargs)
An EntityStore is a provenance structure for a collection of entities. Ostensibly, an EntityStore has a single source from which entities are collected. The source is a resolvable URI that indicates a data resource from which data describing the entities can be extracted. The exact manner of extracting data from resources is subclass-dependent.
The desired key is specified during the call to _add(entity, key). Internally, if the entity has a 'uuid' attribute and it is set (validity not checked), then the uuid is . If the external references do not contain UUIDs, it is recommended to derive a UUID3 using an archive-specific, stable namespace ID. The class-level _ns_uuid_required attribute governs this option: - if True, an ns_uuid argument must be provided when the class is instantiated. This is consistent with a use case in which it is desirable to have predictable, fixed UUIDs (i.e. to interface with a data system that requires stable UUIDs)
-
if False, a random ns_uuid is generated, and used to create a UUID anytime an entity is given a non-UUID external_ref
-
if None, UUID3 are not used and any supplied ns_uuid argument is ignored. external_refs must always be UUIDs.
There is still some refactoring to be done, to try to eliminate the need for externally visible UUIDs anywhere.
An archive has a single semantic reference that describes the data context from which its native entities were gathered. The reference is given using dot-separated hierarchical terms in order of decreasing semantic significance from left to right. The leftmost specifier should describe the maintainer of the resource (which defaults to 'local' when a reference argument is not provided), followed by arbitrarily more precise specifications. Some examples are: local.lcia.traci.2.1.spreadsheet ecoinvent.3.2.undefined
The purpose for the source / reference distinction is that in principle many different sources can all provide the same semantic content: for instance, ecoinvent can be accessed from the website or from a file on the user's computer. In principle, if the semantic reference for two archives is the same, the archives should contain excerpts of the same data, even if drawn from different sources.
An entity is uniquely identified by its link property, which is made from concatenating the semantic origin and a stable reference known as an 'external_ref', as 'origin/external_ref'. The first slash is the delimiter between origin and reference. Examples:
elcd.3.2/processes/00043bd2-4563-4d73-8df8-b84b5d8902fc uslci.ecospold/Acetic acid, at plant
Note that the inclusion of embedded whitespace, commas, and other characters indicate that these semantic origins are not proper URIs.
It is hoped that the user community will help develop and maintain a consistent and easily interpreted namespace for semantic origins. If this is done, it should be possible to identify any published entity with a concise reference.
When an entity is first added to an archive, it is assigned that archive's reference as its origin, following the expectation that data about the same reference from different sources is the same data.
When an entity with a different origin is added to an archive, it is good practice to add a mapping from that origin to its source in the receiving archive's "catalog_names" dictionary. However, since the entity itself does not know its archive's source, this cannot be done automatically.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
physical data source-- where the information is being drawn from |
required | |
ref |
optional semantic reference for the data source. gets added to catalog_names. |
None
|
|
quiet |
|
True
|
|
static |
[False] whether archive is expected to be unchanging. |
False
|
|
dataReference |
alternative to ref |
None
|
|
ns_uuid |
required to store entities by common name. Used to generate uuid3 from string inputs. |
None
|
|
no_validate |
if True, skip validation on entity add |
None
|
|
kwargs |
any other information that should be serialized with the archive |
{}
|
create_descendant(archive_path, signifier=None, force=False)
Saves the archive to a new source with a new semantic reference. The new semantic ref is derived by (a) first removing any trailing ref that matches [0-9]{8+} (b) appending the descendant signifier (c) appending the current date in YYYYMMDD format
After that: 1. The new semantic ref is added to catalog_names, 2. the source is set to archive_path/semantic.ref.json.gz, 3. load_all() is executed, 4. the archive is saved to the new source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
archive_path |
where to store the archive |
required | |
signifier |
A nonzero-length string matching [A-Za-z0-9_-]+. If not supplied, then the semantic ref is unchanged except for the date tag. |
None
|
|
force |
overwrite if file exists |
False
|
Returns:
Type | Description |
---|---|
new semantic ref. |
find_partial_id(uid, startswith=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uid |
is a fragmentary (or complete) uuid string. |
required | |
startswith |
[True] use .startswith instead of full regex |
True
|
Returns:
Type | Description |
---|---|
result set |
get_uuid(key)
Deprecated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
|
required |
Returns:
Type | Description |
---|---|
|
retrieve_or_fetch_entity(key, **kwargs)
Client-facing function to retrieve entity by ID, first checking in the archive, then from the source.
Input is flexible-- could be a UUID or key (partial uuid is just not useful)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
the identifying string (uuid or external ref) |
required | |
kwargs |
used to pass provider-specific information |
{}
|
Returns:
Type | Description |
---|---|
|
serialize(apply_changes=True, **kwargs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
apply_changes |
[True] any properties assigned to catalog refs are applied to the entities prior to saving |
True
|
|
kwargs |
|
{}
|
Returns:
Type | Description |
---|---|
|
validate_entity_list()
This whole thing is crufty and untested and never used and should be abandoned
Returns:
Type | Description |
---|---|
|
write_to_file(filename, gzip=False, complete=False, **kwargs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
|
required | |
gzip |
|
False
|
|
complete |
|
False
|
|
kwargs |
whatever is required by the subclass's serialize method |
{}
|
Returns:
Type | Description |
---|---|
|