disease.database.dynamodb#

Provide DynamoDB client.

class disease.database.dynamodb.DynamoDbDatabase(db_url=None, **db_args)[source]#

Disease Normalizer database client for DynamoDB.

__init__(db_url=None, **db_args)[source]#

Initialize Database class.

Parameters:

db_url (str) – URL endpoint for DynamoDB source

Keyword Arguments:
  • region_name: AWS region (defaults to “us-east-2”)

add_merged_record(record)[source]#

Add merged record to database.

Parameters:

record (Dict) – merged record to add

Return type:

None

add_record(record, src_name)[source]#

Add new record to database.

Parameters:
  • record (Dict) – record to upload

  • src_name (SourceName) – name of source for record

Return type:

None

add_source_metadata(src_name, meta)[source]#

Add new source metadata entry.

Parameters:
Raises:

DatabaseWriteException – if write fails

Return type:

None

check_schema_initialized()[source]#

Check if database schema is properly initialized.

Return type:

bool

Returns:

True if DB appears to be fully initialized, False otherwise

check_tables_populated()[source]#

Perform rudimentary checks to see if tables are populated.

Emphasis is on rudimentary – if some fiendish element has deleted half of the disease aliases, this method won’t pick it up. It just wants to see if a few critical tables have at least a small number of records.

Return type:

bool

Returns:

True if queries successful, false if DB appears empty

close_connection()[source]#

Perform any manual connection closure procedures if necessary.

Return type:

None

complete_write_transaction()[source]#

Conclude transaction or batch writing if relevant.

Return type:

None

delete_normalized_concepts()[source]#

Remove merged records from the database. Use when performing a new update of normalized data.

Raises:
Return type:

None

delete_source(src_name)[source]#

Delete all data for a source. Use when updating source data.

Parameters:

src_name (SourceName) – name of source to delete

Raises:
Return type:

None

drop_db()[source]#

Delete all tables from database. Requires manual confirmation.

Raises:

DatabaseWriteException – if called in a protected setting with confirmation silenced.

Return type:

None

export_db(export_location)[source]#

Dump DB to specified location. Not available for DynamoDB database backend.

Parameters:

export_location (Path) – path to save DB dump at

Return type:

None

get_all_concept_ids(source=None)[source]#

Retrieve concept IDs for use in generating normalized records.

Parameters:

source (Optional[SourceName]) – optionally, just get all IDs for a specific source

Return type:

Set[str]

Returns:

Set of concept IDs as strings.

get_all_records(record_type)[source]#

Retrieve all source or normalized records. Either return all source records, or all records that qualify as “normalized” (i.e., merged groups + source records that are otherwise ungrouped). For example,

>>> from disease.database import create_db
>>> from disease.schemas import RecordType
>>> db = create_db()
>>> for record in db.get_all_records(RecordType.MERGER):
>>>     pass  # do something
Parameters:

record_type (RecordType) – type of result to return

Return type:

Generator[Dict, None, None]

Returns:

Generator that lazily provides records as they are retrieved

get_record_by_id(concept_id, case_sensitive=True, merge=False)[source]#

Fetch record corresponding to provided concept ID

Parameters:
  • concept_id (str) – concept ID for disease record

  • case_sensitive (bool) – if true, performs exact lookup, which is more efficient. Otherwise, performs filter operation, which doesn’t require correct casing.

  • merge (bool) – if true, look for merged record; look for identity record otherwise.

Return type:

Optional[Dict]

Returns:

complete record, if match is found; None otherwise

get_refs_by_type(search_term, ref_type)[source]#

Retrieve concept IDs for records matching the user’s query. Other methods are responsible for actually retrieving full records.

Parameters:
  • search_term (str) – string to match against

  • ref_type (RefType) – type of match to look for.

Return type:

List[str]

Returns:

list of associated concept IDs. Empty if lookup fails.

get_source_metadata(src_name)[source]#

Get license, versioning, data lookup, etc information for a source.

Parameters:

src_name (Union[str, SourceName]) – name of the source to get data for

Return type:

Optional[SourceMeta]

Returns:

source metadata, if lookup is successful

initialize_db()[source]#

Create disease_normalizer table if needed.

Return type:

None

list_tables()[source]#

Return names of tables in database.

Return type:

List[str]

Returns:

Table names in DynamoDB

load_from_remote(url=None)[source]#

Load DB from remote dump. Not available for DynamoDB database backend.

Parameters:

url (Optional[str]) – remote location to retrieve gzipped dump file from

Return type:

None

update_merge_ref(concept_id, merge_ref)[source]#

Update the merged record reference of an individual record to a new value.

Parameters:
  • concept_id (str) – record to update

  • merge_ref (Any) – new ref value

Raises:

DatabaseWriteException – if attempting to update non-existent record

Return type:

None