disease.database.database#

Provide core database classes and helper methods.

Users shouldn’t need to interact with the base class directly, but create_db() is the recommended way to create a database connection.

class disease.database.database.AbstractDatabase(db_url=None, **db_args)[source]#

Define a database interface.

abstract __init__(db_url=None, **db_args)[source]#

Initialize database instance.

Generally, implementing classes should be able to construct a connection by something like a libpq URL. Any additional arguments or DB-specific parameters can be passed as keywords.

Parameters:
  • db_url (Optional[str]) – address/connection description for database

  • db_args – any DB implementation-specific parameters

Raises:

DatabaseInitializationException – if initial setup fails

abstract add_merged_record(record)[source]#

Add merged record to database.

Parameters:

record (Dict) – merged record to add

Return type:

None

abstract add_record(record, src_name)[source]#

Add new record to database.

Parameters:
  • record (Dict) – record to upload

  • src_name (SourceName) – name of source for record.

Return type:

None

abstract add_source_metadata(src_name, meta)[source]#

Add new source metadata entry.

Parameters:
Raises:

DatabaseWriteException – if write fails

Return type:

None

abstract check_schema_initialized()[source]#

Check if database schema is properly initialized.

Return type:

bool

Returns:

True if DB appears to be fully initialized, False otherwise

abstract check_tables_populated()[source]#

Perform rudimentary checks to see if tables are populated. Emphasis is on rudimentary – if some fiendish element has deleted half of the disease aliases, this method won’t pick it up. It just wants to see if a few critical tables have at least a small number of records.

Return type:

bool

Returns:

True if queries successful, false if DB appears empty

abstract close_connection()[source]#

Perform any manual connection closure procedures if necessary.

Return type:

None

abstract complete_write_transaction()[source]#

Conclude transaction or batch writing if relevant.

Return type:

None

abstract delete_normalized_concepts()[source]#

Remove merged records from the database. Use when performing a new update of normalized data.

Raises:
Return type:

None

abstract delete_source(src_name)[source]#

Delete all data for a source. Use when updating source data.

Parameters:

src_name (SourceName) – name of source to delete

Raises:
Return type:

None

abstract drop_db()[source]#

Initiate total teardown of DB. Useful for quickly resetting the entirety of the data. Requires manual confirmation.

Raises:

DatabaseWriteException – if called in a protected setting with confirmation silenced.

Return type:

None

abstract export_db(export_location)[source]#

Dump DB to specified location.

Parameters:

export_location (Path) – path to save DB dump at

Raise:

NotImplementedError if not supported by DB

Return type:

None

abstract get_all_concept_ids(source=None)[source]#

Retrieve all available concept IDs for use in generating normalized records.

Parameters:

source (Optional[SourceName]) – optionally, just get all IDs for a specific source

Return type:

Set[str]

Returns:

Set of concept IDs as strings.

abstract get_all_records(record_type)[source]#

Retrieve all source or normalized records. Either return all source records, or all records that qualify as “normalized” (i.e., merged groups + source records that are otherwise ungrouped). For example,

>>> from disease.database import create_db
>>> from disease.schemas import RecordType
>>> db = create_db()
>>> for record in db.get_all_records(RecordType.MERGER):
>>>     pass  # do something
Parameters:

record_type (RecordType) – type of result to return

Return type:

Generator[Dict, None, None]

Returns:

Generator that lazily provides records as they are retrieved

abstract get_record_by_id(concept_id, case_sensitive=True, merge=False)[source]#

Fetch record corresponding to provided concept ID

Parameters:
  • concept_id (str) – concept ID for record

  • case_sensitive (bool) – if true, performs exact lookup, which may be quicker. Otherwise, performs filter operation, which doesn’t require correct casing.

  • merge (bool) – if true, look for merged record; look for identity record otherwise.

Return type:

Optional[Dict]

Returns:

complete record, if match is found; None otherwise

abstract get_refs_by_type(search_term, ref_type)[source]#

Retrieve concept IDs for records matching the user’s query. Other methods are responsible for actually retrieving full records.

Parameters:
  • search_term (str) – string to match against

  • ref_type (RefType) – type of match to look for.

Return type:

List[str]

Returns:

list of associated concept IDs. Empty if lookup fails.

abstract get_source_metadata(src_name)[source]#

Get license, versioning, data lookup, etc information for a source.

Parameters:

src_name (Union[str, SourceName]) – name of the source to get data for

Return type:

Optional[SourceMeta]

Returns:

source metadata, if lookup is successful

abstract initialize_db()[source]#

Perform all necessary parts of database setup. Should be tolerant of existing content – ie, this method is also responsible for checking whether the DB is already set up.

Raises:

DatabaseInitializationException – if initialization fails

Return type:

None

abstract list_tables()[source]#

Return names of tables in database.

Return type:

List[str]

Returns:

Table names in database

abstract load_from_remote(url=None)[source]#

Load DB from remote dump. Warning: Deletes all existing data.

Parameters:

url (Optional[str]) – remote location to retrieve gzipped dump file from

Raise:

NotImplementedError if not supported by DB

Return type:

None

abstract update_merge_ref(concept_id, merge_ref)[source]#

Update the merged record reference of an individual record to a new value.

Parameters:
  • concept_id (str) – record to update

  • merge_ref (Any) – new ref value

Raises:

DatabaseWriteException – if attempting to update non-existent record

Return type:

None

class disease.database.database.AwsEnvName(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

AWS environment name that is being used

DEVELOPMENT = 'Dev'[source]#
PRODUCTION = 'Prod'[source]#
STAGING = 'Staging'[source]#
exception disease.database.database.DatabaseException[source]#

Create custom class for handling database exceptions

exception disease.database.database.DatabaseInitializationException[source]#

Create custom exception for errors during DB connection initialization.

exception disease.database.database.DatabaseReadException[source]#

Create custom exception for lookup/read errors

exception disease.database.database.DatabaseWriteException[source]#

Create custom exception for write errors

disease.database.database.confirm_aws_db_use(env_name)[source]#

Check to ensure that AWS instance should actually be used.

Return type:

None

disease.database.database.create_db(db_url=None, aws_instance=False)[source]#

Database factory method. Checks environment variables and provided parameters and creates a DB instance.

Generally prefers to return a DynamoDB instance, unless all DDB-relevant environment variables are unset and a libpq-compliant URI is passed to db_url.

Parameters:
  • db_url (Optional[str]) – address to database instance

  • aws_instance (bool) – use hosted DynamoDB instance, not local DB

Return type:

AbstractDatabase

Returns:

constructed Database instance