Loading and updating data#
Note
See the ETL API documentation for information on programmatic access to the data loader classes.
Full load/reload#
Calling the Disease Normalizer update command with the --update_all
and --update_merged
flags will delete all existing data, fetch new source data if available, and then perform a complete reload of the database (including merged records):
disease_norm_update --update_all --update_merged
Reload individual source#
To update specific sources, call the --sources
option with one or more source name(s) quoted and separated by spaces. While it is possible to update individual source data without also updating the normalized record data, that may affect the proper function of the normalized query endpoints, so it is recommended to include the --update_merged
flag as well.
disease_norm_update --sources="HGNC NCBI" --update_merged
Use local data#
The Disease Normalizer will fetch the latest available data from all sources if local data is out-of-date. To suppress this and force usage of local files, use the –use_existing flag:
disease_norm_update --update_all --use_existing
Check DB health#
The shell command disease_norm_check_db
performs a basic check on the database status. It first confirms that the database’s schema exists, and then identifies whether metadata is available for each source, and whether disease record and normalized concept tables are non-empty. Check the process’s exit code for the result (per the UNIX standard, 0
means success, and any other return code means failure).
$ disease_norm_check_db
$ echo $?
1 # indicates failure
This command is equivalent to the combination of the database classes’ check_schema_initialized
and check_tables_populated
methods:
from disease.database import create_db
db = create_db()
db_is_healthy = db.check_schema_initialized() and db.check_tables_populated()