PostgreSQL#

The Disease Normalizer can store and retrieve disease records from a PostgreSQL database. See the “Getting Started” section of the PostgreSQL documentation for basic installation instructions.

Note

See the PostgreSQL handler API reference for information on programmatic access.

Local setup#

To populate the Disease Normalizer, a connection must be established to an existing PostgreSQL database, so one must be created manually when performing Disease Normalizer setup. Most PostgreSQL distributions include the createdb utility for this purpose. For example, to create a database named disease_normalizer in a local database listening on port 5432 using the PostgreSQL user named postgres, run the following shell command:

createdb -h localhost -p 5432 -U postgres disease_normalizer

Once created, set the environment variable DISEASE_NORM_DB_URL to a connection description for that database. The following command provides a connection to a database named disease_normalizer in a local PostgreSQL instance, using port 5432, under the username postgres with no required password. See the PostgreSQL connection string documentation for more information.

export DISEASE_NORM_DB_URL=postgres://postgres@localhost:5432/disease_normalizer

Load from remote source#

The Disease Normalizer’s PostgreSQL class provides the disease_norm_update_remote shell command to refresh its data directly from a remotely-stored SQL dump, instead of acquiring, transforming, and loading source data. This enables data loading on the order of seconds rather than hours. See the command description at disease_norm_update_remote --help for more information.

By default, this command will fetch the latest data dump provided by the VICC. Alternative URLs can be set with the --data_url option:

disease_norm_update_remote --data_url=https://vicc-normalizers.s3.us-east-2.amazonaws.com/disease_normalization/postgresql/disease_norm_20230322163523.sql.tar.gz

Create SQL dump from database#

The Disease Normalizer’s PostgreSQL class also provides the disease_norm_dump shell command to create a SQL dump of current data into a file. This command will create a file named disease_norm_YYYYMMDDHHmmss.sql in the current directory; the -o option can be used to specify an alternate location, like so:

disease_norm_dump -o ~/.disease_data/

See disease_norm_dump --help for more information.