Contributing#

Bug reports and feature requests#

Bugs and new feature requests can be submitted to the issue tracker on GitHub. See this StackOverflow post for tips on how to craft a helpful bug report.

Development setup#

Clone the repository:

git clone https://github.com/cancervariants/disease-normalization
cd disease-normalization

Then initialize a virtual environment:

python3 -m virtualenv venv
source venv/bin/activate
python3 -m pip install -e '.[dev,tests,docs]'

We use pre-commit to run conformance tests before commits. This provides checks for:

  • Code format and style

  • Added large files

  • AWS credentials

  • Private keys

Before your first commit, run:

pre-commit install

Style#

Code style is managed by Ruff, and should be checked via pre-commit hook before commits. Final QC is applied with GitHub Actions to every pull request.

Tests#

Tests are executed with pytest:

pytest

To employ testing data (e.g. in CI), first define the app configuration to utilize the test environment:

export DISEASE_NORM_ENV=test
pytest

Documentation#

The documentation is built with Sphinx, which is included as part of the docs dependency group. Navigate to the docs/ subdirectory and use make to build the HTML version:

cd docs
make html

See the Sphinx documentation for more information.

Creating and Publishing Docker images#

Note

This section assumes you have push permissions for the DockerHub organization.

It also assumes you have OMIM data located at $WAGS_TAILS_DIR/omim, see Wags-TAILS for more details.

Important

All commands in this section must be run from the root of the repository.

These instructions assume a fresh local DynamoDB setup. The local DynamoDB data is stored in a bind-mounted Docker volume and must be reset before loading new data. Reusing an existing local DynamoDB volume is not supported.

Configure environment#

Set your DockerHub organization.

export DOCKERHUB_ORG=your-org

Set the WAGS_TAILS_DIR environment variable to your location.

export WAGS_TAILS_DIR="$HOME/.local/share/wags_tails"

Set the image version from the most recent Git tag (used for API image).

export VERSION=$(git describe --tags --abbrev=0)

Set the image date tag (used for DynamoDB image).

export DATE=$(date +%F)

Reset local DynamoDB data#

The local DynamoDB volume (disease_norm_ddb_vol) is configured as a bind-mounted Docker volume that maps to the local dynamodb_local_latest directory. Because of this, both the Docker volume and the local directory must be removed to ensure a completely clean database state.

Remove the existing Docker volume.

docker volume rm disease_norm_ddb_vol

Remove the local DynamoDB data directory.

rm -rf dynamodb_local_latest

Recreate the local DynamoDB data directory.

mkdir dynamodb_local_latest

Recreate the Docker volume (bind-mounted to a local directory).

docker volume create --driver local --opt type=none --opt device="$(pwd)/dynamodb_local_latest" --opt o=bind disease_norm_ddb_vol

Build and run services locally#

To start the services and load DynamoDB:

docker compose -f compose-dev.yaml up --build

Build and publish API images#

To tag and push the API images:

docker build --build-arg VERSION=$VERSION -t $DOCKERHUB_ORG/disease-normalizer-api:$VERSION -t $DOCKERHUB_ORG/disease-normalizer-api:latest .
docker push $DOCKERHUB_ORG/disease-normalizer-api:$VERSION
docker push $DOCKERHUB_ORG/disease-normalizer-api:latest

Archive local DynamoDB data#

To archive disease_norm_ddb_vol into ./disease_norm_ddb.tar.gz:

docker run --rm \
    -v disease_norm_ddb_vol:/volume \
    -v "$(pwd)":/backup \
    alpine:3.23 \
    sh -c "cd /volume && tar czf /backup/disease_norm_ddb.tar.gz ."

Build and publish DynamoDB images#

To tag and push the DynamoDB images:

docker build -f Dockerfile.ddb -t $DOCKERHUB_ORG/disease-normalizer-ddb:$DATE -t $DOCKERHUB_ORG/disease-normalizer-ddb:latest .
docker push $DOCKERHUB_ORG/disease-normalizer-ddb:$DATE
docker push $DOCKERHUB_ORG/disease-normalizer-ddb:latest