Disease Normalizer 0.4.0.dev3#

PyPI version License PyPI - supported Python versions tests status

The Disease Normalizer resolves ambiguous references and descriptions of human diseases to consistently-structured, normalized terms. For concepts extracted from NCIt, Mondo Disease Ontology, The Human Disease Ontology, OMIM, and OncoTree, it designates a CURIE, and provides additional metadata like aliases and cross-references.

A public REST instance of the service is available for programmatic queries:

>>> import requests
>>> result = requests.get("https://normalize.cancervariants.org/disease/normalize?q=nsclc").json()
>>> result["normalized_id"]
'ncit:C2926'
>>> result["disease"]["aliases"][:5]
['Non-Small Cell Carcinoma of Lung', 'NSCLC - non-small cell lung cancer', 'Non-small cell lung cancer', 'Non-Small Cell Carcinoma of the Lung', 'non-small cell cancer of the lung']

The Disease Normalizer can also be installed locally as a Python package for fast access:

>>> from disease.query import QueryHandler
>>> from disease.database import create_db
>>> q = QueryHandler(create_db())
>>> result = q.normalize("nsclc")
>>> result.normalized_id
'ncit:C2926'
>>> result.disease.aliases[:5]
['Non-Small Cell Carcinoma of Lung', 'NSCLC - non-small cell lung cancer', 'Non-small cell lung cancer', 'Non-Small Cell Carcinoma of the Lung', 'non-small cell cancer of the lung']

The Disease Normalizer was created to support the Knowledgebase Integration Project of the Variant Interpretation for Cancer Consortium (VICC). It is developed primarily by the Wagner Lab. Full source code is available on GitHub.