Phenotype Ontology Library: Phenol¶
Phenol is a modern Java (version 8 and above) library for working with phenotype and other attribute ontologies (including Gene Ontology). Phenol provides full support for working with the Human Phenotype Ontology and HPO-based disease annotations.
Authors¶
- Sebastian Bauer
- Peter N. Robinson
- Sebastian Koehler
- Manuel Holtgrewe
- HyeongSik Kim
- Michael Gargano
- Jules Jacobsen
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/monarch-initiative/phenol/issues
If you are reporting a bug, please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the Github issues for bugs. If you want to start working on a bug then please write short message on the issue tracker to prevent duplicate work.
Implement Features¶
Look through the Github issues for features. If you want to start working on an issue then please write short message on the issue tracker to prevent duplicate work.
Write Documentation¶
Phenol could always use more documentation, whether as part of the official vcfpy docs, in docstrings, or even on the web in blog posts, articles, and such.
Phenol uses Sphinx for the user manual (that you are currently reading). See doc_guidelines on how the documentation reStructuredText is used. See doc_setup on creating a local setup for building the documentation.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/monarch-initiative/phenol/issues
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project, and that contributions are welcome :)
Documentation Guidelines¶
For the documentation, please adhere to the following guidelines:
- Put each sentence on its own line, this makes tracking changes through Git SCM easier.
- Provide hyperlink targets, at least for the first two section levels.
- Use the section structure from below.
.. heading_1:
=========
Heading 1
=========
.. heading_2:
---------
Heading 2
---------
.. heading_3:
Heading 3
=========
.. heading_4:
Heading 4
---------
.. heading_5:
Heading 5
~~~~~~~~~
.. heading_6:
Heading 6
:::::::::
Documentation Setup¶
For building the documentation, you have to install the Python program Sphinx. This is best done in a virtual environment. The following assumes you have a working Python 3 setup.
Use the following steps for installing Sphinx and the dependencies for building the phenol documentation:
$ cd phenol/manual
$ virtualenv -p python3 .venv
$ source .venv/bin/activate
$ pip install --upgrade -r requirements.txt
Use the following for building the documentation.
The first two lines is only required for loading the virtualenv.
Afterwards, you can always use make html
for building.
$ cd phenol/manual
$ source .venv/bin/activate
$ make html # rebuild for changed files only
$ make clean && make html # force rebuild
Get Started!¶
Ready to contribute? First, create your Java/Documentation development setup as described in install_from_source/doc_setup.
Fork the phenol repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/phenol.git
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making your changes, make sure that the build runs through. For Java:
$ mvn package
For documentation:
$ make clean && make html
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
- The pull request should include tests.
- If the pull request adds functionality, the docs should be updated.
- Describe your changes in the
CHANGELOG.md
file.
Changelog¶
latest¶
v2.0.1¶
Minor changes¶
- Use Jackson annotations to configure serialization of TermId and Identified
- update dependencies
v2.0.0¶
- Upgrade to Java 11+, add
module-info
files - Support for GO GAF 2.2 files
- Speed up build by adding a new build profile
phenol-core
-MinimalOntology
has a version - do not use non-propagating relationships during ontology traversalsphenol-io
- Dropping support for reading OBO/OWL ontologies - drop non-modular curie-util dependencyphenol-annotations
- RemodelHpoDisease
,HpoAnnotation
,HpoAssociationData
,GeneIdentifier``s, etc.. - ``HpoDiseases
has a version - Model temporal elements - ImplementHGNCGeneIdentifierLoader
for readingGeneIdentifiers
from HGNC complete set archive. - Add newHpoOnset
terms. - Consolidate hardcoded HPO constants (TermId``s) into ``org.monarchinitiative.phenol.annotations.constants.hpo
package - Standardize HPO annotations header, ensure the parsers can read the older releases. - Deprecate the code for parsing small files and move to hpoannotQC
v1.6.3¶
- added class for efficient precalculation of Resnik scores for HPO
- various bug fixes
v1.6.1¶
- Fixed issue with parsing Orphanet en_product4.xml
v1.6.0¶
- Simplified interface to GO overrepresentation analysis classes.
- Parent-Child Gene Ontology overrepresentation analysis with unit tests
- MGSA bugfix
v1.5.0¶
- improved functions for display of upper level HPO categories
v1.4.3¶
- fixing bug in parsing of MGI Genetic Marker file
- fixing bug in parsing of orphanet genes file, en_product6.xml
v1.4.2¶
- Prototype ingest of JSON ontology files
- flexible handling of relation types
- bug fix of previously incorrect handling of tRNA genes
v1.4.1¶
- Added workaround for duplicated lines in Homo_sapiens_gene_info file
- Added phenotype to gene extraction
v1.4.0¶
- Added Orphanet inheritance parser
- Aded several demonstration programs
- refactored files for constructing the phenotype.hpoa file
v1.3.3¶
- Various bug fixes
- Orphanet inheritance XML file ingest
- Adding additional demo app to show how to access Term information (hpdemo).
v1.3.0¶
- refactored TermAnnotation interface to use TermId instead of String to identify objects being annotation
- refactored GoGaf21Annotation class to use TermId internally instead of Strings for db and dbObjectId
- refactored to use junit 5 (allowing legacy use of junit 4, will migrate completely in coming releases)
v1.2.5-SNAPSHOT¶
- moving to SNAPSHOT version names to conform with maven standards
- fixed bug in initialized association lists for Gene Ontology analysis.
v1.1.4¶
- Adding parsing of onset, modifier, PMID/source to HpoAnnotation class
- Adding all relation types relevant to MONDO
v1.1.3¶
- Adding parsing of relations other than IS_A for Gene Ontology
- Fixing calculation of frequency (double) from frequency category
- allowing any valid curie as cross-ref
v1.1.2¶
- Adding MP annotation parser for MGI_GenePheno.rst and MGI_Pheno_Sex.rst
v1.1.1¶
- HPO Annotation parser now indexes diseases as a TermId representing the disease CURIE, e.g., MONDO:0000042.
- HPO Annotation parser now uses new ‘big-file’ format (with updated treatment of biocuration field)
v1.0.3¶
- refactored MP and GO parsing to use new OWLAPI-based parser
- adding support for adding artificial root to ontologies such as GO with multiple root terms.
- upgraded to obographs v0.1.1
v1.0.2¶
- refactored TermId to remove superfluous interface and renamed ImmutableTermId to TermId
- refactored TermSynonym to remove superfluous interface
- adding support for alt term ids to Owl2OboTermFactory (class renamed from GenericOwlFactory)
- adding support for database_cross_reference (usually PMID, ISBM, HPO, or MGI–added to term definitions)
v1.0.0¶
- completed refactoring to use single Term/Relationship. The API is not backwards compatible with versions prior to v0.1.9.
v0.1.9¶
- refactored to use just a single Term and Relationship instead of having separate types for each ontology. Simplified
classes that were templated to allow e.g., MpoTerm, MpoRelationship by hardcoding Term,Relationship and removing template.
v0.1.8¶
- refactored HpoAnnotation from HpoTermId
v0.1.7¶
- refactored phenol to use JGraphT library
- Adding OWLAPI based parser
- Refactoring HPO Disease annotation parser
v0.1.6¶
- refactored HPO disease annotation parser (changed API)
v0.1.5¶
- changed package and project name to phenol - Phenotype Ontology Library
v0.1.4¶
- fix to GOA parser
- added HPODiseaseWithMetaData parser
- added functions to calculate Term relationships (sibling, subclass, related, not-related)
v0.1.2¶
- refactored HpoFreqeuncy class to return frequencies (i.e., a number in [0,1]) rather than percentage
- Added HpoOnset classes
- Added HpoDiseaseWithMetadata class to encompass frequency and onset data
v0.4/v0.1.1¶
- forked from ontolib
- fixed mp.obo parse error
- fixed subontology creation error (TermMap, TermRelation)
- Adding Adding class
OntologyAlgorithm
with test classOntologyAlgorithmTest
.
Implements functions to get children, parents, descendents and ancestors.
v0.3¶
xref
tags are now parsed and their content is available inTerm
. Added appropriate classes for representation.- Added
Ontology.getParent()
. - Removed
JaccardIcWeightedSimilarity
,JiangSimilarity
,LinSimilarity
, supporting code and tests. - Refactoring the code for object score I/O into
ontolib-io
package. - Adding support for score distribution reading and writing to H2 database files.
Ontology.getAncestorTermIds()
now also resolves alternative term IDs.- Fixing dependency on slf4j components in
ontolib-core
andontolib-io
. - Adding
getPrimaryTermId()
inOntology
.
v0.2¶
- Making date parser for HPO annotation files more robust. It works now for positive and negative associations.
- Small bug fix in HPO OBO parser.
- Adding
ontolib-cli
package that allows score distribution precomputation from the command line. - Removed some dead code.
- Added various tests, minor internal refactoring.
- Moved
OntologyTerms
intoontology.algo
package.
v0.1¶
- Everything is new.
Working with HPO Annotations¶
Please see Working with the Term class for information about how to access individual HPO terms. This tutorial intends to explain how to access HPO annotation data as contained in the phenotype.hpoa file.
Parsing Annotation Files¶
You can parse the phenotype-to-disease annotation files as follows.
import org.monarchinitiative.phenol.ontology.data.TermId;
import org.monarchinitiative.phenol.formats.hpo.HpoDisease;
import org.monarchinitiative.phenol.io.obo.hpo.HpoDiseaseAnnotationParser;
HpoDiseaseAnnotationParser annotationParser =
new HpoDiseaseAnnotationParser(phenotypeAnnotationPath,ontology);
try {
Map<TermId, HpoDisease> diseaseMap = annotationParser.parse();
if (!annotationParser.validParse()) {
int n = annotationParser.getErrors().size();
logger.warn("Parse problems encountered with the annotation file at {}. Got {} errors",
this.phenotypeAnnotationPath,n);
}
return diseaseMap; // or do something else with the data
} catch (PhenolException e) {
e.printStackTrace(); // or do something else
}
Parsing Annotation Files for Specific Sources¶
To limit the import to data representing diseases in the DECIPHER database, use the following code (the rest is identical). Currently, DECIPHER, OMIM, and ORPHA are available.
List<String> desiredDatabasePrefixes=ImmutableList.of("DECIPHER");
HpoDiseaseAnnotationParser annotationParser =
new HpoDiseaseAnnotationParser(phenotypeAnnotationPath,ontology,desiredDatabasePrefixes);
Using phenol for Gene Ontology¶
phenol supports working with the GO in several following ways including some GO term enrichment analysis approaches.
Loading the data¶
To perform enrichment analysis, we require the GO ontology file, the annotation file, as well as a population set (e.g., all genes in a genome) and a study set (e.g., some set of genes determined to be differentially expressed).
Ontology gontology = OntologyLoader.loadOntology(new File(pathGoObo), "GO");
final GoGeneAnnotationParser annotparser = new GoGeneAnnotationParser(pathGoGaf);
List<TermAnnotation> goAnnots = annotparser.getTermAnnotations();
AssociationContainer associationContainer = new AssociationContainer(goAnnots);
Set<TermId> populationGenes = getPopulationSet(goAnnots);
StudySet populationSet = new StudySet(populationGenes,"population",associationContainer,gontology);
Set<TermId> studyGenes = ... // get list of genes from study set
StudySet studySet = new StudySet(studyGenes,"study",associationContainer,gontology);
Perform testing for overrepresentation¶
In this example, we show how to use the exact Fisher test to assess term overrepresentation.
See the implementation in GoEnrichmentDemo.java
for more details.
Input¶
The phenol library is mainly intended to support working with the Human Phenotype Ontology, the Mammalian Phenotype Ontology, the Gene Ontology, MONDO, and ECTO, but has also been tested with the OBO version of NCIT.
Human Phenotype Ontology¶
To load the Human Phenotype Ontology (HPO), use the following code. The HPO is in the default curie map and only contains known relationships (is-a) and HP terms.
String hpoPath="/some/path/hp.obo";
Ontology hpoOntology = OntologyLoader.loadOntology(new File(hpoPath));
Mammalian Phenotype Ontology¶
The Mammalian Phenotype Ontology (MP) can be loaded using the same command.
String mpPath="/some/path/mp.obo";
Ontology hpoOntology = OntologyLoader.loadOntology(new File(mpPath));
Gene Ontology¶
The Gene Ontology (GO) is in the default curie map but also contains BFO and RO terms with unknown relationships we want to ignore these so here we specify the term prefixes we want to use. It has three possible root nodes (biological_process, cellular_component, biological_function) so an artificial GO:0000000 root is added.
String goPath="/some/path/go.obo";
Ontology goOntology = OntologyLoader.loadOntology(goPath, "GO")
Environmental conditions, treatments and exposures ontology¶
The Environmental conditions, treatments and exposures ontology (ECTO) contains multiple relationships so we’re going to simplify this graph by only loading ECTO nodes (this ignores the true root term XCO:0000000) and other nodes from CHEBI. BFO and UBERON among others.
CurieUtil curieUtil = CurieUtilBuilder.withDefaultsAnd(ImmutableMap.of("ECTO", http://purl.obolibrary.org/obo/ECTO_"));
Ontology ecto = OntologyLoader.loadOntology(ectoFile, curieUtil, "ECTO");
ecto.getRelationMap().values().forEach(relationship -> assertEquals(RelationshipType.IS_A, relationship.getRelationshipType()));
// test if you like..
assertEquals(TermId.of("owl:Thing"), ecto.getRootTermId());
assertEquals(2272, ecto.countNonObsoleteTerms());
assertEquals(0, ecto.countObsoleteTerms());
Installation¶
Use Maven Central Binaries¶
Note
This is the recommended way of installing for normal users.
Simply use the following snippet for your pom.xml
for using phenol modules in your Maven project.
<dependencies>
<dependency>
<groupId>org.monarchinitiative.phenol</groupId>
<artifactId>phenol-core</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.monarchinitiative.phenol</groupId>
<artifactId>phenol-io</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
Install from Source¶
Note
You only need to install from source if you want to develop Phenol in Java yourself.
Prerequisites¶
For building Phenol, you will need
- Java JDK 8 for compiling phenol,
- Maven 3 for building phenol, and
- Git for getting the sources.
Git Checkout and maven build¶
The following code snippet downloads the phenol sources and builds them.
$ git clone https://github.com/monarch-initiative/phenol
$ cd phenol
$ mvn package
Maven Proxy Settings¶
If you are behind a proxy, you will get problems with Maven downloading dependencies.
If you run into problems, make sure to also delete ~/.m2/repository
.
Then, execute the following commands to fill ~/.m2/settings.xml
.
$ mkdir -p ~/.m2
$ test -f ~/.m2/settings.xml || cat >~/.m2/settings.xml <<END
<settings>
<proxies>
<proxy>
<active>true</active>
<protocol>http</protocol>
<host>proxy.example.com</host>
<port>8080</port>
<nonProxyHosts>*.example.com</nonProxyHosts>
</proxy>
</proxies>
</settings>
END
License¶
Phenol is licensed under the BSD Clear 3-Clause License.
How-To: Release on Maven Central¶
This page describes the steps to release Phenol on Maven Central.
Read the following first¶
Update the README.rst
file¶
Change the version in the README.rst
.
Update the CHANGELOG.rst
file¶
- Update the
CHANGELOG.rst
file to reflect the new version. - Create a new commit with this version.
- Do not create a git tag as this will be done by Maven below.
Prepare the Release using Maven¶
mvn release:prepare
Answer with the default everywhere but use “vMAJOR.MINOR” for giving the tag name, e.g. “v0.15”. Eventually, this will update the versions, create a tag for the version and also push the tag to Github.
Update README CHANGELOG¶
Open README.md and CHANGELOG.md and adjust the files to include the header for the next SNAPSHOT
version.
Maven comments¶
mvn versions:set
is useful for bumping versions
Quick Example¶
The following snippet will load the hp.obo
file (which can be downloaded from the
HPO website) into an Ontology
object. The HPO
has multiple subontologies, and the following code extracts all of the terms
of the Phenotypic Abnormality subontology.
String hpoOboFilePath=...; // initialize to path of hp.obo file Ontology ontology = OntologyLoader.loadOntology(new File(hpoOboFilePath)); final TermId PHENOTYPIC_ABNORMALITY = TermId.of("HP:0000118"); for (TermId tid : getDescendents(ontology, PHENOTYPIC_ABNORMALITY)) { Term hpoTerm = ontology.getTermMap().get(tid); // ... do something with the Term }
History¶
Phenolib was forked from Ontolib in April 2018, and since has been extensively refactored and extended to provide additional functionality for working with phenotype annotations. Several classes from the Ontologizer, which we initially programmed in Java 1.5, have been refactored to support Gene Ontology overrepresentation analysis (see Bauer et al., 2008).
Feedback¶
The best place to leave feedback, ask questions, and report bugs is the Phenol Issue Tracker.