Human-Disease Phenotype Map

This is the phenotype connectivity map from one of the largest PheWAS using electronic health record (EHR)-derived phenotypes across 38,682 unrelated samples from the Geisinger’s MyCode Community Health Initiative genotyped through the DiscovEHR project. Click on each disease node to highlight other diseases found to be associated with this disease via SNPs.

Human-disease phenotype map derived from PheWAS across 38,682 individuals, American Journal of Human Genetics, 2019 [PubMed]

MildInt: Deep learning-based multimodal longitudinal data integration framework

The python package MildInt (Deep learning-based Multimodal longitudinal data integration framework) provides the pre-constructed deep learning architecture for a classification task. MildInt contains two learning phases: learning feature representation from each modality of data and training a classifier for the final decision. Adopting deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, linear regression classifier is used for detecting and investigating biomarkers from multimodal data. Thus, by combining the linear model and deep learning model higher accuracy and better interpretability can be achieved. MildInt is capable of integrating multiple forms of numerical data including time series and non-time series data for extracting complementary features from the multimodal dataset.

Frontiers in Genetics, 2019 [PubMed]

iDRW: integrative directed random walk-based pathway activity inference method

We propose a general framework for integrative pathway activity inference on the multi-omics network and investigate multiple scenarios of the multi-layered gene-gene graph construction that can be applied to various datasets. To reflect the interaction effects between multi-omics data, we designed a directed gene-gene graph using pathway information by assigning interactions between genes in multiple layers of networks. The proposed method selects cooperative driver pathways and predicts overall survival (OS) or metastasis. As a proof-of-concept study, it was evaluated using three genomic profiles of urologic cancer patients. iDRW is implemented as the R software package. 

HiG2Vec: Hierarchical representations of gene ontology and gene

Using the knowledge from Gene Ontology (GO) and annotation, the manipulation can be mainly done by using vector-representation of GO terms and genes for versatile applications like deep learning approach. We propose hierarchical representations of gene ontology and gene (HiG2Vec) that applies Poincare embedding specialized in the representation of hierarchy through two-step procedures: GO embedding and gene embedding. Experimental results indicate that HiG2Vec has superiority at capturing the GO and gene semantics and utilization of data, and has robustness to be able to apply to manipulate various biological knowledge.


To improve the accessibility of the visualization of shared genetic components across phenotypes, we developed the humaN-disEase phenoType MAp GEnerator (NETMAGE), a web-based tool that produces interactive phenotype network visualizations from summarized PheWAS results. Users can search the map by a variety of attributes, and they can select nodes to view information such as related phenotypes, associated SNPs, and other network statistics. As a test case, we constructed a network using UK BioBank PheWAS summary data. By examining the associations between phenotypes in our map, we can potentially identify novel instances of pleiotropy, where loci influence multiple phenotypic traits. Thus, our tool provides researchers with a means to identify prospective genetic targets for drug design, contributing to the exploration of personalized medicine.


Coming soon!

Integrative Omics & Biomedical Informatics Laboratory

Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA


© 2016 Dokyoon Kim, All Rights Reserved