Our research entails the development and application of data integration approach to improve the ability to diagnose, treat, and prevent complex diseases. Our primary focus lies in integrating multi-omics data and biological knowledge to better translate genomic and phenotypic data derived from electronic health records (EHR) into clinical products. Our past projects have been both theoretical and applied, and they include 1) developing novel data integration methods that combine multi-omics using machine/deep learning approaches; 2) predicting clinical outcomes based on interactions between multi-omic features; 3) integrating multi-modal imaging and multi-omics data; 4) identifying gene-by-environment (GxE) interactions in several phenotypes/diseases; and 5) constructing a disease-disease network using EHR-linked biobank data. We plan to continue our work in these areas, focusing primarily on providing actionable clinical products based on inter-plays within/between different dimensional genomic data. In particular, our long-term research goal is to develop and evaluate sophisticated data integration methods that simultaneously combine peoples’ individual variations in genomic (‘omic) data, imaging data, phenotype data from EHR, and environment/lifelog data for advancing precision medicine. Further, artificial intelligence (AI)-driven approaches would help to provide better healthcare.

Lecture - Translational research, via data integration

Why Data Integration?

A central problem in translational research is that it’s difficult to understand the genetic architecture of a complex disease. Many genome-wide association studies (GWAS) exist—but most of the genetic loci that have been identified have negligible effects: Heritability, in many cases, isn’t found. That’s why we need alternative strategies to find underlying disease causes, said Assistant Professor of Informatics Dokyoon Kim, PhD—and a systems genomics approach, using data integration, can get the job done.

One Math Model to Explain Disease Phenotype

With the traditional approach, you can analyze a beautiful multi-omics data set, Dr. Kim continued—but you must handle the associations one pair at a time.  So he asked himself: Can I integrate everything into one mathematical model to predict outcome or explain disease phenotype?

Integrative Omics & Biomedical Informatics Laboratory

Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA


© 2016 Dokyoon Kim, All Rights Reserved