Genome Informatics
- Construction of an AI system based on machine learning
- Construction of a database carrying genome, medical and clinical information
- Technology for the visualization and verbalization of complex data
- High performance computing for biomedical research
Statistical genetics software based on machine learning for the visualization and verbalization of data
We are building a genome information database that extracts genomic data collected from high-performance sequencers (NGS) and high density arrays. This database is to be interfaced with medical clinical information to realize a new information processing system that integrates these different data.
Figure 1
We will use AI (Artificial Intelligence) and HPC (High-Performance Computing) based on machine learning to study these integrated data and learn about the patient state for better diagnostics and treatment. This work will describe relationships between different factors as mathematical models to further reveal unknown factors that contribute to the patient state. These efforts will require the development of high-performance information processing technology to handle the large amount of data, which is another project of the lab.
Figure 2
At present, our processing performance has reached thousands of samples of clinical data. We have developed software to statistically analyze the relevance of genomic components such as haplotype and transcripts on patient groups. The findings are being shared with clinical researchers to study a wide range of diseases.
Figure 3
Much of this work depends on machine learning and AI that will be the basis for new mathematical models and allow us to comprehensively examine and predict relationships in the data. Data visualization will be used to describe the relationships quantitatively.