Research

Bioinformatics and Computational Biology

Big data analysis and biomedical research meet in our lab: We develop novel data mining algorithms for detecting patterns and statistical dependencies in large datasets from the life sciences.

The ultimate goal in our work is to contribute to two big goals of science in the 21st century: To enable the automatic generation of new knowledge from big data through machine learning, and to help to gain an understanding of the relationship between diseases and molecular properties of patients, thereby enabling precision medicine.

Below you can find further information for some of our projects:

Machine Learning: Comparing Structured Data

We develop methods for comparing and classifying high-dimensional objects. One prominent example are graph kernels, i.e. efficient distance functions between graphs.

Graph Kernels (Code and Data)
A Confounder-Corrected Support Vector Machine Classifier (ccSVM)

Machine Learning: High-Dimensional Correlations

We develop methods for measuring statistical dependence between high dimensional variables, two-sample tests to tell whether two samples were drawn from the same distribution, outlier detection algorithms to tell find "unusual" observations in a given dataset, and approaches that detect non-linear dependence between variables.

Kernel Method for the Two Sample Problem
(Gretton, A., Borgwardt, K., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A Kernel Two-Sample Test. Journal of Machine Learning Research, 13(25), 723-773.)
Detecting Non-Linear Correlations via the Mutual Information Dimension
(Sugiyama, M., & Borgwardt, K. (2013). Measuring Statistical Dependence via the Mutual Information Dimension. Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), 1692-1698.)
Rapid Outlier Detection via Sampling
(Sugiyama, M., & Borgwardt, K. (2013). Rapid Distance-Based Outlier Detection via Sampling. Advances in Neural Information Processing Systems 26 (NIPS 2013), 467-475.)

Machine Learning: Significant Pattern Mining

We develop methods that discover significant patterns in high dimensional datasets while being runtime efficient and statistically sound. Our algorithms can be applied to graphs or collections of sequences and allow to account for dependencies between objects, to control the Family-Wise Error Rate and to correct for categorical covariates.

Overview page on Significant Pattern Mining
Significant Pattern Mining (Westfall-Young Light)
(Llinares-López, F., Sugiyama, M., Papaxanthos, L., & Borgwardt, K. (2015). Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing. KDD '15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 725-734. doi:10.1145/2783258.2783363.)
Finding Genomic Intervals of Genetic Heterogeneity (FAIS)
(Llinares-López, F., Grimm, D. G., Bodenham, D. A., Gieraths, U., Sugiyama, M., Rowan, B., et al. (2015). Genome-wide detection of intervals of genetic heterogeneity associated with complex traits. Bioinformatics, 31(12), i240-i249. doi:10.1093/bioinformatics/btv263.)
Significant Pattern Mining with Covariates (FACS)
(Papaxanthos, L., Llinares-López, F., Bodenham, D., & Borgwardt, K. (2016). Finding significant combinations of features in the presence of categorical covariates. Advances in Neural Information Processing Systems 29 (NIPS 2016), 2271-2279.)

Computational Biology: Genome-Wide Association Studies

We develop efficient multivariate approaches for the genome-wide discovery of genetic loci that are associated with a phenotype, thereby trying to elucidate the multicausal basis of complex traits.

Computational Biology: Genome Annotation

We have developed methods for detecting genomic insertions and deletions using next-generation sequencing, and thoroughly assessed the difficulty of comparing the performance of variant pathogenicity prediction tools.

Computational Biology: Molecular Graph Classification via Graph Kernels

We developed new, fast and scalable similarity measures on graphs, so-called graph kernels. Their prime purpose is to compare molecular graphs or protein structures and to classify them into functional categories.

Personalized Medicine

We have coordinated several national and international networks on personalized medicine: