Ph.D. Defense: “Kernel-based empirical bayesian classification method with applications to protein phosphorylation and non-coding RNA”, Mark Menor

Kernel-based empirical bayesian classification method with applications to protein phosphorylation and non-coding RNA

Mark Menor


Monday, June 2, 10:00am, POST 302

Abstract: With the advancement of high-throughput sequencing technologies, a new era of ‘big data’ biological research has dawned. However, the abundance of biological data presents many challenges in their analysis and it has proven very difficult to extract impor- tant information out of the data. One approach to this problem is to use the methods of machine learning.

In this dissertation, we describe novel probabilistic kernel-based learning methods and demonstrate their practical applicability by solving major bioinformatics problems at the transcriptome and proteome levels where the resulting tools are expected to help biologists further elucidate the important information contained in their data.

The proposed binary classification method, the Classification Relevance Units Machine (CRUM), employs the theory of kernel and empirical Bayesian methods to achieve non-linear classification and high generalization. We demonstrate the practical applicabil- ity of CRUM by applying it to the prediction of protein phosphorylation sites, which helps explain the mechanisms that control many biochemical processes.

Then we develop an extension of CRUM to solve multiclass problems, called the Multiclass Relevance Units Machine (McRUM). McRUM uses the error correcting output codes framework to decompose a multiclass problem into a set of binary problems. We devise a linear-time algorithm to aggregate the results into the final probabilistic multiclass prediction to allow for predictions in large scale applications. We demonstrate the practical applicability of the McRUM through a solution to the identification of mature microRNA (miRNA) and piwi-interacting RNA (piRNA) in small RNA sequencing datasets. This provides biologists a tool to help discover novel miRNA and piRNA to further understand the molecular processes of the organisms they study.

Committee: Kyungim Baek (Chairperson), Guylaine Poisson (Chairperson), Henri Casanova, Scott Robertson, and Gernot Presting