Fundamental algorithmic research
Feature selection in machine learning
The selection of a subset of relevant features from potentially huge initial feature step is an important topic in
machine learning. Sometimes the choice of the feature subset may be even more important than the learning model
that is chosen to achieve the best results. My research focuses on selection methods that are able to deal with both
(i) large feature sets, and (ii) feature dependencies. A more recent topic of investigation is the use of feature
selection for clustering, a non-trivial and challenging topic that is gaining more and more attention from
the scientific community.
Modelling gene networks using different sources of data
Modelling the interactions between genes remains a difficult research topic, as often the starting data is quite
noisy and it is difficult to evaluate the obtained results. To minimise the amount of error in the results and
get to more reliable models, different sources of data need to be combined (sequence data (motifs), expression data,
interaction data,...). However, rigorous mathematical techniques to combine and reason with these different types
of data are lacking, and hence present a great opportunity for research.
Mathematical models for gene splicing
Gene splicing is a very intricate and tightly regulated process in the cell. However, computational models for
recognizing splice sites are still far from being perfect. A particular difficult issue from a machine learing
point of view is the large amount of negative examples that occur in genomes. Therefore, additional submodels
(e.g. branch point model) should be designed and evaluated to increase overall performance.
Applied research
Feature selection for classification of nucleic acid sequences
The application of feature selection to different recognition problems related to gene recognition/genome annotation
can provide new biological insights in how some processes work. In addition, looking for a core set of relevant
features can improve model robustness and increase classification performance.
Feature selection for promoter prediction
The computational identification of promoter regions on a genomic scale is still in its childhood. To improve the models
that are used to locate promoters, one should first have some knowledge about which characteristics differentiate promoter
regions from other genomic regions. The application of feature selection techniques can aid in finding new features that
are important for promoter modelling.
Gene and genome annotation
Our team is involved in the genome annotation of several organisms. To do this job properly, advanced modelling techniques
are needed to find and combine the different signals in the gene. We are developping software for the recognition of the
most important gene features (start/stop codon and splice sites).
Hardware-based speed up of bioinformatics algorithms
As bioinformatics databases are increasing at an exponential rate, there is a need for fast implementations of very common
algorithms (such as alignemnt). In this research project, we are experimenting with the implementation of several common
bioinformatics algorithms in parallel, using specialised hardware (FPGA).