Yvan Saeys

Fundamental algorithmic research

  • Feature selection in machine learning

    The selection of a subset of relevant features from potentially huge initial feature step is an important topic in machine learning. Sometimes the choice of the feature subset may be even more important than the learning model that is chosen to achieve the best results. My research focuses on selection methods that are able to deal with both (i) large feature sets, and (ii) feature dependencies. A more recent topic of investigation is the use of feature selection for clustering, a non-trivial and challenging topic that is gaining more and more attention from the scientific community.

  • Modelling gene networks using different sources of data

    Modelling the interactions between genes remains a difficult research topic, as often the starting data is quite noisy and it is difficult to evaluate the obtained results. To minimise the amount of error in the results and get to more reliable models, different sources of data need to be combined (sequence data (motifs), expression data, interaction data,...). However, rigorous mathematical techniques to combine and reason with these different types of data are lacking, and hence present a great opportunity for research.

  • Mathematical models for gene splicing

    Gene splicing is a very intricate and tightly regulated process in the cell. However, computational models for recognizing splice sites are still far from being perfect. A particular difficult issue from a machine learing point of view is the large amount of negative examples that occur in genomes. Therefore, additional submodels (e.g. branch point model) should be designed and evaluated to increase overall performance.


    Applied research

  • Feature selection for classification of nucleic acid sequences

    The application of feature selection to different recognition problems related to gene recognition/genome annotation can provide new biological insights in how some processes work. In addition, looking for a core set of relevant features can improve model robustness and increase classification performance.

  • Feature selection for promoter prediction

    The computational identification of promoter regions on a genomic scale is still in its childhood. To improve the models that are used to locate promoters, one should first have some knowledge about which characteristics differentiate promoter regions from other genomic regions. The application of feature selection techniques can aid in finding new features that are important for promoter modelling.

  • Gene and genome annotation

    Our team is involved in the genome annotation of several organisms. To do this job properly, advanced modelling techniques are needed to find and combine the different signals in the gene. We are developping software for the recognition of the most important gene features (start/stop codon and splice sites).

  • Hardware-based speed up of bioinformatics algorithms

    As bioinformatics databases are increasing at an exponential rate, there is a need for fast implementations of very common algorithms (such as alignemnt). In this research project, we are experimenting with the implementation of several common bioinformatics algorithms in parallel, using specialised hardware (FPGA).