Knowledge discovery and data mining (KDD) is a multidisciplinary effort to extract nuggets of knowledge from data.
Researchers in data mining and knowledge discovery see the need for mining nuggets of knowledge from massive data, finding `patterns' of data for various purposes such as classification, prediction, summarization, and planning.
Reducing the number of dimensions by selecting features has proven to be efficient and effective in dealing with high-dimensional data, and constitutes an important preprocessing step in many areas such as statistics, pattern recognition, machine learning, and data mining. Important advantages of feature selection include the ability to construct better models, designing faster and more cost-effective models and gain more insight in the processes that are described by the data.
During the last years, more and more high-dimensional datasets are emerging, in some cases consisting of only a limited number of samples, while in other cases giving rise to very large datasets. Furthermore, new mining challenges like web and text mining, social and biological networks, as well as many other tasks give rise to datasets consisting of more complex structures (e.g. graphs, strings, documents,..) than the traditional vectorial representations. In order to deal with the specificities of
these new data structures, feature selection methodologies need to be adapted or new methodologies need to be developed.