TY - GEN
T1 - A novel ensemble method for high-dimensional genomic data classification
AU - Espichan, Alexandra
AU - Villanueva, Edwin
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/21
Y1 - 2019/1/21
N2 - Classifier ensembles have shown to be an attractive approach for dealing with the curse of dimensionality problems in genomic data. The common idea of this approach is to integrate diverse and accurate base predictors in order to obtain a classification system better than its members. Many methods pursue it by introducing perturbations in some aspect of the learning process (examples, features, base learners, etc.). However, many of the existing methodologies do so in a completely random way, without having control of the perturbation process, which can generate unhelpful base predictors that can affect the final performance or the need to use some pruning strategy. In this paper we introduce tEnsemble, a new and simple approach that seeks an adequate balance between diversity and accuracy. This is done by using a previously optimized template feature set, which serves to guide the perturbation process on the feature space in a controlled manner. Experiments carried out on 39 gene expression public data sets showed that this methodology has the potential to produce effective classifier ensemble systems, showing a frequent superiority in relation to Random Forest, a well-established methodology in the area.
AB - Classifier ensembles have shown to be an attractive approach for dealing with the curse of dimensionality problems in genomic data. The common idea of this approach is to integrate diverse and accurate base predictors in order to obtain a classification system better than its members. Many methods pursue it by introducing perturbations in some aspect of the learning process (examples, features, base learners, etc.). However, many of the existing methodologies do so in a completely random way, without having control of the perturbation process, which can generate unhelpful base predictors that can affect the final performance or the need to use some pruning strategy. In this paper we introduce tEnsemble, a new and simple approach that seeks an adequate balance between diversity and accuracy. This is done by using a previously optimized template feature set, which serves to guide the perturbation process on the feature space in a controlled manner. Experiments carried out on 39 gene expression public data sets showed that this methodology has the potential to produce effective classifier ensemble systems, showing a frequent superiority in relation to Random Forest, a well-established methodology in the area.
KW - cancer classification
KW - Ensemble learning
KW - gene expression data classification
KW - high-dimensional genomic data
UR - http://www.scopus.com/inward/record.url?scp=85062519039&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2018.8621386
DO - 10.1109/BIBM.2018.8621386
M3 - Conference contribution
AN - SCOPUS:85062519039
T3 - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
SP - 2229
EP - 2236
BT - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
A2 - Schmidt, Harald
A2 - Griol, David
A2 - Wang, Haiying
A2 - Baumbach, Jan
A2 - Zheng, Huiru
A2 - Callejas, Zoraida
A2 - Hu, Xiaohua
A2 - Dickerson, Julie
A2 - Zhang, Le
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
Y2 - 3 December 2018 through 6 December 2018
ER -