TY - GEN
T1 - Classification algorithms for big data analysis, a map reduce approach
AU - Ayma, V. A.
AU - Ferreira, R. S.
AU - Happ, P.
AU - Oliveira, D.
AU - Feitosa, R.
AU - Costa, G.
AU - Plaza, A.
AU - Gamba, P.
PY - 2015
Y1 - 2015
N2 - Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.
AB - Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.
KW - Big data
KW - Classification algorithms
KW - Cloud computing
KW - Hadoop
KW - MapReduce framework
UR - http://www.scopus.com/inward/record.url?scp=84925352144&partnerID=8YFLogxK
U2 - 10.5194/isprsarchives-XL-3-W2-17-2015
DO - 10.5194/isprsarchives-XL-3-W2-17-2015
M3 - Conference contribution
AN - SCOPUS:84925352144
T3 - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives
SP - 17
EP - 21
BT - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives
A2 - Stilla, Uwe
A2 - Heipke, Christian
PB - International Society for Photogrammetry and Remote Sensing
T2 - Joint ISPRS Conference on Photogrammetric Image Analysis, PIA 2015 and High Resolution Earth Imaging for Geospatial Information, HRIGI 2015
Y2 - 25 March 2015 through 27 March 2015
ER -