TY - GEN
T1 - Quality data extraction methodology based on the labeling of coffee leaves with nutritional deficiencies
AU - Jungbluth, Adolfo
AU - Yeng, Jon Li
AU - Vives, Luis
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/4/9
Y1 - 2018/4/9
N2 - Nutritional deficiencies detection for coffee leaves is a task which is often undertaken manually by experts on the field known as agronomists. The process they follow to carry this task is based on observation of the different characteristics of the coffee leaves while relying on their own experience. Visual fatigue and human error in this empiric approach cause leaves to be incorrectly labeled and thus affecting the quality of the data obtained. In this context, different crowdsourcing approaches can be applied to enhance the quality of the data extracted. These approaches separately propose the use of voting systems, association rule filters and evolutive learning. In this paper, we extend the use of association rule filters and evolutive approach by combining them in a methodology to enhance the quality of the data while guiding the users during the main stages of data extraction tasks. Moreover, our methodology proposes a reward component to engage users and keep them motivated during the crowdsourcing tasks. The extracted dataset by applying our proposed methodology in a case study on Peruvian coffee leaves resulted in 93.33% accuracy with 30 instances collected by 8 experts and evaluated by 2 agronomic engineers with background on coffee leaves. The accuracy of the dataset was higher than independently implementing the evolutive feedback strategy and an empiric approach which resulted in 86.67% and 70% accuracy respectively under the same conditions.
AB - Nutritional deficiencies detection for coffee leaves is a task which is often undertaken manually by experts on the field known as agronomists. The process they follow to carry this task is based on observation of the different characteristics of the coffee leaves while relying on their own experience. Visual fatigue and human error in this empiric approach cause leaves to be incorrectly labeled and thus affecting the quality of the data obtained. In this context, different crowdsourcing approaches can be applied to enhance the quality of the data extracted. These approaches separately propose the use of voting systems, association rule filters and evolutive learning. In this paper, we extend the use of association rule filters and evolutive approach by combining them in a methodology to enhance the quality of the data while guiding the users during the main stages of data extraction tasks. Moreover, our methodology proposes a reward component to engage users and keep them motivated during the crowdsourcing tasks. The extracted dataset by applying our proposed methodology in a case study on Peruvian coffee leaves resulted in 93.33% accuracy with 30 instances collected by 8 experts and evaluated by 2 agronomic engineers with background on coffee leaves. The accuracy of the dataset was higher than independently implementing the evolutive feedback strategy and an empiric approach which resulted in 86.67% and 70% accuracy respectively under the same conditions.
KW - Data extraction
KW - Data quality assessment
KW - Quality data extraction methodology
UR - http://www.scopus.com/inward/record.url?scp=85050084409&partnerID=8YFLogxK
U2 - 10.1145/3206098.3206102
DO - 10.1145/3206098.3206102
M3 - Conference contribution
AN - SCOPUS:85050084409
T3 - ACM International Conference Proceeding Series
SP - 59
EP - 64
BT - ICISDM 2018 - 2nd International Conference on Information System and Data Mining
PB - Association for Computing Machinery
T2 - 2nd International Conference on Information System and Data Mining, ICISDM 2018
Y2 - 9 April 2018 through 11 April 2018
ER -