TY - GEN
T1 - Clustering and topic modeling over tweets
T2 - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
AU - Lossio-Ventura, Juan Antonio
AU - Morzan, Juandiego
AU - Alatrista-Salas, Hugo
AU - Hernandez-Boussard, Tina
AU - Bian, Jiang
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.
AB - Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.
KW - Twitter
KW - clustering
KW - internal cluster indexes
KW - natural language processing
KW - topic modeling
UR - http://www.scopus.com/inward/record.url?scp=85084332074&partnerID=8YFLogxK
U2 - 10.1109/BIBM47256.2019.8983167
DO - 10.1109/BIBM47256.2019.8983167
M3 - Conference contribution
AN - SCOPUS:85084332074
T3 - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
SP - 1544
EP - 1547
BT - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
A2 - Yoo, Illhoi
A2 - Bi, Jinbo
A2 - Hu, Xiaohua Tony
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 November 2019 through 21 November 2019
ER -