Clustering and topic modeling over tweets: A comparison over a health dataset

Juan Antonio Lossio-Ventura, Juandiego Morzan, Hugo Alatrista-Salas, Tina Hernandez-Boussard, Jiang Bian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
EditorsIllhoi Yoo, Jinbo Bi, Xiaohua Tony Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1544-1547
Number of pages4
ISBN (Electronic)9781728118673
DOIs
StatePublished - Nov 2019
Externally publishedYes
Event2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 - San Diego, United States
Duration: 18 Nov 201921 Nov 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

Conference

Conference2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
Country/TerritoryUnited States
CitySan Diego
Period18/11/1921/11/19

Keywords

  • Twitter
  • clustering
  • internal cluster indexes
  • natural language processing
  • topic modeling

Fingerprint

Dive into the research topics of 'Clustering and topic modeling over tweets: A comparison over a health dataset'. Together they form a unique fingerprint.

Cite this