Exploring New Horizons in Word Sense Disambiguation and Topic Modeling: Potential of Deep Learning Based Transformers Models

Küçük Resim Yok

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer Nature

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Research Problem: The chapter discusses the limitations of traditional topic modeling algorithms in capturing complex relationships and contextual meaning of words and the need for a more precise understanding of individual senses of words in Word Sense Disambiguation (WSD). This chapter explores various approaches such as using lexical resources like WordNet and leveraging word embeddings and pre-trained language models like Bidirectional Encoder Representations from Transformers (BERT) to incorporate semantic knowledge into topic modeling. This chapter also explains the advantages of using a phrase embeddings-based approach for WSD tasks in topic modeling. Methodology: The methodology used in this paper uses Phrase-BERT, which fine-tunes BERT for phrase-level representation learning. A dataset of phrasal paraphrase pairs and phrases in context is created and used to fine-tune the model. The resulting Phrase-BERT embeddings are incorporated into a Phrase-based Neural Topic Model (PNTM), which interprets topics as mixtures of words and phrases. PNTM is compared to other topic model baselines and found to have superior performance in terms of both topic coherence and topic-to-document relatedness. Airlines reviews dataset which is a collection of customer reviews and feedback related to various airlines is used for the experimentations. Findings: The first finding demonstrates the good precision but poor recall of Phrase-BERT for WSD on the Airlines reviews dataset. Further research is needed to improve its recall. The second finding shows the potentials of Phrase-BERT PNTM for generating coherent and interpretable topics, which were evaluated using coherence scores and human evaluation through card sorting exercises. Phrase-BERT PNTM outperformed Classical LDA but was slightly outperformed by BertTopic in coherence scores. However, human evaluations showed good consistency and accuracy for the topics generated by Phrase-BERT PNTM. Significance: Phrase-BERT has great potential for improving the performance of WSD and topic modeling. © The Editor(s) (if applicable) and The Author(s), under exclusive licence to Springer Nature Switzerland AG 2024.

Açıklama

Anahtar Kelimeler

[Keyword Not Available]

Kaynak

Digital Humanities Looking at the World: Exploring Innovative Approaches and Contributions to Society

WoS Q Değeri

Scopus Q Değeri

N/A

Cilt

Sayı

Künye