Comparative analysis of LDA, LSA and NMF topic modelling for web data

Whenever the user enters any website, the URL, timestamp, client IP-address and other information are stored in the web log file. This information can be further analysed, and useful information can be extracted. Processing the entire web log document files is a difficult process which hinders the p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shastry, Pooja, Prakash, C. O.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Whenever the user enters any website, the URL, timestamp, client IP-address and other information are stored in the web log file. This information can be further analysed, and useful information can be extracted. Processing the entire web log document files is a difficult process which hinders the performance. For example, if there are 2000 documents and each of these documents have 500 words in them then to process the entire set of documents requires 500 * 2000 = 1000000 threads. So, to avoid this if we divide the document into documents having topics for example, number of topics = 3, then processing it requires just 3 * 500 words = 1500 threads. Hence, this work proposes a comparative analysis which employs Topic Modelling Methods like LDA, LSA, NMF to extract the hidden features from the web log data.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0178761