Comparative analysis of LDA, LSA and NMF topic modelling for web data
Whenever the user enters any website, the URL, timestamp, client IP-address and other information are stored in the web log file. This information can be further analysed, and useful information can be extracted. Processing the entire web log document files is a difficult process which hinders the p...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Whenever the user enters any website, the URL, timestamp, client IP-address and other information are stored in the web log file. This information can be further analysed, and useful information can be extracted. Processing the entire web log document files is a difficult process which hinders the performance. For example, if there are 2000 documents and each of these documents have 500 words in them then to process the entire set of documents requires 500 * 2000 = 1000000 threads. So, to avoid this if we divide the document into documents having topics for example, number of topics = 3, then processing it requires just 3 * 500 words = 1500 threads. Hence, this work proposes a comparative analysis which employs Topic Modelling Methods like LDA, LSA, NMF to extract the hidden features from the web log data. |
---|---|
ISSN: | 0094-243X 1551-7616 |
DOI: | 10.1063/5.0178761 |