An Efficient Framework for Web Content Mining Systems Using Improved CD-PAM Clustering and the A-CNN Technique
The World Wide Web's expansion (WWW) has made finding appropriate information difficult, and web classification has emerged as an alternative approach to support effective information retrieval. The main problem addressed in this research is the need for an efficient and accurate web content mi...
Gespeichert in:
Veröffentlicht in: | SN computer science 2023-09, Vol.4 (5), p.692, Article 692 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The World Wide Web's expansion (WWW) has made finding appropriate information difficult, and web classification has emerged as an alternative approach to support effective information retrieval. The main problem addressed in this research is the need for an efficient and accurate web content mining system. This research proposes a new framework that combines cosine distance-based partitioning around Medoid (CD-PAM) clustering and ANOVA-Convolutional Neural Network (A-CNN) techniques to develop an efficient and accurate web content mining system. CD-PAM clustering is used to group similar, based on the content of their web pages, while A-CNN extracts relevant features. The CD-PAM clustering strategy is used to cluster similar, based on the content of their web pages, whereas pertinent features from the web pages are extracted using the A-CNN model. The combined approach is expected to increase the precision and effectiveness of web content mining by minimising the quantity of irrelevant web pages. Novelty of the proposed framework lies in the integration of CD-PAM clustering and A-CNN techniques, which have not been combined before in the context of web content mining. The authors claim that this novel framework achieves better results in terms of accuracy and efficiency compared to existing methods. Overall, the proposed framework aims to address the limitations of existing web content mining systems by providing an efficient and accurate solution for extracting valuable information from web data. |
---|---|
ISSN: | 2661-8907 2662-995X 2661-8907 |
DOI: | 10.1007/s42979-023-02137-w |