Classification of News on "Radar" Tarakan Online Using K-Nearest Neighbor Method with N-Gram Features
Classification or text categorization is one of the most common themes in analysing complex data. The classification in this research aims to define the class of an object that has not been known to the class. The classification process will be performed in the process of learning and testing agains...
Gespeichert in:
Veröffentlicht in: | IOP conference series. Materials Science and Engineering 2019-11, Vol.676 (1), p.12008 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Classification or text categorization is one of the most common themes in analysing complex data. The classification in this research aims to define the class of an object that has not been known to the class. The classification process will be performed in the process of learning and testing against a known class dataset object. As an online news information, Radar Tarakan classifies news to ease readers to find the desired news and to ease the work of the administrators to do the automatic classification. The website of Radar Tarakan daily news (www.radartarakan.co.id) publishes approximately 480 local news everyday. Local news is the news in the surrounding areas of Tarakan in North Borneo. The website contains the classification of eight local news including the dimensions of the area surrounding the town of Tarakan. This research uses 720 news to build the system of news classification automatically. A text classification method used in this research is the K-Nearest Neighbor (KNN) with N-grams. The accuracy of matching classes generated by the K-Nearest Neighbor with N-Gram is 85.65% (out of 180 test data) and that of the sub class is 70.35 %. The classification time is around 2.7 second per news. |
---|---|
ISSN: | 1757-8981 1757-899X |
DOI: | 10.1088/1757-899X/676/1/012008 |