Bayesian theorem-based short text classification data set correction method and system

The invention provides a Bayesian theorem-based short text classification data set correction method and system, and the method comprises the steps: obtaining a to-be-corrected data set; performing coded representation on the text content of the to-be-corrected data set; obtaining sample smoothing p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: GUO HAOLIANG, LIU KAI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a Bayesian theorem-based short text classification data set correction method and system, and the method comprises the steps: obtaining a to-be-corrected data set; performing coded representation on the text content of the to-be-corrected data set; obtaining sample smoothing parameters of the plurality of sample categories; according to a sample smoothing parameter, smoothing the word frequency of the to-be-corrected data set after coding representation; according to the Bayesian theorem and the word frequency after data smoothing, obtaining logarithm likelihoods of the sample data belonging to different sample categories; setting a predetermined label modification condition; and modifying the sample data meeting the conditions. The technical problems that in the prior art, labeled information is not fully mined, a label labeling result still completely depends on people and largely depends on manual participation, internet data which is inaccurately and incorrectly labeled is difficult