Bayesian theorem-based short text classification data set correction method and system

The invention provides a Bayesian theorem-based short text classification data set correction method and system, and the method comprises the steps: obtaining a to-be-corrected data set; performing coded representation on the text content of the to-be-corrected data set; obtaining sample smoothing p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	GUO HAOLIANG, LIU KAI
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention provides a Bayesian theorem-based short text classification data set correction method and system, and the method comprises the steps: obtaining a to-be-corrected data set; performing coded representation on the text content of the to-be-corrected data set; obtaining sample smoothing parameters of the plurality of sample categories; according to a sample smoothing parameter, smoothing the word frequency of the to-be-corrected data set after coding representation; according to the Bayesian theorem and the word frequency after data smoothing, obtaining logarithm likelihoods of the sample data belonging to different sample categories; setting a predetermined label modification condition; and modifying the sample data meeting the conditions. The technical problems that in the prior art, labeled information is not fully mined, a label labeling result still completely depends on people and largely depends on manual participation, internet data which is inaccurately and incorrectly labeled is difficult