Bayesian theorem-based short text classification data set correction method and system
The invention provides a Bayesian theorem-based short text classification data set correction method and system, and the method comprises the steps: obtaining a to-be-corrected data set; performing coded representation on the text content of the to-be-corrected data set; obtaining sample smoothing p...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a Bayesian theorem-based short text classification data set correction method and system, and the method comprises the steps: obtaining a to-be-corrected data set; performing coded representation on the text content of the to-be-corrected data set; obtaining sample smoothing parameters of the plurality of sample categories; according to a sample smoothing parameter, smoothing the word frequency of the to-be-corrected data set after coding representation; according to the Bayesian theorem and the word frequency after data smoothing, obtaining logarithm likelihoods of the sample data belonging to different sample categories; setting a predetermined label modification condition; and modifying the sample data meeting the conditions. The technical problems that in the prior art, labeled information is not fully mined, a label labeling result still completely depends on people and largely depends on manual participation, internet data which is inaccurately and incorrectly labeled is difficult |
---|