Automatic classification of RDoC positive valence severity with a neural network

[Display omitted] •We trained a machine learning-based system to determine psychiatric symptom severity.•Regularization and feature selection via mutual information reduced overfitting.•Increasing the amount of annotated data increased accuracy by several percent. Our objective was to develop a mach...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2017-11, Vol.75, p.S120-S128
Hauptverfasser: Clark, Cheryl, Wellner, Ben, Davis, Rachel, Aberdeen, John, Hirschman, Lynette
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •We trained a machine learning-based system to determine psychiatric symptom severity.•Regularization and feature selection via mutual information reduced overfitting.•Increasing the amount of annotated data increased accuracy by several percent. Our objective was to develop a machine learning-based system to determine the severity of Positive Valance symptoms for a patient, based on information included in their initial psychiatric evaluation. Severity was rated on an ordinal scale of 0–3 as follows: 0 (absent=no symptoms), 1 (mild=modest significance), 2 (moderate=requires treatment), 3 (severe=causes substantial impairment) by experts. We treated the task of assigning Positive Valence severity as a text classification problem. During development, we experimented with regularized multinomial logistic regression classifiers, gradient boosted trees, and feedforward, fully-connected neural networks. We found both regularization and feature selection via mutual information to be very important in preventing models from overfitting the data. Our best configuration was a neural network with three fully connected hidden layers with rectified linear unit activations. Our best performing system achieved a score of 77.86%. The evaluation metric is an inverse normalization of the Mean Absolute Error presented as a percentage number between 0 and 100, where 100 means the highest performance. Error analysis showed that 90% of the system errors involved neighboring severity categories. Machine learning text classification techniques with feature selection can be trained to recognize broad differences in Positive Valence symptom severity with a modest amount of training data (in this case 600 documents, 167 of which were unannotated). An increase in the amount of annotated data can increase accuracy of symptom severity classification by several percentage points. Additional features and/or a larger training corpus may further improve accuracy.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2017.07.005