CSR-SVM: Compositional semantic representation for intelligent identification of engineering change documents based on SVM
•A compositional semantic representation (CSR) combined with SVM is proposed.•CSR provides sufficient semantic representation for SVM in an efficient way.•CSR can maintain prominent semantic features and retain global semantic features.•A custom domain dictionary for recognizing unregistered words i...
Gespeichert in:
Veröffentlicht in: | Advanced engineering informatics 2023-08, Vol.57, p.102050, Article 102050 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A compositional semantic representation (CSR) combined with SVM is proposed.•CSR provides sufficient semantic representation for SVM in an efficient way.•CSR can maintain prominent semantic features and retain global semantic features.•A custom domain dictionary for recognizing unregistered words is established.
Timely and accurate identification of engineering change documents plays an important role in the document management system of the Architecture, Engineering, and Construction (AEC) industry, which facilitates decision making. The current way of manual review and analysis may cause a delay or omission in information transmission, which may severely affect the project schedule and cost. This paper adopts text classification to explore the intelligent identification of unstructured engineering change documents. However, the paucity of available corpus and the limited number of engineering change texts are challenges in practice. Additionally, the semantic information of texts cannot be elegantly exploited. Especially there are unregistered words that interfere with semantics in engineering change texts. To tackle these problems, we propose a compositional semantic representation (CSR) and develop an SVM-based method named CSR-SVM. We introduce a language model to produce word embeddings; here, a domain dictionary is established for unregistered words. The embeddings are then exploited in CSR so that CSR can incorporate both key and global semantic representations. The former is obtained based on dependency parsing and word embeddings, and the latter is obtained according to all the word representations of a text. The CSR provides sufficient semantic representation for SVM in an efficient way. The advantages of CSR-SVM have been validated by experiments on a real-world dataset. |
---|---|
ISSN: | 1474-0346 1873-5320 |
DOI: | 10.1016/j.aei.2023.102050 |