GSAM: A deep neural network model for extracting computational representations of Chinese addresses fused with geospatial feature

Addresses are one of the most important geographical reference systems in natural languages. In China, due to the relatively backward address planning, there are a large number of non-standard addresses. This kind of unstructured text makes the management and application of Chinese addresses much mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers, environment and urban systems environment and urban systems, 2020-05, Vol.81, p.101473-12, Article 101473
Hauptverfasser: Xu, Liuchang, Du, Zhenhong, Mao, Ruichen, Zhang, Feng, Liu, Renyi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Addresses are one of the most important geographical reference systems in natural languages. In China, due to the relatively backward address planning, there are a large number of non-standard addresses. This kind of unstructured text makes the management and application of Chinese addresses much more difficult. However, by extracting the computational representations of addresses, it can be structured and its related applications can be extended more conveniently. Therefore, this paper utilizes a deep neural language model from natural language processing (NLP) to automatically extract computational representations through an unsupervised address language model (ALM), which is trained in an unsupervised way and is suitable for a large-scale address corpus. We propose a solution to fuse addresses and geospatial features and construct a geospatial-semantic address model (GSAM) that supports a variety of downstream tasks. Our proposed GSAM constructing process consists of three phases. First, we build an ALM using bidirectional encoder representations from Transformers (BERT) to learn the addresses' semantic representations. Then, the fusion clustering results of the semantic and geospatial information are obtained by a high-dimensional clustering algorithm. Finally, we construct the GSAM based on the fused clustering results using novel fine-tuning techniques. Furthermore, we apply the extracted computational representation from GSAM to the address location prediction task. The experimental results indicate that the target task accuracy of the ALM is 90.79%, and the result of semantic geospatial fusion clustering strongly correlates with fine-grained urban neighbourhood area division. The GSAM can accurately identify clustering labels and the values of evaluation metrics are all above 0.96. We also demonstrate that our model outperforms purely ALM-based and word2vec-based models by address location prediction task. •An unsupervised language modelling method is used to extract computational representations of massive Chinese address data•Propose a feature fusion solution for addressing semantic and geospatial features•The semantic geospatial fusion clustering result strongly correlates with fine-grained urban neighbourhood area division•The GSAM constructed can extract the computational representations of addresses and use it to predict the location of address•GSAM-based model outperforms purely ALM-based and word2vec-based models by address location predict
ISSN:0198-9715
1873-7587
DOI:10.1016/j.compenvurbsys.2020.101473