An Approach for Buyer Name Normalization in Pharmacy Sales Data
It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typ...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.93990-93997 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3093028 |