An Approach for Buyer Name Normalization in Pharmacy Sales Data

It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.93990-93997
Hauptverfasser: Li, Jiajing, Jia, Wang, Nie, Fuhui, You, Hongyan, Hao, Yaxin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 93997
container_issue
container_start_page 93990
container_title IEEE access
container_volume 9
creator Li, Jiajing
Jia, Wang
Nie, Fuhui
You, Hongyan
Hao, Yaxin
description It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%.
doi_str_mv 10.1109/ACCESS.2021.3093028
format Article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2548988613</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9466847</ieee_id><doaj_id>oai_doaj_org_article_077de524827042959d82e59979a83800</doaj_id><sourcerecordid>2548988613</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-44301448b64cba3d3932be75cd804241162492efc78f9b5da422c0f020a80b9f3</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQDaJgqf0FvSx4Tt3PZPckMVYtlCpUz8tks7EpabZu0kP99W5NEecyw-O9NzMviqYEzwjB6i7L8_l6PaOYkhnDimEqL6IRJYmKmWDJ5b_5Opp03RaHkgES6Si6z1qU7ffegdmgynn0cDhaj1aws2jl_A6a-hv62rWobtHbBgJijmgNje3QI_RwE11V0HR2cu7j6ONp_p6_xMvX50WeLWPDhOxjzhkmnMsi4aYAVjLFaGFTYUqJOeWEJJQraiuTykoVogROqcEVphgkLlTFxtFi8C0dbPXe1zvwR-2g1r-A858afF-bxmqcpqUVlEuaBm8lVCmpFUqlCiSTGAev28ErvP11sF2vt-7g23C-poJLJWVCWGCxgWW86zpvq7-tBOtT8HoIXp-C1-fgg2o6qGpr7Z9C8SSRPGU_ukl60Q</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2548988613</pqid></control><display><type>article</type><title>An Approach for Buyer Name Normalization in Pharmacy Sales Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Li, Jiajing ; Jia, Wang ; Nie, Fuhui ; You, Hongyan ; Hao, Yaxin</creator><creatorcontrib>Li, Jiajing ; Jia, Wang ; Nie, Fuhui ; You, Hongyan ; Hao, Yaxin</creatorcontrib><description>It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3093028</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Abbreviations ; BERT ; BiLSTM ; Bit error rate ; Coders ; Context modeling ; Data mining ; Datasets ; Distributors ; Drugs ; Hospitals ; named entities normalization ; Names ; representation learning ; Representations ; Sales ; Semantic similarity evaluation ; Semantics ; sentence classification ; Task analysis</subject><ispartof>IEEE access, 2021, Vol.9, p.93990-93997</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c358t-44301448b64cba3d3932be75cd804241162492efc78f9b5da422c0f020a80b9f3</cites><orcidid>0000-0003-1804-4258 ; 0000-0002-4077-8875</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9466847$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Li, Jiajing</creatorcontrib><creatorcontrib>Jia, Wang</creatorcontrib><creatorcontrib>Nie, Fuhui</creatorcontrib><creatorcontrib>You, Hongyan</creatorcontrib><creatorcontrib>Hao, Yaxin</creatorcontrib><title>An Approach for Buyer Name Normalization in Pharmacy Sales Data</title><title>IEEE access</title><addtitle>Access</addtitle><description>It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%.</description><subject>Abbreviations</subject><subject>BERT</subject><subject>BiLSTM</subject><subject>Bit error rate</subject><subject>Coders</subject><subject>Context modeling</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Distributors</subject><subject>Drugs</subject><subject>Hospitals</subject><subject>named entities normalization</subject><subject>Names</subject><subject>representation learning</subject><subject>Representations</subject><subject>Sales</subject><subject>Semantic similarity evaluation</subject><subject>Semantics</subject><subject>sentence classification</subject><subject>Task analysis</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQDaJgqf0FvSx4Tt3PZPckMVYtlCpUz8tks7EpabZu0kP99W5NEecyw-O9NzMviqYEzwjB6i7L8_l6PaOYkhnDimEqL6IRJYmKmWDJ5b_5Opp03RaHkgES6Si6z1qU7ffegdmgynn0cDhaj1aws2jl_A6a-hv62rWobtHbBgJijmgNje3QI_RwE11V0HR2cu7j6ONp_p6_xMvX50WeLWPDhOxjzhkmnMsi4aYAVjLFaGFTYUqJOeWEJJQraiuTykoVogROqcEVphgkLlTFxtFi8C0dbPXe1zvwR-2g1r-A858afF-bxmqcpqUVlEuaBm8lVCmpFUqlCiSTGAev28ErvP11sF2vt-7g23C-poJLJWVCWGCxgWW86zpvq7-tBOtT8HoIXp-C1-fgg2o6qGpr7Z9C8SSRPGU_ukl60Q</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Li, Jiajing</creator><creator>Jia, Wang</creator><creator>Nie, Fuhui</creator><creator>You, Hongyan</creator><creator>Hao, Yaxin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1804-4258</orcidid><orcidid>https://orcid.org/0000-0002-4077-8875</orcidid></search><sort><creationdate>2021</creationdate><title>An Approach for Buyer Name Normalization in Pharmacy Sales Data</title><author>Li, Jiajing ; Jia, Wang ; Nie, Fuhui ; You, Hongyan ; Hao, Yaxin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-44301448b64cba3d3932be75cd804241162492efc78f9b5da422c0f020a80b9f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Abbreviations</topic><topic>BERT</topic><topic>BiLSTM</topic><topic>Bit error rate</topic><topic>Coders</topic><topic>Context modeling</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Distributors</topic><topic>Drugs</topic><topic>Hospitals</topic><topic>named entities normalization</topic><topic>Names</topic><topic>representation learning</topic><topic>Representations</topic><topic>Sales</topic><topic>Semantic similarity evaluation</topic><topic>Semantics</topic><topic>sentence classification</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jiajing</creatorcontrib><creatorcontrib>Jia, Wang</creatorcontrib><creatorcontrib>Nie, Fuhui</creatorcontrib><creatorcontrib>You, Hongyan</creatorcontrib><creatorcontrib>Hao, Yaxin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Jiajing</au><au>Jia, Wang</au><au>Nie, Fuhui</au><au>You, Hongyan</au><au>Hao, Yaxin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Approach for Buyer Name Normalization in Pharmacy Sales Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>93990</spage><epage>93997</epage><pages>93990-93997</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3093028</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-1804-4258</orcidid><orcidid>https://orcid.org/0000-0002-4077-8875</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.93990-93997
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2548988613
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Abbreviations
BERT
BiLSTM
Bit error rate
Coders
Context modeling
Data mining
Datasets
Distributors
Drugs
Hospitals
named entities normalization
Names
representation learning
Representations
Sales
Semantic similarity evaluation
Semantics
sentence classification
Task analysis
title An Approach for Buyer Name Normalization in Pharmacy Sales Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A45%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Approach%20for%20Buyer%20Name%20Normalization%20in%20Pharmacy%20Sales%20Data&rft.jtitle=IEEE%20access&rft.au=Li,%20Jiajing&rft.date=2021&rft.volume=9&rft.spage=93990&rft.epage=93997&rft.pages=93990-93997&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3093028&rft_dat=%3Cproquest_doaj_%3E2548988613%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2548988613&rft_id=info:pmid/&rft_ieee_id=9466847&rft_doaj_id=oai_doaj_org_article_077de524827042959d82e59979a83800&rfr_iscdi=true