An Approach for Buyer Name Normalization in Pharmacy Sales Data
It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typ...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.93990-93997 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 93997 |
---|---|
container_issue | |
container_start_page | 93990 |
container_title | IEEE access |
container_volume | 9 |
creator | Li, Jiajing Jia, Wang Nie, Fuhui You, Hongyan Hao, Yaxin |
description | It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%. |
doi_str_mv | 10.1109/ACCESS.2021.3093028 |
format | Article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2548988613</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9466847</ieee_id><doaj_id>oai_doaj_org_article_077de524827042959d82e59979a83800</doaj_id><sourcerecordid>2548988613</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-44301448b64cba3d3932be75cd804241162492efc78f9b5da422c0f020a80b9f3</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQDaJgqf0FvSx4Tt3PZPckMVYtlCpUz8tks7EpabZu0kP99W5NEecyw-O9NzMviqYEzwjB6i7L8_l6PaOYkhnDimEqL6IRJYmKmWDJ5b_5Opp03RaHkgES6Si6z1qU7ffegdmgynn0cDhaj1aws2jl_A6a-hv62rWobtHbBgJijmgNje3QI_RwE11V0HR2cu7j6ONp_p6_xMvX50WeLWPDhOxjzhkmnMsi4aYAVjLFaGFTYUqJOeWEJJQraiuTykoVogROqcEVphgkLlTFxtFi8C0dbPXe1zvwR-2g1r-A858afF-bxmqcpqUVlEuaBm8lVCmpFUqlCiSTGAev28ErvP11sF2vt-7g23C-poJLJWVCWGCxgWW86zpvq7-tBOtT8HoIXp-C1-fgg2o6qGpr7Z9C8SSRPGU_ukl60Q</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2548988613</pqid></control><display><type>article</type><title>An Approach for Buyer Name Normalization in Pharmacy Sales Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Li, Jiajing ; Jia, Wang ; Nie, Fuhui ; You, Hongyan ; Hao, Yaxin</creator><creatorcontrib>Li, Jiajing ; Jia, Wang ; Nie, Fuhui ; You, Hongyan ; Hao, Yaxin</creatorcontrib><description>It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3093028</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Abbreviations ; BERT ; BiLSTM ; Bit error rate ; Coders ; Context modeling ; Data mining ; Datasets ; Distributors ; Drugs ; Hospitals ; named entities normalization ; Names ; representation learning ; Representations ; Sales ; Semantic similarity evaluation ; Semantics ; sentence classification ; Task analysis</subject><ispartof>IEEE access, 2021, Vol.9, p.93990-93997</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c358t-44301448b64cba3d3932be75cd804241162492efc78f9b5da422c0f020a80b9f3</cites><orcidid>0000-0003-1804-4258 ; 0000-0002-4077-8875</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9466847$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Li, Jiajing</creatorcontrib><creatorcontrib>Jia, Wang</creatorcontrib><creatorcontrib>Nie, Fuhui</creatorcontrib><creatorcontrib>You, Hongyan</creatorcontrib><creatorcontrib>Hao, Yaxin</creatorcontrib><title>An Approach for Buyer Name Normalization in Pharmacy Sales Data</title><title>IEEE access</title><addtitle>Access</addtitle><description>It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%.</description><subject>Abbreviations</subject><subject>BERT</subject><subject>BiLSTM</subject><subject>Bit error rate</subject><subject>Coders</subject><subject>Context modeling</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Distributors</subject><subject>Drugs</subject><subject>Hospitals</subject><subject>named entities normalization</subject><subject>Names</subject><subject>representation learning</subject><subject>Representations</subject><subject>Sales</subject><subject>Semantic similarity evaluation</subject><subject>Semantics</subject><subject>sentence classification</subject><subject>Task analysis</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQDaJgqf0FvSx4Tt3PZPckMVYtlCpUz8tks7EpabZu0kP99W5NEecyw-O9NzMviqYEzwjB6i7L8_l6PaOYkhnDimEqL6IRJYmKmWDJ5b_5Opp03RaHkgES6Si6z1qU7ffegdmgynn0cDhaj1aws2jl_A6a-hv62rWobtHbBgJijmgNje3QI_RwE11V0HR2cu7j6ONp_p6_xMvX50WeLWPDhOxjzhkmnMsi4aYAVjLFaGFTYUqJOeWEJJQraiuTykoVogROqcEVphgkLlTFxtFi8C0dbPXe1zvwR-2g1r-A858afF-bxmqcpqUVlEuaBm8lVCmpFUqlCiSTGAev28ErvP11sF2vt-7g23C-poJLJWVCWGCxgWW86zpvq7-tBOtT8HoIXp-C1-fgg2o6qGpr7Z9C8SSRPGU_ukl60Q</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Li, Jiajing</creator><creator>Jia, Wang</creator><creator>Nie, Fuhui</creator><creator>You, Hongyan</creator><creator>Hao, Yaxin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1804-4258</orcidid><orcidid>https://orcid.org/0000-0002-4077-8875</orcidid></search><sort><creationdate>2021</creationdate><title>An Approach for Buyer Name Normalization in Pharmacy Sales Data</title><author>Li, Jiajing ; Jia, Wang ; Nie, Fuhui ; You, Hongyan ; Hao, Yaxin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-44301448b64cba3d3932be75cd804241162492efc78f9b5da422c0f020a80b9f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Abbreviations</topic><topic>BERT</topic><topic>BiLSTM</topic><topic>Bit error rate</topic><topic>Coders</topic><topic>Context modeling</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Distributors</topic><topic>Drugs</topic><topic>Hospitals</topic><topic>named entities normalization</topic><topic>Names</topic><topic>representation learning</topic><topic>Representations</topic><topic>Sales</topic><topic>Semantic similarity evaluation</topic><topic>Semantics</topic><topic>sentence classification</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jiajing</creatorcontrib><creatorcontrib>Jia, Wang</creatorcontrib><creatorcontrib>Nie, Fuhui</creatorcontrib><creatorcontrib>You, Hongyan</creatorcontrib><creatorcontrib>Hao, Yaxin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Jiajing</au><au>Jia, Wang</au><au>Nie, Fuhui</au><au>You, Hongyan</au><au>Hao, Yaxin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Approach for Buyer Name Normalization in Pharmacy Sales Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>93990</spage><epage>93997</epage><pages>93990-93997</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>It is fundamental for pharmaceutical enterprises to make statistics and analyses on the massive sales data that are submitted by distributors at different levels when making market plans and other decisions. However, buyer names in sales data may have various forms due to aliases, abbreviations, typos, and other reasons, which severely affect the subsequent mining performance. To tackle this problem, in this paper, we propose a novel approach called BuyerNorm which can identify different expression forms for a given buyer name. The proposed model takes pairs of variable length buyer names as input and combines Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM) to obtain string representations and calculate the similarity between the two names, which indicates whether they represent the same buyer. For this task, we first build a data set including more than 80,000 pairs of buyer names. Then, extensive experiments on the data set show that BuyerNorm performs better than the state-of-the-art baseline and can obtain an average AUC of 99.84%.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3093028</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-1804-4258</orcidid><orcidid>https://orcid.org/0000-0002-4077-8875</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2021, Vol.9, p.93990-93997 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2548988613 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Abbreviations BERT BiLSTM Bit error rate Coders Context modeling Data mining Datasets Distributors Drugs Hospitals named entities normalization Names representation learning Representations Sales Semantic similarity evaluation Semantics sentence classification Task analysis |
title | An Approach for Buyer Name Normalization in Pharmacy Sales Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A45%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Approach%20for%20Buyer%20Name%20Normalization%20in%20Pharmacy%20Sales%20Data&rft.jtitle=IEEE%20access&rft.au=Li,%20Jiajing&rft.date=2021&rft.volume=9&rft.spage=93990&rft.epage=93997&rft.pages=93990-93997&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3093028&rft_dat=%3Cproquest_doaj_%3E2548988613%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2548988613&rft_id=info:pmid/&rft_ieee_id=9466847&rft_doaj_id=oai_doaj_org_article_077de524827042959d82e59979a83800&rfr_iscdi=true |