Transfer of Supervision for Improved Address Standardization

Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to creat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kothari, Govind, Faruquie, Tanveer A, Subramaniam, L Venkata, Prasad, K Hima, Mohania, Mukesh K
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2181
container_issue
container_start_page 2178
container_title
container_volume
creator Kothari, Govind
Faruquie, Tanveer A
Subramaniam, L Venkata
Prasad, K Hima
Mohania, Mukesh K
description Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data from one source. The shared component distribution across these dirichlet processes captures the semantic relation between data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.
doi_str_mv 10.1109/ICPR.2010.533
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5595945</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5595945</ieee_id><sourcerecordid>5595945</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-ac67a06592ad8464934b037758466d7bb122b4e1567502ee06cc29799cbbe7bd3</originalsourceid><addsrcrecordid>eNo1js1KxDAYReMfWMcuXbnpC3TM35c04GYooxYGFKf7IWm-QsBpS1IH9OktqKvL4R4ul5A7RteMUfPQ1G_va04XBCHOSG50xSSXUoNk8pxkvBKs1AtekJv_gvNLkjEKrJQK2DXJUwqOcqWVBoCMPLbRDqnHWIx9sf-cMJ5CCuNQ9GMsmuMUxxP6YuN9xJSK_WwHb6MP33ZepFty1duPhPlfrkj7tG3rl3L3-tzUm10ZDJ1L2yltqQLDra-kkkZIR4XWsIDy2jnGuZPIYLlEOSJVXceNNqZzDrXzYkXuf2cDIh6mGI42fh0ADBgJ4gePE0uo</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Transfer of Supervision for Improved Address Standardization</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Kothari, Govind ; Faruquie, Tanveer A ; Subramaniam, L Venkata ; Prasad, K Hima ; Mohania, Mukesh K</creator><creatorcontrib>Kothari, Govind ; Faruquie, Tanveer A ; Subramaniam, L Venkata ; Prasad, K Hima ; Mohania, Mukesh K</creatorcontrib><description>Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data from one source. The shared component distribution across these dirichlet processes captures the semantic relation between data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.</description><identifier>ISSN: 1051-4651</identifier><identifier>ISBN: 1424475422</identifier><identifier>ISBN: 9781424475421</identifier><identifier>EISSN: 2831-7475</identifier><identifier>EISBN: 9781424475414</identifier><identifier>EISBN: 9780769541099</identifier><identifier>EISBN: 1424475414</identifier><identifier>EISBN: 0769541097</identifier><identifier>DOI: 10.1109/ICPR.2010.533</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation model ; address cleansing ; address standardization ; Buildings ; Clustering algorithms ; Data models ; HDP ; Roads ; Semantics ; Training ; transfer learning</subject><ispartof>2010 20th International Conference on Pattern Recognition, 2010, p.2178-2181</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5595945$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5595945$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kothari, Govind</creatorcontrib><creatorcontrib>Faruquie, Tanveer A</creatorcontrib><creatorcontrib>Subramaniam, L Venkata</creatorcontrib><creatorcontrib>Prasad, K Hima</creatorcontrib><creatorcontrib>Mohania, Mukesh K</creatorcontrib><title>Transfer of Supervision for Improved Address Standardization</title><title>2010 20th International Conference on Pattern Recognition</title><addtitle>ICPR</addtitle><description>Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data from one source. The shared component distribution across these dirichlet processes captures the semantic relation between data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.</description><subject>Adaptation model</subject><subject>address cleansing</subject><subject>address standardization</subject><subject>Buildings</subject><subject>Clustering algorithms</subject><subject>Data models</subject><subject>HDP</subject><subject>Roads</subject><subject>Semantics</subject><subject>Training</subject><subject>transfer learning</subject><issn>1051-4651</issn><issn>2831-7475</issn><isbn>1424475422</isbn><isbn>9781424475421</isbn><isbn>9781424475414</isbn><isbn>9780769541099</isbn><isbn>1424475414</isbn><isbn>0769541097</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1js1KxDAYReMfWMcuXbnpC3TM35c04GYooxYGFKf7IWm-QsBpS1IH9OktqKvL4R4ul5A7RteMUfPQ1G_va04XBCHOSG50xSSXUoNk8pxkvBKs1AtekJv_gvNLkjEKrJQK2DXJUwqOcqWVBoCMPLbRDqnHWIx9sf-cMJ5CCuNQ9GMsmuMUxxP6YuN9xJSK_WwHb6MP33ZepFty1duPhPlfrkj7tG3rl3L3-tzUm10ZDJ1L2yltqQLDra-kkkZIR4XWsIDy2jnGuZPIYLlEOSJVXceNNqZzDrXzYkXuf2cDIh6mGI42fh0ADBgJ4gePE0uo</recordid><startdate>201008</startdate><enddate>201008</enddate><creator>Kothari, Govind</creator><creator>Faruquie, Tanveer A</creator><creator>Subramaniam, L Venkata</creator><creator>Prasad, K Hima</creator><creator>Mohania, Mukesh K</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201008</creationdate><title>Transfer of Supervision for Improved Address Standardization</title><author>Kothari, Govind ; Faruquie, Tanveer A ; Subramaniam, L Venkata ; Prasad, K Hima ; Mohania, Mukesh K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-ac67a06592ad8464934b037758466d7bb122b4e1567502ee06cc29799cbbe7bd3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Adaptation model</topic><topic>address cleansing</topic><topic>address standardization</topic><topic>Buildings</topic><topic>Clustering algorithms</topic><topic>Data models</topic><topic>HDP</topic><topic>Roads</topic><topic>Semantics</topic><topic>Training</topic><topic>transfer learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Kothari, Govind</creatorcontrib><creatorcontrib>Faruquie, Tanveer A</creatorcontrib><creatorcontrib>Subramaniam, L Venkata</creatorcontrib><creatorcontrib>Prasad, K Hima</creatorcontrib><creatorcontrib>Mohania, Mukesh K</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kothari, Govind</au><au>Faruquie, Tanveer A</au><au>Subramaniam, L Venkata</au><au>Prasad, K Hima</au><au>Mohania, Mukesh K</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Transfer of Supervision for Improved Address Standardization</atitle><btitle>2010 20th International Conference on Pattern Recognition</btitle><stitle>ICPR</stitle><date>2010-08</date><risdate>2010</risdate><spage>2178</spage><epage>2181</epage><pages>2178-2181</pages><issn>1051-4651</issn><eissn>2831-7475</eissn><isbn>1424475422</isbn><isbn>9781424475421</isbn><eisbn>9781424475414</eisbn><eisbn>9780769541099</eisbn><eisbn>1424475414</eisbn><eisbn>0769541097</eisbn><abstract>Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data from one source. The shared component distribution across these dirichlet processes captures the semantic relation between data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.</abstract><pub>IEEE</pub><doi>10.1109/ICPR.2010.533</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-4651
ispartof 2010 20th International Conference on Pattern Recognition, 2010, p.2178-2181
issn 1051-4651
2831-7475
language eng
recordid cdi_ieee_primary_5595945
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Adaptation model
address cleansing
address standardization
Buildings
Clustering algorithms
Data models
HDP
Roads
Semantics
Training
transfer learning
title Transfer of Supervision for Improved Address Standardization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T00%3A02%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Transfer%20of%20Supervision%20for%20Improved%20Address%20Standardization&rft.btitle=2010%2020th%20International%20Conference%20on%20Pattern%20Recognition&rft.au=Kothari,%20Govind&rft.date=2010-08&rft.spage=2178&rft.epage=2181&rft.pages=2178-2181&rft.issn=1051-4651&rft.eissn=2831-7475&rft.isbn=1424475422&rft.isbn_list=9781424475421&rft_id=info:doi/10.1109/ICPR.2010.533&rft_dat=%3Cieee_6IE%3E5595945%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424475414&rft.eisbn_list=9780769541099&rft.eisbn_list=1424475414&rft.eisbn_list=0769541097&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5595945&rfr_iscdi=true