Using Public Data to Generate Industrial Classification Codes

Statistical agencies face increasing costs, lower response rates, and increased demands for timely and accurate statistical data. These demands on agency resources reveal the need for alternative data sources, ideally data that are cheaper than current surveys and available within a short time frame...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Basdeo, Nevada, Burbank, Nathaniel, Bhattacharjee, Sudip, Etudo, Ugochukwu, Cuffe, John, Smith, Justin C, Roberts, Shawn R
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	big data Econometrics and Mathematical Economics industrial classification machine learning NAICS text analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Basdeo, Nevada Burbank, Nathaniel Bhattacharjee, Sudip Etudo, Ugochukwu Cuffe, John Smith, Justin C Roberts, Shawn R
description	Statistical agencies face increasing costs, lower response rates, and increased demands for timely and accurate statistical data. These demands on agency resources reveal the need for alternative data sources, ideally data that are cheaper than current surveys and available within a short time frame. Textual data on public-facing websites present an ideal data source for certain US Census Bureau statistical products. We identify such data sources and argue that they may be well suited for classification tasks such as industrial or occupational coding. Using these data sources allows statistical agencies to provide more accurate, more timely data for lower costs and lower respondent burden compared to traditional survey methods, while opening the door for new and innovative statistical products. We explore how public data can improve the production of federal statistics, using the specific case of using website text and user reviews, gathered from Google Places API, to generate North American Industrial Classification System (NAICS) codes for approximately 120,000 single-unit employer establishments. Our approach shows that public data is a useful tool for generating NAICS codes. We also find challenges, and provide suggestions for agencies implementing such a system for production purposes.
doi_str_mv	10.7208/chicago/9780226801391.003.0008
format	Book Chapter
fullrecord	<record><control><sourceid>oup</sourceid><recordid>TN_cdi_oup_upso_upso_9780226801254_chapter_008</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>upso_9780226801254_chapter_008</oup_id><sourcerecordid>upso_9780226801254_chapter_008</sourcerecordid><originalsourceid>FETCH-LOGICAL-o729-be904fab6081f470eb338b9a6d60f9518c0b743323344fee3c3f5b8b4ab991823</originalsourceid><addsrcrecordid>eNpVUE1LxDAUjIigrv0POXnr7stHm-TgQaquCwt6WMFbSdJkN1Ka0qT_38IuiIf3huENM8xD6JHAWlCQG3sKVh_jRgkJlNYSCFNkDcCWAXmFin-Ha3R_IbT6vkVFSj-LjFZcVATu0NNXCsMRf86mDxa_6KxxjnjrBjfp7PBu6OaUp6B73PQ6peCX7BzigJvYufSAbrzukysuuEKHt9dD817uP7a75nlfRkFVaZwC7rWpQRLPBTjDmDRK110NXlVEWjCCM0YZ49w7xyzzlZGGa6MUkZSt0PpsG-exnccUz-uv51KntSc9Zje1yw_YL_25UiI</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype></control><display><type>book_chapter</type><title>Using Public Data to Generate Industrial Classification Codes</title><source>De Gruyter eBooks</source><creator>Basdeo, Nevada ; Burbank, Nathaniel ; Bhattacharjee, Sudip ; Etudo, Ugochukwu ; Cuffe, John ; Smith, Justin C ; Roberts, Shawn R</creator><creatorcontrib>Basdeo, Nevada ; Burbank, Nathaniel ; Bhattacharjee, Sudip ; Etudo, Ugochukwu ; Cuffe, John ; Smith, Justin C ; Roberts, Shawn R</creatorcontrib><description>Statistical agencies face increasing costs, lower response rates, and increased demands for timely and accurate statistical data. These demands on agency resources reveal the need for alternative data sources, ideally data that are cheaper than current surveys and available within a short time frame. Textual data on public-facing websites present an ideal data source for certain US Census Bureau statistical products. We identify such data sources and argue that they may be well suited for classification tasks such as industrial or occupational coding. Using these data sources allows statistical agencies to provide more accurate, more timely data for lower costs and lower respondent burden compared to traditional survey methods, while opening the door for new and innovative statistical products. We explore how public data can improve the production of federal statistics, using the specific case of using website text and user reviews, gathered from Google Places API, to generate North American Industrial Classification System (NAICS) codes for approximately 120,000 single-unit employer establishments. Our approach shows that public data is a useful tool for generating NAICS codes. We also find challenges, and provide suggestions for agencies implementing such a system for production purposes.</description><identifier>ISBN: 022680125X</identifier><identifier>ISBN: 9780226801254</identifier><identifier>EISBN: 9780226801391</identifier><identifier>EISBN: 022680139X</identifier><identifier>DOI: 10.7208/chicago/9780226801391.003.0008</identifier><language>eng</language><publisher>University of Chicago Press</publisher><subject>big data ; Econometrics and Mathematical Economics ; industrial classification ; machine learning ; NAICS ; text analysis</subject><ispartof>Big Data for Twenty-First-Century Economic Statistics, 2022</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>779,780,784,793,27925</link.rule.ids></links><search><creatorcontrib>Basdeo, Nevada</creatorcontrib><creatorcontrib>Burbank, Nathaniel</creatorcontrib><creatorcontrib>Bhattacharjee, Sudip</creatorcontrib><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><creatorcontrib>Cuffe, John</creatorcontrib><creatorcontrib>Smith, Justin C</creatorcontrib><creatorcontrib>Roberts, Shawn R</creatorcontrib><title>Using Public Data to Generate Industrial Classification Codes</title><title>Big Data for Twenty-First-Century Economic Statistics</title><description>Statistical agencies face increasing costs, lower response rates, and increased demands for timely and accurate statistical data. These demands on agency resources reveal the need for alternative data sources, ideally data that are cheaper than current surveys and available within a short time frame. Textual data on public-facing websites present an ideal data source for certain US Census Bureau statistical products. We identify such data sources and argue that they may be well suited for classification tasks such as industrial or occupational coding. Using these data sources allows statistical agencies to provide more accurate, more timely data for lower costs and lower respondent burden compared to traditional survey methods, while opening the door for new and innovative statistical products. We explore how public data can improve the production of federal statistics, using the specific case of using website text and user reviews, gathered from Google Places API, to generate North American Industrial Classification System (NAICS) codes for approximately 120,000 single-unit employer establishments. Our approach shows that public data is a useful tool for generating NAICS codes. We also find challenges, and provide suggestions for agencies implementing such a system for production purposes.</description><subject>big data</subject><subject>Econometrics and Mathematical Economics</subject><subject>industrial classification</subject><subject>machine learning</subject><subject>NAICS</subject><subject>text analysis</subject><isbn>022680125X</isbn><isbn>9780226801254</isbn><isbn>9780226801391</isbn><isbn>022680139X</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2022</creationdate><recordtype>book_chapter</recordtype><sourceid/><recordid>eNpVUE1LxDAUjIigrv0POXnr7stHm-TgQaquCwt6WMFbSdJkN1Ka0qT_38IuiIf3huENM8xD6JHAWlCQG3sKVh_jRgkJlNYSCFNkDcCWAXmFin-Ha3R_IbT6vkVFSj-LjFZcVATu0NNXCsMRf86mDxa_6KxxjnjrBjfp7PBu6OaUp6B73PQ6peCX7BzigJvYufSAbrzukysuuEKHt9dD817uP7a75nlfRkFVaZwC7rWpQRLPBTjDmDRK110NXlVEWjCCM0YZ49w7xyzzlZGGa6MUkZSt0PpsG-exnccUz-uv51KntSc9Zje1yw_YL_25UiI</recordid><startdate>20220311</startdate><enddate>20220311</enddate><creator>Basdeo, Nevada</creator><creator>Burbank, Nathaniel</creator><creator>Bhattacharjee, Sudip</creator><creator>Etudo, Ugochukwu</creator><creator>Cuffe, John</creator><creator>Smith, Justin C</creator><creator>Roberts, Shawn R</creator><general>University of Chicago Press</general><scope/></search><sort><creationdate>20220311</creationdate><title>Using Public Data to Generate Industrial Classification Codes</title><author>Basdeo, Nevada ; Burbank, Nathaniel ; Bhattacharjee, Sudip ; Etudo, Ugochukwu ; Cuffe, John ; Smith, Justin C ; Roberts, Shawn R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-o729-be904fab6081f470eb338b9a6d60f9518c0b743323344fee3c3f5b8b4ab991823</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2022</creationdate><topic>big data</topic><topic>Econometrics and Mathematical Economics</topic><topic>industrial classification</topic><topic>machine learning</topic><topic>NAICS</topic><topic>text analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Basdeo, Nevada</creatorcontrib><creatorcontrib>Burbank, Nathaniel</creatorcontrib><creatorcontrib>Bhattacharjee, Sudip</creatorcontrib><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><creatorcontrib>Cuffe, John</creatorcontrib><creatorcontrib>Smith, Justin C</creatorcontrib><creatorcontrib>Roberts, Shawn R</creatorcontrib></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Basdeo, Nevada</au><au>Burbank, Nathaniel</au><au>Bhattacharjee, Sudip</au><au>Etudo, Ugochukwu</au><au>Cuffe, John</au><au>Smith, Justin C</au><au>Roberts, Shawn R</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Using Public Data to Generate Industrial Classification Codes</atitle><btitle>Big Data for Twenty-First-Century Economic Statistics</btitle><date>2022-03-11</date><risdate>2022</risdate><isbn>022680125X</isbn><isbn>9780226801254</isbn><eisbn>9780226801391</eisbn><eisbn>022680139X</eisbn><abstract>Statistical agencies face increasing costs, lower response rates, and increased demands for timely and accurate statistical data. These demands on agency resources reveal the need for alternative data sources, ideally data that are cheaper than current surveys and available within a short time frame. Textual data on public-facing websites present an ideal data source for certain US Census Bureau statistical products. We identify such data sources and argue that they may be well suited for classification tasks such as industrial or occupational coding. Using these data sources allows statistical agencies to provide more accurate, more timely data for lower costs and lower respondent burden compared to traditional survey methods, while opening the door for new and innovative statistical products. We explore how public data can improve the production of federal statistics, using the specific case of using website text and user reviews, gathered from Google Places API, to generate North American Industrial Classification System (NAICS) codes for approximately 120,000 single-unit employer establishments. Our approach shows that public data is a useful tool for generating NAICS codes. We also find challenges, and provide suggestions for agencies implementing such a system for production purposes.</abstract><pub>University of Chicago Press</pub><doi>10.7208/chicago/9780226801391.003.0008</doi></addata></record>
fulltext	fulltext
identifier	ISBN: 022680125X
ispartof	Big Data for Twenty-First-Century Economic Statistics, 2022
issn
language	eng
recordid	cdi_oup_upso_upso_9780226801254_chapter_008
source	De Gruyter eBooks
subjects	big data Econometrics and Mathematical Economics industrial classification machine learning NAICS text analysis
title	Using Public Data to Generate Industrial Classification Codes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T22%3A01%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Using%20Public%20Data%20to%20Generate%20Industrial%20Classification%20Codes&rft.btitle=Big%20Data%20for%20Twenty-First-Century%20Economic%20Statistics&rft.au=Basdeo,%20Nevada&rft.date=2022-03-11&rft.isbn=022680125X&rft.isbn_list=9780226801254&rft_id=info:doi/10.7208/chicago/9780226801391.003.0008&rft_dat=%3Coup%3Eupso_9780226801254_chapter_008%3C/oup%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780226801391&rft.eisbn_list=022680139X&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_oup_id=upso_9780226801254_chapter_008&rfr_iscdi=true