Large-Scale Bayesian Logistic Regression for Text Categorization

Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitt...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Technometrics 2007-08, Vol.49 (3), p.291-304
Hauptverfasser:	Genkin, Alexander, Lewis, David D, Madigan, David
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Data with Complex Structure Datasets Information retrieval Lasso Logistic regression Logistics Machine learning Parametric models Penalization Regression analysis Ridge regression Statistical discrepancies Support vector classifier Variable selection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	304
container_issue	3
container_start_page	291
container_title	Technometrics
container_volume	49
creator	Genkin, Alexander Lewis, David D Madigan, David
description	Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.
doi_str_mv	10.1198/004017007000000245
format	Article
fullrecord	<record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_journals_213672224</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>25471349</jstor_id><sourcerecordid>25471349</sourcerecordid><originalsourceid>FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</originalsourceid><addsrcrecordid>eNp9UF1LAzEQDKJgrf4BQTh8P91N7vNBUItfcCBofQ65XHKkXC81SdH6602p-CK4LCzMzswuQ8gpwgViXV0CZIAlQOxt0SzfIxPMWZnSkrJ9MtkS0sgoDsmR9wsAZLQqJ-S6Ea5X6asUg0puxUZ5I8aksb3xwcjkRfVOeW_smGjrkrn6DMlMBNVbZ75EiPgxOdBi8OrkZ07J2_3dfPaYNs8PT7ObJpVZASFFgWULoFqsEETXoRI5K_I6FxV2NWit42_IZKvzDtq6KDIqO6ihq1tZRYRNyfnOd-Xs-1r5wBd27cZ4klNkRUkpzSKJ7kjSWe-d0nzlzFK4DUfg26D436Ci6GwnWvhg3a-C5lmJLKvj_mq3N2PMYCk-rBs6HsRmsE47MUrjOfvH_xtNJHZs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>213672224</pqid></control><display><type>article</type><title>Large-Scale Bayesian Logistic Regression for Text Categorization</title><source>Jstor Complete Legacy</source><source>JSTOR Mathematics & Statistics</source><creator>Genkin, Alexander ; Lewis, David D ; Madigan, David</creator><creatorcontrib>Genkin, Alexander ; Lewis, David D ; Madigan, David</creatorcontrib><description>Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.</description><identifier>ISSN: 0040-1706</identifier><identifier>EISSN: 1537-2723</identifier><identifier>DOI: 10.1198/004017007000000245</identifier><identifier>CODEN: TCMTA2</identifier><language>eng</language><publisher>Alexandria: Taylor & Francis</publisher><subject>Algorithms ; Data with Complex Structure ; Datasets ; Information retrieval ; Lasso ; Logistic regression ; Logistics ; Machine learning ; Parametric models ; Penalization ; Regression analysis ; Ridge regression ; Statistical discrepancies ; Support vector classifier ; Variable selection</subject><ispartof>Technometrics, 2007-08, Vol.49 (3), p.291-304</ispartof><rights>American Statistical Association and the American Society for Quality 2007</rights><rights>Copyright 2007 The American Statistical Association and The American Society for Quality</rights><rights>Copyright American Society for Quality Aug 2007</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</citedby><cites>FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/25471349$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/25471349$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,828,27901,27902,57992,57996,58225,58229</link.rule.ids></links><search><creatorcontrib>Genkin, Alexander</creatorcontrib><creatorcontrib>Lewis, David D</creatorcontrib><creatorcontrib>Madigan, David</creatorcontrib><title>Large-Scale Bayesian Logistic Regression for Text Categorization</title><title>Technometrics</title><description>Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.</description><subject>Algorithms</subject><subject>Data with Complex Structure</subject><subject>Datasets</subject><subject>Information retrieval</subject><subject>Lasso</subject><subject>Logistic regression</subject><subject>Logistics</subject><subject>Machine learning</subject><subject>Parametric models</subject><subject>Penalization</subject><subject>Regression analysis</subject><subject>Ridge regression</subject><subject>Statistical discrepancies</subject><subject>Support vector classifier</subject><subject>Variable selection</subject><issn>0040-1706</issn><issn>1537-2723</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9UF1LAzEQDKJgrf4BQTh8P91N7vNBUItfcCBofQ65XHKkXC81SdH6602p-CK4LCzMzswuQ8gpwgViXV0CZIAlQOxt0SzfIxPMWZnSkrJ9MtkS0sgoDsmR9wsAZLQqJ-S6Ea5X6asUg0puxUZ5I8aksb3xwcjkRfVOeW_smGjrkrn6DMlMBNVbZ75EiPgxOdBi8OrkZ07J2_3dfPaYNs8PT7ObJpVZASFFgWULoFqsEETXoRI5K_I6FxV2NWit42_IZKvzDtq6KDIqO6ihq1tZRYRNyfnOd-Xs-1r5wBd27cZ4klNkRUkpzSKJ7kjSWe-d0nzlzFK4DUfg26D436Ci6GwnWvhg3a-C5lmJLKvj_mq3N2PMYCk-rBs6HsRmsE47MUrjOfvH_xtNJHZs</recordid><startdate>20070801</startdate><enddate>20070801</enddate><creator>Genkin, Alexander</creator><creator>Lewis, David D</creator><creator>Madigan, David</creator><general>Taylor & Francis</general><general>The American Society for Quality and The American Statistical Association</general><general>American Society for Quality</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AO</scope><scope>8C1</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M2P</scope><scope>M7S</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYYUZ</scope><scope>Q9U</scope><scope>S0X</scope></search><sort><creationdate>20070801</creationdate><title>Large-Scale Bayesian Logistic Regression for Text Categorization</title><author>Genkin, Alexander ; Lewis, David D ; Madigan, David</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Algorithms</topic><topic>Data with Complex Structure</topic><topic>Datasets</topic><topic>Information retrieval</topic><topic>Lasso</topic><topic>Logistic regression</topic><topic>Logistics</topic><topic>Machine learning</topic><topic>Parametric models</topic><topic>Penalization</topic><topic>Regression analysis</topic><topic>Ridge regression</topic><topic>Statistical discrepancies</topic><topic>Support vector classifier</topic><topic>Variable selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Genkin, Alexander</creatorcontrib><creatorcontrib>Lewis, David D</creatorcontrib><creatorcontrib>Madigan, David</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><collection>SIRS Editorial</collection><jtitle>Technometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Genkin, Alexander</au><au>Lewis, David D</au><au>Madigan, David</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Large-Scale Bayesian Logistic Regression for Text Categorization</atitle><jtitle>Technometrics</jtitle><date>2007-08-01</date><risdate>2007</risdate><volume>49</volume><issue>3</issue><spage>291</spage><epage>304</epage><pages>291-304</pages><issn>0040-1706</issn><eissn>1537-2723</eissn><coden>TCMTA2</coden><abstract>Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.</abstract><cop>Alexandria</cop><pub>Taylor & Francis</pub><doi>10.1198/004017007000000245</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0040-1706
ispartof	Technometrics, 2007-08, Vol.49 (3), p.291-304
issn	0040-1706 1537-2723
language	eng
recordid	cdi_proquest_journals_213672224
source	Jstor Complete Legacy; JSTOR Mathematics & Statistics
subjects	Algorithms Data with Complex Structure Datasets Information retrieval Lasso Logistic regression Logistics Machine learning Parametric models Penalization Regression analysis Ridge regression Statistical discrepancies Support vector classifier Variable selection
title	Large-Scale Bayesian Logistic Regression for Text Categorization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A00%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Large-Scale%20Bayesian%20Logistic%20Regression%20for%20Text%20Categorization&rft.jtitle=Technometrics&rft.au=Genkin,%20Alexander&rft.date=2007-08-01&rft.volume=49&rft.issue=3&rft.spage=291&rft.epage=304&rft.pages=291-304&rft.issn=0040-1706&rft.eissn=1537-2723&rft.coden=TCMTA2&rft_id=info:doi/10.1198/004017007000000245&rft_dat=%3Cjstor_proqu%3E25471349%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=213672224&rft_id=info:pmid/&rft_jstor_id=25471349&rfr_iscdi=true