Large-Scale Bayesian Logistic Regression for Text Categorization
Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitt...
Gespeichert in:
Veröffentlicht in: | Technometrics 2007-08, Vol.49 (3), p.291-304 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 304 |
---|---|
container_issue | 3 |
container_start_page | 291 |
container_title | Technometrics |
container_volume | 49 |
creator | Genkin, Alexander Lewis, David D Madigan, David |
description | Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results. |
doi_str_mv | 10.1198/004017007000000245 |
format | Article |
fullrecord | <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_journals_213672224</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>25471349</jstor_id><sourcerecordid>25471349</sourcerecordid><originalsourceid>FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</originalsourceid><addsrcrecordid>eNp9UF1LAzEQDKJgrf4BQTh8P91N7vNBUItfcCBofQ65XHKkXC81SdH6602p-CK4LCzMzswuQ8gpwgViXV0CZIAlQOxt0SzfIxPMWZnSkrJ9MtkS0sgoDsmR9wsAZLQqJ-S6Ea5X6asUg0puxUZ5I8aksb3xwcjkRfVOeW_smGjrkrn6DMlMBNVbZ75EiPgxOdBi8OrkZ07J2_3dfPaYNs8PT7ObJpVZASFFgWULoFqsEETXoRI5K_I6FxV2NWit42_IZKvzDtq6KDIqO6ihq1tZRYRNyfnOd-Xs-1r5wBd27cZ4klNkRUkpzSKJ7kjSWe-d0nzlzFK4DUfg26D436Ci6GwnWvhg3a-C5lmJLKvj_mq3N2PMYCk-rBs6HsRmsE47MUrjOfvH_xtNJHZs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>213672224</pqid></control><display><type>article</type><title>Large-Scale Bayesian Logistic Regression for Text Categorization</title><source>Jstor Complete Legacy</source><source>JSTOR Mathematics & Statistics</source><creator>Genkin, Alexander ; Lewis, David D ; Madigan, David</creator><creatorcontrib>Genkin, Alexander ; Lewis, David D ; Madigan, David</creatorcontrib><description>Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.</description><identifier>ISSN: 0040-1706</identifier><identifier>EISSN: 1537-2723</identifier><identifier>DOI: 10.1198/004017007000000245</identifier><identifier>CODEN: TCMTA2</identifier><language>eng</language><publisher>Alexandria: Taylor & Francis</publisher><subject>Algorithms ; Data with Complex Structure ; Datasets ; Information retrieval ; Lasso ; Logistic regression ; Logistics ; Machine learning ; Parametric models ; Penalization ; Regression analysis ; Ridge regression ; Statistical discrepancies ; Support vector classifier ; Variable selection</subject><ispartof>Technometrics, 2007-08, Vol.49 (3), p.291-304</ispartof><rights>American Statistical Association and the American Society for Quality 2007</rights><rights>Copyright 2007 The American Statistical Association and The American Society for Quality</rights><rights>Copyright American Society for Quality Aug 2007</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</citedby><cites>FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/25471349$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/25471349$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,828,27901,27902,57992,57996,58225,58229</link.rule.ids></links><search><creatorcontrib>Genkin, Alexander</creatorcontrib><creatorcontrib>Lewis, David D</creatorcontrib><creatorcontrib>Madigan, David</creatorcontrib><title>Large-Scale Bayesian Logistic Regression for Text Categorization</title><title>Technometrics</title><description>Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.</description><subject>Algorithms</subject><subject>Data with Complex Structure</subject><subject>Datasets</subject><subject>Information retrieval</subject><subject>Lasso</subject><subject>Logistic regression</subject><subject>Logistics</subject><subject>Machine learning</subject><subject>Parametric models</subject><subject>Penalization</subject><subject>Regression analysis</subject><subject>Ridge regression</subject><subject>Statistical discrepancies</subject><subject>Support vector classifier</subject><subject>Variable selection</subject><issn>0040-1706</issn><issn>1537-2723</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9UF1LAzEQDKJgrf4BQTh8P91N7vNBUItfcCBofQ65XHKkXC81SdH6602p-CK4LCzMzswuQ8gpwgViXV0CZIAlQOxt0SzfIxPMWZnSkrJ9MtkS0sgoDsmR9wsAZLQqJ-S6Ea5X6asUg0puxUZ5I8aksb3xwcjkRfVOeW_smGjrkrn6DMlMBNVbZ75EiPgxOdBi8OrkZ07J2_3dfPaYNs8PT7ObJpVZASFFgWULoFqsEETXoRI5K_I6FxV2NWit42_IZKvzDtq6KDIqO6ihq1tZRYRNyfnOd-Xs-1r5wBd27cZ4klNkRUkpzSKJ7kjSWe-d0nzlzFK4DUfg26D436Ci6GwnWvhg3a-C5lmJLKvj_mq3N2PMYCk-rBs6HsRmsE47MUrjOfvH_xtNJHZs</recordid><startdate>20070801</startdate><enddate>20070801</enddate><creator>Genkin, Alexander</creator><creator>Lewis, David D</creator><creator>Madigan, David</creator><general>Taylor & Francis</general><general>The American Society for Quality and The American Statistical Association</general><general>American Society for Quality</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AO</scope><scope>8C1</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M2P</scope><scope>M7S</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYYUZ</scope><scope>Q9U</scope><scope>S0X</scope></search><sort><creationdate>20070801</creationdate><title>Large-Scale Bayesian Logistic Regression for Text Categorization</title><author>Genkin, Alexander ; Lewis, David D ; Madigan, David</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c460t-1a17b00eb1810add1ea536595a81d90fff70613cbf5d0b96642cd090d9bc8d0b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Algorithms</topic><topic>Data with Complex Structure</topic><topic>Datasets</topic><topic>Information retrieval</topic><topic>Lasso</topic><topic>Logistic regression</topic><topic>Logistics</topic><topic>Machine learning</topic><topic>Parametric models</topic><topic>Penalization</topic><topic>Regression analysis</topic><topic>Ridge regression</topic><topic>Statistical discrepancies</topic><topic>Support vector classifier</topic><topic>Variable selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Genkin, Alexander</creatorcontrib><creatorcontrib>Lewis, David D</creatorcontrib><creatorcontrib>Madigan, David</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><collection>SIRS Editorial</collection><jtitle>Technometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Genkin, Alexander</au><au>Lewis, David D</au><au>Madigan, David</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Large-Scale Bayesian Logistic Regression for Text Categorization</atitle><jtitle>Technometrics</jtitle><date>2007-08-01</date><risdate>2007</risdate><volume>49</volume><issue>3</issue><spage>291</spage><epage>304</epage><pages>291-304</pages><issn>0040-1706</issn><eissn>1537-2723</eissn><coden>TCMTA2</coden><abstract>Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.</abstract><cop>Alexandria</cop><pub>Taylor & Francis</pub><doi>10.1198/004017007000000245</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0040-1706 |
ispartof | Technometrics, 2007-08, Vol.49 (3), p.291-304 |
issn | 0040-1706 1537-2723 |
language | eng |
recordid | cdi_proquest_journals_213672224 |
source | Jstor Complete Legacy; JSTOR Mathematics & Statistics |
subjects | Algorithms Data with Complex Structure Datasets Information retrieval Lasso Logistic regression Logistics Machine learning Parametric models Penalization Regression analysis Ridge regression Statistical discrepancies Support vector classifier Variable selection |
title | Large-Scale Bayesian Logistic Regression for Text Categorization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A00%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Large-Scale%20Bayesian%20Logistic%20Regression%20for%20Text%20Categorization&rft.jtitle=Technometrics&rft.au=Genkin,%20Alexander&rft.date=2007-08-01&rft.volume=49&rft.issue=3&rft.spage=291&rft.epage=304&rft.pages=291-304&rft.issn=0040-1706&rft.eissn=1537-2723&rft.coden=TCMTA2&rft_id=info:doi/10.1198/004017007000000245&rft_dat=%3Cjstor_proqu%3E25471349%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=213672224&rft_id=info:pmid/&rft_jstor_id=25471349&rfr_iscdi=true |