Learning to classify documents according to genre

Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Society for Information Science and Technology 2006-09, Vol.57 (11), p.1506-1518
Hauptverfasser: Finn, Aidan, Kushmerick, Nicholas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1518
container_issue 11
container_start_page 1506
container_title Journal of the American Society for Information Science and Technology
container_volume 57
creator Finn, Aidan
Kushmerick, Nicholas
description Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.
doi_str_mv 10.1002/asi.20427
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_35241177</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1153594561</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</originalsourceid><addsrcrecordid>eNp1kF1LwzAUhoMoOKcX_oMiKHjR7eR7vZxDt8FQ8PMyxDQZnV07kxbdv7ezc4LgVQ7keR_OeRE6xdDDAKSvQ9YjwIjcQx3MKYnJIIH93Twgh-gohAUAxhxDB-GZ1b7IinlUlZHJdQiZW0dpaeqlLaoQaWNKn27_57bw9hgdOJ0He7J9u-jp5vpxNIlnd-PpaDiLDUuEjDW4VGtgnArLLMMOU2cHxKWvUgsJRDDLGRMcO-YY8JQSzkBKIVNMEk0p7aKL1rvy5XttQ6WWWTA2z3VhyzooygnDWMoGPPsDLsraF81uitDm7gRI0kCXLWR8GYK3Tq18ttR-rTCoTXOqaU59N9ew51uhDkbnzuvCZOE3MAACidg4-y33keV2_b9QDR-mP-a4TWShsp-7hPZvSkgquXq5HSv2DILcT66UpF_0uIiS</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>231539029</pqid></control><display><type>article</type><title>Learning to classify documents according to genre</title><source>Wiley Online Library Journals Frontfile Complete</source><source>EBSCOhost Business Source Complete</source><creator>Finn, Aidan ; Kushmerick, Nicholas</creator><creatorcontrib>Finn, Aidan ; Kushmerick, Nicholas</creatorcontrib><description>Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.</description><identifier>ISSN: 1532-2882</identifier><identifier>ISSN: 2330-1635</identifier><identifier>EISSN: 1532-2890</identifier><identifier>EISSN: 2330-1643</identifier><identifier>DOI: 10.1002/asi.20427</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>Artificial intelligence ; Classification ; Classifiers ; Documents ; Domains ; Exact sciences and technology ; Genre ; Genre analysis ; Information and communication sciences ; Information retrieval ; Information retrieval systems ; Information retrieval systems. Information and document management system ; Information science. Documentation ; Machine learning ; Performance enhancement ; Sciences and techniques of general use ; Studies ; Topics</subject><ispartof>Journal of the American Society for Information Science and Technology, 2006-09, Vol.57 (11), p.1506-1518</ispartof><rights>Copyright © 2006 Wiley Periodicals, Inc., A Wiley Company</rights><rights>2007 INIST-CNRS</rights><rights>Copyright Wiley Periodicals Inc. Sep 2006</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</citedby><cites>FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fasi.20427$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fasi.20427$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18020969$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Finn, Aidan</creatorcontrib><creatorcontrib>Kushmerick, Nicholas</creatorcontrib><title>Learning to classify documents according to genre</title><title>Journal of the American Society for Information Science and Technology</title><addtitle>J. Am. Soc. Inf. Sci</addtitle><description>Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.</description><subject>Artificial intelligence</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Documents</subject><subject>Domains</subject><subject>Exact sciences and technology</subject><subject>Genre</subject><subject>Genre analysis</subject><subject>Information and communication sciences</subject><subject>Information retrieval</subject><subject>Information retrieval systems</subject><subject>Information retrieval systems. Information and document management system</subject><subject>Information science. Documentation</subject><subject>Machine learning</subject><subject>Performance enhancement</subject><subject>Sciences and techniques of general use</subject><subject>Studies</subject><subject>Topics</subject><issn>1532-2882</issn><issn>2330-1635</issn><issn>1532-2890</issn><issn>2330-1643</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNp1kF1LwzAUhoMoOKcX_oMiKHjR7eR7vZxDt8FQ8PMyxDQZnV07kxbdv7ezc4LgVQ7keR_OeRE6xdDDAKSvQ9YjwIjcQx3MKYnJIIH93Twgh-gohAUAxhxDB-GZ1b7IinlUlZHJdQiZW0dpaeqlLaoQaWNKn27_57bw9hgdOJ0He7J9u-jp5vpxNIlnd-PpaDiLDUuEjDW4VGtgnArLLMMOU2cHxKWvUgsJRDDLGRMcO-YY8JQSzkBKIVNMEk0p7aKL1rvy5XttQ6WWWTA2z3VhyzooygnDWMoGPPsDLsraF81uitDm7gRI0kCXLWR8GYK3Tq18ttR-rTCoTXOqaU59N9ew51uhDkbnzuvCZOE3MAACidg4-y33keV2_b9QDR-mP-a4TWShsp-7hPZvSkgquXq5HSv2DILcT66UpF_0uIiS</recordid><startdate>200609</startdate><enddate>200609</enddate><creator>Finn, Aidan</creator><creator>Kushmerick, Nicholas</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><general>Wiley</general><general>Wiley Periodicals Inc</general><scope>BSCLL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200609</creationdate><title>Learning to classify documents according to genre</title><author>Finn, Aidan ; Kushmerick, Nicholas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Artificial intelligence</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Documents</topic><topic>Domains</topic><topic>Exact sciences and technology</topic><topic>Genre</topic><topic>Genre analysis</topic><topic>Information and communication sciences</topic><topic>Information retrieval</topic><topic>Information retrieval systems</topic><topic>Information retrieval systems. Information and document management system</topic><topic>Information science. Documentation</topic><topic>Machine learning</topic><topic>Performance enhancement</topic><topic>Sciences and techniques of general use</topic><topic>Studies</topic><topic>Topics</topic><toplevel>online_resources</toplevel><creatorcontrib>Finn, Aidan</creatorcontrib><creatorcontrib>Kushmerick, Nicholas</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the American Society for Information Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Finn, Aidan</au><au>Kushmerick, Nicholas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to classify documents according to genre</atitle><jtitle>Journal of the American Society for Information Science and Technology</jtitle><addtitle>J. Am. Soc. Inf. Sci</addtitle><date>2006-09</date><risdate>2006</risdate><volume>57</volume><issue>11</issue><spage>1506</spage><epage>1518</epage><pages>1506-1518</pages><issn>1532-2882</issn><issn>2330-1635</issn><eissn>1532-2890</eissn><eissn>2330-1643</eissn><abstract>Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><doi>10.1002/asi.20427</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1532-2882
ispartof Journal of the American Society for Information Science and Technology, 2006-09, Vol.57 (11), p.1506-1518
issn 1532-2882
2330-1635
1532-2890
2330-1643
language eng
recordid cdi_proquest_miscellaneous_35241177
source Wiley Online Library Journals Frontfile Complete; EBSCOhost Business Source Complete
subjects Artificial intelligence
Classification
Classifiers
Documents
Domains
Exact sciences and technology
Genre
Genre analysis
Information and communication sciences
Information retrieval
Information retrieval systems
Information retrieval systems. Information and document management system
Information science. Documentation
Machine learning
Performance enhancement
Sciences and techniques of general use
Studies
Topics
title Learning to classify documents according to genre
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A43%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20classify%20documents%20according%20to%20genre&rft.jtitle=Journal%20of%20the%20American%20Society%20for%20Information%20Science%20and%20Technology&rft.au=Finn,%20Aidan&rft.date=2006-09&rft.volume=57&rft.issue=11&rft.spage=1506&rft.epage=1518&rft.pages=1506-1518&rft.issn=1532-2882&rft.eissn=1532-2890&rft_id=info:doi/10.1002/asi.20427&rft_dat=%3Cproquest_cross%3E1153594561%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=231539029&rft_id=info:pmid/&rfr_iscdi=true