Learning to classify documents according to genre
Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic...
Gespeichert in:
Veröffentlicht in: | Journal of the American Society for Information Science and Technology 2006-09, Vol.57 (11), p.1506-1518 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1518 |
---|---|
container_issue | 11 |
container_start_page | 1506 |
container_title | Journal of the American Society for Information Science and Technology |
container_volume | 57 |
creator | Finn, Aidan Kushmerick, Nicholas |
description | Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled. |
doi_str_mv | 10.1002/asi.20427 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_35241177</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1153594561</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</originalsourceid><addsrcrecordid>eNp1kF1LwzAUhoMoOKcX_oMiKHjR7eR7vZxDt8FQ8PMyxDQZnV07kxbdv7ezc4LgVQ7keR_OeRE6xdDDAKSvQ9YjwIjcQx3MKYnJIIH93Twgh-gohAUAxhxDB-GZ1b7IinlUlZHJdQiZW0dpaeqlLaoQaWNKn27_57bw9hgdOJ0He7J9u-jp5vpxNIlnd-PpaDiLDUuEjDW4VGtgnArLLMMOU2cHxKWvUgsJRDDLGRMcO-YY8JQSzkBKIVNMEk0p7aKL1rvy5XttQ6WWWTA2z3VhyzooygnDWMoGPPsDLsraF81uitDm7gRI0kCXLWR8GYK3Tq18ttR-rTCoTXOqaU59N9ew51uhDkbnzuvCZOE3MAACidg4-y33keV2_b9QDR-mP-a4TWShsp-7hPZvSkgquXq5HSv2DILcT66UpF_0uIiS</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>231539029</pqid></control><display><type>article</type><title>Learning to classify documents according to genre</title><source>Wiley Online Library Journals Frontfile Complete</source><source>EBSCOhost Business Source Complete</source><creator>Finn, Aidan ; Kushmerick, Nicholas</creator><creatorcontrib>Finn, Aidan ; Kushmerick, Nicholas</creatorcontrib><description>Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.</description><identifier>ISSN: 1532-2882</identifier><identifier>ISSN: 2330-1635</identifier><identifier>EISSN: 1532-2890</identifier><identifier>EISSN: 2330-1643</identifier><identifier>DOI: 10.1002/asi.20427</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>Artificial intelligence ; Classification ; Classifiers ; Documents ; Domains ; Exact sciences and technology ; Genre ; Genre analysis ; Information and communication sciences ; Information retrieval ; Information retrieval systems ; Information retrieval systems. Information and document management system ; Information science. Documentation ; Machine learning ; Performance enhancement ; Sciences and techniques of general use ; Studies ; Topics</subject><ispartof>Journal of the American Society for Information Science and Technology, 2006-09, Vol.57 (11), p.1506-1518</ispartof><rights>Copyright © 2006 Wiley Periodicals, Inc., A Wiley Company</rights><rights>2007 INIST-CNRS</rights><rights>Copyright Wiley Periodicals Inc. Sep 2006</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</citedby><cites>FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fasi.20427$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fasi.20427$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18020969$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Finn, Aidan</creatorcontrib><creatorcontrib>Kushmerick, Nicholas</creatorcontrib><title>Learning to classify documents according to genre</title><title>Journal of the American Society for Information Science and Technology</title><addtitle>J. Am. Soc. Inf. Sci</addtitle><description>Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.</description><subject>Artificial intelligence</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Documents</subject><subject>Domains</subject><subject>Exact sciences and technology</subject><subject>Genre</subject><subject>Genre analysis</subject><subject>Information and communication sciences</subject><subject>Information retrieval</subject><subject>Information retrieval systems</subject><subject>Information retrieval systems. Information and document management system</subject><subject>Information science. Documentation</subject><subject>Machine learning</subject><subject>Performance enhancement</subject><subject>Sciences and techniques of general use</subject><subject>Studies</subject><subject>Topics</subject><issn>1532-2882</issn><issn>2330-1635</issn><issn>1532-2890</issn><issn>2330-1643</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNp1kF1LwzAUhoMoOKcX_oMiKHjR7eR7vZxDt8FQ8PMyxDQZnV07kxbdv7ezc4LgVQ7keR_OeRE6xdDDAKSvQ9YjwIjcQx3MKYnJIIH93Twgh-gohAUAxhxDB-GZ1b7IinlUlZHJdQiZW0dpaeqlLaoQaWNKn27_57bw9hgdOJ0He7J9u-jp5vpxNIlnd-PpaDiLDUuEjDW4VGtgnArLLMMOU2cHxKWvUgsJRDDLGRMcO-YY8JQSzkBKIVNMEk0p7aKL1rvy5XttQ6WWWTA2z3VhyzooygnDWMoGPPsDLsraF81uitDm7gRI0kCXLWR8GYK3Tq18ttR-rTCoTXOqaU59N9ew51uhDkbnzuvCZOE3MAACidg4-y33keV2_b9QDR-mP-a4TWShsp-7hPZvSkgquXq5HSv2DILcT66UpF_0uIiS</recordid><startdate>200609</startdate><enddate>200609</enddate><creator>Finn, Aidan</creator><creator>Kushmerick, Nicholas</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><general>Wiley</general><general>Wiley Periodicals Inc</general><scope>BSCLL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200609</creationdate><title>Learning to classify documents according to genre</title><author>Finn, Aidan ; Kushmerick, Nicholas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4967-a0fdaa04536e4e41f13fe82fdb7a670264e544651f4f405d325407767d129a333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Artificial intelligence</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Documents</topic><topic>Domains</topic><topic>Exact sciences and technology</topic><topic>Genre</topic><topic>Genre analysis</topic><topic>Information and communication sciences</topic><topic>Information retrieval</topic><topic>Information retrieval systems</topic><topic>Information retrieval systems. Information and document management system</topic><topic>Information science. Documentation</topic><topic>Machine learning</topic><topic>Performance enhancement</topic><topic>Sciences and techniques of general use</topic><topic>Studies</topic><topic>Topics</topic><toplevel>online_resources</toplevel><creatorcontrib>Finn, Aidan</creatorcontrib><creatorcontrib>Kushmerick, Nicholas</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of the American Society for Information Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Finn, Aidan</au><au>Kushmerick, Nicholas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to classify documents according to genre</atitle><jtitle>Journal of the American Society for Information Science and Technology</jtitle><addtitle>J. Am. Soc. Inf. Sci</addtitle><date>2006-09</date><risdate>2006</risdate><volume>57</volume><issue>11</issue><spage>1506</spage><epage>1518</epage><pages>1506-1518</pages><issn>1532-2882</issn><issn>2330-1635</issn><eissn>1532-2890</eissn><eissn>2330-1643</eissn><abstract>Current document‐retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer—genre classifiers should be reusable across multiple topics—which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple‐topic domains. We also show how different feature‐sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><doi>10.1002/asi.20427</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-2882 |
ispartof | Journal of the American Society for Information Science and Technology, 2006-09, Vol.57 (11), p.1506-1518 |
issn | 1532-2882 2330-1635 1532-2890 2330-1643 |
language | eng |
recordid | cdi_proquest_miscellaneous_35241177 |
source | Wiley Online Library Journals Frontfile Complete; EBSCOhost Business Source Complete |
subjects | Artificial intelligence Classification Classifiers Documents Domains Exact sciences and technology Genre Genre analysis Information and communication sciences Information retrieval Information retrieval systems Information retrieval systems. Information and document management system Information science. Documentation Machine learning Performance enhancement Sciences and techniques of general use Studies Topics |
title | Learning to classify documents according to genre |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A43%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20classify%20documents%20according%20to%20genre&rft.jtitle=Journal%20of%20the%20American%20Society%20for%20Information%20Science%20and%20Technology&rft.au=Finn,%20Aidan&rft.date=2006-09&rft.volume=57&rft.issue=11&rft.spage=1506&rft.epage=1518&rft.pages=1506-1518&rft.issn=1532-2882&rft.eissn=1532-2890&rft_id=info:doi/10.1002/asi.20427&rft_dat=%3Cproquest_cross%3E1153594561%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=231539029&rft_id=info:pmid/&rfr_iscdi=true |