Automatic Genre Determination of Web Content

A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HARZ DIRK, IFFERT RALF, USHER MARK, KEINHOERSTER MARK
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator HARZ DIRK
IFFERT RALF
USHER MARK
KEINHOERSTER MARK
description A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2015264107A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2015264107A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2015264107A13</originalsourceid><addsrcrecordid>eNrjZNBxLC3Jz00syUxWcE_NK0pVcEktSS3KzcwDCuXnKeSnKYSnJik45-eVpOaV8DCwpiXmFKfyQmluBmU31xBnD93Ugvz41OKCxOTUvNSS-NBgIwNDUyMzE0MDc0dDY-JUAQAV4ipY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Automatic Genre Determination of Web Content</title><source>esp@cenet</source><creator>HARZ DIRK ; IFFERT RALF ; USHER MARK ; KEINHOERSTER MARK</creator><creatorcontrib>HARZ DIRK ; IFFERT RALF ; USHER MARK ; KEINHOERSTER MARK</creatorcontrib><description>A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC COMMUNICATION TECHNIQUE ; ELECTRICITY ; PHYSICS ; TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION</subject><creationdate>2015</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20150917&amp;DB=EPODOC&amp;CC=US&amp;NR=2015264107A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76289</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20150917&amp;DB=EPODOC&amp;CC=US&amp;NR=2015264107A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>HARZ DIRK</creatorcontrib><creatorcontrib>IFFERT RALF</creatorcontrib><creatorcontrib>USHER MARK</creatorcontrib><creatorcontrib>KEINHOERSTER MARK</creatorcontrib><title>Automatic Genre Determination of Web Content</title><description>A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC COMMUNICATION TECHNIQUE</subject><subject>ELECTRICITY</subject><subject>PHYSICS</subject><subject>TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2015</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZNBxLC3Jz00syUxWcE_NK0pVcEktSS3KzcwDCuXnKeSnKYSnJik45-eVpOaV8DCwpiXmFKfyQmluBmU31xBnD93Ugvz41OKCxOTUvNSS-NBgIwNDUyMzE0MDc0dDY-JUAQAV4ipY</recordid><startdate>20150917</startdate><enddate>20150917</enddate><creator>HARZ DIRK</creator><creator>IFFERT RALF</creator><creator>USHER MARK</creator><creator>KEINHOERSTER MARK</creator><scope>EVB</scope></search><sort><creationdate>20150917</creationdate><title>Automatic Genre Determination of Web Content</title><author>HARZ DIRK ; IFFERT RALF ; USHER MARK ; KEINHOERSTER MARK</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2015264107A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2015</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC COMMUNICATION TECHNIQUE</topic><topic>ELECTRICITY</topic><topic>PHYSICS</topic><topic>TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION</topic><toplevel>online_resources</toplevel><creatorcontrib>HARZ DIRK</creatorcontrib><creatorcontrib>IFFERT RALF</creatorcontrib><creatorcontrib>USHER MARK</creatorcontrib><creatorcontrib>KEINHOERSTER MARK</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>HARZ DIRK</au><au>IFFERT RALF</au><au>USHER MARK</au><au>KEINHOERSTER MARK</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Automatic Genre Determination of Web Content</title><date>2015-09-17</date><risdate>2015</risdate><abstract>A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US2015264107A1
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC COMMUNICATION TECHNIQUE
ELECTRICITY
PHYSICS
TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHICCOMMUNICATION
title Automatic Genre Determination of Web Content
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T10%3A47%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=HARZ%20DIRK&rft.date=2015-09-17&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2015264107A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true