Language trees and zipping
In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. W...
Gespeichert in:
Veröffentlicht in: | Physical review letters 2002-01, Vol.88 (4), p.048702-048702, Article 048702 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 048702 |
---|---|
container_issue | 4 |
container_start_page | 048702 |
container_title | Physical review letters |
container_volume | 88 |
creator | Benedetto, Dario Caglioti, Emanuele Loreto, Vittorio |
description | In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification. |
doi_str_mv | 10.1103/PhysRevLett.88.048702 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_71409252</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>71409252</sourcerecordid><originalsourceid>FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</originalsourceid><addsrcrecordid>eNpNkE1Lw0AURQdRbK3-AUHpyl3qezOTzGQpxS8IKKLr4SV5iZEkjZlEqL_eSgq6upt7z4UjxAXCChHU9fP71r_wV8LDsLJ2BdoakAdijmDiwCDqQzEHUBjEAGYmTrz_AACUkT0WM0QLiMbOxXlCbTlSycuhZ_ZLavPld9V1VVueiqOCas9n-1yIt7vb1_VDkDzdP65vkiBTYIbAxtZiqkOKTJ5FqohAZ5jHQCgVphjmIepYhpClVBBLNEQRUqzCHKVhTWohriZu128-R_aDayqfcV1Ty5vRO4MadgC5K4ZTMes33vdcuK6vGuq3DsH9SnH_pDhr3SRlt7vcH4xpw_nfam9B_QDi615M</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>71409252</pqid></control><display><type>article</type><title>Language trees and zipping</title><source>MEDLINE</source><source>American Physical Society Journals</source><creator>Benedetto, Dario ; Caglioti, Emanuele ; Loreto, Vittorio</creator><creatorcontrib>Benedetto, Dario ; Caglioti, Emanuele ; Loreto, Vittorio</creatorcontrib><description>In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.</description><identifier>ISSN: 0031-9007</identifier><identifier>EISSN: 1079-7114</identifier><identifier>DOI: 10.1103/PhysRevLett.88.048702</identifier><identifier>PMID: 11801178</identifier><language>eng</language><publisher>United States</publisher><subject>Algorithms ; DNA - genetics ; Language ; Models, Theoretical ; Pattern Recognition, Automated</subject><ispartof>Physical review letters, 2002-01, Vol.88 (4), p.048702-048702, Article 048702</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</citedby><cites>FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,2876,2877,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/11801178$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Benedetto, Dario</creatorcontrib><creatorcontrib>Caglioti, Emanuele</creatorcontrib><creatorcontrib>Loreto, Vittorio</creatorcontrib><title>Language trees and zipping</title><title>Physical review letters</title><addtitle>Phys Rev Lett</addtitle><description>In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.</description><subject>Algorithms</subject><subject>DNA - genetics</subject><subject>Language</subject><subject>Models, Theoretical</subject><subject>Pattern Recognition, Automated</subject><issn>0031-9007</issn><issn>1079-7114</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpNkE1Lw0AURQdRbK3-AUHpyl3qezOTzGQpxS8IKKLr4SV5iZEkjZlEqL_eSgq6upt7z4UjxAXCChHU9fP71r_wV8LDsLJ2BdoakAdijmDiwCDqQzEHUBjEAGYmTrz_AACUkT0WM0QLiMbOxXlCbTlSycuhZ_ZLavPld9V1VVueiqOCas9n-1yIt7vb1_VDkDzdP65vkiBTYIbAxtZiqkOKTJ5FqohAZ5jHQCgVphjmIepYhpClVBBLNEQRUqzCHKVhTWohriZu128-R_aDayqfcV1Ty5vRO4MadgC5K4ZTMes33vdcuK6vGuq3DsH9SnH_pDhr3SRlt7vcH4xpw_nfam9B_QDi615M</recordid><startdate>20020128</startdate><enddate>20020128</enddate><creator>Benedetto, Dario</creator><creator>Caglioti, Emanuele</creator><creator>Loreto, Vittorio</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20020128</creationdate><title>Language trees and zipping</title><author>Benedetto, Dario ; Caglioti, Emanuele ; Loreto, Vittorio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Algorithms</topic><topic>DNA - genetics</topic><topic>Language</topic><topic>Models, Theoretical</topic><topic>Pattern Recognition, Automated</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Benedetto, Dario</creatorcontrib><creatorcontrib>Caglioti, Emanuele</creatorcontrib><creatorcontrib>Loreto, Vittorio</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Physical review letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Benedetto, Dario</au><au>Caglioti, Emanuele</au><au>Loreto, Vittorio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Language trees and zipping</atitle><jtitle>Physical review letters</jtitle><addtitle>Phys Rev Lett</addtitle><date>2002-01-28</date><risdate>2002</risdate><volume>88</volume><issue>4</issue><spage>048702</spage><epage>048702</epage><pages>048702-048702</pages><artnum>048702</artnum><issn>0031-9007</issn><eissn>1079-7114</eissn><abstract>In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.</abstract><cop>United States</cop><pmid>11801178</pmid><doi>10.1103/PhysRevLett.88.048702</doi><tpages>1</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0031-9007 |
ispartof | Physical review letters, 2002-01, Vol.88 (4), p.048702-048702, Article 048702 |
issn | 0031-9007 1079-7114 |
language | eng |
recordid | cdi_proquest_miscellaneous_71409252 |
source | MEDLINE; American Physical Society Journals |
subjects | Algorithms DNA - genetics Language Models, Theoretical Pattern Recognition, Automated |
title | Language trees and zipping |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A10%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Language%20trees%20and%20zipping&rft.jtitle=Physical%20review%20letters&rft.au=Benedetto,%20Dario&rft.date=2002-01-28&rft.volume=88&rft.issue=4&rft.spage=048702&rft.epage=048702&rft.pages=048702-048702&rft.artnum=048702&rft.issn=0031-9007&rft.eissn=1079-7114&rft_id=info:doi/10.1103/PhysRevLett.88.048702&rft_dat=%3Cproquest_cross%3E71409252%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=71409252&rft_id=info:pmid/11801178&rfr_iscdi=true |