Language trees and zipping

In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. W...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Physical review letters 2002-01, Vol.88 (4), p.048702-048702, Article 048702
Hauptverfasser: Benedetto, Dario, Caglioti, Emanuele, Loreto, Vittorio
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 048702
container_issue 4
container_start_page 048702
container_title Physical review letters
container_volume 88
creator Benedetto, Dario
Caglioti, Emanuele
Loreto, Vittorio
description In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.
doi_str_mv 10.1103/PhysRevLett.88.048702
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_71409252</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>71409252</sourcerecordid><originalsourceid>FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</originalsourceid><addsrcrecordid>eNpNkE1Lw0AURQdRbK3-AUHpyl3qezOTzGQpxS8IKKLr4SV5iZEkjZlEqL_eSgq6upt7z4UjxAXCChHU9fP71r_wV8LDsLJ2BdoakAdijmDiwCDqQzEHUBjEAGYmTrz_AACUkT0WM0QLiMbOxXlCbTlSycuhZ_ZLavPld9V1VVueiqOCas9n-1yIt7vb1_VDkDzdP65vkiBTYIbAxtZiqkOKTJ5FqohAZ5jHQCgVphjmIepYhpClVBBLNEQRUqzCHKVhTWohriZu128-R_aDayqfcV1Ty5vRO4MadgC5K4ZTMes33vdcuK6vGuq3DsH9SnH_pDhr3SRlt7vcH4xpw_nfam9B_QDi615M</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>71409252</pqid></control><display><type>article</type><title>Language trees and zipping</title><source>MEDLINE</source><source>American Physical Society Journals</source><creator>Benedetto, Dario ; Caglioti, Emanuele ; Loreto, Vittorio</creator><creatorcontrib>Benedetto, Dario ; Caglioti, Emanuele ; Loreto, Vittorio</creatorcontrib><description>In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.</description><identifier>ISSN: 0031-9007</identifier><identifier>EISSN: 1079-7114</identifier><identifier>DOI: 10.1103/PhysRevLett.88.048702</identifier><identifier>PMID: 11801178</identifier><language>eng</language><publisher>United States</publisher><subject>Algorithms ; DNA - genetics ; Language ; Models, Theoretical ; Pattern Recognition, Automated</subject><ispartof>Physical review letters, 2002-01, Vol.88 (4), p.048702-048702, Article 048702</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</citedby><cites>FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,2876,2877,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/11801178$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Benedetto, Dario</creatorcontrib><creatorcontrib>Caglioti, Emanuele</creatorcontrib><creatorcontrib>Loreto, Vittorio</creatorcontrib><title>Language trees and zipping</title><title>Physical review letters</title><addtitle>Phys Rev Lett</addtitle><description>In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.</description><subject>Algorithms</subject><subject>DNA - genetics</subject><subject>Language</subject><subject>Models, Theoretical</subject><subject>Pattern Recognition, Automated</subject><issn>0031-9007</issn><issn>1079-7114</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpNkE1Lw0AURQdRbK3-AUHpyl3qezOTzGQpxS8IKKLr4SV5iZEkjZlEqL_eSgq6upt7z4UjxAXCChHU9fP71r_wV8LDsLJ2BdoakAdijmDiwCDqQzEHUBjEAGYmTrz_AACUkT0WM0QLiMbOxXlCbTlSycuhZ_ZLavPld9V1VVueiqOCas9n-1yIt7vb1_VDkDzdP65vkiBTYIbAxtZiqkOKTJ5FqohAZ5jHQCgVphjmIepYhpClVBBLNEQRUqzCHKVhTWohriZu128-R_aDayqfcV1Ty5vRO4MadgC5K4ZTMes33vdcuK6vGuq3DsH9SnH_pDhr3SRlt7vcH4xpw_nfam9B_QDi615M</recordid><startdate>20020128</startdate><enddate>20020128</enddate><creator>Benedetto, Dario</creator><creator>Caglioti, Emanuele</creator><creator>Loreto, Vittorio</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20020128</creationdate><title>Language trees and zipping</title><author>Benedetto, Dario ; Caglioti, Emanuele ; Loreto, Vittorio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c307t-89881b45a67dc63f604c1d90a1231b15d5149250cbafae217aa61a935d127e4a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Algorithms</topic><topic>DNA - genetics</topic><topic>Language</topic><topic>Models, Theoretical</topic><topic>Pattern Recognition, Automated</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Benedetto, Dario</creatorcontrib><creatorcontrib>Caglioti, Emanuele</creatorcontrib><creatorcontrib>Loreto, Vittorio</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Physical review letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Benedetto, Dario</au><au>Caglioti, Emanuele</au><au>Loreto, Vittorio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Language trees and zipping</atitle><jtitle>Physical review letters</jtitle><addtitle>Phys Rev Lett</addtitle><date>2002-01-28</date><risdate>2002</risdate><volume>88</volume><issue>4</issue><spage>048702</spage><epage>048702</epage><pages>048702-048702</pages><artnum>048702</artnum><issn>0031-9007</issn><eissn>1079-7114</eissn><abstract>In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution, and language classification.</abstract><cop>United States</cop><pmid>11801178</pmid><doi>10.1103/PhysRevLett.88.048702</doi><tpages>1</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0031-9007
ispartof Physical review letters, 2002-01, Vol.88 (4), p.048702-048702, Article 048702
issn 0031-9007
1079-7114
language eng
recordid cdi_proquest_miscellaneous_71409252
source MEDLINE; American Physical Society Journals
subjects Algorithms
DNA - genetics
Language
Models, Theoretical
Pattern Recognition, Automated
title Language trees and zipping
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A10%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Language%20trees%20and%20zipping&rft.jtitle=Physical%20review%20letters&rft.au=Benedetto,%20Dario&rft.date=2002-01-28&rft.volume=88&rft.issue=4&rft.spage=048702&rft.epage=048702&rft.pages=048702-048702&rft.artnum=048702&rft.issn=0031-9007&rft.eissn=1079-7114&rft_id=info:doi/10.1103/PhysRevLett.88.048702&rft_dat=%3Cproquest_cross%3E71409252%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=71409252&rft_id=info:pmid/11801178&rfr_iscdi=true