MARIA: Multiple-alignment $r$-index with aggregation

There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Goga, Adrián, Baláž, Andrej, Petescia, Alessia, Gagie, Travis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Goga, Adrián
Baláž, Andrej
Petescia, Alessia
Gagie, Travis
description There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.
doi_str_mv 10.48550/arxiv.2209.09218
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2209_09218</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209_09218</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-7576d5875228081cac9ee1cead4114b5d82fbe53373a55899251f565ca00aa223</originalsourceid><addsrcrecordid>eNotzrFuwjAQgGEvDBX0ATo1A6uDfc7FNluEaEECISH26EguqaUQkJu29O2rAtO__fqEeNEqzRyimlG8hu8UQPlUedDuSWTbYr8u5sn2qxvCpWNJXWj7E_dDMo1TGfqar8lPGD4SatvILQ3h3E_EqKHuk58fHYvD2_KwWMnN7n29KDaScuukRZvX6CwCOOV0RZVn1hVTnWmdHbF20BwZjbGGEJ33gLrBHCtSigjAjMXrfXtjl5cYThR_y39-eeObP4uJPc8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MARIA: Multiple-alignment $r$-index with aggregation</title><source>arXiv.org</source><creator>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</creator><creatorcontrib>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</creatorcontrib><description>There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.</description><identifier>DOI: 10.48550/arxiv.2209.09218</identifier><language>eng</language><subject>Computer Science - Data Structures and Algorithms</subject><creationdate>2022-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2209.09218$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2209.09218$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Goga, Adrián</creatorcontrib><creatorcontrib>Baláž, Andrej</creatorcontrib><creatorcontrib>Petescia, Alessia</creatorcontrib><creatorcontrib>Gagie, Travis</creatorcontrib><title>MARIA: Multiple-alignment $r$-index with aggregation</title><description>There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.</description><subject>Computer Science - Data Structures and Algorithms</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrFuwjAQgGEvDBX0ATo1A6uDfc7FNluEaEECISH26EguqaUQkJu29O2rAtO__fqEeNEqzRyimlG8hu8UQPlUedDuSWTbYr8u5sn2qxvCpWNJXWj7E_dDMo1TGfqar8lPGD4SatvILQ3h3E_EqKHuk58fHYvD2_KwWMnN7n29KDaScuukRZvX6CwCOOV0RZVn1hVTnWmdHbF20BwZjbGGEJ33gLrBHCtSigjAjMXrfXtjl5cYThR_y39-eeObP4uJPc8</recordid><startdate>20220919</startdate><enddate>20220919</enddate><creator>Goga, Adrián</creator><creator>Baláž, Andrej</creator><creator>Petescia, Alessia</creator><creator>Gagie, Travis</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220919</creationdate><title>MARIA: Multiple-alignment $r$-index with aggregation</title><author>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-7576d5875228081cac9ee1cead4114b5d82fbe53373a55899251f565ca00aa223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Data Structures and Algorithms</topic><toplevel>online_resources</toplevel><creatorcontrib>Goga, Adrián</creatorcontrib><creatorcontrib>Baláž, Andrej</creatorcontrib><creatorcontrib>Petescia, Alessia</creatorcontrib><creatorcontrib>Gagie, Travis</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Goga, Adrián</au><au>Baláž, Andrej</au><au>Petescia, Alessia</au><au>Gagie, Travis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MARIA: Multiple-alignment $r$-index with aggregation</atitle><date>2022-09-19</date><risdate>2022</risdate><abstract>There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.</abstract><doi>10.48550/arxiv.2209.09218</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2209.09218
ispartof
issn
language eng
recordid cdi_arxiv_primary_2209_09218
source arXiv.org
subjects Computer Science - Data Structures and Algorithms
title MARIA: Multiple-alignment $r$-index with aggregation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A22%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MARIA:%20Multiple-alignment%20$r$-index%20with%20aggregation&rft.au=Goga,%20Adri%C3%A1n&rft.date=2022-09-19&rft_id=info:doi/10.48550/arxiv.2209.09218&rft_dat=%3Carxiv_GOX%3E2209_09218%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true