MARIA: Multiple-alignment $r$-index with aggregation

There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Goga, Adrián, Baláž, Andrej, Petescia, Alessia, Gagie, Travis
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Data Structures and Algorithms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Goga, Adrián Baláž, Andrej Petescia, Alessia Gagie, Travis
description	There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.
doi_str_mv	10.48550/arxiv.2209.09218
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2209_09218</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209_09218</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-7576d5875228081cac9ee1cead4114b5d82fbe53373a55899251f565ca00aa223</originalsourceid><addsrcrecordid>eNotzrFuwjAQgGEvDBX0ATo1A6uDfc7FNluEaEECISH26EguqaUQkJu29O2rAtO__fqEeNEqzRyimlG8hu8UQPlUedDuSWTbYr8u5sn2qxvCpWNJXWj7E_dDMo1TGfqar8lPGD4SatvILQ3h3E_EqKHuk58fHYvD2_KwWMnN7n29KDaScuukRZvX6CwCOOV0RZVn1hVTnWmdHbF20BwZjbGGEJ33gLrBHCtSigjAjMXrfXtjl5cYThR_y39-eeObP4uJPc8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MARIA: Multiple-alignment $r$-index with aggregation</title><source>arXiv.org</source><creator>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</creator><creatorcontrib>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</creatorcontrib><description>There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.</description><identifier>DOI: 10.48550/arxiv.2209.09218</identifier><language>eng</language><subject>Computer Science - Data Structures and Algorithms</subject><creationdate>2022-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2209.09218$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2209.09218$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Goga, Adrián</creatorcontrib><creatorcontrib>Baláž, Andrej</creatorcontrib><creatorcontrib>Petescia, Alessia</creatorcontrib><creatorcontrib>Gagie, Travis</creatorcontrib><title>MARIA: Multiple-alignment $r$-index with aggregation</title><description>There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.</description><subject>Computer Science - Data Structures and Algorithms</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrFuwjAQgGEvDBX0ATo1A6uDfc7FNluEaEECISH26EguqaUQkJu29O2rAtO__fqEeNEqzRyimlG8hu8UQPlUedDuSWTbYr8u5sn2qxvCpWNJXWj7E_dDMo1TGfqar8lPGD4SatvILQ3h3E_EqKHuk58fHYvD2_KwWMnN7n29KDaScuukRZvX6CwCOOV0RZVn1hVTnWmdHbF20BwZjbGGEJ33gLrBHCtSigjAjMXrfXtjl5cYThR_y39-eeObP4uJPc8</recordid><startdate>20220919</startdate><enddate>20220919</enddate><creator>Goga, Adrián</creator><creator>Baláž, Andrej</creator><creator>Petescia, Alessia</creator><creator>Gagie, Travis</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220919</creationdate><title>MARIA: Multiple-alignment $r$-index with aggregation</title><author>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-7576d5875228081cac9ee1cead4114b5d82fbe53373a55899251f565ca00aa223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Data Structures and Algorithms</topic><toplevel>online_resources</toplevel><creatorcontrib>Goga, Adrián</creatorcontrib><creatorcontrib>Baláž, Andrej</creatorcontrib><creatorcontrib>Petescia, Alessia</creatorcontrib><creatorcontrib>Gagie, Travis</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Goga, Adrián</au><au>Baláž, Andrej</au><au>Petescia, Alessia</au><au>Gagie, Travis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MARIA: Multiple-alignment $r$-index with aggregation</atitle><date>2022-09-19</date><risdate>2022</risdate><abstract>There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.</abstract><doi>10.48550/arxiv.2209.09218</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2209.09218
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2209_09218
source	arXiv.org
subjects	Computer Science - Data Structures and Algorithms
title	MARIA: Multiple-alignment $r$-index with aggregation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A22%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MARIA:%20Multiple-alignment%20$r$-index%20with%20aggregation&rft.au=Goga,%20Adri%C3%A1n&rft.date=2022-09-19&rft_id=info:doi/10.48550/arxiv.2209.09218&rft_dat=%3Carxiv_GOX%3E2209_09218%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true