MARIA: Multiple-alignment $r$-index with aggregation
There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Goga, Adrián Baláž, Andrej Petescia, Alessia Gagie, Travis |
description | There now exist compact indexes that can efficiently list all the occurrences
of a pattern in a dataset consisting of thousands of genomes, or even all the
occurrences of all the pattern's maximal exact matches (MEMs) with respect to
the dataset. Unless we are lucky and the pattern is specific to only a few
genomes, however, we could be swamped by hundreds of matches -- or even
hundreds per MEM -- only to discover that most or all of the matches are to
substrings that occupy the same few columns in a multiple alignment. To address
this issue, in this paper we present a simple and compact data index MARIA that
stores a multiple alignment such that, given the position of one match of a
pattern (or a MEM or other substring of a pattern) and its length, we can
quickly list all the distinct columns of the multiple alignment where matches
start. |
doi_str_mv | 10.48550/arxiv.2209.09218 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2209_09218</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209_09218</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-7576d5875228081cac9ee1cead4114b5d82fbe53373a55899251f565ca00aa223</originalsourceid><addsrcrecordid>eNotzrFuwjAQgGEvDBX0ATo1A6uDfc7FNluEaEECISH26EguqaUQkJu29O2rAtO__fqEeNEqzRyimlG8hu8UQPlUedDuSWTbYr8u5sn2qxvCpWNJXWj7E_dDMo1TGfqar8lPGD4SatvILQ3h3E_EqKHuk58fHYvD2_KwWMnN7n29KDaScuukRZvX6CwCOOV0RZVn1hVTnWmdHbF20BwZjbGGEJ33gLrBHCtSigjAjMXrfXtjl5cYThR_y39-eeObP4uJPc8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MARIA: Multiple-alignment $r$-index with aggregation</title><source>arXiv.org</source><creator>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</creator><creatorcontrib>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</creatorcontrib><description>There now exist compact indexes that can efficiently list all the occurrences
of a pattern in a dataset consisting of thousands of genomes, or even all the
occurrences of all the pattern's maximal exact matches (MEMs) with respect to
the dataset. Unless we are lucky and the pattern is specific to only a few
genomes, however, we could be swamped by hundreds of matches -- or even
hundreds per MEM -- only to discover that most or all of the matches are to
substrings that occupy the same few columns in a multiple alignment. To address
this issue, in this paper we present a simple and compact data index MARIA that
stores a multiple alignment such that, given the position of one match of a
pattern (or a MEM or other substring of a pattern) and its length, we can
quickly list all the distinct columns of the multiple alignment where matches
start.</description><identifier>DOI: 10.48550/arxiv.2209.09218</identifier><language>eng</language><subject>Computer Science - Data Structures and Algorithms</subject><creationdate>2022-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2209.09218$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2209.09218$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Goga, Adrián</creatorcontrib><creatorcontrib>Baláž, Andrej</creatorcontrib><creatorcontrib>Petescia, Alessia</creatorcontrib><creatorcontrib>Gagie, Travis</creatorcontrib><title>MARIA: Multiple-alignment $r$-index with aggregation</title><description>There now exist compact indexes that can efficiently list all the occurrences
of a pattern in a dataset consisting of thousands of genomes, or even all the
occurrences of all the pattern's maximal exact matches (MEMs) with respect to
the dataset. Unless we are lucky and the pattern is specific to only a few
genomes, however, we could be swamped by hundreds of matches -- or even
hundreds per MEM -- only to discover that most or all of the matches are to
substrings that occupy the same few columns in a multiple alignment. To address
this issue, in this paper we present a simple and compact data index MARIA that
stores a multiple alignment such that, given the position of one match of a
pattern (or a MEM or other substring of a pattern) and its length, we can
quickly list all the distinct columns of the multiple alignment where matches
start.</description><subject>Computer Science - Data Structures and Algorithms</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrFuwjAQgGEvDBX0ATo1A6uDfc7FNluEaEECISH26EguqaUQkJu29O2rAtO__fqEeNEqzRyimlG8hu8UQPlUedDuSWTbYr8u5sn2qxvCpWNJXWj7E_dDMo1TGfqar8lPGD4SatvILQ3h3E_EqKHuk58fHYvD2_KwWMnN7n29KDaScuukRZvX6CwCOOV0RZVn1hVTnWmdHbF20BwZjbGGEJ33gLrBHCtSigjAjMXrfXtjl5cYThR_y39-eeObP4uJPc8</recordid><startdate>20220919</startdate><enddate>20220919</enddate><creator>Goga, Adrián</creator><creator>Baláž, Andrej</creator><creator>Petescia, Alessia</creator><creator>Gagie, Travis</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220919</creationdate><title>MARIA: Multiple-alignment $r$-index with aggregation</title><author>Goga, Adrián ; Baláž, Andrej ; Petescia, Alessia ; Gagie, Travis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-7576d5875228081cac9ee1cead4114b5d82fbe53373a55899251f565ca00aa223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Data Structures and Algorithms</topic><toplevel>online_resources</toplevel><creatorcontrib>Goga, Adrián</creatorcontrib><creatorcontrib>Baláž, Andrej</creatorcontrib><creatorcontrib>Petescia, Alessia</creatorcontrib><creatorcontrib>Gagie, Travis</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Goga, Adrián</au><au>Baláž, Andrej</au><au>Petescia, Alessia</au><au>Gagie, Travis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MARIA: Multiple-alignment $r$-index with aggregation</atitle><date>2022-09-19</date><risdate>2022</risdate><abstract>There now exist compact indexes that can efficiently list all the occurrences
of a pattern in a dataset consisting of thousands of genomes, or even all the
occurrences of all the pattern's maximal exact matches (MEMs) with respect to
the dataset. Unless we are lucky and the pattern is specific to only a few
genomes, however, we could be swamped by hundreds of matches -- or even
hundreds per MEM -- only to discover that most or all of the matches are to
substrings that occupy the same few columns in a multiple alignment. To address
this issue, in this paper we present a simple and compact data index MARIA that
stores a multiple alignment such that, given the position of one match of a
pattern (or a MEM or other substring of a pattern) and its length, we can
quickly list all the distinct columns of the multiple alignment where matches
start.</abstract><doi>10.48550/arxiv.2209.09218</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2209.09218 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2209_09218 |
source | arXiv.org |
subjects | Computer Science - Data Structures and Algorithms |
title | MARIA: Multiple-alignment $r$-index with aggregation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A22%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MARIA:%20Multiple-alignment%20$r$-index%20with%20aggregation&rft.au=Goga,%20Adri%C3%A1n&rft.date=2022-09-19&rft_id=info:doi/10.48550/arxiv.2209.09218&rft_dat=%3Carxiv_GOX%3E2209_09218%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |