Assessing the reliability of point mutation as data augmentation for deep learning with genomic data

Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lee, Hyunjung, Özbulak, Utku, Park, Homin, Depuydt, Stephen, De Neve, Wesley, Vankerschaver, Joris
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lee, Hyunjung
Özbulak, Utku
Park, Homin
Depuydt, Stephen
De Neve, Wesley
Vankerschaver, Joris
description Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. Results: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. Conclusion: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.
format Article
fullrecord <record><control><sourceid>ghent</sourceid><recordid>TN_cdi_ghent_librecat_oai_archive_ugent_be_01HWW1743VJ0Z63MTTAVY0PJQV</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_archive_ugent_be_01HWW1743VJ0Z63MTTAVY0PJQV</sourcerecordid><originalsourceid>FETCH-ghent_librecat_oai_archive_ugent_be_01HWW1743VJ0Z63MTTAVY0PJQV3</originalsourceid><addsrcrecordid>eNqtjFtqwzAUREVpoeljD3cDASlO6v6GkhIChQSMQ_ojru1r-RZZCpLc0t3nQT6ygHzNMIc5d2KkprkaT5Sc3V_1R_EU44-UKn-Xs5Fo5jFSjOwMpI4gkGWs2HL6B9_C3rNL0A8JE3sHGKHBhICD6cldxtYHaIj2YAmDO4n-OHVgyPme6_PhRTy0aCO9XvJZLD4XxcdybLqjRluuAtWYtEfWGOqOf0kP5oQq0lItt1uVT7NyJb_fsq-imJc7uV5tyuxWngOLkmA0</addsrcrecordid><sourcetype>Institutional Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Assessing the reliability of point mutation as data augmentation for deep learning with genomic data</title><source>DOAJ Directory of Open Access Journals</source><source>SpringerNature Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>Ghent University Academic Bibliography</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Lee, Hyunjung ; Özbulak, Utku ; Park, Homin ; Depuydt, Stephen ; De Neve, Wesley ; Vankerschaver, Joris</creator><creatorcontrib>Lee, Hyunjung ; Özbulak, Utku ; Park, Homin ; Depuydt, Stephen ; De Neve, Wesley ; Vankerschaver, Joris</creatorcontrib><description>Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. Results: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. Conclusion: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><language>eng</language><subject>Biology and Life Sciences ; Data augmentation ; Deep learning ; Mathematics and Statistics ; Point mutations ; Splicing ; Technology and Engineering ; Translation initiation</subject><creationdate>2024</creationdate><rights>Creative Commons Attribution 4.0 International Public License (CC-BY 4.0) info:eu-repo/semantics/openAccess</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,316,782,786,4026,27867</link.rule.ids></links><search><creatorcontrib>Lee, Hyunjung</creatorcontrib><creatorcontrib>Özbulak, Utku</creatorcontrib><creatorcontrib>Park, Homin</creatorcontrib><creatorcontrib>Depuydt, Stephen</creatorcontrib><creatorcontrib>De Neve, Wesley</creatorcontrib><creatorcontrib>Vankerschaver, Joris</creatorcontrib><title>Assessing the reliability of point mutation as data augmentation for deep learning with genomic data</title><description>Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. Results: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. Conclusion: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.</description><subject>Biology and Life Sciences</subject><subject>Data augmentation</subject><subject>Deep learning</subject><subject>Mathematics and Statistics</subject><subject>Point mutations</subject><subject>Splicing</subject><subject>Technology and Engineering</subject><subject>Translation initiation</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ADGLB</sourceid><recordid>eNqtjFtqwzAUREVpoeljD3cDASlO6v6GkhIChQSMQ_ojru1r-RZZCpLc0t3nQT6ygHzNMIc5d2KkprkaT5Sc3V_1R_EU44-UKn-Xs5Fo5jFSjOwMpI4gkGWs2HL6B9_C3rNL0A8JE3sHGKHBhICD6cldxtYHaIj2YAmDO4n-OHVgyPme6_PhRTy0aCO9XvJZLD4XxcdybLqjRluuAtWYtEfWGOqOf0kP5oQq0lItt1uVT7NyJb_fsq-imJc7uV5tyuxWngOLkmA0</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Lee, Hyunjung</creator><creator>Özbulak, Utku</creator><creator>Park, Homin</creator><creator>Depuydt, Stephen</creator><creator>De Neve, Wesley</creator><creator>Vankerschaver, Joris</creator><scope>ADGLB</scope></search><sort><creationdate>2024</creationdate><title>Assessing the reliability of point mutation as data augmentation for deep learning with genomic data</title><author>Lee, Hyunjung ; Özbulak, Utku ; Park, Homin ; Depuydt, Stephen ; De Neve, Wesley ; Vankerschaver, Joris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ghent_librecat_oai_archive_ugent_be_01HWW1743VJ0Z63MTTAVY0PJQV3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Biology and Life Sciences</topic><topic>Data augmentation</topic><topic>Deep learning</topic><topic>Mathematics and Statistics</topic><topic>Point mutations</topic><topic>Splicing</topic><topic>Technology and Engineering</topic><topic>Translation initiation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Hyunjung</creatorcontrib><creatorcontrib>Özbulak, Utku</creatorcontrib><creatorcontrib>Park, Homin</creatorcontrib><creatorcontrib>Depuydt, Stephen</creatorcontrib><creatorcontrib>De Neve, Wesley</creatorcontrib><creatorcontrib>Vankerschaver, Joris</creatorcontrib><collection>Ghent University Academic Bibliography</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Hyunjung</au><au>Özbulak, Utku</au><au>Park, Homin</au><au>Depuydt, Stephen</au><au>De Neve, Wesley</au><au>Vankerschaver, Joris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Assessing the reliability of point mutation as data augmentation for deep learning with genomic data</atitle><date>2024</date><risdate>2024</risdate><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. Results: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. Conclusion: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof
issn 1471-2105
1471-2105
language eng
recordid cdi_ghent_librecat_oai_archive_ugent_be_01HWW1743VJ0Z63MTTAVY0PJQV
source DOAJ Directory of Open Access Journals; SpringerNature Journals; PubMed Central Open Access; Springer Nature OA Free Journals; Ghent University Academic Bibliography; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Biology and Life Sciences
Data augmentation
Deep learning
Mathematics and Statistics
Point mutations
Splicing
Technology and Engineering
Translation initiation
title Assessing the reliability of point mutation as data augmentation for deep learning with genomic data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T01%3A04%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ghent&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Assessing%20the%20reliability%20of%20point%20mutation%20as%20data%20augmentation%20for%20deep%20learning%20with%20genomic%20data&rft.au=Lee,%20Hyunjung&rft.date=2024&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/&rft_dat=%3Cghent%3Eoai_archive_ugent_be_01HWW1743VJ0Z63MTTAVY0PJQV%3C/ghent%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true