Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins

[Display omitted] •CCRs are based on human conservation and complement inter-species conservation.•CCRs assist in variant interpretation, here we mapped them onto proteins sites.•The most constrained coding sites correspond to protein sites in interactions.•These interactions include those with DNA/...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of molecular biology 2023-01, Vol.435 (2), p.167892-167892, Article 167892
Hauptverfasser: Hasenahuer, Marcia A., Sanchis-Juan, Alba, Laskowski, Roman A., Baker, James A., Stephenson, James D., Orengo, Christine A., Raymond, F. Lucy, Thornton, Janet M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 167892
container_issue 2
container_start_page 167892
container_title Journal of molecular biology
container_volume 435
creator Hasenahuer, Marcia A.
Sanchis-Juan, Alba
Laskowski, Roman A.
Baker, James A.
Stephenson, James D.
Orengo, Christine A.
Raymond, F. Lucy
Thornton, Janet M.
description [Display omitted] •CCRs are based on human conservation and complement inter-species conservation.•CCRs assist in variant interpretation, here we mapped them onto proteins sites.•The most constrained coding sites correspond to protein sites in interactions.•These interactions include those with DNA/RNA, proteins and in catalytic active sites.•Those driving LLPS, in LIPs and in disorder–order transitions are also highly constrained. Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
doi_str_mv 10.1016/j.jmb.2022.167892
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9875310</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022283622005125</els_id><sourcerecordid>2739069267</sourcerecordid><originalsourceid>FETCH-LOGICAL-c403t-b34cfb0eb06dc7779725f1949393002b13840dcf9153afddb50ceeb93888f2bf3</originalsourceid><addsrcrecordid>eNp9kUFv1DAQhS0EokvhB3BBOXLJdmwnsS0kJLSCtlJRK1QuXKzYnux6tbGDna3Ev8fbLRW99GRr5ntvPH6EvKewpEC7s-1yO5olA8aWtBNSsRdkQUGqWnZcviQLKJ2aSd6dkDc5bwGg5Y18TU5411BoRLMgv7730-TDupo3WK1iyHPqfUBX7u5Q_oFrX6qVD_fExX7sQ3WOIY5YzbG63aBPhU0J8xTDveQmxRl9yG_Jq6HfZXz3cJ6Sn9--3q4u6qvr88vVl6vaNsDn2vDGDgbQQOesEEIJ1g5UNYorXhYwlMsGnB0UbXk_OGdasIhGcSnlwMzAT8nno--0NyM6i6HssNNT8mOf_ujYe_20E_xGr-OdVlK0nEIx-PhgkOLvPeZZjz5b3O36gHGfNRNcQadYJwpKj6hNMeeEw-MYCvqQid7qkok-ZKKPmRTNh__f96j4F0IBPh0BLL905zHpbD0Gi84ntLN20T9j_xe7jp37</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2739069267</pqid></control><display><type>article</type><title>Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals Complete</source><creator>Hasenahuer, Marcia A. ; Sanchis-Juan, Alba ; Laskowski, Roman A. ; Baker, James A. ; Stephenson, James D. ; Orengo, Christine A. ; Raymond, F. Lucy ; Thornton, Janet M.</creator><creatorcontrib>Hasenahuer, Marcia A. ; Sanchis-Juan, Alba ; Laskowski, Roman A. ; Baker, James A. ; Stephenson, James D. ; Orengo, Christine A. ; Raymond, F. Lucy ; Thornton, Janet M.</creatorcontrib><description>[Display omitted] •CCRs are based on human conservation and complement inter-species conservation.•CCRs assist in variant interpretation, here we mapped them onto proteins sites.•The most constrained coding sites correspond to protein sites in interactions.•These interactions include those with DNA/RNA, proteins and in catalytic active sites.•Those driving LLPS, in LIPs and in disorder–order transitions are also highly constrained. Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.</description><identifier>ISSN: 0022-2836</identifier><identifier>EISSN: 1089-8638</identifier><identifier>DOI: 10.1016/j.jmb.2022.167892</identifier><identifier>PMID: 36410474</identifier><language>eng</language><publisher>Netherlands: Elsevier Ltd</publisher><subject>Base Sequence ; Chromosome Mapping ; constrained coding regions ; disease-related variants ; Genome, Human - genetics ; Genomics ; Humans ; liquid-liquid phase separation ; Open Reading Frames ; population variability ; protein functional features ; Proteins - genetics</subject><ispartof>Journal of molecular biology, 2023-01, Vol.435 (2), p.167892-167892, Article 167892</ispartof><rights>2022 The Authors</rights><rights>Copyright © 2022 The Authors. Published by Elsevier Ltd.. All rights reserved.</rights><rights>2022 The Authors 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c403t-b34cfb0eb06dc7779725f1949393002b13840dcf9153afddb50ceeb93888f2bf3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jmb.2022.167892$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>230,314,780,784,885,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36410474$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hasenahuer, Marcia A.</creatorcontrib><creatorcontrib>Sanchis-Juan, Alba</creatorcontrib><creatorcontrib>Laskowski, Roman A.</creatorcontrib><creatorcontrib>Baker, James A.</creatorcontrib><creatorcontrib>Stephenson, James D.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><creatorcontrib>Raymond, F. Lucy</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><title>Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins</title><title>Journal of molecular biology</title><addtitle>J Mol Biol</addtitle><description>[Display omitted] •CCRs are based on human conservation and complement inter-species conservation.•CCRs assist in variant interpretation, here we mapped them onto proteins sites.•The most constrained coding sites correspond to protein sites in interactions.•These interactions include those with DNA/RNA, proteins and in catalytic active sites.•Those driving LLPS, in LIPs and in disorder–order transitions are also highly constrained. Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.</description><subject>Base Sequence</subject><subject>Chromosome Mapping</subject><subject>constrained coding regions</subject><subject>disease-related variants</subject><subject>Genome, Human - genetics</subject><subject>Genomics</subject><subject>Humans</subject><subject>liquid-liquid phase separation</subject><subject>Open Reading Frames</subject><subject>population variability</subject><subject>protein functional features</subject><subject>Proteins - genetics</subject><issn>0022-2836</issn><issn>1089-8638</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kUFv1DAQhS0EokvhB3BBOXLJdmwnsS0kJLSCtlJRK1QuXKzYnux6tbGDna3Ev8fbLRW99GRr5ntvPH6EvKewpEC7s-1yO5olA8aWtBNSsRdkQUGqWnZcviQLKJ2aSd6dkDc5bwGg5Y18TU5411BoRLMgv7730-TDupo3WK1iyHPqfUBX7u5Q_oFrX6qVD_fExX7sQ3WOIY5YzbG63aBPhU0J8xTDveQmxRl9yG_Jq6HfZXz3cJ6Sn9--3q4u6qvr88vVl6vaNsDn2vDGDgbQQOesEEIJ1g5UNYorXhYwlMsGnB0UbXk_OGdasIhGcSnlwMzAT8nno--0NyM6i6HssNNT8mOf_ujYe_20E_xGr-OdVlK0nEIx-PhgkOLvPeZZjz5b3O36gHGfNRNcQadYJwpKj6hNMeeEw-MYCvqQid7qkok-ZKKPmRTNh__f96j4F0IBPh0BLL905zHpbD0Gi84ntLN20T9j_xe7jp37</recordid><startdate>20230130</startdate><enddate>20230130</enddate><creator>Hasenahuer, Marcia A.</creator><creator>Sanchis-Juan, Alba</creator><creator>Laskowski, Roman A.</creator><creator>Baker, James A.</creator><creator>Stephenson, James D.</creator><creator>Orengo, Christine A.</creator><creator>Raymond, F. Lucy</creator><creator>Thornton, Janet M.</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20230130</creationdate><title>Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins</title><author>Hasenahuer, Marcia A. ; Sanchis-Juan, Alba ; Laskowski, Roman A. ; Baker, James A. ; Stephenson, James D. ; Orengo, Christine A. ; Raymond, F. Lucy ; Thornton, Janet M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c403t-b34cfb0eb06dc7779725f1949393002b13840dcf9153afddb50ceeb93888f2bf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Base Sequence</topic><topic>Chromosome Mapping</topic><topic>constrained coding regions</topic><topic>disease-related variants</topic><topic>Genome, Human - genetics</topic><topic>Genomics</topic><topic>Humans</topic><topic>liquid-liquid phase separation</topic><topic>Open Reading Frames</topic><topic>population variability</topic><topic>protein functional features</topic><topic>Proteins - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hasenahuer, Marcia A.</creatorcontrib><creatorcontrib>Sanchis-Juan, Alba</creatorcontrib><creatorcontrib>Laskowski, Roman A.</creatorcontrib><creatorcontrib>Baker, James A.</creatorcontrib><creatorcontrib>Stephenson, James D.</creatorcontrib><creatorcontrib>Orengo, Christine A.</creatorcontrib><creatorcontrib>Raymond, F. Lucy</creatorcontrib><creatorcontrib>Thornton, Janet M.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of molecular biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hasenahuer, Marcia A.</au><au>Sanchis-Juan, Alba</au><au>Laskowski, Roman A.</au><au>Baker, James A.</au><au>Stephenson, James D.</au><au>Orengo, Christine A.</au><au>Raymond, F. Lucy</au><au>Thornton, Janet M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins</atitle><jtitle>Journal of molecular biology</jtitle><addtitle>J Mol Biol</addtitle><date>2023-01-30</date><risdate>2023</risdate><volume>435</volume><issue>2</issue><spage>167892</spage><epage>167892</epage><pages>167892-167892</pages><artnum>167892</artnum><issn>0022-2836</issn><eissn>1089-8638</eissn><abstract>[Display omitted] •CCRs are based on human conservation and complement inter-species conservation.•CCRs assist in variant interpretation, here we mapped them onto proteins sites.•The most constrained coding sites correspond to protein sites in interactions.•These interactions include those with DNA/RNA, proteins and in catalytic active sites.•Those driving LLPS, in LIPs and in disorder–order transitions are also highly constrained. Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.</abstract><cop>Netherlands</cop><pub>Elsevier Ltd</pub><pmid>36410474</pmid><doi>10.1016/j.jmb.2022.167892</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0022-2836
ispartof Journal of molecular biology, 2023-01, Vol.435 (2), p.167892-167892, Article 167892
issn 0022-2836
1089-8638
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9875310
source MEDLINE; Elsevier ScienceDirect Journals Complete
subjects Base Sequence
Chromosome Mapping
constrained coding regions
disease-related variants
Genome, Human - genetics
Genomics
Humans
liquid-liquid phase separation
Open Reading Frames
population variability
protein functional features
Proteins - genetics
title Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T06%3A49%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mapping%20the%20Constrained%20Coding%20Regions%20in%20the%20Human%20Genome%20to%20Their%20Corresponding%20Proteins&rft.jtitle=Journal%20of%20molecular%20biology&rft.au=Hasenahuer,%20Marcia%20A.&rft.date=2023-01-30&rft.volume=435&rft.issue=2&rft.spage=167892&rft.epage=167892&rft.pages=167892-167892&rft.artnum=167892&rft.issn=0022-2836&rft.eissn=1089-8638&rft_id=info:doi/10.1016/j.jmb.2022.167892&rft_dat=%3Cproquest_pubme%3E2739069267%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2739069267&rft_id=info:pmid/36410474&rft_els_id=S0022283622005125&rfr_iscdi=true