ZINC15 for Drug Similarity Search

This dataset is a subset of the ZINC15 database, specifically filtered and processed for molecular similarity search applications using MegaMolBART embeddings. The subset focuses on drug-like molecules with specific physicochemical and purchasability properties. Keywords: ZINC15, Molecular Similarit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Al Abir, Fuad
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Al Abir, Fuad
description This dataset is a subset of the ZINC15 database, specifically filtered and processed for molecular similarity search applications using MegaMolBART embeddings. The subset focuses on drug-like molecules with specific physicochemical and purchasability properties. Keywords: ZINC15, Molecular Similarity Search, MegaMolBART, Drug Discovery, Cheminformatics. Background: The ZINC15 database is a comprehensive collection of commercially available compounds for virtual screening. This subset was created to facilitate the development of machine learning models for drug discovery, particularly those based on molecular embeddings. Methodology: The ZINC15 database was queried using the following criteria: Molecular weight
doi_str_mv 10.5281/zenodo.11090228
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_11090228</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_11090228</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5281_zenodo_110902283</originalsourceid><addsrcrecordid>eNpjYBA3NNAzNbIw1K9KzctPydczNDSwNDAysuBkUIzy9HM2NFVIyy9ScCkqTVcIzszNzEksyiypVAhOTSxKzuBhYE1LzClO5YXS3Az6bq4hzh66KYklicmZJanxBUWZuYlFlfGGBvEgS-IhlsTDLDEmXQcAu400vw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>ZINC15 for Drug Similarity Search</title><source>DataCite</source><creator>Al Abir, Fuad</creator><creatorcontrib>Al Abir, Fuad</creatorcontrib><description>This dataset is a subset of the ZINC15 database, specifically filtered and processed for molecular similarity search applications using MegaMolBART embeddings. The subset focuses on drug-like molecules with specific physicochemical and purchasability properties. Keywords: ZINC15, Molecular Similarity Search, MegaMolBART, Drug Discovery, Cheminformatics. Background: The ZINC15 database is a comprehensive collection of commercially available compounds for virtual screening. This subset was created to facilitate the development of machine learning models for drug discovery, particularly those based on molecular embeddings. Methodology: The ZINC15 database was queried using the following criteria: Molecular weight &lt;= 500 Daltons LogP &lt;= 5 Reactivity level = "reactive" Purchasability = "annotated"  The resulting dataset was then processed to extract MegaMolBART embeddings for each molecule. Data Description:  The dataset is organized into three folders: /data/project/ubrite/drg-depot/zinc15-similarity-search/raw-data/ (66 GB): This folder contains the raw data files obtained from the ZINC15 database after applying the filtering criteria. /data/project/ubrite/drg-depot/zinc15-similarity-search/processed-data/ (13 GB): This folder contains the processed data, including the extracted MegaMolBART embeddings for each molecule.  /data/project/ubrite/drg-depot/zinc15-similarity-search/query/: This folder contains sample SMILES strings and their corresponding embeddings for performing similarity searches. Technical Specifications:  Format: SMILES strings, numerical data (embeddings) Size: 79 GB (total) License: This dataset is derived from the ZINC15 database and processed using MegaMolBART. It is subject to the licenses of both the ZINC15 database and the MegaMolBART model. ZINC15 Database: ZINC15 data is made available under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. For more information, please visit the ZINC15 website. MegaMolBART: The MegaMolBART model and its associated data are copyrighted by AstraZeneca and NVIDIA. The usage of MegaMolBART is subject to the terms and conditions specified by the copyright holders. By using this dataset, you agree to comply with the licenses and conditions imposed by the ZINC15 database and MegaMolBART. Access and Usage:  The dataset is available for download through Zenodo. Users are encouraged to acknowledge this dataset and the corresponding Zenodo entry in any publications or research projects that utilize the data.  Contact: Fuad Al Abir, fuad021@uab.edu</description><identifier>DOI: 10.5281/zenodo.11090228</identifier><language>eng</language><publisher>Zenodo</publisher><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-9091-3078</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1888</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.11090228$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Al Abir, Fuad</creatorcontrib><title>ZINC15 for Drug Similarity Search</title><description>This dataset is a subset of the ZINC15 database, specifically filtered and processed for molecular similarity search applications using MegaMolBART embeddings. The subset focuses on drug-like molecules with specific physicochemical and purchasability properties. Keywords: ZINC15, Molecular Similarity Search, MegaMolBART, Drug Discovery, Cheminformatics. Background: The ZINC15 database is a comprehensive collection of commercially available compounds for virtual screening. This subset was created to facilitate the development of machine learning models for drug discovery, particularly those based on molecular embeddings. Methodology: The ZINC15 database was queried using the following criteria: Molecular weight &lt;= 500 Daltons LogP &lt;= 5 Reactivity level = "reactive" Purchasability = "annotated"  The resulting dataset was then processed to extract MegaMolBART embeddings for each molecule. Data Description:  The dataset is organized into three folders: /data/project/ubrite/drg-depot/zinc15-similarity-search/raw-data/ (66 GB): This folder contains the raw data files obtained from the ZINC15 database after applying the filtering criteria. /data/project/ubrite/drg-depot/zinc15-similarity-search/processed-data/ (13 GB): This folder contains the processed data, including the extracted MegaMolBART embeddings for each molecule.  /data/project/ubrite/drg-depot/zinc15-similarity-search/query/: This folder contains sample SMILES strings and their corresponding embeddings for performing similarity searches. Technical Specifications:  Format: SMILES strings, numerical data (embeddings) Size: 79 GB (total) License: This dataset is derived from the ZINC15 database and processed using MegaMolBART. It is subject to the licenses of both the ZINC15 database and the MegaMolBART model. ZINC15 Database: ZINC15 data is made available under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. For more information, please visit the ZINC15 website. MegaMolBART: The MegaMolBART model and its associated data are copyrighted by AstraZeneca and NVIDIA. The usage of MegaMolBART is subject to the terms and conditions specified by the copyright holders. By using this dataset, you agree to comply with the licenses and conditions imposed by the ZINC15 database and MegaMolBART. Access and Usage:  The dataset is available for download through Zenodo. Users are encouraged to acknowledge this dataset and the corresponding Zenodo entry in any publications or research projects that utilize the data.  Contact: Fuad Al Abir, fuad021@uab.edu</description><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNpjYBA3NNAzNbIw1K9KzctPydczNDSwNDAysuBkUIzy9HM2NFVIyy9ScCkqTVcIzszNzEksyiypVAhOTSxKzuBhYE1LzClO5YXS3Az6bq4hzh66KYklicmZJanxBUWZuYlFlfGGBvEgS-IhlsTDLDEmXQcAu400vw</recordid><startdate>20240429</startdate><enddate>20240429</enddate><creator>Al Abir, Fuad</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0002-9091-3078</orcidid></search><sort><creationdate>20240429</creationdate><title>ZINC15 for Drug Similarity Search</title><author>Al Abir, Fuad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5281_zenodo_110902283</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Al Abir, Fuad</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Al Abir, Fuad</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>ZINC15 for Drug Similarity Search</title><date>2024-04-29</date><risdate>2024</risdate><abstract>This dataset is a subset of the ZINC15 database, specifically filtered and processed for molecular similarity search applications using MegaMolBART embeddings. The subset focuses on drug-like molecules with specific physicochemical and purchasability properties. Keywords: ZINC15, Molecular Similarity Search, MegaMolBART, Drug Discovery, Cheminformatics. Background: The ZINC15 database is a comprehensive collection of commercially available compounds for virtual screening. This subset was created to facilitate the development of machine learning models for drug discovery, particularly those based on molecular embeddings. Methodology: The ZINC15 database was queried using the following criteria: Molecular weight &lt;= 500 Daltons LogP &lt;= 5 Reactivity level = "reactive" Purchasability = "annotated"  The resulting dataset was then processed to extract MegaMolBART embeddings for each molecule. Data Description:  The dataset is organized into three folders: /data/project/ubrite/drg-depot/zinc15-similarity-search/raw-data/ (66 GB): This folder contains the raw data files obtained from the ZINC15 database after applying the filtering criteria. /data/project/ubrite/drg-depot/zinc15-similarity-search/processed-data/ (13 GB): This folder contains the processed data, including the extracted MegaMolBART embeddings for each molecule.  /data/project/ubrite/drg-depot/zinc15-similarity-search/query/: This folder contains sample SMILES strings and their corresponding embeddings for performing similarity searches. Technical Specifications:  Format: SMILES strings, numerical data (embeddings) Size: 79 GB (total) License: This dataset is derived from the ZINC15 database and processed using MegaMolBART. It is subject to the licenses of both the ZINC15 database and the MegaMolBART model. ZINC15 Database: ZINC15 data is made available under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. For more information, please visit the ZINC15 website. MegaMolBART: The MegaMolBART model and its associated data are copyrighted by AstraZeneca and NVIDIA. The usage of MegaMolBART is subject to the terms and conditions specified by the copyright holders. By using this dataset, you agree to comply with the licenses and conditions imposed by the ZINC15 database and MegaMolBART. Access and Usage:  The dataset is available for download through Zenodo. Users are encouraged to acknowledge this dataset and the corresponding Zenodo entry in any publications or research projects that utilize the data.  Contact: Fuad Al Abir, fuad021@uab.edu</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.11090228</doi><orcidid>https://orcid.org/0000-0002-9091-3078</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.5281/zenodo.11090228
ispartof
issn
language eng
recordid cdi_datacite_primary_10_5281_zenodo_11090228
source DataCite
title ZINC15 for Drug Similarity Search
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T18%3A45%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Al%20Abir,%20Fuad&rft.date=2024-04-29&rft_id=info:doi/10.5281/zenodo.11090228&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_11090228%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true