CCU-Llama: A Knowledge Extraction LLM for Carbon Capture and Utilization by Mining Scientific Literature Data

As the rate of carbon dioxide emissions directly contributes to global warming, there have been various attempts in the research community to develop novel pathways that mitigate and control this impact. These outcomes of their research are primarily documented in articles, and finding effective sol...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Industrial & engineering chemistry research 2024-10, Vol.63 (41), p.17585-17598
Hauptverfasser: Jami, Harshitha Chandra, Singh, Pushp Raj, Kumar, Avan, Bakshi, Bhavik R., Ramteke, Manojkumar, Kodamana, Hariprasad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As the rate of carbon dioxide emissions directly contributes to global warming, there have been various attempts in the research community to develop novel pathways that mitigate and control this impact. These outcomes of their research are primarily documented in articles, and finding effective solutions necessitates the ability to scan the scientific literature and extract relevant information. In this study, we propose a large language model (LLM), CCU-Llama, for extracting knowledge about carbon capture and utilization (CCU). To create CCU-Llama, employ Llama-2 LLM and apply pretraining and transfer-learning techniques using CCU research articles sourced from the Elsevier database via API. Thorough preprocessing eliminates irrelevant content from the extracted article text. This proposed LLM model performs two major tasks: (i) the CCU-technology potential knowledge extraction task, which provides information about technology and its impact using sentence pair extraction and sentence pairing classification with accuracy of 0.835 and 0.779, respectively, and (ii) creating an interface like a chatbot with a visualization task that can respond to any query related to CCU. CCU-Llama outperforms ChatGPT in F1-score and IFnumeric with scores of 0.7366 and 0.301, respectively, compared to ChatGPT’s scores of 0.6822 and 0.003. This work would help to rapidly analyze the number of carbon source capture and utilization applications. Ultimately, extracted knowledge about CCU technologies can be used to guide the transition toward the goal of net-zero greenhouse gas emissions.
ISSN:0888-5885
1520-5045
DOI:10.1021/acs.iecr.4c01656