Augmenting Large Language Models via Vector Embeddings to Improve Domain-specific Responsiveness

Large language models (LLMs) have emerged as a popular resource for generating information relevant to a user query. Such models are created through a resource-intensive training process utilizing an extensive, static corpus of textual data. This static nature results in limitations for adoption in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of visualized experiments 2024-12 (214)
Hauptverfasser:	Wolfrath, Nathan M, Verhagen, Nathaniel B, Crotty, Bradley H, Somai, Melek, Kothari, Anai N
Format:	Artikel
Sprache:	eng
Schlagworte:	Humans Language Natural Language Processing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Large language models (LLMs) have emerged as a popular resource for generating information relevant to a user query. Such models are created through a resource-intensive training process utilizing an extensive, static corpus of textual data. This static nature results in limitations for adoption in domains with rapidly changing knowledge, proprietary information, and sensitive data. In this work, methods are outlined for augmenting general-purpose LLMs, known as foundation models, with domain-specific information using an embeddings-based approach for incorporating up-to-date, peer-reviewed scientific manuscripts. This is achieved through open-source tools such as Llama-Index and publicly available models such as Llama-2 to maximize transparency, user privacy and control, and replicability. While scientific manuscripts are used as an example use case, this approach can be extended to any text data source. Additionally, methods for evaluating model performance following this enhancement are discussed. These methods enable the rapid development of LLM systems for highly specialized domains regardless of the comprehensiveness of information in the training corpus.
ISSN:	1940-087X 1940-087X
DOI:	10.3791/66796