Enhancing classroom teaching with LLMs and RAG

Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mullins, Elizabeth A, Portillo, Adrian, Ruiz-Rohena, Kristalys, Piplai, Aritran
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Mullins, Elizabeth A Portillo, Adrian Ruiz-Rohena, Kristalys Piplai, Aritran
description	Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course materials serving as a data source, might help students in K-12 education. The initial research utilizes Reddit as a data source for up-to-date cybersecurity information. Chunk size is evaluated to determine the optimal amount of context needed to generate accurate answers. After running the experiment for different chunk sizes, answer correctness was evaluated using RAGAs with average answer correctness not exceeding 50 percent for any chunk size. This suggests that Reddit is not a good source to mine for data for questions about cybersecurity threats. The methodology was successful in evaluating the data source, which has implications for its use to evaluate educational resources for effectiveness.
doi_str_mv	10.48550/arxiv.2411.04341
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_04341</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_04341</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_043413</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwMTYx5GTQc83LSMxLzsxLV0jOSSwuLsrPz1UoSU1MzgAJlWeWZCj4-PgWKyTmpSgEObrzMLCmJeYUp_JCaW4GeTfXEGcPXbDJ8QVFmbmJRZXxIBviwTYYE1YBAJL4LyA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Enhancing classroom teaching with LLMs and RAG</title><source>arXiv.org</source><creator>Mullins, Elizabeth A ; Portillo, Adrian ; Ruiz-Rohena, Kristalys ; Piplai, Aritran</creator><creatorcontrib>Mullins, Elizabeth A ; Portillo, Adrian ; Ruiz-Rohena, Kristalys ; Piplai, Aritran</creatorcontrib><description>Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course materials serving as a data source, might help students in K-12 education. The initial research utilizes Reddit as a data source for up-to-date cybersecurity information. Chunk size is evaluated to determine the optimal amount of context needed to generate accurate answers. After running the experiment for different chunk sizes, answer correctness was evaluated using RAGAs with average answer correctness not exceeding 50 percent for any chunk size. This suggests that Reddit is not a good source to mine for data for questions about cybersecurity threats. The methodology was successful in evaluating the data source, which has implications for its use to evaluate educational resources for effectiveness.</description><identifier>DOI: 10.48550/arxiv.2411.04341</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.04341$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.04341$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mullins, Elizabeth A</creatorcontrib><creatorcontrib>Portillo, Adrian</creatorcontrib><creatorcontrib>Ruiz-Rohena, Kristalys</creatorcontrib><creatorcontrib>Piplai, Aritran</creatorcontrib><title>Enhancing classroom teaching with LLMs and RAG</title><description>Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course materials serving as a data source, might help students in K-12 education. The initial research utilizes Reddit as a data source for up-to-date cybersecurity information. Chunk size is evaluated to determine the optimal amount of context needed to generate accurate answers. After running the experiment for different chunk sizes, answer correctness was evaluated using RAGAs with average answer correctness not exceeding 50 percent for any chunk size. This suggests that Reddit is not a good source to mine for data for questions about cybersecurity threats. The methodology was successful in evaluating the data source, which has implications for its use to evaluate educational resources for effectiveness.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwMTYx5GTQc83LSMxLzsxLV0jOSSwuLsrPz1UoSU1MzgAJlWeWZCj4-PgWKyTmpSgEObrzMLCmJeYUp_JCaW4GeTfXEGcPXbDJ8QVFmbmJRZXxIBviwTYYE1YBAJL4LyA</recordid><startdate>20241106</startdate><enddate>20241106</enddate><creator>Mullins, Elizabeth A</creator><creator>Portillo, Adrian</creator><creator>Ruiz-Rohena, Kristalys</creator><creator>Piplai, Aritran</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241106</creationdate><title>Enhancing classroom teaching with LLMs and RAG</title><author>Mullins, Elizabeth A ; Portillo, Adrian ; Ruiz-Rohena, Kristalys ; Piplai, Aritran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_043413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Mullins, Elizabeth A</creatorcontrib><creatorcontrib>Portillo, Adrian</creatorcontrib><creatorcontrib>Ruiz-Rohena, Kristalys</creatorcontrib><creatorcontrib>Piplai, Aritran</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mullins, Elizabeth A</au><au>Portillo, Adrian</au><au>Ruiz-Rohena, Kristalys</au><au>Piplai, Aritran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing classroom teaching with LLMs and RAG</atitle><date>2024-11-06</date><risdate>2024</risdate><abstract>Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course materials serving as a data source, might help students in K-12 education. The initial research utilizes Reddit as a data source for up-to-date cybersecurity information. Chunk size is evaluated to determine the optimal amount of context needed to generate accurate answers. After running the experiment for different chunk sizes, answer correctness was evaluated using RAGAs with average answer correctness not exceeding 50 percent for any chunk size. This suggests that Reddit is not a good source to mine for data for questions about cybersecurity threats. The methodology was successful in evaluating the data source, which has implications for its use to evaluate educational resources for effectiveness.</abstract><doi>10.48550/arxiv.2411.04341</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.04341
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_04341
source	arXiv.org
subjects	Computer Science - Learning
title	Enhancing classroom teaching with LLMs and RAG
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T03%3A18%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20classroom%20teaching%20with%20LLMs%20and%20RAG&rft.au=Mullins,%20Elizabeth%20A&rft.date=2024-11-06&rft_id=info:doi/10.48550/arxiv.2411.04341&rft_dat=%3Carxiv_GOX%3E2411_04341%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true