Optimizing Alignment with Less: Leveraging Data Augmentation for Personalized Evaluation

Automatic evaluation by large language models (LLMs) is a prominent topic today; however, judgment and evaluation tasks are often subjective and influenced by various factors, making adaptation challenging. While many studies demonstrate the capabilities of state-of-the-art proprietary LLMs in compa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Seraj, Javad, Mohajeri, Mohammad Mahdi, Dousti, Mohammad Javad, Ahmadabadi, Majid Nili
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Seraj, Javad
Mohajeri, Mohammad Mahdi
Dousti, Mohammad Javad
Ahmadabadi, Majid Nili
description Automatic evaluation by large language models (LLMs) is a prominent topic today; however, judgment and evaluation tasks are often subjective and influenced by various factors, making adaptation challenging. While many studies demonstrate the capabilities of state-of-the-art proprietary LLMs in comparison to human evaluators, they often struggle to adapt to reference evaluators over time, a requirement for achieving personalized judgment. Additionally, numerous works have attempted to apply open LLMs as judges or evaluators, but these efforts frequently overlook the limitations of working with scarce data. Personalized judgment is inherently associated with limited data scenarios, which are common in many real-world problems. Our work aims to present a data augmentation technique to select a more effective sample from limited data in order to align an open LLM with human preference. Our work achieves approximately 7% improvements in Pearson correlation with a reference judge over the baseline,and 30% improvement over the base model (Llama3.1-8B-Instruct) in the mathematical reasoning evaluation task. demonstrating that augmenting selecting more effective preference data enables our approach to surpass baseline methods.
doi_str_mv 10.48550/arxiv.2412.07429
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_07429</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_07429</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_074293</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jMwNzGy5GSI8C8oyczNrMrMS1dwzMlMz8tNzStRKM8syVDwSS0utgKSZalFiekgeZfEkkQFx9J0kJLEksz8PIW0_CKFgNSi4vy8xJzMqtQUBdeyxJxSsBwPA2taYk5xKi-U5maQd3MNcfbQBbshvqAoMzexqDIe5JZ4sFuMCasAAGChQBg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Optimizing Alignment with Less: Leveraging Data Augmentation for Personalized Evaluation</title><source>arXiv.org</source><creator>Seraj, Javad ; Mohajeri, Mohammad Mahdi ; Dousti, Mohammad Javad ; Ahmadabadi, Majid Nili</creator><creatorcontrib>Seraj, Javad ; Mohajeri, Mohammad Mahdi ; Dousti, Mohammad Javad ; Ahmadabadi, Majid Nili</creatorcontrib><description>Automatic evaluation by large language models (LLMs) is a prominent topic today; however, judgment and evaluation tasks are often subjective and influenced by various factors, making adaptation challenging. While many studies demonstrate the capabilities of state-of-the-art proprietary LLMs in comparison to human evaluators, they often struggle to adapt to reference evaluators over time, a requirement for achieving personalized judgment. Additionally, numerous works have attempted to apply open LLMs as judges or evaluators, but these efforts frequently overlook the limitations of working with scarce data. Personalized judgment is inherently associated with limited data scenarios, which are common in many real-world problems. Our work aims to present a data augmentation technique to select a more effective sample from limited data in order to align an open LLM with human preference. Our work achieves approximately 7% improvements in Pearson correlation with a reference judge over the baseline,and 30% improvement over the base model (Llama3.1-8B-Instruct) in the mathematical reasoning evaluation task. demonstrating that augmenting selecting more effective preference data enables our approach to surpass baseline methods.</description><identifier>DOI: 10.48550/arxiv.2412.07429</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.07429$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.07429$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Seraj, Javad</creatorcontrib><creatorcontrib>Mohajeri, Mohammad Mahdi</creatorcontrib><creatorcontrib>Dousti, Mohammad Javad</creatorcontrib><creatorcontrib>Ahmadabadi, Majid Nili</creatorcontrib><title>Optimizing Alignment with Less: Leveraging Data Augmentation for Personalized Evaluation</title><description>Automatic evaluation by large language models (LLMs) is a prominent topic today; however, judgment and evaluation tasks are often subjective and influenced by various factors, making adaptation challenging. While many studies demonstrate the capabilities of state-of-the-art proprietary LLMs in comparison to human evaluators, they often struggle to adapt to reference evaluators over time, a requirement for achieving personalized judgment. Additionally, numerous works have attempted to apply open LLMs as judges or evaluators, but these efforts frequently overlook the limitations of working with scarce data. Personalized judgment is inherently associated with limited data scenarios, which are common in many real-world problems. Our work aims to present a data augmentation technique to select a more effective sample from limited data in order to align an open LLM with human preference. Our work achieves approximately 7% improvements in Pearson correlation with a reference judge over the baseline,and 30% improvement over the base model (Llama3.1-8B-Instruct) in the mathematical reasoning evaluation task. demonstrating that augmenting selecting more effective preference data enables our approach to surpass baseline methods.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jMwNzGy5GSI8C8oyczNrMrMS1dwzMlMz8tNzStRKM8syVDwSS0utgKSZalFiekgeZfEkkQFx9J0kJLEksz8PIW0_CKFgNSi4vy8xJzMqtQUBdeyxJxSsBwPA2taYk5xKi-U5maQd3MNcfbQBbshvqAoMzexqDIe5JZ4sFuMCasAAGChQBg</recordid><startdate>20241210</startdate><enddate>20241210</enddate><creator>Seraj, Javad</creator><creator>Mohajeri, Mohammad Mahdi</creator><creator>Dousti, Mohammad Javad</creator><creator>Ahmadabadi, Majid Nili</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241210</creationdate><title>Optimizing Alignment with Less: Leveraging Data Augmentation for Personalized Evaluation</title><author>Seraj, Javad ; Mohajeri, Mohammad Mahdi ; Dousti, Mohammad Javad ; Ahmadabadi, Majid Nili</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_074293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Seraj, Javad</creatorcontrib><creatorcontrib>Mohajeri, Mohammad Mahdi</creatorcontrib><creatorcontrib>Dousti, Mohammad Javad</creatorcontrib><creatorcontrib>Ahmadabadi, Majid Nili</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Seraj, Javad</au><au>Mohajeri, Mohammad Mahdi</au><au>Dousti, Mohammad Javad</au><au>Ahmadabadi, Majid Nili</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing Alignment with Less: Leveraging Data Augmentation for Personalized Evaluation</atitle><date>2024-12-10</date><risdate>2024</risdate><abstract>Automatic evaluation by large language models (LLMs) is a prominent topic today; however, judgment and evaluation tasks are often subjective and influenced by various factors, making adaptation challenging. While many studies demonstrate the capabilities of state-of-the-art proprietary LLMs in comparison to human evaluators, they often struggle to adapt to reference evaluators over time, a requirement for achieving personalized judgment. Additionally, numerous works have attempted to apply open LLMs as judges or evaluators, but these efforts frequently overlook the limitations of working with scarce data. Personalized judgment is inherently associated with limited data scenarios, which are common in many real-world problems. Our work aims to present a data augmentation technique to select a more effective sample from limited data in order to align an open LLM with human preference. Our work achieves approximately 7% improvements in Pearson correlation with a reference judge over the baseline,and 30% improvement over the base model (Llama3.1-8B-Instruct) in the mathematical reasoning evaluation task. demonstrating that augmenting selecting more effective preference data enables our approach to surpass baseline methods.</abstract><doi>10.48550/arxiv.2412.07429</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2412.07429
ispartof
issn
language eng
recordid cdi_arxiv_primary_2412_07429
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
title Optimizing Alignment with Less: Leveraging Data Augmentation for Personalized Evaluation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T18%3A27%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20Alignment%20with%20Less:%20Leveraging%20Data%20Augmentation%20for%20Personalized%20Evaluation&rft.au=Seraj,%20Javad&rft.date=2024-12-10&rft_id=info:doi/10.48550/arxiv.2412.07429&rft_dat=%3Carxiv_GOX%3E2412_07429%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true