Evaluating Morphological Compositional Generalization in Large Language Models
Large language models (LLMs) have demonstrated significant progress in various natural language generation and understanding tasks. However, their linguistic generalization capabilities remain questionable, raising doubts about whether these models learn language similarly to humans. While humans ex...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Ismayilzada, Mete Circi, Defne Sälevä, Jonne Sirin, Hale Köksal, Abdullatif Dhingra, Bhuwan Bosselut, Antoine van der Plas, Lonneke Ataman, Duygu |
description | Large language models (LLMs) have demonstrated significant progress in
various natural language generation and understanding tasks. However, their
linguistic generalization capabilities remain questionable, raising doubts
about whether these models learn language similarly to humans. While humans
exhibit compositional generalization and linguistic creativity in language use,
the extent to which LLMs replicate these abilities, particularly in morphology,
is under-explored. In this work, we systematically investigate the
morphological generalization abilities of LLMs through the lens of
compositionality. We define morphemes as compositional primitives and design a
novel suite of generative and discriminative tasks to assess morphological
productivity and systematicity. Focusing on agglutinative languages such as
Turkish and Finnish, we evaluate several state-of-the-art instruction-finetuned
multilingual models, including GPT-4 and Gemini. Our analysis shows that LLMs
struggle with morphological compositional generalization particularly when
applied to novel word roots, with performance declining sharply as
morphological complexity increases. While models can identify individual
morphological combinations better than chance, their performance lacks
systematicity, leading to significant accuracy gaps compared to humans. |
doi_str_mv | 10.48550/arxiv.2410.12656 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_12656</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_12656</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_126563</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBqZmZpxMvi5liXmlCaWZOalK_jmFxVk5Ofkp2cmJ-YoOOfnFuQXZ5Zk5ucBee6pealFiTmZVYkgAYXMPAWfxKL0VCCZl16aCGT45qek5hTzMLCmJeYUp_JCaW4GeTfXEGcPXbDN8QVFmbmJRZXxIBfEg11gTFgFANZqPHo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Evaluating Morphological Compositional Generalization in Large Language Models</title><source>arXiv.org</source><creator>Ismayilzada, Mete ; Circi, Defne ; Sälevä, Jonne ; Sirin, Hale ; Köksal, Abdullatif ; Dhingra, Bhuwan ; Bosselut, Antoine ; van der Plas, Lonneke ; Ataman, Duygu</creator><creatorcontrib>Ismayilzada, Mete ; Circi, Defne ; Sälevä, Jonne ; Sirin, Hale ; Köksal, Abdullatif ; Dhingra, Bhuwan ; Bosselut, Antoine ; van der Plas, Lonneke ; Ataman, Duygu</creatorcontrib><description>Large language models (LLMs) have demonstrated significant progress in
various natural language generation and understanding tasks. However, their
linguistic generalization capabilities remain questionable, raising doubts
about whether these models learn language similarly to humans. While humans
exhibit compositional generalization and linguistic creativity in language use,
the extent to which LLMs replicate these abilities, particularly in morphology,
is under-explored. In this work, we systematically investigate the
morphological generalization abilities of LLMs through the lens of
compositionality. We define morphemes as compositional primitives and design a
novel suite of generative and discriminative tasks to assess morphological
productivity and systematicity. Focusing on agglutinative languages such as
Turkish and Finnish, we evaluate several state-of-the-art instruction-finetuned
multilingual models, including GPT-4 and Gemini. Our analysis shows that LLMs
struggle with morphological compositional generalization particularly when
applied to novel word roots, with performance declining sharply as
morphological complexity increases. While models can identify individual
morphological combinations better than chance, their performance lacks
systematicity, leading to significant accuracy gaps compared to humans.</description><identifier>DOI: 10.48550/arxiv.2410.12656</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.12656$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.12656$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ismayilzada, Mete</creatorcontrib><creatorcontrib>Circi, Defne</creatorcontrib><creatorcontrib>Sälevä, Jonne</creatorcontrib><creatorcontrib>Sirin, Hale</creatorcontrib><creatorcontrib>Köksal, Abdullatif</creatorcontrib><creatorcontrib>Dhingra, Bhuwan</creatorcontrib><creatorcontrib>Bosselut, Antoine</creatorcontrib><creatorcontrib>van der Plas, Lonneke</creatorcontrib><creatorcontrib>Ataman, Duygu</creatorcontrib><title>Evaluating Morphological Compositional Generalization in Large Language Models</title><description>Large language models (LLMs) have demonstrated significant progress in
various natural language generation and understanding tasks. However, their
linguistic generalization capabilities remain questionable, raising doubts
about whether these models learn language similarly to humans. While humans
exhibit compositional generalization and linguistic creativity in language use,
the extent to which LLMs replicate these abilities, particularly in morphology,
is under-explored. In this work, we systematically investigate the
morphological generalization abilities of LLMs through the lens of
compositionality. We define morphemes as compositional primitives and design a
novel suite of generative and discriminative tasks to assess morphological
productivity and systematicity. Focusing on agglutinative languages such as
Turkish and Finnish, we evaluate several state-of-the-art instruction-finetuned
multilingual models, including GPT-4 and Gemini. Our analysis shows that LLMs
struggle with morphological compositional generalization particularly when
applied to novel word roots, with performance declining sharply as
morphological complexity increases. While models can identify individual
morphological combinations better than chance, their performance lacks
systematicity, leading to significant accuracy gaps compared to humans.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBqZmZpxMvi5liXmlCaWZOalK_jmFxVk5Ofkp2cmJ-YoOOfnFuQXZ5Zk5ucBee6pealFiTmZVYkgAYXMPAWfxKL0VCCZl16aCGT45qek5hTzMLCmJeYUp_JCaW4GeTfXEGcPXbDN8QVFmbmJRZXxIBfEg11gTFgFANZqPHo</recordid><startdate>20241016</startdate><enddate>20241016</enddate><creator>Ismayilzada, Mete</creator><creator>Circi, Defne</creator><creator>Sälevä, Jonne</creator><creator>Sirin, Hale</creator><creator>Köksal, Abdullatif</creator><creator>Dhingra, Bhuwan</creator><creator>Bosselut, Antoine</creator><creator>van der Plas, Lonneke</creator><creator>Ataman, Duygu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241016</creationdate><title>Evaluating Morphological Compositional Generalization in Large Language Models</title><author>Ismayilzada, Mete ; Circi, Defne ; Sälevä, Jonne ; Sirin, Hale ; Köksal, Abdullatif ; Dhingra, Bhuwan ; Bosselut, Antoine ; van der Plas, Lonneke ; Ataman, Duygu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_126563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ismayilzada, Mete</creatorcontrib><creatorcontrib>Circi, Defne</creatorcontrib><creatorcontrib>Sälevä, Jonne</creatorcontrib><creatorcontrib>Sirin, Hale</creatorcontrib><creatorcontrib>Köksal, Abdullatif</creatorcontrib><creatorcontrib>Dhingra, Bhuwan</creatorcontrib><creatorcontrib>Bosselut, Antoine</creatorcontrib><creatorcontrib>van der Plas, Lonneke</creatorcontrib><creatorcontrib>Ataman, Duygu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ismayilzada, Mete</au><au>Circi, Defne</au><au>Sälevä, Jonne</au><au>Sirin, Hale</au><au>Köksal, Abdullatif</au><au>Dhingra, Bhuwan</au><au>Bosselut, Antoine</au><au>van der Plas, Lonneke</au><au>Ataman, Duygu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating Morphological Compositional Generalization in Large Language Models</atitle><date>2024-10-16</date><risdate>2024</risdate><abstract>Large language models (LLMs) have demonstrated significant progress in
various natural language generation and understanding tasks. However, their
linguistic generalization capabilities remain questionable, raising doubts
about whether these models learn language similarly to humans. While humans
exhibit compositional generalization and linguistic creativity in language use,
the extent to which LLMs replicate these abilities, particularly in morphology,
is under-explored. In this work, we systematically investigate the
morphological generalization abilities of LLMs through the lens of
compositionality. We define morphemes as compositional primitives and design a
novel suite of generative and discriminative tasks to assess morphological
productivity and systematicity. Focusing on agglutinative languages such as
Turkish and Finnish, we evaluate several state-of-the-art instruction-finetuned
multilingual models, including GPT-4 and Gemini. Our analysis shows that LLMs
struggle with morphological compositional generalization particularly when
applied to novel word roots, with performance declining sharply as
morphological complexity increases. While models can identify individual
morphological combinations better than chance, their performance lacks
systematicity, leading to significant accuracy gaps compared to humans.</abstract><doi>10.48550/arxiv.2410.12656</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.12656 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_12656 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
title | Evaluating Morphological Compositional Generalization in Large Language Models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T17%3A22%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20Morphological%20Compositional%20Generalization%20in%20Large%20Language%20Models&rft.au=Ismayilzada,%20Mete&rft.date=2024-10-16&rft_id=info:doi/10.48550/arxiv.2410.12656&rft_dat=%3Carxiv_GOX%3E2410_12656%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |