Chumor 2.0: Towards Benchmarking Chinese Humor Understanding
Existing humor datasets and evaluations predominantly focus on English, leaving limited resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, the first Chinese humor explanation dataset that exceeds the size of existing humor datasets...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | He, Ruiqi He, Yushu Bai, Longju Liu, Jiarui Sun, Zhenjie Tang, Zenghao Wang, He Xia, Hanchen Mihalcea, Rada Deng, Naihao |
description | Existing humor datasets and evaluations predominantly focus on English,
leaving limited resources for culturally nuanced humor in non-English languages
like Chinese. To address this gap, we construct Chumor, the first Chinese humor
explanation dataset that exceeds the size of existing humor datasets. Chumor is
sourced from Ruo Zhi Ba, a Chinese Reddit-like platform known for sharing
intellectually challenging and culturally specific jokes. We test ten LLMs
through direct and chain-of-thought prompting, revealing that Chumor poses
significant challenges to existing LLMs, with their accuracy slightly above
random and far below human. In addition, our analysis highlights that
human-annotated humor explanations are significantly better than those
generated by GPT-4o and ERNIE-4-turbo. We release Chumor at
https://huggingface.co/datasets/dnaihao/Chumor, our project page is at
https://dnaihao.github.io/Chumor-dataset/, our leaderboard is at
https://huggingface.co/spaces/dnaihao/Chumor, and our codebase is at
https://github.com/dnaihao/Chumor-dataset. |
doi_str_mv | 10.48550/arxiv.2412.17729 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_17729</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_17729</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_177293</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0Nzey5GSwcc4ozc0vUjDSM7BSCMkvTyxKKVZwSs1LzshNLMrOzEtXcM7IzEstTlXwAKsLzUtJLSouScxLAcrxMLCmJeYUp_JCaW4GeTfXEGcPXbA98QVFmUBDKuNB9sWD7TMmrAIAnto0ng</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Chumor 2.0: Towards Benchmarking Chinese Humor Understanding</title><source>arXiv.org</source><creator>He, Ruiqi ; He, Yushu ; Bai, Longju ; Liu, Jiarui ; Sun, Zhenjie ; Tang, Zenghao ; Wang, He ; Xia, Hanchen ; Mihalcea, Rada ; Deng, Naihao</creator><creatorcontrib>He, Ruiqi ; He, Yushu ; Bai, Longju ; Liu, Jiarui ; Sun, Zhenjie ; Tang, Zenghao ; Wang, He ; Xia, Hanchen ; Mihalcea, Rada ; Deng, Naihao</creatorcontrib><description>Existing humor datasets and evaluations predominantly focus on English,
leaving limited resources for culturally nuanced humor in non-English languages
like Chinese. To address this gap, we construct Chumor, the first Chinese humor
explanation dataset that exceeds the size of existing humor datasets. Chumor is
sourced from Ruo Zhi Ba, a Chinese Reddit-like platform known for sharing
intellectually challenging and culturally specific jokes. We test ten LLMs
through direct and chain-of-thought prompting, revealing that Chumor poses
significant challenges to existing LLMs, with their accuracy slightly above
random and far below human. In addition, our analysis highlights that
human-annotated humor explanations are significantly better than those
generated by GPT-4o and ERNIE-4-turbo. We release Chumor at
https://huggingface.co/datasets/dnaihao/Chumor, our project page is at
https://dnaihao.github.io/Chumor-dataset/, our leaderboard is at
https://huggingface.co/spaces/dnaihao/Chumor, and our codebase is at
https://github.com/dnaihao/Chumor-dataset.</description><identifier>DOI: 10.48550/arxiv.2412.17729</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.17729$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.17729$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Ruiqi</creatorcontrib><creatorcontrib>He, Yushu</creatorcontrib><creatorcontrib>Bai, Longju</creatorcontrib><creatorcontrib>Liu, Jiarui</creatorcontrib><creatorcontrib>Sun, Zhenjie</creatorcontrib><creatorcontrib>Tang, Zenghao</creatorcontrib><creatorcontrib>Wang, He</creatorcontrib><creatorcontrib>Xia, Hanchen</creatorcontrib><creatorcontrib>Mihalcea, Rada</creatorcontrib><creatorcontrib>Deng, Naihao</creatorcontrib><title>Chumor 2.0: Towards Benchmarking Chinese Humor Understanding</title><description>Existing humor datasets and evaluations predominantly focus on English,
leaving limited resources for culturally nuanced humor in non-English languages
like Chinese. To address this gap, we construct Chumor, the first Chinese humor
explanation dataset that exceeds the size of existing humor datasets. Chumor is
sourced from Ruo Zhi Ba, a Chinese Reddit-like platform known for sharing
intellectually challenging and culturally specific jokes. We test ten LLMs
through direct and chain-of-thought prompting, revealing that Chumor poses
significant challenges to existing LLMs, with their accuracy slightly above
random and far below human. In addition, our analysis highlights that
human-annotated humor explanations are significantly better than those
generated by GPT-4o and ERNIE-4-turbo. We release Chumor at
https://huggingface.co/datasets/dnaihao/Chumor, our project page is at
https://dnaihao.github.io/Chumor-dataset/, our leaderboard is at
https://huggingface.co/spaces/dnaihao/Chumor, and our codebase is at
https://github.com/dnaihao/Chumor-dataset.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0Nzey5GSwcc4ozc0vUjDSM7BSCMkvTyxKKVZwSs1LzshNLMrOzEtXcM7IzEstTlXwAKsLzUtJLSouScxLAcrxMLCmJeYUp_JCaW4GeTfXEGcPXbA98QVFmUBDKuNB9sWD7TMmrAIAnto0ng</recordid><startdate>20241223</startdate><enddate>20241223</enddate><creator>He, Ruiqi</creator><creator>He, Yushu</creator><creator>Bai, Longju</creator><creator>Liu, Jiarui</creator><creator>Sun, Zhenjie</creator><creator>Tang, Zenghao</creator><creator>Wang, He</creator><creator>Xia, Hanchen</creator><creator>Mihalcea, Rada</creator><creator>Deng, Naihao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241223</creationdate><title>Chumor 2.0: Towards Benchmarking Chinese Humor Understanding</title><author>He, Ruiqi ; He, Yushu ; Bai, Longju ; Liu, Jiarui ; Sun, Zhenjie ; Tang, Zenghao ; Wang, He ; Xia, Hanchen ; Mihalcea, Rada ; Deng, Naihao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_177293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Ruiqi</creatorcontrib><creatorcontrib>He, Yushu</creatorcontrib><creatorcontrib>Bai, Longju</creatorcontrib><creatorcontrib>Liu, Jiarui</creatorcontrib><creatorcontrib>Sun, Zhenjie</creatorcontrib><creatorcontrib>Tang, Zenghao</creatorcontrib><creatorcontrib>Wang, He</creatorcontrib><creatorcontrib>Xia, Hanchen</creatorcontrib><creatorcontrib>Mihalcea, Rada</creatorcontrib><creatorcontrib>Deng, Naihao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>He, Ruiqi</au><au>He, Yushu</au><au>Bai, Longju</au><au>Liu, Jiarui</au><au>Sun, Zhenjie</au><au>Tang, Zenghao</au><au>Wang, He</au><au>Xia, Hanchen</au><au>Mihalcea, Rada</au><au>Deng, Naihao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Chumor 2.0: Towards Benchmarking Chinese Humor Understanding</atitle><date>2024-12-23</date><risdate>2024</risdate><abstract>Existing humor datasets and evaluations predominantly focus on English,
leaving limited resources for culturally nuanced humor in non-English languages
like Chinese. To address this gap, we construct Chumor, the first Chinese humor
explanation dataset that exceeds the size of existing humor datasets. Chumor is
sourced from Ruo Zhi Ba, a Chinese Reddit-like platform known for sharing
intellectually challenging and culturally specific jokes. We test ten LLMs
through direct and chain-of-thought prompting, revealing that Chumor poses
significant challenges to existing LLMs, with their accuracy slightly above
random and far below human. In addition, our analysis highlights that
human-annotated humor explanations are significantly better than those
generated by GPT-4o and ERNIE-4-turbo. We release Chumor at
https://huggingface.co/datasets/dnaihao/Chumor, our project page is at
https://dnaihao.github.io/Chumor-dataset/, our leaderboard is at
https://huggingface.co/spaces/dnaihao/Chumor, and our codebase is at
https://github.com/dnaihao/Chumor-dataset.</abstract><doi>10.48550/arxiv.2412.17729</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.17729 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_17729 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
title | Chumor 2.0: Towards Benchmarking Chinese Humor Understanding |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T10%3A03%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Chumor%202.0:%20Towards%20Benchmarking%20Chinese%20Humor%20Understanding&rft.au=He,%20Ruiqi&rft.date=2024-12-23&rft_id=info:doi/10.48550/arxiv.2412.17729&rft_dat=%3Carxiv_GOX%3E2412_17729%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |