CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Ningyu, Chen, Mosha, Bi, Zhen, Liang, Xiaozhuan, Li, Lei, Shang, Xin, Yin, Kangping, Tan, Chuanqi, Xu, Jian, Huang, Fei, Si, Luo, Ni, Yuan, Xie, Guotong, Sui, Zhifang, Chang, Baobao, Zong, Hui, Yuan, Zheng, Li, Linfeng, Yan, Jun, Zan, Hongying, Zhang, Kunli, Tang, Buzhou, Chen, Qingcai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhang, Ningyu
Chen, Mosha
Bi, Zhen
Liang, Xiaozhuan
Li, Lei
Shang, Xin
Yin, Kangping
Tan, Chuanqi
Xu, Jian
Huang, Fei
Si, Luo
Ni, Yuan
Xie, Guotong
Sui, Zhifang
Chang, Baobao
Zong, Hui
Yuan, Zheng
Li, Linfeng
Yan, Jun
Zan, Hongying
Zhang, Kunli
Tang, Buzhou
Chen, Qingcai
description Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling. Our benchmark is released at \url{https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&lang=en-us}.
doi_str_mv 10.48550/arxiv.2106.08087
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2106_08087</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2106_08087</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-e657011d7894af2b34ef222c1709f750fbda717633932ff6bf1e57b19d37bfb83</originalsourceid><addsrcrecordid>eNotz71OwzAYhWEvDKhwAUz4BhL8E_tzujVRKEiRWNo5-hzbqdXURUlbwd0DpdPZ3qOHkCfO8sIoxV5w-oqXXHCmc2aYgXuyrqt22yzpita7mPzsaRWPB-9ijyNtMQ1nHDzdJuen-YTJxTTQ5oLjGU_xmGjlU7874LR_IHcBx9k_3nZBNq_Npn7L2o_1e71qM9QAmdcKGOcOTFlgEFYWPggheg6sDKBYsA6Bg5aylCIEbQP3CiwvnQQbrJEL8vyfvUq6zyn-nn93f6LuKpI_Z_tFOg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark</title><source>arXiv.org</source><creator>Zhang, Ningyu ; Chen, Mosha ; Bi, Zhen ; Liang, Xiaozhuan ; Li, Lei ; Shang, Xin ; Yin, Kangping ; Tan, Chuanqi ; Xu, Jian ; Huang, Fei ; Si, Luo ; Ni, Yuan ; Xie, Guotong ; Sui, Zhifang ; Chang, Baobao ; Zong, Hui ; Yuan, Zheng ; Li, Linfeng ; Yan, Jun ; Zan, Hongying ; Zhang, Kunli ; Tang, Buzhou ; Chen, Qingcai</creator><creatorcontrib>Zhang, Ningyu ; Chen, Mosha ; Bi, Zhen ; Liang, Xiaozhuan ; Li, Lei ; Shang, Xin ; Yin, Kangping ; Tan, Chuanqi ; Xu, Jian ; Huang, Fei ; Si, Luo ; Ni, Yuan ; Xie, Guotong ; Sui, Zhifang ; Chang, Baobao ; Zong, Hui ; Yuan, Zheng ; Li, Linfeng ; Yan, Jun ; Zan, Hongying ; Zhang, Kunli ; Tang, Buzhou ; Chen, Qingcai</creatorcontrib><description>Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling. Our benchmark is released at \url{https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&amp;lang=en-us}.</description><identifier>DOI: 10.48550/arxiv.2106.08087</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Information Retrieval ; Computer Science - Learning</subject><creationdate>2021-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2106.08087$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.08087$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Ningyu</creatorcontrib><creatorcontrib>Chen, Mosha</creatorcontrib><creatorcontrib>Bi, Zhen</creatorcontrib><creatorcontrib>Liang, Xiaozhuan</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Shang, Xin</creatorcontrib><creatorcontrib>Yin, Kangping</creatorcontrib><creatorcontrib>Tan, Chuanqi</creatorcontrib><creatorcontrib>Xu, Jian</creatorcontrib><creatorcontrib>Huang, Fei</creatorcontrib><creatorcontrib>Si, Luo</creatorcontrib><creatorcontrib>Ni, Yuan</creatorcontrib><creatorcontrib>Xie, Guotong</creatorcontrib><creatorcontrib>Sui, Zhifang</creatorcontrib><creatorcontrib>Chang, Baobao</creatorcontrib><creatorcontrib>Zong, Hui</creatorcontrib><creatorcontrib>Yuan, Zheng</creatorcontrib><creatorcontrib>Li, Linfeng</creatorcontrib><creatorcontrib>Yan, Jun</creatorcontrib><creatorcontrib>Zan, Hongying</creatorcontrib><creatorcontrib>Zhang, Kunli</creatorcontrib><creatorcontrib>Tang, Buzhou</creatorcontrib><creatorcontrib>Chen, Qingcai</creatorcontrib><title>CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark</title><description>Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling. Our benchmark is released at \url{https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&amp;lang=en-us}.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAYhWEvDKhwAUz4BhL8E_tzujVRKEiRWNo5-hzbqdXURUlbwd0DpdPZ3qOHkCfO8sIoxV5w-oqXXHCmc2aYgXuyrqt22yzpita7mPzsaRWPB-9ijyNtMQ1nHDzdJuen-YTJxTTQ5oLjGU_xmGjlU7874LR_IHcBx9k_3nZBNq_Npn7L2o_1e71qM9QAmdcKGOcOTFlgEFYWPggheg6sDKBYsA6Bg5aylCIEbQP3CiwvnQQbrJEL8vyfvUq6zyn-nn93f6LuKpI_Z_tFOg</recordid><startdate>20210615</startdate><enddate>20210615</enddate><creator>Zhang, Ningyu</creator><creator>Chen, Mosha</creator><creator>Bi, Zhen</creator><creator>Liang, Xiaozhuan</creator><creator>Li, Lei</creator><creator>Shang, Xin</creator><creator>Yin, Kangping</creator><creator>Tan, Chuanqi</creator><creator>Xu, Jian</creator><creator>Huang, Fei</creator><creator>Si, Luo</creator><creator>Ni, Yuan</creator><creator>Xie, Guotong</creator><creator>Sui, Zhifang</creator><creator>Chang, Baobao</creator><creator>Zong, Hui</creator><creator>Yuan, Zheng</creator><creator>Li, Linfeng</creator><creator>Yan, Jun</creator><creator>Zan, Hongying</creator><creator>Zhang, Kunli</creator><creator>Tang, Buzhou</creator><creator>Chen, Qingcai</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210615</creationdate><title>CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark</title><author>Zhang, Ningyu ; Chen, Mosha ; Bi, Zhen ; Liang, Xiaozhuan ; Li, Lei ; Shang, Xin ; Yin, Kangping ; Tan, Chuanqi ; Xu, Jian ; Huang, Fei ; Si, Luo ; Ni, Yuan ; Xie, Guotong ; Sui, Zhifang ; Chang, Baobao ; Zong, Hui ; Yuan, Zheng ; Li, Linfeng ; Yan, Jun ; Zan, Hongying ; Zhang, Kunli ; Tang, Buzhou ; Chen, Qingcai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-e657011d7894af2b34ef222c1709f750fbda717633932ff6bf1e57b19d37bfb83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Ningyu</creatorcontrib><creatorcontrib>Chen, Mosha</creatorcontrib><creatorcontrib>Bi, Zhen</creatorcontrib><creatorcontrib>Liang, Xiaozhuan</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Shang, Xin</creatorcontrib><creatorcontrib>Yin, Kangping</creatorcontrib><creatorcontrib>Tan, Chuanqi</creatorcontrib><creatorcontrib>Xu, Jian</creatorcontrib><creatorcontrib>Huang, Fei</creatorcontrib><creatorcontrib>Si, Luo</creatorcontrib><creatorcontrib>Ni, Yuan</creatorcontrib><creatorcontrib>Xie, Guotong</creatorcontrib><creatorcontrib>Sui, Zhifang</creatorcontrib><creatorcontrib>Chang, Baobao</creatorcontrib><creatorcontrib>Zong, Hui</creatorcontrib><creatorcontrib>Yuan, Zheng</creatorcontrib><creatorcontrib>Li, Linfeng</creatorcontrib><creatorcontrib>Yan, Jun</creatorcontrib><creatorcontrib>Zan, Hongying</creatorcontrib><creatorcontrib>Zhang, Kunli</creatorcontrib><creatorcontrib>Tang, Buzhou</creatorcontrib><creatorcontrib>Chen, Qingcai</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Ningyu</au><au>Chen, Mosha</au><au>Bi, Zhen</au><au>Liang, Xiaozhuan</au><au>Li, Lei</au><au>Shang, Xin</au><au>Yin, Kangping</au><au>Tan, Chuanqi</au><au>Xu, Jian</au><au>Huang, Fei</au><au>Si, Luo</au><au>Ni, Yuan</au><au>Xie, Guotong</au><au>Sui, Zhifang</au><au>Chang, Baobao</au><au>Zong, Hui</au><au>Yuan, Zheng</au><au>Li, Linfeng</au><au>Yan, Jun</au><au>Zan, Hongying</au><au>Zhang, Kunli</au><au>Tang, Buzhou</au><au>Chen, Qingcai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark</atitle><date>2021-06-15</date><risdate>2021</risdate><abstract>Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling. Our benchmark is released at \url{https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414&amp;lang=en-us}.</abstract><doi>10.48550/arxiv.2106.08087</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2106.08087
ispartof
issn
language eng
recordid cdi_arxiv_primary_2106_08087
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Information Retrieval
Computer Science - Learning
title CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T15%3A26%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CBLUE:%20A%20Chinese%20Biomedical%20Language%20Understanding%20Evaluation%20Benchmark&rft.au=Zhang,%20Ningyu&rft.date=2021-06-15&rft_id=info:doi/10.48550/arxiv.2106.08087&rft_dat=%3Carxiv_GOX%3E2106_08087%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true