Binding Language Models in Symbolic Languages

Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of la...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cheng, Zhoujun, Xie, Tianbao, Shi, Peng, Li, Chengzu, Nadkarni, Rahul, Hu, Yushi, Xiong, Caiming, Radev, Dragomir, Ostendorf, Mari, Zettlemoyer, Luke, Smith, Noah A, Yu, Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Cheng, Zhoujun Xie, Tianbao Shi, Peng Li, Chengzu Nadkarni, Rahul Hu, Yushi Xiong, Caiming Radev, Dragomir Ostendorf, Mari Zettlemoyer, Luke Smith, Noah A Yu, Tao
description	Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training. Our code is available at https://github.com/HKUNLP/Binder .
doi_str_mv	10.48550/arxiv.2210.02875
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2210_02875</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2210_02875</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-b6acf85e8e24e44e3361cfc669defe6c2cedad5eeb4c49e91798c1bf248de6ed3</originalsourceid><addsrcrecordid>eNo9zskKwjAYBOBcPEj1ATyZF2ht0yRNj1rcoOLB3kuWPyXQRVoU-_aueBqYgeFDaBGFARWMhSvZP9w9IORVhEQkbIr8jWuNayucy7a6yQrwqTNQD9i1-DI2qqud_m_DDE2srAeY_9JDxW5bZAc_P--P2Tr3JU-Yr7jUVjAQQChQCnHMI20156kBC1wTDUYaBqCopimkUZIKHSlLqDDAwcQeWn5vP97y2rtG9mP5dpcfd_wEe4A-EQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Binding Language Models in Symbolic Languages</title><source>arXiv.org</source><creator>Cheng, Zhoujun ; Xie, Tianbao ; Shi, Peng ; Li, Chengzu ; Nadkarni, Rahul ; Hu, Yushi ; Xiong, Caiming ; Radev, Dragomir ; Ostendorf, Mari ; Zettlemoyer, Luke ; Smith, Noah A ; Yu, Tao</creator><creatorcontrib>Cheng, Zhoujun ; Xie, Tianbao ; Shi, Peng ; Li, Chengzu ; Nadkarni, Rahul ; Hu, Yushi ; Xiong, Caiming ; Radev, Dragomir ; Ostendorf, Mari ; Zettlemoyer, Luke ; Smith, Noah A ; Yu, Tao</creatorcontrib><description>Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training. Our code is available at https://github.com/HKUNLP/Binder .</description><identifier>DOI: 10.48550/arxiv.2210.02875</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2022-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2210.02875$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2210.02875$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cheng, Zhoujun</creatorcontrib><creatorcontrib>Xie, Tianbao</creatorcontrib><creatorcontrib>Shi, Peng</creatorcontrib><creatorcontrib>Li, Chengzu</creatorcontrib><creatorcontrib>Nadkarni, Rahul</creatorcontrib><creatorcontrib>Hu, Yushi</creatorcontrib><creatorcontrib>Xiong, Caiming</creatorcontrib><creatorcontrib>Radev, Dragomir</creatorcontrib><creatorcontrib>Ostendorf, Mari</creatorcontrib><creatorcontrib>Zettlemoyer, Luke</creatorcontrib><creatorcontrib>Smith, Noah A</creatorcontrib><creatorcontrib>Yu, Tao</creatorcontrib><title>Binding Language Models in Symbolic Languages</title><description>Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training. Our code is available at https://github.com/HKUNLP/Binder .</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo9zskKwjAYBOBcPEj1ATyZF2ht0yRNj1rcoOLB3kuWPyXQRVoU-_aueBqYgeFDaBGFARWMhSvZP9w9IORVhEQkbIr8jWuNayucy7a6yQrwqTNQD9i1-DI2qqud_m_DDE2srAeY_9JDxW5bZAc_P--P2Tr3JU-Yr7jUVjAQQChQCnHMI20156kBC1wTDUYaBqCopimkUZIKHSlLqDDAwcQeWn5vP97y2rtG9mP5dpcfd_wEe4A-EQ</recordid><startdate>20221006</startdate><enddate>20221006</enddate><creator>Cheng, Zhoujun</creator><creator>Xie, Tianbao</creator><creator>Shi, Peng</creator><creator>Li, Chengzu</creator><creator>Nadkarni, Rahul</creator><creator>Hu, Yushi</creator><creator>Xiong, Caiming</creator><creator>Radev, Dragomir</creator><creator>Ostendorf, Mari</creator><creator>Zettlemoyer, Luke</creator><creator>Smith, Noah A</creator><creator>Yu, Tao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221006</creationdate><title>Binding Language Models in Symbolic Languages</title><author>Cheng, Zhoujun ; Xie, Tianbao ; Shi, Peng ; Li, Chengzu ; Nadkarni, Rahul ; Hu, Yushi ; Xiong, Caiming ; Radev, Dragomir ; Ostendorf, Mari ; Zettlemoyer, Luke ; Smith, Noah A ; Yu, Tao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-b6acf85e8e24e44e3361cfc669defe6c2cedad5eeb4c49e91798c1bf248de6ed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Zhoujun</creatorcontrib><creatorcontrib>Xie, Tianbao</creatorcontrib><creatorcontrib>Shi, Peng</creatorcontrib><creatorcontrib>Li, Chengzu</creatorcontrib><creatorcontrib>Nadkarni, Rahul</creatorcontrib><creatorcontrib>Hu, Yushi</creatorcontrib><creatorcontrib>Xiong, Caiming</creatorcontrib><creatorcontrib>Radev, Dragomir</creatorcontrib><creatorcontrib>Ostendorf, Mari</creatorcontrib><creatorcontrib>Zettlemoyer, Luke</creatorcontrib><creatorcontrib>Smith, Noah A</creatorcontrib><creatorcontrib>Yu, Tao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Zhoujun</au><au>Xie, Tianbao</au><au>Shi, Peng</au><au>Li, Chengzu</au><au>Nadkarni, Rahul</au><au>Hu, Yushi</au><au>Xiong, Caiming</au><au>Radev, Dragomir</au><au>Ostendorf, Mari</au><au>Zettlemoyer, Luke</au><au>Smith, Noah A</au><au>Yu, Tao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Binding Language Models in Symbolic Languages</atitle><date>2022-10-06</date><risdate>2022</risdate><abstract>Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training. Our code is available at https://github.com/HKUNLP/Binder .</abstract><doi>10.48550/arxiv.2210.02875</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2210.02875
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2210_02875
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Binding Language Models in Symbolic Languages
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T23%3A31%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Binding%20Language%20Models%20in%20Symbolic%20Languages&rft.au=Cheng,%20Zhoujun&rft.date=2022-10-06&rft_id=info:doi/10.48550/arxiv.2210.02875&rft_dat=%3Carxiv_GOX%3E2210_02875%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true