TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark

Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We intro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wróblewska, Ania, Kaliska, Agnieszka, Pawłowski, Maciej, Wiśniewski, Dawid, Sosnowski, Witold, Ławrynowicz, Agnieszka
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wróblewska, Ania Kaliska, Agnieszka Pawłowski, Maciej Wiśniewski, Dawid Sosnowski, Witold Ławrynowicz, Agnieszka
description	Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.
doi_str_mv	10.48550/arxiv.2204.07775
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2204_07775</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2204_07775</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-3b97d73328eb94c91057adc1e62b3f37247d365d757ffff11a531db9060368123</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3hgFo-gBP-AQfbG3uTY1tSQKpUqeQebWKntQpOlUQI_p6kdC4zGo129Rh7VDJJM2PkM_U_4TvRWqaJRERzz9bl6qMsBj9yIfjBN-Hi-QuNNDcUHd92neNFHMMY_DAPumOcchf52sfm9EX9ecnuWvoc_MPNF6zcFuXmTez2r--b1U6QRSOgztEhgM58nadNrqRBco3yVtfQAuoUHVjj0GA7SSkyoFydSyvBZkrDgj39n71CVJc-TM9_qxmmusLAH4OYQmg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</title><source>arXiv.org</source><creator>Wróblewska, Ania ; Kaliska, Agnieszka ; Pawłowski, Maciej ; Wiśniewski, Dawid ; Sosnowski, Witold ; Ławrynowicz, Agnieszka</creator><creatorcontrib>Wróblewska, Ania ; Kaliska, Agnieszka ; Pawłowski, Maciej ; Wiśniewski, Dawid ; Sosnowski, Witold ; Ławrynowicz, Agnieszka</creatorcontrib><description>Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.</description><identifier>DOI: 10.48550/arxiv.2204.07775</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2022-04</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2204.07775$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2204.07775$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wróblewska, Ania</creatorcontrib><creatorcontrib>Kaliska, Agnieszka</creatorcontrib><creatorcontrib>Pawłowski, Maciej</creatorcontrib><creatorcontrib>Wiśniewski, Dawid</creatorcontrib><creatorcontrib>Sosnowski, Witold</creatorcontrib><creatorcontrib>Ławrynowicz, Agnieszka</creatorcontrib><title>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</title><description>Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3hgFo-gBP-AQfbG3uTY1tSQKpUqeQebWKntQpOlUQI_p6kdC4zGo129Rh7VDJJM2PkM_U_4TvRWqaJRERzz9bl6qMsBj9yIfjBN-Hi-QuNNDcUHd92neNFHMMY_DAPumOcchf52sfm9EX9ecnuWvoc_MPNF6zcFuXmTez2r--b1U6QRSOgztEhgM58nadNrqRBco3yVtfQAuoUHVjj0GA7SSkyoFydSyvBZkrDgj39n71CVJc-TM9_qxmmusLAH4OYQmg</recordid><startdate>20220416</startdate><enddate>20220416</enddate><creator>Wróblewska, Ania</creator><creator>Kaliska, Agnieszka</creator><creator>Pawłowski, Maciej</creator><creator>Wiśniewski, Dawid</creator><creator>Sosnowski, Witold</creator><creator>Ławrynowicz, Agnieszka</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220416</creationdate><title>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</title><author>Wróblewska, Ania ; Kaliska, Agnieszka ; Pawłowski, Maciej ; Wiśniewski, Dawid ; Sosnowski, Witold ; Ławrynowicz, Agnieszka</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-3b97d73328eb94c91057adc1e62b3f37247d365d757ffff11a531db9060368123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wróblewska, Ania</creatorcontrib><creatorcontrib>Kaliska, Agnieszka</creatorcontrib><creatorcontrib>Pawłowski, Maciej</creatorcontrib><creatorcontrib>Wiśniewski, Dawid</creatorcontrib><creatorcontrib>Sosnowski, Witold</creatorcontrib><creatorcontrib>Ławrynowicz, Agnieszka</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wróblewska, Ania</au><au>Kaliska, Agnieszka</au><au>Pawłowski, Maciej</au><au>Wiśniewski, Dawid</au><au>Sosnowski, Witold</au><au>Ławrynowicz, Agnieszka</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</atitle><date>2022-04-16</date><risdate>2022</risdate><abstract>Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.</abstract><doi>10.48550/arxiv.2204.07775</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2204.07775
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2204_07775
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
title	TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A10%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TASTEset%20--%20Recipe%20Dataset%20and%20Food%20Entities%20Recognition%20Benchmark&rft.au=Wr%C3%B3blewska,%20Ania&rft.date=2022-04-16&rft_id=info:doi/10.48550/arxiv.2204.07775&rft_dat=%3Carxiv_GOX%3E2204_07775%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true