TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark

Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We intro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wróblewska, Ania, Kaliska, Agnieszka, Pawłowski, Maciej, Wiśniewski, Dawid, Sosnowski, Witold, Ławrynowicz, Agnieszka
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Wróblewska, Ania
Kaliska, Agnieszka
Pawłowski, Maciej
Wiśniewski, Dawid
Sosnowski, Witold
Ławrynowicz, Agnieszka
description Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.
doi_str_mv 10.48550/arxiv.2204.07775
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2204_07775</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2204_07775</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-3b97d73328eb94c91057adc1e62b3f37247d365d757ffff11a531db9060368123</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3hgFo-gBP-AQfbG3uTY1tSQKpUqeQebWKntQpOlUQI_p6kdC4zGo129Rh7VDJJM2PkM_U_4TvRWqaJRERzz9bl6qMsBj9yIfjBN-Hi-QuNNDcUHd92neNFHMMY_DAPumOcchf52sfm9EX9ecnuWvoc_MPNF6zcFuXmTez2r--b1U6QRSOgztEhgM58nadNrqRBco3yVtfQAuoUHVjj0GA7SSkyoFydSyvBZkrDgj39n71CVJc-TM9_qxmmusLAH4OYQmg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</title><source>arXiv.org</source><creator>Wróblewska, Ania ; Kaliska, Agnieszka ; Pawłowski, Maciej ; Wiśniewski, Dawid ; Sosnowski, Witold ; Ławrynowicz, Agnieszka</creator><creatorcontrib>Wróblewska, Ania ; Kaliska, Agnieszka ; Pawłowski, Maciej ; Wiśniewski, Dawid ; Sosnowski, Witold ; Ławrynowicz, Agnieszka</creatorcontrib><description>Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.</description><identifier>DOI: 10.48550/arxiv.2204.07775</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2022-04</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2204.07775$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2204.07775$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wróblewska, Ania</creatorcontrib><creatorcontrib>Kaliska, Agnieszka</creatorcontrib><creatorcontrib>Pawłowski, Maciej</creatorcontrib><creatorcontrib>Wiśniewski, Dawid</creatorcontrib><creatorcontrib>Sosnowski, Witold</creatorcontrib><creatorcontrib>Ławrynowicz, Agnieszka</creatorcontrib><title>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</title><description>Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3hgFo-gBP-AQfbG3uTY1tSQKpUqeQebWKntQpOlUQI_p6kdC4zGo129Rh7VDJJM2PkM_U_4TvRWqaJRERzz9bl6qMsBj9yIfjBN-Hi-QuNNDcUHd92neNFHMMY_DAPumOcchf52sfm9EX9ecnuWvoc_MPNF6zcFuXmTez2r--b1U6QRSOgztEhgM58nadNrqRBco3yVtfQAuoUHVjj0GA7SSkyoFydSyvBZkrDgj39n71CVJc-TM9_qxmmusLAH4OYQmg</recordid><startdate>20220416</startdate><enddate>20220416</enddate><creator>Wróblewska, Ania</creator><creator>Kaliska, Agnieszka</creator><creator>Pawłowski, Maciej</creator><creator>Wiśniewski, Dawid</creator><creator>Sosnowski, Witold</creator><creator>Ławrynowicz, Agnieszka</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220416</creationdate><title>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</title><author>Wróblewska, Ania ; Kaliska, Agnieszka ; Pawłowski, Maciej ; Wiśniewski, Dawid ; Sosnowski, Witold ; Ławrynowicz, Agnieszka</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-3b97d73328eb94c91057adc1e62b3f37247d365d757ffff11a531db9060368123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wróblewska, Ania</creatorcontrib><creatorcontrib>Kaliska, Agnieszka</creatorcontrib><creatorcontrib>Pawłowski, Maciej</creatorcontrib><creatorcontrib>Wiśniewski, Dawid</creatorcontrib><creatorcontrib>Sosnowski, Witold</creatorcontrib><creatorcontrib>Ławrynowicz, Agnieszka</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wróblewska, Ania</au><au>Kaliska, Agnieszka</au><au>Pawłowski, Maciej</au><au>Wiśniewski, Dawid</au><au>Sosnowski, Witold</au><au>Ławrynowicz, Agnieszka</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark</atitle><date>2022-04-16</date><risdate>2022</risdate><abstract>Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.</abstract><doi>10.48550/arxiv.2204.07775</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2204.07775
ispartof
issn
language eng
recordid cdi_arxiv_primary_2204_07775
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Learning
title TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A10%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TASTEset%20--%20Recipe%20Dataset%20and%20Food%20Entities%20Recognition%20Benchmark&rft.au=Wr%C3%B3blewska,%20Ania&rft.date=2022-04-16&rft_id=info:doi/10.48550/arxiv.2204.07775&rft_dat=%3Carxiv_GOX%3E2204_07775%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true