From Bytes to Bites; Advancing Data Collection Methodologies for Enhanced Branded Food Insights
Nutrition research relies on food databases which are extensively used in dietary surveys, clinical practice, research, and policy development (1). Online data volume is expected to increase up to 180 zettabytes by 2025, due to a proliferation of internet-connected devices, the growth of social medi...
Gespeichert in:
Veröffentlicht in: | Proceedings of the Nutrition Society 2024-11, Vol.83 (OCE4) |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Nutrition research relies on food databases which are extensively used in dietary surveys, clinical practice, research, and policy development (1). Online data volume is expected to increase up to 180 zettabytes by 2025, due to a proliferation of internet-connected devices, the growth of social media platforms, and a digital transformation of industries (2). Webscraping, a method to extract data from websites, has been previously used in Ireland to evaluate online retailer information as a potential source for monitoring food reformulation efforts in the Irish retail market (3). This study aims to outline a process for, and evaluate the use of, webscraping on online supermarket websites to increase data availability to researchers. An online supermarket website was selected to trial the new process. Octoparse software version 8 was downloaded. 12 data fields of interest were identified; cost, lifestyle, net weight, Directions for use, Storage instructions, Nutrition information, Front of pack information, legal name, brand name, manufacturer, ingredients, and allergy advice. A process was defined for data web scraping in four main steps; 1) collection of category level URL’s, 2) collection of product level URL’s, 3) collection of data at product level within defined fields and 4) data cleaning and re-structuring. A workflow was created in Octoparse for steps i - iii and step iv was completed using Excel version 16.69.1. 83 category level page links were generated and entered into Octoparse software. Webscraping was completed on 3,095 product level URLs. Data on 1,450 products (47%) were successfully scraped as they had data within the 12 defined data fields. A new dataset was created for the 1,450 products with data fields including information on nutrition (energy, fat, of which saturates, carbohydrate, of which sugars, fibre, protein and salt), costs per serving and per kg, lifestyle factors (e.g. whether a product was vegetarian or vegan), ingredient lists and allergy advice. 637 products (44%) were found to have vegetarian/vegan claims. Micronutrient level data was limited. An increased availability of online data presents an opportunity for the development of new and more systematically updated datasets, and may increase the availability of information on branded products. Webscraping enables researchers to create new databases, and systematically update datasets, with less resources. This study enhances the availability of data and may enable researc |
---|---|
ISSN: | 0029-6651 1475-2719 |
DOI: | 10.1017/S0029665124007249 |