Automatic food recognition systems have been receiving increasing attention in the research community with the advancements in inductive learning (e.g., classification in computer vision) due to their applicability in the… Click to show full abstract
Automatic food recognition systems have been receiving increasing attention in the research community with the advancements in inductive learning (e.g., classification in computer vision) due to their applicability in the healthcare and hospitality industry. However, food recognition is challenging due to its fine-grained nature and its high correlation with culture, geo-location, and language. To make food recognition systems feasible for the Middle Eastern region, we present a large-scale dataset (MEFood) of commonly consumed food items in the Middle East, thereby providing a dataset for current development and establishing a benchmark for future research. We have also thoroughly examined the MEFood dataset highlighting its challenging aspects and its real-world nature. Additionally, we have conducted a thorough experimental study benchmarking the mainstream computer vision and mobile networks on classification, runtime, and resource utilization metrics. Our results highlight that EfficientNet-V2 achieves performance closer to the best-performing individual model on the MEFood dataset while having the least resource utilization and minimal inference times. Finally, we have performed a thorough error analysis study to glean additional insights about the networks and MEFood dataset.
               
Click one of the above tabs to view related content.