A Benchmark Dataset to Study the Representation of Food Images

Farinella, Giovanni Maria; Allegra, Dario; Stanco, Filippo

doi:10.1007/978-3-319-16199-0_41

Giovanni Maria Farinella¹⁶,
Dario Allegra¹⁶ &
Filippo Stanco¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8927))

Included in the following conference series:

European Conference on Computer Vision

3948 Accesses
21 Citations

Abstract

It is well-known that people love food. However, an insane diet can cause problems in the general health of the people. Since health is strictly linked to the diet, advanced computer vision tools to recognize food images (e.g. acquired with mobile/wearable cameras), as well as their properties (e.g., calories), can help the diet monitoring by providing useful information to the experts (e.g., nutritionists) to assess the food intake of patients (e.g., to combat obesity). The food recognition is a challenging task since the food is intrinsically deformable and presents high variability in appearance. Image representation plays a fundamental role. To properly study the peculiarities of the image representation in the food application context, a benchmark dataset is needed. These facts motivate the work presented in this paper. In this work we introduce the UNICT-FD889 dataset. It is the first food image dataset composed by over \(800\) distinct plates of food which can be used as benchmark to design and compare representation models of food images. We exploit the UNICT-FD889 dataset for Near Duplicate Image Retrieval (NDIR) purposes by comparing three standard state-of-the-art image descriptors: Bag of Textons, PRICoLBP and SIFT. Results confirm that both textures and colors are fundamental properties in food representation. Moreover the experiments point out that the Bag of Textons representation obtained considering the color domain is more accurate than the other two approaches for NDIR.

Download to read the full chapter text

Chapter PDF

On the Exploitation of One Class Classification to Distinguish Food Vs Non-Food Images

Improved food image recognition by leveraging deep learning and data-driven methods with an application to Central Asian Food Scene

Article Open access 23 April 2025

ChinFood1000: A Large Benchmark Dataset for Chinese Food Recognition

Keywords

References

Kong, F., Tan, J.: Dietcam: Automatic dietary assessment with mobile camera phones. Pervasive and Mobile Computing 8(1), 147–163 (2012)
Article Google Scholar
Xu, C., He, Y., Khannan, N., Parra, A., Boushey, C., Delp, E.: Image-based food volume estimation. In: International Workshop on Multimedia for Cooking and Eating Activities, pp. 75–80 (2013)
Google Scholar
Kim, S., Schap, T.R., Bosch, M., Maciejewski, R., Delp, E.J., Ebert, D.S., Boushey, C.J.: Development of a mobile user interface for image-based dietary assessment. In: International Conference on Mobile and Ubiquitous Multimedia, pp. 1–13 (2010)
Google Scholar
Arab, L., Estrin, D., Kim, D.H., Burke, J., Goldman, J.: Feasibility testing of an automated image-capture method to aid dietary recall (2011)
Google Scholar
Zhu, F., Bosch, M., Woo, I., Kim, S., Boushey, C.J., Ebert, D.S., Delp, E.J.: The use of mobile devices in aiding dietary assessment and evaluation. Journal of Selected Topics in Signal Processing 4(4), 756–766 (2010)
Article Google Scholar
O’Loughlin, G., Cullen, S.J., McGoldrick, A., O’Connor, S., Blain, R., O’Malley, S., Warrington, G.D.: Using a wearable camera to increase the accuracy of dietary analysis. American Journal of Preventive Medicine 44(3), 297–301 (2013)
Article Google Scholar
Chen, M., Dhingra, K., Wu, W., Yang, L., Sukthankar, R., Yang, J.: Pfid: Pittsburgh fast-food image dataset. In: IEEE International Conference Image Processing, pp. 289–292 (2009)
Google Scholar
Yang, S., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. In: IEEE Computer Vision and Pattern Recognition, pp. 2249–2256 (2010)
Google Scholar
Farinella, G.M., Moltisanti, M., Battiato, S.: Classifying food images represented as Bag of Textons. in: IEEE International Conference on Image Processing (ICIP), pp. 5212–5216 (2014)
Google Scholar
Oliveira, R.D., Cherubini, M., Oliver, N.: Looking at near-duplicate videos from a human-centric perspective. ACM Transaction on Multimedia Comput. Commun. Appl. 6(3), 15:1–15:22 (2010)
Google Scholar
Ke, Y., Sukthankar, R., Huston, L.: Efficient near-duplicate detection and sub-image retrieval. In: ACM International Conference on Multimedia, pp. 869–876 (2004)
Google Scholar
Hu, Y., Cheng, X., Chia, L.T., Xie, X., Rajan, D., Tan, A.H.: Coherent phrase model for efficient image near-duplicate retrieval. IEEE Transactions on Multimedia 11(8), 1434–1445 (2009)
Article Google Scholar
Varma, M., Zisserman, A.: A Statistical Approach to Texture Classification from Single Images. International Journal of Computer Vision 62(1–2), 61–81 (2005)
Article Google Scholar
Qi, X., Xiao, R., Guo, J., Zhang, L.: Pairwise rotation invariant co-occurrence local binary pattern. In: European Converence on Computer Vision, pp. 158–171 (2012)
Google Scholar
Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Chen, H.C., Jia, W., Yue, Y., Li, Z., Sun, Y.N., Fernstrom, J.D., Sun, M.: Model-based measurement of food portion size for image-based dietary assessment using 3d/2d registration (2013)
Google Scholar
Jimnez, A.R., Jain, A.K., Ruz, R.C., Rovira, J.L.P.: Automatic fruit recognition: a survey and new results using range/attenuation images. Pattern Recognition 32(10), 1719–1736 (1999)
Article Google Scholar
Joutou, T., Yanai, K.: A food image recognition system with multiple kernel learning. In: IEEE International Conference on Image Processing, pp. 285–288 (2009)
Google Scholar
Matsuda, Y., Hoashi, H., Yanai, K.: Recognition of multiple-food images by detecting candidate regions. In: IEEE International Conference on Multimedia and Expo, pp. 25–30 (2012)
Google Scholar
Julesz, B.: Textons, the elements of texture perception, and their interactions. Nature 290(5802), 91–97 (1981)
Article Google Scholar
Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and Texture Analysis for Image Segmentation. International Journal of Computer Vision 43(1), 7–27 (2001)
Article MATH Google Scholar
Leung, T., Malik, J.: Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons. Int. J. Comput. Vision 43(1), 29–44 (2001)
Article MATH Google Scholar
Battiato, S., Farinella, G.M., Gallo, G., Ravì, D.: Exploiting textons distributions on spatial hierarchy for scene classification. Eurasip Journal on Image and Video Processing, pp. 1–13 (2010)
Google Scholar
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)
Article Google Scholar
Qi, X., Xiao, R., Li, C., Qiao, Y., Guo, J., Tang, X.: Pairwise rotation invariant co-occurrence local binary pattern. IEEE Transactions on Pattern Analysis and Machine Intelligence (2014)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision, pp. 1150–1157 (1999)
Google Scholar
Brown, M., Lowe, D.: Automatic panoramic image stitching using invariant features. International Journal of Computer Vision 74(1), 59–73 (2007)
Article Google Scholar
Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: British Machine Vision Conference, pp. 1–10 (2008)
Google Scholar
Battiato, S., Farinella, G.M., Puglisi, G., Ravì, R.: Aligning codebooks for near duplicate image detection. Multimedia Tools and Applications 72(2), 1483–1506 (2014)
Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable libraryof computer vision algorithms (2008). http://www.vlfeat.org/
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2033–2040 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Image Processing Laboratory, Department of Mathematics and Computer Science, University of Catania, Catania, Italy
Giovanni Maria Farinella, Dario Allegra & Filippo Stanco

Authors

Giovanni Maria Farinella
View author publications
You can also search for this author in PubMed Google Scholar
Dario Allegra
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Stanco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Maria Farinella .

Editor information

Editors and Affiliations

University College London, London, United Kingdom
Lourdes Agapito
University of Lugano, Lugano, Switzerland
Michael M. Bronstein
Technische Universität Dresden, Dresden, Germany
Carsten Rother

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Farinella, G.M., Allegra, D., Stanco, F. (2015). A Benchmark Dataset to Study the Representation of Food Images. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8927. Springer, Cham. https://doi.org/10.1007/978-3-319-16199-0_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-16199-0_41
Published: 20 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16198-3
Online ISBN: 978-3-319-16199-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics