SBI – Department of Systems Biology and Bioinformatics
Faculty of Computer Science and Electrical Engineering
University of Rostock
Ulmenstrasse 69 | 18057 Rostock
Germany
+49 381 498-7571
olaf.wolkenhauer@uni-rostock.de
The risk of chronic diseases is directly related to one’s diet. There has been a growing need for a smart dietary assessment system that can help people to keep track of their
food consumption. The first step towards development of such a system is the recognition of the food item and estimation of its volume of from its images. The state-ofthe-art techniques to find the volume of food item from its images requires the presence of a reference calibration object to be placed next to the food item. Other techniques
involve creating a 3D reconstruction of the food item from multiview images to find its volume and in some approaches complex camera systems such as multiple cameras, stereo cameras or depth cameras are used. In this thesis, we adopt a different approach where the presence of a calibration object is not required, and the 3D reconstruction of the food item is not done explicitly. The factors that influences the 3D reconstruction such as multi-view images and the change in position of the camera are taken as inputs to a supervised deep learning model. The sequence of images is taken using a simple smartphone camera and the change in the position of the smart phone while capturing the sequence of images is obtained from the IMU sensors of the smartphone. Objects in the images are classified using state-of-the-art deep neural networks for image classification. An end-to-end deep neural network model is implemented that takes in the multi-view images and the IMU sensor measurements as inputs, fuses them together to estimate the volume. LSTMs are used to learn the sequential nature of the inputs.
Experimental results demonstrate that the accuracy of image classification achieved is 99.23%. The error analysis of volume predictions show that image features and the
quantity of dataset played an important role in the performance of the models. The volume prediction results improved when the image features were better represented in
the deep learning model.
Location: Ulmencampus - Building 3 - Room 410