DCNN:
we proposed presenting DCNN highlights which are extricated from the pre-prepared DCNN with the
ILSVRC 1000-class dataset into sustenance photograph acknowledgment. In the exploratory outcomes,
we have accomplished the best characterization precision, 72.26%, for the UEC-FOOD100 dataset, which
demonstrated that that DCNN highlights can helped the grouping execution by incorporating it with the
ordinary highlights. For future work, we will execute the proposed structure on cell phones. To do that,
it is expected to diminish the measure of the pre-prepared DCNN parameters which comprise of around
60 million gliding esteems.
In this paper, we investigate viability of comparability learning for sustenance picture recovery. We tried
three sorts of CNN and it was turned out that Triplet Network was the most intense system contrasted
with others. We likewise demonstrated the execution of Triplet Network can be enhanced by
consolidating order assignment.
CNN which is known as convolutional neural network. It mainly uses a pooling in order to reduce the
processing time or power consumption required to process the data by reducing the dimension [8]. In
addition, CNN works in multiple layers where the output of first layer will work as an input for the
second layer and the output of the second layer will work as an input for the third layer and so on. After
making the images suitable for multilayer perceptron CNN now flattens the image into column vectors.
This flatten images are put into a neural network which we call feed forward network and then we use
back propagation for backtracking for analysing and finding errors,the process is repeated in every
iteration while training[8] [10]. After calculating the error back propagation traverses back to hidden
layer or inner layer for adjusting the wight in order to decrease the error. Back propagation keeps
repeating this process until the algorithm reaches the potential output[12].
Paper name: A Comprehensive Survey of Image-
Based Food Recognition and Volume Estimation
Methods for Dietary Assessment (food
classification definition)
Recognition of food type and calorie estimation using neural
network (classification algorithm)
Basic algorithm for classifcation is given as follows:
1. Acquiring multiple images of each fruits / vegetables / food items for training and testing
2. Pre-processing and segmentation of the image for obtaining the ROI
3. Based on the boundaries, extracting the appropriate features
4. Measuring the actual volume of the fruits
5. Training the network by using the real volume as target and obtaining training data sets.
6. Performing the volume estimation on other images and calculating the calories
Fine-Grained Food Classification Methods on the UEC
FOOD-100 Database
ALGORITHMS FOR FOOD CLASSIFICATION
The recent developments in machine learning attest the superior performance of deep learning
approaches over the classical methods. The DCNN network allows to analyze various images and videos.
Currently, DCNNs are successfully used for image classification [5], [19], image segmentation [20], [21],
object detection, and localization problems [19], [20], [22]. A review on a specific application of deep
learning with food is given by [23]. This article compares the performances of several deep learning
methods using food images of the UEC FOOD100 database; it reaches the state-of-the-art performance
of 90.02% best-shot accuracy using the ensemble method with the bagging strategy on DenseNet-161
and ResNeXt-101.
Deep Convolutional Neural Network Models :
The convolutional neural network (CNN) provides a feature extraction step with the convolution and the
pooling layers and a classification phase with fully connected layers. The convolution layer uses sliding
boxes, called filters, to extract hierarchical features out of the images. The number of convolutional
layers (CLs), pooling and fully connected layers, the coefficients, and the size and the number of the
filters are all critical hyper parameters of the deep learning systems. There is also a final layer, which is
called the softmax or classification layer. The DCNN became famous in 2012, when Krizhevsky et al.
presented the with AlexNet architecture [5], which won the ImageNet Large-Scale Visual Recognition
Challenge (ILSVRC). This model is made up of five CLs followed by three fully connected layers; the first
CL uses filters of size 11 × 11, the second CL filters of size 5 × 5, and the other layers use filters of size 3 ×
3. The AlexNet tackles the vanishing gradient problem by using the rectified linear unit (ReLU) activation
function, instead of the Sigmoid or Tanh functions. Another important change was to reduce the
overfitting problem by adding a dropout layer after every fully connected layer. The dropout layer
randomly associates to every neuron a probability value and switches OFF those neurons, which do not
reach the prefixed threshold. The Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on
July 05,2022 at 07:45:52 UTC from IEEE Xplore. Restrictions apply. 240 IEEE TRANSACTIONS ON
ARTIFICIAL INTELLIGENCE, VOL. 3, NO. 2, APRIL 2022 ConvNet or the VGGNet architecture proposed by
the VGG group of Oxford [24] improved the AlexNet model by replacing the big size filters of the first
and second CL with multiple little filters of size 3 × 3. This trick allows to have the same receptive field,
i.e., the size of the area of the input image from with the output depends, while decreasing the total
number of parameters, e.g., from (7 × 7) to 3 × (3 × 3). The VGG16 CLs are followed by 3 fully connected
layer for a total of 16 layers. The VGG architecture achieved the top-five accuracy of 92.9% on ILSVRC
2012 dataset and 93.2% on the ILSVRC 2014 dataset [24]. The complexity of VGG architecture is still too
high because the model is fully connected, i.e., in a layer having input and output channels equal to C,
e.g., C = 512, every input channel is connected to every output channel. The GoogLeNet introduced a
sparse connected architecture based on the idea that most of the activation in a deep network are not
necessary, because redundant, due to the correlations between them. Since kernels for sparse matrix
multiplication were not optimized, the GoogLeNet released the Inception module [25] that
approximates a sparse CNN with a normal dense construction. Furthermore, the Inception module
decreases the number and the size of the convolutional filters and keeps a little width of the network
which is reducing the total complexity of the model. Additionally, it uses the convolutional filters with
different sizes, i.e., 5, 3, and 1, to consider the details at different resolutions. Other important
improvements proposed by this model are 1) the bottleneck layer implemented by using the filters of
size 1 × 1 and 2) the substitution of the fully connected layers with a global averaging pooling layer. The
Inception architecture is much faster than the VGG. The Inception V3 achieved the top-five accuracy of
94.4% on ILSVRC 2012 dataset [8].
A Framework for Food recognition and predicting its
Nutritional value through Convolution neural network
Convolution neural networks are a sort of artificial neural network (ANN) that uses perceptrons, a
machine learning unit technique, to conduct supervised data learning. Picture processing and other
analytical activities are aided by CNN. CNN analyses the image piece by piece, looking for what
areknown as features. Thus, CNN is far better at detecting similarity than full image matching systems
since it finds similar characteristics that match between the images. These are thumbnails of the larger
image. We take a feature from the trained dataset and apply it to the input image; if the two match, the
image is correctly identified. Figure 2: Proposed CNN Model The suggested CNN model’s first layer is the
Convolutional-2D layer. This layer’s following Convolutional-2D layer is related to it. The max-pooling
layer is the following layer, which minimises the size of the convolved feature map to save time and
money. It also extracts the maximum value from the convolutional layers' resulting matrix.The dropout
layer, which breaks connections between one convolutional 2D layer and another convolutional 2D
layer, is the next step. This prevents the proposed model fromoverfitting. The dropout layer, on the
other hand, eliminates the connections between one convolutional 2D layer and another 2D convolution
layer. The value obtained by the max-pooling layer is passed on to the convolutional 2D, max-pooling,
and dropout layers in the following hidden layer. The flatten layer is then placed after that. The multi-
dimensional array is converted to a single-dimensional array via this layer. It's a tool for converting a
matrix to a vector. This vector is used to classify a image into one of the predefined class in the dense
layer
Rpn algorithm:
https://towardsmachinelearning.org/region-
proposal-network/
https://www.listendata.com/2022/06/region-
proposal-network.html