0% found this document useful (0 votes)
25 views12 pages

Cse 12

This document proposes a "Stereotype Deepening" algorithm for anomaly detection that uses a tree-like teacher-student network structure with transitive learning. It aims to deepen the "stereotype" or cognitive bias between the teacher and student networks to better detect anomalies. Experiments on different datasets show it can accurately locate anomalies and performs well especially on texture data.

Uploaded by

rupa reddy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
25 views12 pages

Cse 12

This document proposes a "Stereotype Deepening" algorithm for anomaly detection that uses a tree-like teacher-student network structure with transitive learning. It aims to deepen the "stereotype" or cognitive bias between the teacher and student networks to better detect anomalies. Experiments on different datasets show it can accurately locate anomalies and performs well especially on texture data.

Uploaded by

rupa reddy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 12

Deepening Stereotype Recognition for Detecting

Anomalies
Abstract: At present, many anomaly detection researches focus on two problems: one is that the anomaly on pixels cannot be
accurately located; the other is that the training data cannot include the anomalies. We introduce the “Stereotype Deepening” algorithm
to solve the challenging problems, which uses transitive learning in the process of training the tree-like teacher-student network
structure to deepen the “Stereotype”. Therefore, in the abnormal area, the descriptors given by the student will deviate from the
descriptors given by the teacher. Additionally, peer bias is also taken into account as an abnormal score item. Experiments have been
conducted on different types of datasets to prove the effectiveness of this algorithm for anomaly detection and anomaly localization. By
comparison, the method proposed in this paper has significant advantages in textures data type.

Keywords: Stereotype deepening; Transitive learning; Knowledge distillation; Anomaly detection; Anomaly localization

1 Introduction network with transitive learning characteristics is


introduced to complete regional anomaly detection and
In the real world, a common requirement is to determine localization. Figure 1 shows the detection results
which instances are different from other instances, and represented by anomaly maps. It has been confirmed in [6]
such a process is called anomaly detection [1]. Up to now, that there will be a cognitive bias between students and
anomaly detection is still a challenging task. To solve teachers, which is called “Stereotype”. As shown in Figure
various problems in anomaly detection, different anomaly 2, our intuition is that the network will deepen this
detection algorithms have appeared one after another. cognitive bias in the process of transitive learning, and
Deep neural networks (DNN) have developed rapidly in abnormal areas can be distinguished by deepening
recent years, and anomaly detection algorithms based on “Stereotype”.
deep learning have shown excellent performance. Because
many data used for anomaly detection are difficult to be The main contributions are as follows:
labeled, anomaly detection methods based on unsupervised
learning have been widely studied. Most of the initial work • We presented a tree-like teacher-student anomaly
focuses on image reconstruction, and the common method detection structure based on “stereotype deepening”,
is to use general models such as generative adversarial which associating anomalies with pixels and locating
networks (GANs) [2] and autoencoders [3, 4]. Some anomaly areas.
researchers have found that the pre-trained DNN has
powerful functions. They used a tiny stack of autoencoders
and a convolutional neural network (CNN) to form a
cascade classifier to cooperate in cubic-patch anomaly
detection [5]. In [6], they applied the student-teacher
structure to unsupervised anomaly detection using a pre-
trained residual neural network (ResNet) and completed
anomaly detection and anomaly localization through
multiscale anomaly segmentation. Afterward, Salehi et al.
[7] used a visual geometry group (VGG) as a pre-training
network to distill the knowledge into a cloning network.
They used the distance between activation values and the
directional similarity of activation vectors between several
key layers to complete anomaly detection and used the
gradient of overall loss to find anomaly regions that caused
their values to increase to complete anomaly location.
Compared with [6], Salehi et al. [7] completed anomaly
detection and anomaly localization from different angles.
Figure 1: Comprehensive assessment results. The
Inspired by the previous work, this paper proposed an evaluation types include textures and objects. There are
anomaly detection method based on “Stereotype apparent color differences between the abnormal area
Deepening”. In this paper, a tree-like teacher-student and the surrounding area.
• We proposed two loss functions: one is a new 2.2 Semi-supervised Anomaly Detection
compactness loss, which is not affected by batch size,
and the other is the regression error between descriptors. Labels for normal data are more accessible to obtain than
• We integrated inference bias, delivery bias and peer bias labels for abnormal data. Therefore, many researchers
to evaluate the performance of anomaly detection and chose to use semi-supervised methods to complete
localization, so that the result was more obvious. anomaly detection. Gu et al. [13] proposed a corrupted
Anomaly maps are used to intuitively express the results GAN (CorGAN) for outlier detection. Assuming that the
of anomaly localization. generator generates outliers of negative classes, the
• Experiments on three datasets have proved the discriminator was trained to distinguish the training dataset
effectiveness of the method proposed in this paper. Our from the data generated by the generator. To avoid
algorithm shows satisfactory results on all datasets, reaching Nash equilibrium in the training process, they
especially in the category of textures. also proposed several techniques to break the fusion and
establish robust outlier identifiers. Similarly, influenced by
2 Related Work GANs, Sabokrou et al. [14] took the lead in adding one-
class classification to the end-to-end architecture and
2.1 Supervised Anomaly Detection introduced an anomaly detection network structure of R+D,
in which R consists of encoder and decoder, while D is a
Many supervised anomaly detection methods are in the CNN network, which was used to classify the data
form of binary classification. Because of the use of labels, regenerated by R. Perera and Patel [15] proposed deep one-
they can produce highly accurate results. Some studies [8, class classification (DOC) to solve the one-class
9, 10] try to use the method based on active learning. classification problem and introduced the joint loss based
Gaddam et al. [11] proposed a novel anomaly detection on compactness loss and descriptive loss to train the
method. network. Finally, it was verified by experiments on
anomaly detection, novelty detection, and mobile active
authentication datasets. Unsupervised anomaly detection
has also been widely studied in many application areas. In
detecting abnormal climate, Racah et al. [16] proposed a
multi-channel spatial-temporal CNN architecture for semi-
supervised bounding box prediction and exploratory data
analysis to address the challenge of incomplete extreme
weather labeling data. This method can apply time
information and unlabeled data to improve the positioning
of extreme weather events. In addition, in remote sensing
applications, there is also a challenge of collecting labeled
data. Wu and Prasad [17] provide a semi-supervised
anomaly detection method for hyperspectral image
classification, which used unlabeled data with pseudo-
labels generated by the C-DPMM-based clustering
algorithm to train the neural network.
Figure 2: Prior distribution. Coordinate system (1) shows
the cognitive deviation between one doctor and two Semi-supervised anomaly detection can carry out end-to-
masters, and coordinate system (2) shows the cognitive end learning and improve the situation of insufficient data
deviation between one master and two bachelors. Their labels. However, it takes a long time in the training
distributions are relatively similar in normal areas, but process, and the effect of feature extraction is not good.
their distributions will be quite different in abnormal
areas due to deepened cognitive bias. 2.3 Unsupervised Anomaly Detection
Based on K-Means and ID3, which first obtained k Compared with supervised and semi-supervised anomaly
different clusters and then constructed an ID3 decision tree detection, an obvious advantage of unsupervised anomaly
in each cluster. This approach avoids both the forced detection is that it can distinguish normal from
assignment and class dominance. Jumutc and Suykens [12] abnormality by learning unlabeled dataset. Zong et al. [18]
extended the supervised novelty detection. They proposed a deep auto-encoded gaussian mixture model
introduced a new coupling term between classes which (DAGMM), which is easy to carry out end-to-end training
took advantage of finding a reasonable decision boundary. for anomaly detection. DAGMM model consists of a
compression network and an estimation network. A deep
Although supervised anomaly detection has high accuracy, automatic encoder was used to generate low-dimensional
it has poor generality due to the uncertainty of anomalies representation and reconstruction errors for each input data.
and the lack of data labels. Some previous works have Further, low-dimensional representation and reconstruction
tried to solve these problems from various aspects, but errors have been fed into the gaussian mixture model. The
supervised anomaly detection still has limitations. problem of continuous anomaly detection in application
fields such as image analysis and video surveillance is a
challenge that needs to be solved. Lu et al. [19] used an
autoencoder model to capture the inherent difference in
density between outliers and normal instances and information carried by the negative labels is ignored
integrated the model into a recurrent neural network frequently. To avoid this problem, the distillation method
(RNN). It was convenient to capture the context changed outputs of the softmax by controlling the
information and finally updated the network through temperature T so that the output probability distribution
hierarchical training. Unlike [19], Leveau and Joly [20] was smoother, and the result was recorded as soft targets.
used an adversarial autoencoder for anomaly detection and The smoother probability distribution of outputs can
further improved its performance by introducing explicit amplify information carried by the negative labels. The
rejection classes in the prior distribution and adding softmax function is defined as follows:
random input images to the autoencoder. Some scholars
have proposed a deep structured energy-based model
𝑞𝑖 = 𝑒𝑥𝑝 𝑎𝑖 /𝑇 /Σ𝑛 𝑖=1𝑒𝑥𝑝𝑎𝑖 /𝑇 (1)
(DSEBM), which extended the energy-based model to a
deep architecture with three types of structures and solved
where T represents the distillation temperature. When
the anomaly detection problem by directly modeling the
training the teacher network, the temperature T was set to
data distribution using the deep architecture [21].
1, and the training was achieved by minimizing the cross-
Moreover, they also provided two decision criteria for
entropy between the softmax layer output and the target.
training, namely energy score and reconstruction error.
After the teacher network was trained, a higher
Mishra et al. [22] used CVAEs to solve the anomaly
temperature T greater than 1 was set, and it is used to train
detection problem under the zero-shot learning. They treat
the student network. The difference between the output of
it as a missing data problem, generate samples from a
the student network and the soft target was regarded as the
given attribute, and use the generated samples to classify
distillation loss. When temperature T in the student
invisible classes. Some people introduced an anomaly
network was set to 1, the difference between output and
detection method for a mobile autonomous robot based on
ground truth was taken as another loss. Both losses are
GAN, which builds a GAN to collect images by remotely
used to evaluate the performance of student network. Its
operating the robot in a given environment [23]. The
results showed that the distilled student network had
shifted grid divides all images into patches for training
comparable performance to the teacher network, which
GAN. It compared the bottleneck feature of the generated
was easier to deploy.
patch with that of the actual patch.
3.2 Descriptor Compactness
Although many unsupervised anomaly detection methods
have been proposed, many existing methods are based on
The neural network model is prone to over-fitting due to
data reconstruction, and the results of anomaly detection
excessive sample noise interference, high model
will be affected by reconstruction errors. Considering this
complexity, and too much iteration. Over-fitting can easily
problem, this paper utilized a tree-like teacher-student
lead to deviations in the results, so it is also essential to
structure to deepen the “Stereotype” generated in the
solve the model over-fitting. In addition to the common
process of transitive learning and used the compactness
causes of overfitting, Tian et al. [25] found that the
loss with irrelevant batches and regression error to
severity of overfitting is directly related to the correlation
optimize the network. Finally, inference bias, delivery bias,
between the descriptor dimensions. Therefore, in their
and peer bias were used as anomaly evaluation indicators.
experiment, an error term was introduced to illustrate the
compactness of descriptors, and the redundancy between
3 Preliminary Work descriptors of different dimensions was reduced through
training so that each dimension carried as much
3.1 Knowledge Distillation information as possible. The correlation coefficients of
different dimensions are expressed as:
For better training effect, many models were trained from
one or more
consumes large
lots neural networks.
of computing However,
resources and is this method
difficult to 𝑟𝑖𝑗 = (𝑏𝑖 − 𝑏̅ ) 𝑇(𝑏𝑗 − 𝑏̅ )/√(𝑏𝑖 − 𝑏̅ ) 𝑇(𝑏𝑖 − 𝑏̅ )√(𝑏𝑗 − 𝑏̅ ) 𝑇(𝑏𝑗 − 𝑏̅ )

𝑖 𝑗 𝑖 𝑖 𝑗 𝑗
deploy. To solve this problem, Hinton et al.[24] tried to
use knowledge distillation to transfer knowledge from (2)
bulky
models to small models, which is more suitable for Among them, 𝑏̅ and 𝑏̅ respectively represent the mean
𝑖 𝑗
deployment to a large number of users. The knowledge much smaller than that of the positive label, so the
distillation model consists of two parts: teacher network
and student network. The teacher network has a
complicated structure and numerous parameters, while the
student network has a simple structure and few parameters.
During training, the student network learns the knowledge
extracted by the teacher network.

The teacher network generated classification results


through the softmax layer. The results contain probability
information of each category, but only one category
belongs to positive labels, and the rest belongs to negative
labels. The probability of each negative label is usually
value of the 𝑖𝑡ℎcolumn and the𝑗𝑡ℎ

column. The correlation matrix[𝑟𝑖𝑗 ]is

denoted as R.

4 Algorithm
In this section, the proposed “Stereotype Deepening”
algorithm is described in detail. In process of the training
network, transitive learning was used to deepen the
“Stereotype”, so that the discrepancy between the
descriptors was enlarged. The student networks were
updated by minimizing the mixed loss. In the
evaluation,
inference bias, delivery bias, and peer bias were used to Bachelors were obtained by studying the knowledge of
measure the effect of anomaly detection and localization. masters. We divided the bachelors into different classes
according to which master is its teacher. Masters act as
4.1 Network Structure teachers and students. We trained all the students on the
given training data U = {u1,u2,...,un} that only contains
As shown in Figure 3, masters and bachelors were set up anomaly-free images. Each network except doctor takes
as students in the tree-like teacher-student structure, and it the descriptor of the previous network as the regression
completed the training of the network one by one through target. For example, the regression targets for bachelors
transitive learning. For convenience, we gave students are the feature descriptors output by masters. After training,
different names for each layer. The students who were we used both abnormal and non-abnormal images as test
obtained after completing the first transitive learning are data, and inference bias, delivery bias, and peer bias
called masters, and then the network carried out the second caused by “Stereotype” were used as indicators for
round of transfer study, at which time bachelors were got. abnormal evaluation.

Figure 3: Bachelor1 and Bachelor2 belong to Class1, Bachelor3 and Bachelor4 belong to Class2. Among them, CostInference
represents the inference bias, CostDelivery represents the delivery bias, and CostPeer represents the peer bias

4.2 The Process of Training trained on the dataset U, using the same method to train
bachelors with outputs of masters. In particular, before
This section will introduce network training in detail. The training the next layer of students, we all Normalized the
process is divided into three stages. The training structure output provided by this layer. Masters have dual identities
is shown in Figure 4. in the whole network structure. They are students for the
front layer and teachers for the back layer.
4.2.1 Training of Doctor Network
Mean Square Error Similar to the training of the doctor
The input image G was randomly cut into patch-sized network, it also needs to extract the knowledge of the
image regions I, and the doctor network D outputs a d- previous network into the current network when training
dimensional descriptor for each patch I. Because the pre- the student network. Meanwhile, the distance is used to
trained deep neural network has a strong representation measure the difference between M(I) and D(I). M(I) is the
ability, it performs well in classification. Therefore, we d-dimension descriptor given by the master network:
used the pre-trained network D as the basic network of the
classification network T, and the loss of the classification Ld= ||M(I) − D(I)||2 (4)
network can be expressed as:
Descriptor Compactness Error For a set of I inputs, to
Lk = −Σylog(T(I)) (3) eliminate redundancy and minimize correlation between
descriptors, we have made some improvements based on
where T is the classification network, and y is the the method used in [25]. The improved method can ensure
classification label. the accuracy of the calculation. Each patch I passing
through the network will be transformed into a d-
4.2.2 Training of Masters and Bachelors dimensional descriptor in a batch, and we have calculated
the correlation between any two descriptors. After
In this part, we used the mean square error and the calculation, it was found that simply summing these
improved descriptor compactness loss as the mixed loss. correlation coefficients cannot accurately express the
The network trained to get masters first and then to get overall correlation, and it would be affected by the batch
bachelors through masters. We always let the current size when minimizing the descriptor correlation. The batch
network fit the description of the previous network. size was determined by the number of patches, which
Specifically, D first extracted patch-based descriptors for could reflect the number of random combinations of
each image on the dataset U, and masters were trained by different descriptors,
regressing the descriptors output by D. After masters
Figure 4: b denotes batch size, and D(I), M(I) and B(I) are descriptors. From right to left are the three stages of training. The
masters are trained according to the doctor, and then the masters are used to train the bachelors.

and further affected the sum of correlation coefficients. information. Bachelors would be affected by this one-sided
Therefore, to eliminate the influence of batch size on factor when training according to the masters, and there
descriptor compactness, the improved method can be would be delivery bias when bachelors learned descriptors
expressed as: of masters. Additionally, there would be peer biases
between masters and peer biases among bachelors. Figure
Lp= n(n − 1)/2 (5) 3 clearly identifies three types of biases.
n is the batch size. We take the deviation degree between the descriptors
given by the master and the descriptors given by the doctor
Mixed Loss The training losses of masters and bachelors as the first score. D(x) represents the descriptor of the
are obtained by summing these two weights and are finally doctor, and Mi(x) is the descriptor given by the ith master.
expressed as follows: The first anomaly score is expressed as:

Lt = µLd+ (1 − µ)Lp (6) 2


𝐶𝑜𝑠𝑡𝐼𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒 ∑𝑖 √ (𝐷 (𝑥 )− 𝑀𝑖 (𝑥 ) (7)
in this formula, µ represents the weighting factor.
As mentioned before, it can be known that bachelors are
4.3 Anomaly Evaluation distilled by masters, so the difference between bachelors
and masters is taken as the second score, which is
For the test set W = {w1,w2,...,wn} that contains both expressed as:
normal data and abnormal data, it determines the abnormal
area by the degree of difference between the descriptors 2
output by each network. Since doctor module has been 𝐶𝑜𝑠𝑡𝑇𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = ∑𝑖 ∑𝑗 √(𝑀𝑖(𝑥) − 𝐵𝑗 (𝑥)) (8)
trained with abnormal data, when abnormal data was input
to doctor, the descriptor output by doctor would conform Bj(x) is the descriptor given by the jth bachelor.
to the features’ distribution of the abnormal area.
However,
masters only learned the distribution of normal data during masters training will lack the induction of comprehensive
training, so their descriptors would deviate from the
description of doctor when they encounter abnormal areas,
resulting in inference bias. Because it only used normal
data to train masters, the weight obtained after completing
The deviations between students of the same level are
combined as the third abnormal score, which is
represented by CostPeer.
𝐶𝑜𝑠𝑡𝑃𝑒𝑒𝑟 = them, 50,000 images are used for training, and the
∑ ∑ √(𝑀 (𝑥) − 𝑀 ( ) 2
∑ ∑ √ ( ) () 2 remaining 10,000 are used as the test set [27].
𝑖 𝑘𝑀 𝑖 𝑘𝑀 𝑥 ) + 𝑗 𝑘 𝐵
(𝐵 𝑗 𝑥 − 𝐵 𝑘𝐵 𝑥 )
(9) MVTec: It consists of more than 5,000 high-resolution
MkM represents any other masters except the ith master. images, including 10 different object categories and 5
BkB(x) represents the descriptor given by any other different texture categories. The images in the training set
bachelors except the jth bachelors. The total anomaly score are non-anomalous, and the testing set contain part of the
is expressed as: abnormal images [28].

CostTotal= CostInference+ CostTransitivity+ CostPeer (10) 5.2 Experimental Results

5 Experiment The network structure used to train doctor, master and


bachelor is given in Table 1.
In this part, the “Stereotype Deepening” algorithm
proposed in this paper will be verified from two aspects: 5.2.1 Comparisons based on the MNIST and CIFAR-
anomaly detection and anomaly localization. All 10 Datasets
experiments were conducted under the environment of
Intel(R) Core(TM) i7-8700 CPU and NVIDIA GeForce The experiment first verifies the performance of the entire
GTX 1660. The code has been released at: network. Only one category in the dataset is regarded as
https://github.com/zhmhbest/StudentTeacherAnomalyDete normal data, and all other categories are regarded as
ction. abnormal. For example, if 0 is regarded as normal data on
the MNIST dataset, the remaining numbers are considered
5.1 Datasets abnormal for testing network performance. We use the
area under the AUROC curve to evaluate the performance
We tested the proposed method on three datasets: MNIST, of our method and other related works.
CIFAR-10, and MVTec.
The “Stereotype Deepening” algorithm shows high
MNIST: It contains 70,000 handwritten digits, of which accuracy on both MNIST and CIFAR-10 data sets,
60,000 belong to the training set, and the rest belong to the especially on CIFAR-10 our average accuracy rate is
test set [26]. 0.1805 higher than the LSA algorithm. Table 2 shows the
comparison results.
CIFAR-10: This data set consists of 10 categories of color
images. Each category contains 6,000 images. Among

Table 1: The network structure when the patch size is 64. Leaky rectified linear units with slope 5 × 10−3 are applied as
activation functions after each convolution layer
Parameters
Layer Output Size Kernel Stride
Input 64×64×3 - -
Conv2d 61×61×64 4×4 1
MaxPool2d 30×30×64 2×2 2
Conv2d 27×27×32 4×4 1
MaxPool2d 13×13×32 2×2 2
Conv2d 10×10×16 4×4 1
MaxPool2d 5×5×16 2×2 2
Conv2d 2×2×8 4×4 1
Conv2d 1×1×4 2×2 1
Linear 1×1×1 - -
Flatten 1×1 - -
Linear 1×512 - -

Table 2: Anomaly detection results. The table gives the anomaly detection accuracy of different algorithms in one-class
classification, and the mean values are reflected in their overall performance
Dataset Method 0 1 2 3 4 5 6 7 8 9 Mean
DSVDD[29] 0.98 0.997 0.917 0.919 0.949 0.885 0.983 0.946 0.939 0.965 0.948
OCGAN[30] 0.998 0.999 0.942 0.963 0.975 0.98 0.991 0.981 0.939 0.981 0.975
CAVGA Du[31] 0.994 0.997 0.989 0.983 0.997 0.968 0.988 0.986 0.988 0.991 0.986
MINST LSA[32] 0.993 0.999 0.959 0.966 0.956 0.964 0.994 0.98 0.953 0.981 0.975
Ours 0.991 0.995 0.994 0.996 0.995 0.995 0.994 0.991 0.992 0.989 0.9932
DSVDD[29] 0.617 0.659 0.508 0.591 0.609 0.657 0.677 0.673 0.759 0.731 0.648
OCGAN[30] 0.757 0.531 0.64 0.62 0.723 0.62 0.723 0.575 0.82 0.554 0.6566
CAVGA Du[31] 0.653 0.784 0.761 0.747 0.775 0.552 0.813 0.745 0.801 0.741 0.737
CIFAR-10 LSA[32] 0.735 0.58 0.69 0.542 0.761 0.546 0.751 0.535 0.717 0.548 0.641
Ours 0.834 0.852 0.748 0.761 0.801 0.762 0.901 0.841 0.887 0.828 0.8215
5.2.2 Comparisons based on MVTec Dataset capabilities of the network, we also verified the effect of
anomaly localization through experiments.
The MVTec dataset provides anomalies based on different
entities. In addition to verifying the anomaly detection

Table 3: Anomaly localization results in terms of AUROC. The table shows the average AUROC on Textures and
Objects
expressed as Textures mean and Objects mean, respectively
Textures Objects
Textures Metal Objects
Carpet Grid Leather Tile Wood mean Bottle Cable Capsule Hazelnut nut Pill Screw Toothbrush Transistor Zipper mean Mean
STAD[6] 0.695 0.819 0.819 0.921 0.725 0.7958 0.918 0.865 0.916 0.937 0.895 0.935 0.928 0.863 0.701 0.933 0.889 0.858
CAVGA
0.73 0.75 0.71 0.7 0.85 0.748 0.89 0.63 0.83 0.84 0.67 0.88 0.77 0.91 0.73 0.87 0.802 0.784
Du[31]
CAVGA
0.78 0.78 0.75 0.72 0.88 0.782 0.91 0.67 0.87 0.87 0.71 0.91 0.78 0.97 0.75 0.94 0.838 0.819
Ru[31]
CAVGA
Dw[31] 0.8 0.79 0.8 0.81 0.89 0.818 0.93 0.86 0.89 0.9 0.81 0.93 0.79 0.96 0.8 0.95 0.882 0.861

Ours 0.958 0.955 0.9633 0.922 0.925 0.9447 0.9533 0.876 0.906 0.972 0.842 0.8086 0.906 0.74 0.67 0.8714 0.8731 0.885

We considered the inferred bias of each module in the maps, in which anomaly maps given by comprehensive
anomaly region. Excepting the inference bias, we also evaluation are included, and the difference of color reflects
considered the delivery bias between masters and anomaly degree. According to Figure 5, the performance
bachelors, which would make the descriptors given by the of the three biases in abnormal areas is different, and the
masters and bachelors different in the same abnormal area. comprehensive evaluation result is more obvious. As
Students of the same grade got different initial settings of shown in Figure 5, the result of anomaly detection and
the networks and did not use abnormal images for training. localization using only the delivery bias is more effective
Therefore, different students’ expressions in abnormal than that of inference bias, which shows that the
areas would also have significant differences when “Stereotype” is reasonable and available for anomaly
encountering abnormal areas. Figure 5 shows anomaly detection and localization.

Figure 5: Anomaly map and anomaly score at all levels. The various bias produced in the process of transitive learning is
evident in the abnormal areas

Table 3 shows the anomaly localization results on the student structure was proposed for anomaly detection and
MVTec dataset. Experiments showed that our method has
outstanding advantages in the category of textures. The
single AUROC on textures is above 0.92. For Textures,
“Stereotype Deepening” is superior to other baselines. The
average results in the category of Objects can reach a level
comparable to other methods. In general, this method is
better than most of the algorithms we compared.

6 Conclusion
In this paper, an algorithm based on a tree-like teacher-
location, called “Stereotype Deepening”. A descriptor
compactness loss that is irrelevant to batch size was
used in the training of the teacher network, and transitive
learning is applied to train the student networks. During
the experiment, the anomaly detection performance of
the network was tested on the MNIST and CIFAR-10
datasets. Afterward, the “Stereotype” generated by the
network during the training process was used to
complete the anomaly localization. Peer bias and
delivery bias verified the effectiveness of “Stereotype”
from two dimensions. The experiment proved that our
method has a significant effect on the textures type.
References [13] Jindong Gu, Matthias Schubert, and Volker Tresp.
Semi-supervised outlier detection using generative
[1] Raghavendra Chalapathy and Sanjay Chawla. Deep and adversary framework, 2018.
learning for anomaly detection: A survey. [14] Mohammad Sabokrou, Mohammad Khalooei,
abs/1901.03407, 2019. Mahmood Fathy, and Ehsan Adeli. Adversarially
[2] Thomas Schlegl, Philipp Seeb¨ock, Sebastian M. learned one-class classifier for novelty detection. In
Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. 2018 IEEE/CVF Conference on Computer Vision and
Unsupervised anomaly detection with generative Pattern Recognition, pages 3379–3388, June 2018.
adversarial networks to guide marker discovery. [15] Pramuditha Perera and Vishal M. Patel. Learning deep
abs/1703.05921:146–157, 2017. features for one-class classification. IEEE
[3] Christoph Baur, Benedikt Wiestler, Shadi Albarqouni, Transactions on Image Processing, 28(11):5450–5463,
and Nassir Navab. Deep autoencoding models for 2019.
unsupervised anomaly segmentation in brain MR [16] Evan Racah, Christopher Beckham, Samira Maharaj,
images. abs/1804.04488:161–169, 2018. Teganand Ebrahimi Kahou, Mr. Prabhat, and Chris
[4] Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Pal. Extreme weather: A large-scale climate dataset for
Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, semi-supervised detection, localization, and
Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, and understanding of extreme weather events. In Advances
HonglinQiao. Unsupervised anomaly detection via in Neural Information Processing Systems, volume 30,
variational auto-encoder for seasonal kpis in web 2017.
applications. In Proceedings of the 2018 World Wide [17] Hao Wu and Saurabh Prasad. Semi-supervised deep
Web Conference, WWW ’18, page 187–196, Republic learning using pseudo labels for hyperspectral image
and Canton of Geneva, CHE, 2018. International classification. IEEE Transactions on Image
World Wide Web Conferences Steering Committee. Processing, 27(3):1259–1270, 2018.
[5] Mohammad Sabokrou, Mohsen Fayyaz, Mahmood [18] Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng,
Fathy, and Reinhard Klette. Deep-cascade: Cascading Cristian Lumezanu, Daeki Cho, and Haifeng Chen.
3d deep neural networks for fast anomaly detection Deep autoencoding gaussian mixture model for
and localization in crowded scenes. IEEE unsupervised anomaly detection. In International
Transactions on Image Processing, 26(4):1992–2004, Conference on Learning Representations, 2018.
2017. [19] Weining Lu, Yu Cheng, Cao Xiao, Shiyu Chang,
[6] Paul Bergmann, Michael Fauser, David Sattlegger, Shuai Huang, Bin Liang, and Thomas Huang.
and Carsten Steger. Uninformed students: Student- Unsupervised sequential outlier detection with deep
teacher anomaly detection with discriminative latent architectures. IEEE Transactions on Image Processing,
embeddings. In 2020 IEEE/CVF Conference on 26(9):4321–4330, 2017.
Computer Vision and Pattern Recognition (CVPR), [20] Valentin Leveau and Alexis Joly. Adversarial
pages 4182–4191, June 2020. autoencoders for novelty detection. Research report,
[7] Mohammadreza Salehi, Niousha Sadjadi, Soroosh Inria - Sophia Antipolis, 2017.
Baselizadeh, Mohammad Hossein Rohban, and [21] Shuangfei Zhai, Yu Cheng, Weining Lu, and Zhongfei
Hamid R. Rabiee. Multiresolution knowledge Zhang. Deep structured energy based models for
distillation for anomaly detection. pages 14902– anomaly detection. In Proceedings of the 33rd
14912, June 2021. International Conference on International Conference
[8] M. Almgren and E. Jonsson. Using active learning in on Machine Learning - Volume 48, ICML’16, page
intrusion detection. In Proceedings. 17th IEEE 1100–1109. JMLR.org, 2016.
Computer Security Foundations Workshop, pages 88– [22] Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal,
98, Los Alamitos, CA, USA, jun 2004. IEEE and Hema A. Murthy. A generative model for zero
Computer Society. shot learning using conditional variational
[9] Yang Li and Li Guo. An active learning based tcm- autoencoders. In 2018 IEEE/CVF Conference on
knn algorithm for supervised network intrusion Computer Vision and Pattern Recognition Workshops
detection. Computers Security, 26(7):459–467, 2007. (CVPRW), pages 2269–22698, June 2018.
[10] Jay Stokes, John Platt, Joseph Kravis, and Michael [23] Wallace Lawson, Esube Bekele, and Keith Sullivan.
Shilman. Aladin: Active learning of anomalies to Finding anomalies with generative adversarial
detect intrusions. Technical Report MSR-TR-2008-24, networks for a patrolbot. In 2017 IEEE Conference on
March 2008. Computer Vision and Pattern Recognition Workshops
[11] Shekhar R. Gaddam, Vir V. Phoha, and Kiran S. (CVPRW), pages 484–485, 2017.
Balagani. K-means+id3: A novel method for [24] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean.
supervised anomaly detection by cascading k-means Distilling the knowledge in a neural network, 2015.
clustering and id3 decision tree learning methods. [25] Yurun Tian, Bin Fan, and Fuchao Wu. L2-net: Deep
IEEE Transactions on Knowledge and Data learning of discriminative patch descriptor in
Engineering, 19(3):345–354, 2007. euclidean space. In 2017 IEEE Conference on
[12] VilenJumutc and Johan A.K. Suykens. Multi-class Computer Vision and Pattern Recognition (CVPR),
supervised novelty detection. IEEE Transactions on pages 6128–6136, 2017.
Pattern Analysis and Machine Intelligence, [26] Yann Le Cun and Corinna Cortes. Mnist handwritten
36(12):2510–2523, 2014. digit database. 2010.
[27] A. Krizhevsky. Learning multiple layers of features
from tiny images. 2009.
[28] Paul Bergmann, Michael Fauser, David Sattlegger,
and Carsten Steger. Mvtec ad – a comprehensive real-
world dataset for unsupervised anomaly detection. In
Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR),
June 2019.
[29] Lukas Ruff, Robert Vandermeulen, Nico Goernitz,
Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander
Binder, Emmanuel Mu¨ller, and Marius Kloft. Deep
one-class classification. In Jennifer Dy and Andreas
Krause, editors, Proceedings of the 35th International
Conference on Machine Learning, volume 80 of
Proceedings of Machine Learning Research, pages
4393–4402, 10–15 Jul 2018.
[30] Pramuditha Perera, Ramesh Nallapati, and Bing Xiang.
Ocgan: One-class novelty detection using gans with
constrained latent representations. In 2019 IEEE/CVF
Conference on Computer Vision and Pattern
Recognition (CVPR), pages 2893–2901, 2019.
[31] Shashanka Venkataramanan, Kuan-Chuan Peng, Rajat
Vikram Singh, and Abhijit Mahalanobis. Attention
guided anomaly localization in images. In Computer
Vision – ECCV 2020, pages 485– 503. Springer
International Publishing, 2020.
[32] Davide Abati, Angelo Porrello, Simone Calderara, and
Rita Cucchiara. Latent space autoregression for
novelty detection. In 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR),
pages 481–490, 2019.

You might also like