2014 DeepCaptcha
2014 DeepCaptcha
Perception
ABSTRACT
Over the past decade, text-based CAPTCHA (TBC) have
become popular in preventing adversarial attacks and spam
in many websites and applications including emails services,
social platforms, web-based market places, and recommen-
dation systems. However, in addition to several problems
with TBC, it has become increasingly difficult to solve in
recent years, to keep up with OCR technologies. Image-
based CAPTCHA (IBC), on the other hand, is a relatively
new concept that promises to overcome key limitations of
TBC. In this paper we present an innovative IBC, Deep-
CAPTCHA, based on design guidelines, psychological the-
ory and empirical experiments. DeepCAPTCHA exploits
the human ability of depth preception. In our IBC users
should arrange 3D objects in terms of size (or depth). In
our framework for DeepCAPTCHA, we automatically mine
3D models, and use a human-machine Merge Sort algorithm
to order these unknown objects. We then create new appear-
ances for these objects at multiplication factor of 200, and Figure 1: DeepCAPTCHA is an image-based
present these new images to the end-users for sorting (as CAPTCHA to address the shortcommings of
CAPTCHA tasks). Humans are able to apply their rapid text-based CAPTCHA, by exploiting humans
and reliable object recognition and comparison (arise from depth-perception abilities (Compared a text-based
years experience with the physical environment) to solve CAPTCHA from Google).
DeepCAPTCHA, while machines are still unable to com-
plete these tasks. Experimental results show that humans
can solve DeepCAPTCHA with a high accuracy (˜84%) and
ease, while machines perform dismally. mans and machines. These tests require tasks that are easy
for most humans to solve, while being almost intractable for
state-of-art algorithms and heuristics. Today’s CAPTCHA
1. INTRODUCTION have two main benefits: (1) they are web-based means to
CAPTCHA, standing for “Completely Automated Pub- avoid the enrollment of automatic programs in places where
lic Turing test to tell Computers and Humans Apart”, is an only humans are allowed (e.g. email registrations, on-line
automatic challenge-response test to distinguish between hu- polls and reviewing systems, social networks, etc.); and (2)
they also reveal gaps between human and machine perfor-
Permission to make digital or hard copies of all or part of this work for mance in a particular task. Efforts to close these gaps by
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear artificial intelligence (AI) researchers make CAPTCHA an
this notice and the full citation on the first page. Copyrights for components evolving battle field between designers and researchers.
of this work owned by others than ACM must be honored. Abstracting with The most popular type of CAPTCHA is text-based CAPTCHA
credit is permitted. To copy otherwise, or republish, to post on servers or to (TBC), which has been in use for over a decade [1, 2]. The
redistribute to lists, requires prior specific permission and/or a fee. Request use of TBC has not only helped user security (e.g. in email
permissions from [email protected]. services) and removing spam (e.g. in review and recommen-
MMSys ’14, March 19 - 21 2014, Singapore, Singapore
Copyright 2014 ACM 978-1-4503-2705-3/14/03ÃĆ ...$15.00 dation systems), but also indirectly helped the improvement
http://dx.doi.org/10.1145/2557642.2557653. of optical character recognition (OCR) technology [2], and
81
even digitizing a large number of books (in reCAPTCHA 4. Low effort: low cognitive demands
project [3, 4]). However in order to stay ahead of optical
character recognition technologies, TBCs have become in- 5. Perceptible: visible under sensory limitations, amenable
creasingly distorted and complex, thus becoming very diffi- to scale
cult to solve for increasing numbers of human users. Com- 6. Usability: Suitable for a variety of platforms
mon human error include mistaking a consecutive ’c’ and
’l’ for a ’d’, or a ’d’ or ’o’ for an ’a’ or ’b’ etc. [5]. It is [G5:] Resistance to random attacks.
noted that these mistakes come from users highly proficient
in the Roman alphabet using high-resolution displays [5]. [G6:] Robustness to brute force (exhaustive) attacks.
This suggests that TBCs are at the limit of human abilities.
In addition, TBCs are inherently restrictive: users need to [G7:] Situated cognition: exploits experiential and embod-
be literate in the language or the alphabet used in the TBCs. ied knowledge in humans [10].
This violates important universal design guidelines, exclud-
A systematic approach to design a CAPTCHA that satis-
ing a large proportion of potential users [6].
fies the above guidelines is to design the CAPTCHA task
In recent years, image-based CAPTCHA (IBC) have been
(Ctask) based on humans’ higher order cognitive skills, and
introduced to address the shortcomings of TBCs. These new
therefore taking advantage of intuitive visual abilities that
types of CAPTCHA capitalizes on the human innate abil-
are hard to pre-solve exhaustively (G7).
ity to interpret visual representations, such as distinguish-
In this paper we introduce DeepCAPTCHA, based on de-
ing images of animals (e.g. Animal Pix [1]), scenes (e.g.
sign guidelines, psychological theory, and empirical experi-
Pix1 , and Bongo2 ), or face images (e.g. Avatar CAPTCHA
ments. We here exploit human’s perception of relative size
[7]). Therefore, IBCs can address a significantly wider age
and depth of the complex objects to create an IBC. These
range and education level (than TBCs), and are mostly lan-
depth-related visual cues are intuitively used by the human
guage independent, except for requiring that the instruc-
visual system from early age [11], and are applicable to all
tions for solving the CAPTCHA be conveyed in a language-
3D every-day or well-known objects including representa-
independent manner, e.g. using diagrams, symbols or video.
tions of animals, man-made objects and characters across
IBCs are therefore closer to the universal design ideals,
multiple scales. Natural objects across scales can range from
and are also more intuitive for humans than TBCs. On the
atoms and molecules to insects, small animals such as mice
other hand, any IBC, if not carefully implemented, suffers
to human-size animals, to the biggest animals such as ele-
from a vulnerability that almost never plagues TBCs: that
phants, whales and dinosaurs, to natural elements such as
of exhaustive attacks. A TBC is typically generated “on
mountains, all the way to planets. Likewise, every-day arti-
the fly” by creating a text string from a finite set of sym-
ficial objects can intuitively be classified across scales from
bols (e.g. the Roman alphabet), and then increasing the
millimeters such as earrings, to centimeters such as keys and
difficulty to identify the string by applying image distortion
coins, to human-size such as bicycles and chairs, to a few me-
filters. Although the alphabet is finite, a virtually infinite
ters such as houses, to larger objects such as skyscrapers and
set of CAPTCHA may be generated from the string creation
airplanes, to super-structures like cities.
combined with the image distortion operation. This makes
DeepCAPTCHA is built on the basis of human intuitive
it intractable for an attacker to exhaustively pre-solve (e.g.
ability to distinguish and sort categories of objects based
by using human solvers) all possible TBCs. By compari-
on their size in a very large collection of everyday objects
son, IBCs typically start with a finite database of images.
that are familiar across cultures. The simplicity of this task
If these images are used without distortion, as in the case
for humans comes from the rapid and reliable object detec-
of CAPTCHA The Dog [8], then an exhaustive attack is
tion, recognition and comparison which arise from years of
possible. If only simple image distortions are applied, after
embodied experience of interacting with the physical envi-
exhaustive attack, machine can match a distorted image, to
ronment (Hoffmann, M., & Pfeifer, R. (2012)). The impli-
previously solved image, e.g. in [9]. Therefore, for any IBC
cations of embodiment for behavior and cognition: animal
to be viable, it should follow a series of design guidelines to
and robotic case studies, In W. Tschacher & C. Bergomi, ed.,
achieve CAPTCHA goals:
’The Implications of Embodiment: Cognition and Commu-
[G1:] Automation and gradability. nication’, Exeter: Imprint Academic, pp. 31-58). Humans
can rapidly recognize representations of objects, and using
[G2:] Easy to solve by a large majority of humans. their prior information about the real-world sizes of these
objects, order them in terms of size [12]. We exploit the
[G3:] Hard to solve by automated scripts. same principle here, having n objects (here n = 6), and ask
[G4:] Universal (adapted from Center for Universal Design3 ): the user to order them in terms of size, but we change the
appearances of each object to the extent that humans can
1. Equitable: diverse age and culture groups still recognize them, but machines cannot. Figure 1 illus-
trates an example of a DeepCAPTCHA task, on the user’s
2. Flexibility: customizable for specific target applica- device. This seemingly trivial task for humans can be made
tions very difficult for machines as we describe the details of our
framework in the next sections.
3. Intuitive: instructions are easy to understand and fol-
However, there are several challenges to reach an idea
low
CAPTCHA task (Ctask). The DeepCAPTCHA should (1)
1
http://gs264.sp.cs.cmu.edu/cgi-bin/esp-pix address random attacks by a well-designed interface; (2) be
2
http://www.captcha.net/captchas/bongo immune to exhaustive attacks, even when the database of
3
http://www.ncsu.edu/ncsu/design/cud/ objects is leaked (3) be robust to machine-solver attacks
82
while keeping the task easy for a wide range human users.
In addition, to have an automatically gradable Ctask, (5)
the DeepCAPTCHA system should be able to automatically
mine new objects and discover the relative size between them
and the rest of database, and finally remove objects with size
ambiguity. In this work we address these challenges by in-
troducing a five-parted framework, build around the core of
3D model analysis and manipulation.
Our choice of 3D models as the core of DeepCAPTCHA
framework enables us to have full control on object manipu-
lation, and feature analysis. Using 3D model allows appear-
ance alternations at multiplication factor of 200, without
compromising the ease of recognition by human. This is re- Figure 2: Examples of Pix (left), and Bongo (right)
markable and impossible to achieve by conventional TBCs CAPTCHA.
and almost all other IBCs, where the employed distortions
make CAPTCHA tasks more difficult for humans. We there-
fore start with using web crawlers to mine 3D models from guidelines,
the web. As the crawlers return any sorts of model, the sec- • Proposing a fully automatic framework that collects,
ond part of the framework automatically filter out models maintains, and refreshes its CAPTCHA materials (3D
which are simple enough to be distinguished by machine- objects),
learning techniques, by analyzing model features. Now that
all models have a baseline complexity, the third part of the • Propose an automatic method to label unknown ma-
framework orders database objects based on relative size. terials in the database, using a human-machine com-
This part is a machine-human combination of the Merge Sort bination in Merge Sort.
algorithm, that enables DeepCAPTCHA to use human ob-
ject recognition ability via the Amazon Turk service. There In addition to these main contributions, we present smaller
are considerable benefits of using Merge Sort including sys- novel techniques in each part of the framework to create a
tem time and budget constraints that we discuss in Section practical IBC.
4.3. Having an ordered database of 3D objects, the fourth
part of our framework is to automatically change original 2. RELATED WORKS
object appearances, using on-the-fly translation, rotation, Image-based CAPTCHA (IBC) capitalizes on human abil-
re-illuminated, re-coloring, and finally background clutter- ity to detect, recognize, or understand aspects of images, in
ing, to camouflage objects from the “eyes” of machines. This a task that is (still) hard for machines. A well-designed IBC
part can reach a multiplication factor of over 1:200 (200 dif- can be used by people of different ages, nationalities, and
ferent images from a single 3D model). This large multi- literacy levels (universality in CAPTCHA design). In this
plication factor protects DeepCAPTCHA against both ex- section we review several IBCs proposed in the literature,
haustive and machine-solver attacks (G3 & G6). Firstly, no each with individual pros and cons.
2D image is stored to be exhausted in the first place, and IronClad4 is an IBC based on recognizing images of simple
for each Ctask, a completely new appearance of objects are objects such as balls, bars, and keys. The user should enter
created before presenting it to the users. Secondly, even if the number of instances for each class of objects to com-
the database of objects are leaked, machines cannot link the plete the task. This IBC is based on humans’ object recog-
newly generated images, to the correct source object, and nition abilities, however, as it uses a non-distorted database
therefore fail to assign the correct size. We discuss the de- of labeled images, IronClad is vulnerable to be attacked by
tails of this object appearance alteration part of the frame- computer vision algorithms (G3) and also exhaustive attacks
work in Section 4.4, where we see the vital role of this part to (G6). In addition, this IBC uses English as it primary lan-
make DeepCAPTCHA successful. Finally, with the freshly guage (user read class names in English) and is therefore not
generated appearances, the fifth part of the framework is to satisfying the universality guideline (G4).
create a user-friendly CAPTCHA interface, compatible to Pix CAPTCHA5 has a large database of labeled images,
all smart devices with large and small screens, while being and for each Ctask chooses 4 images from the same label,
robust against random attacks (G5). distort them and present to the user. The user should then
In our experiments we show that humans can solve Deep- assign a single label to these four images. Another IBC,
CAPTCHA tasks with relative ease (average accuracy of Bongo CAPTCHA6 is very similar to Pix in its concept. The
83.7%), most likely due to their ability in object recognition user should assign a single block in request to one of the two
and 3D generalization [13], while machines showing constant shown classes, based on its visual characteristics. Bongo also
failure in either recognizing objects, or guessing the relative uses a pre-labeled database from which it selected the images
sizes. for a new Ctaks. Although both Pix and Bongo IBCs use
In summary, this paper presents the following main con- humans’ concept abstraction abilities, both of these captchas
tributions: are vulnerable to exhaustive attacks (G6). More over, Bongo
has also very high (50%) chance of random success (G7).
• A novel image-based CAPTCHA (IBC) based on hu-
4
mans high-level cognitive processes of depth percep- http://www.securitystronghold.com/products/ironclad-
tion, captcha/
5
http://www.captcha.net/captchas/pix/
6
• The first IBC to satisfy all seven CAPTCHA design http://www.captcha.net/captchas/bongo/
83
[18]), high success rate of current machine algorithms (e.g.
face detection CAPTCHA [6]), and being prone to exhaus-
tive attack (arguably all of previous IBCs).
In this work, we present the first practical IBC, Deep-
CAPTCHA, that satisfies all CAPTCHA design guidelines,
based on human’s ability in depth perception. We start
with a brief review of depth perception in humans in the
next chapter, and follow to DeepCAPTCHA system design
details.
84
6 images of different objects, and user’s task is to order these
Figure 4: Our proposed framework for Deep- images in terms of their relative size (i.e. the size of the ob-
CAPTCHA ject each image represent). In order to automatically create
these Ctasks, we device our framework in five parts, namely,
object mining, object filtering, object ordering, appearance
altering, and finally presenting the Ctask to the user (user
interface). Figure 4 illustrates the DeepCAPTCHA frame-
work. Each of the five components of the framework is pro-
posed with a goal-specific design that make the entire the
framework easy to implement and use, while making the
resulting Ctaks robust against attacks (automatic solvers,
random attacks, as well as exhaustive attacks). We there-
fore believe that our framework is not only the first frame-
work for an image-based CAPTCHA (IBC) that satisfies
all CAPTCHA design guidelines, but also a framework that
can be easily scaled and evolved along side the advances in
computer vision field. Next in this section we describe the
details of each component and the contribution in each of
these components, followed by a discussion on the frame-
work scaling and evolving ability.
85
We choose the relative size approach as it produces a
stronger Ctask. To store object relations, we propose to use
a directed graph of linked-lists data structure G = (V, E).
Each vertex v is a linked-list storing objects with similar
sizes, and each edge e represents the relation “larger than”
to another set of objects.
The challenge for automatically sorting the object database
is that machine-learning approaches fail to perform object
recognition (a desirable feature for CAPTCHA). We propose
to use a modified Merge Sort algorithm that uses Amazon
Mechanical Turk (AMT) service (actual humans) (similar to
[32]) to compare objects and add them to the belonging ver-
tices in G. Merge Sort treats the database as an unsorted
list of objects, and as it progresses in this list, whenever
a comparison between two objects xi and xj (xi , xj ∈ / V
or (xi , xj ) ∈
/ E) is required, a their images are sent to the
Figure 5: Filtering ”easy” models based on rough AMT service for a relative size comparison. After a certain
classification accuracies on model features. number of AMT responses, the median of the answers are
set to the database.
In occasions, scale thresholds may not be sharp, so the
of the classifiers. Figure 5 illustrates the maximum accuracy selection of 3D models needs to account for this potential
of these classifiers based on the aforementioned features. As confusion by discarding objects that may be perceived as
this figure shows, classification for the majority of models belonging to more than one size category. Other foresee-
is at the random rate, however, some of the models create able challenges include the likely familiarity of the target
spikes in this trend as they are easy-to-learn models based on user with object category and instance (groups of users may
their visual appearances. The objective of the second part not be familiar with some car categories or models, uncon-
of the DeepCAPTCHA framework is to remove these “easy” ventional chair models or uncommon animals), as well as
models from the dataset. We therefore use model features, closeness to a set of canonical projections of objects, as fur-
labeled with the related maximum classification accuracy, to ther explained below. To avoid models with ambiguity (e.g.
train a new SVM classifier to label each new model as easy car model that also appears like a toy), we discard objects
or hard. that more than l (here l = 3) AMT users assign different
Now that all database models have a baseline complex- relations to them.
ity, we need to know their relative size to create a Deep-
CAPTCHA Ctask, which is the next part of our framework. 4.3.1 Partially Sorted Database
We chose the relative size scenario that requires more ob-
4.3 Object Ordering ject comparisons and therefore costs more than absolute size
In a DeepCAPTCHA task we present 6 objects to be scenario (choosing stronger CAPTCHA, over fewer number
sorted in terms of size by the user, and as these objects of comparisons). For a complete ordered set we need nlog(n)
are automatically mined from the web, we have no intu- comparisons in the worst case, and as we are outsourcing the
ition about their real-world sizes. We therefore need to first comparisons, the number of comparisons becomes a very
know the relative size between the objects, in order to know important issue of the system design for DeepCAPTCHA.
whether a CAPTCHA task (Ctasks) is solved correctly. The This is because the system is charged for each comparison
purpose of the third part in the DeepCAPTCHA framework by AMT, and also the comparison by humans becomes the
is use a human-machine sorting algorithm (modified Merge running time bottleneck of the entire system. There are two
Sort) to continuously order the database of objects, as the possible approaches in size assignment, namely, using abso-
stream of new object is continuously refreshing the database. lute size categories, and using relative size relationships.
There are two possible approaches in object size assign- The first approach to use partially-sorted database to cre-
ment, namely, using absolute size categories, and using rela- ate Ctask. Considering the database of the objects as the list
tive size relationships. In the absolute size scenario, humans L, to be sorted, we can sort disjointed sub-lists l1 , l2 , ..., lk ⊂
assign a categorical size (e.g. very small, small, normal, big, L, up to a state that the final Ctask meets the required ro-
very big, etc.) to each object. This scenario has the advan- bustness against random attacks (having large enough an-
tage that it requires only n comparisons, (n is the number swer space), while consuming the minimum resources. The
of objects in the database). On the other hand, using cat- size of these sorted sub-lists can be increased when the sys-
egorical size restricts object sizes to a limited set, which in tem constraints are relaxed. The main idea behind using
turn restricts the possible set answers to the final Ctask, partially sorted dataset is that we only need know the re-
thus reducing the robustness of the CAPTCHA against ran- lation between a small number of objects for each Ctask,
dom attacks. Moreover, with a limited set of size categories, therefore given a large number of disjointed small sorted
there is a high possibility that different users assign differ- sub-lists with len(li ) len(L), that are just long enough
ent sizes to the same object. In contrast, using relative size to create a Ctask, we can avoid a considerable number of
relationships increases the space of possible answers to the comparisons.
Ctask, and also addresses the ambiguity problem of assign- Using Merge Sort brings forth two advantages in imple-
ing different sizes to the same object, but requiring a larger menting the partially-sorted database approach. First, dur-
number of comparisons (O(nlogn)). ing the sorting process in the Merge Sort, at each point one
86
can know the number of disjointed sorted sub-lists, as well
as their lengths. Second, Merge Sort can be directly used for
parallel sorting, which matches the nature of AMT service,
significantly reduces the required running time, and creates
individual sorted sub-lists in the process.
87
Figure 8: ROC curves (red) of machine performance
Figure 7: An example 3D model (top), and its final on classifying objects into size categories (top) and
altered image (bottom) created on the fly, to be sent recognizing objects (bottom). Blue line indicates
for Ctask. applied threshold.
significant role in protection against exhaustive and machine- select images in the order of size by clicking (or tapping) on
solver attacks. them. As the user selects images, his/her selection appears
In order to test the effectiveness of appearance alteration, as image thumbnails above the 3 × 2 grid, and the user can
we perform two sets of experiments. Note that for an al- reset the current selection by clicking (or tapping) on these
gorithm to solve DeepCAPTCHA, it has to either identify thumbnails. Finally there is a submit button to be hit when
the objects and then use a prior information to determine a confident order is reached and a refresh button for the
the size of the objects (e.g. first identify a bee and a car, cases that the user cannot solve the Ctask for any reason.
and based on a prior information of relative size of cars and In each request for Ctask, we allow up to one “equal” rela-
insects, solve the CAPTCHA), or directly estimate the rela- tion between the presented objects (i.e. not more than two
tive size of the presented objects (e.g. into generic size cate- objects are equal in size). In addition, to avoid any ambigu-
gories). We therefore test machine performances in both of ity in size, we restrict the objects with no “equal” relation to
these scenarios. Using 100 objects as the dataset, we create have at least 2 levels of distance from each other (i.e. objects
20 images per object based on the above appearance alter- with minimum distance of d = 2 in the database graph G).
ations. We then trained several well-known classifiers (lin- Random attack success probability in this presentation
ear, radial-basis, and polynomial SVM, random-forest, and is 16! = 1.3889 × 10−3 for cases with none of the objects
K-NN) using randomly selected %60 of the images, based have equal sizes, and 26! = 2.7778 × 10−3 for cases with
on the aforementioned 14 features (RGB histogram, oppo- two objects having the same size. This probability for the
nent color histogram, SIFT, SURF, etc.) and then tested success of random attacks falls well in the safe range based
their performances on the remaining %40. Our experimen- on the literature [6, 8, 16].
tal results show that due to the introduced variations in It should be noted that in case of using partially sorted
object appearances, machines constantly fail in both object database, the random attack success probability would be
recognition and object size estimation. Figure 8 illustrates higher than above. This is because the space of possible an-
machines random behavior in assigning size categories and swers reduces, due to reduction of the total number of usable
object recognition. objects (sorted lists of objects). Given a sorted dataset of
n objects, and c objects in each Ctask, the probability of a
4.5 Presentation successful random attack is
The final part of the DeepCAPTCHA framework is the 1 c!(n − c)! 1 (n − c)!
presentation on the user’s device. This part should present = = =
n Cc × c! n!c! n Pc n!
an intuitive user interface, that is both easy to understand
and use for humans, and robust to random attacks. As If the sorting stops just before the final merge (reducing
most of the internet users migrate to use mobile devices, n2 comparisons), the success probability increases to (n2−c)!
n2!
.
this interface should be compatible to various devices and However, this reduction does not threat the final robustness
screen sizes. of the Ctask against random attacks. For example, in a
We use a simple, light-weight user interface consisting of dataset of 1000 objects (realistic databases may have more
a grid 3 × 2 images to be sorted by the user. The user can than 100000 objects) with each Ctask requiring 6 objects,
88
stopping at sub-lists with lengths of n2 and n4, incease suc-
cess probability of random attack from 1994, 010, 994, 000
to 161, 752, 747, 000 and 13, 813, 186, 000 respectively, which
don’t have any significant effect on the robustness of Ctasks.
89
EUROCRYPT 2003, pp. 646–646, 2003. in CISP, vol. 3, 2008, pp. 456–460.
[2] J. Yan and A. S. El Ahmad, “Usability of captchas or [26] M. Fussenegger, P. M. Roth, H. Bischof, and A. Pinz,
usability issues in captcha design,” in SOUPS ’08, 2008, pp. “On-line, incremental learning of a robust active shape
44–52. model,” Proc. DAGM-Symp. Pattern Recognit, pp. 122
[3] L. von Ahn, M. Blum, and J. Langford, “Telling humans –131, 2006.
and computers apart automatically,” Commun. ACM, [27] A. Toshev, A. Makadia, and K. Daniilidis, “Shape-based
vol. 47, no. 2, pp. 56–60, 2004. object recognition in videos using 3d synthetic object
[4] L. von Ahn, B. Maurer, C. McMillen, D. Abraham, and models,” in CVPR, 2009, pp. 288–295.
M. Blum, “recaptcha: Human-based character recognition [28] Y. Zhao and A. Cai, “A novel relative orientation feature
via web security measures,” Science, vol. 321 (5895), pp. for shape-based object recognition,” in IC-NIDC, 2009, pp.
1465–1468, 2008. 686–689.
[5] B. B. Zhu, J. Yan, Q. Li, C. Yang, J. Liu, N. Xu, M. Yi, [29] A. Diplaros, T. Gevers, and I. Patras, “Combining color
and K. Cai, “Attacks and design of image recognition and shape information for illumination-viewpoint invariant
captchas,” in CCS ’10. ACM, 2010, pp. 187–200. object recognition,” Image Processing, IEEE Transactions
[6] Y. Rui and Z. Liu, “Artifacial: Automated reverse turing on, vol. 15, pp. 1–11, 2006.
test using facial features,” MMSys, vol. 9, no. 6, pp. [30] W. Hu, X. Zhou, W. Li, W. Luo, X. Zhang, and
493–502, 2004. S. Maybank, “Active contour-based visual tracking by
[7] D. DSouza, P. C. Polina, and R. V. Yampolskiy, “Avatar integrating colors, shapes, and motions,” Image Processing,
captcha: Telling computers and humans apart via face IEEE Transactions on, vol. 22, no. 5, pp. 1778–1792, 2013.
classification.” in EIT. IEEE, 2012, pp. 1–6. [31] W. Wang, L. Chen, D. Chen, S. Li, and K. Kuhnlenz, “Fast
[8] J. Elson, J. R. Douceur, J. Howell, and J. Saul, “Asirra: a object recognition and 6d pose estimation using viewpoint
captcha that exploits interest-aligned manual image oriented color-shape histogram,” in ICME, 2013, pp. 1–6.
categorization,” in CCS ’07, 2007, pp. 366–374. [32] G. Little, L. B. Chilton, M. Goldman, and R. C. Miller,
[9] D. Misra and K. Gaj, “Face recognition captchas,” AICT, “Turkit: human computation algorithms on mechanical
p. 122, 2006. turk,” in Proceedings of the 23nd annual ACM symposium
[10] W. J. Clancey, Situated Cognition: On Human Knowledge on User interface software and technology. New York,
and Computer Representations (Learning in Doing: Social, NY, USA: ACM, 2010, pp. 57–66.
Cognitive and Computational Perspectives). Cambridge [33] H. Yu, M. Li, H.-J. Zhang, and J. Feng, “Color texture
University Press, Aug. 1997. [Online]. Available: moments for content-based image retrieval,” in ICIP, vol. 3,
http://www.worldcat.org/isbn/0521448719 2002, pp. 929–932.
[11] W. Hudson, “Pictorial depth perception in sub-cultural [34] I. Omer and M. Werman, “Color lines: Image specific color
groups in africa,” The Journal of Social Psychology, vol. 52, representation,” CVPR, pp. 946–953, 2004.
no. 2, pp. 183–208, 1960. [35] K. Konstantinidis, A. Gasteratos, and I. Andreadis, “Image
[12] V. Bruce, P. Green, and M. Georgeson, Visual Perception: retrieval based on fuzzy color histogram processing,” Optics
Physiology, Psychology and Ecology. 3rd Ed. Psychology Communica-
Press, Hove., 1996. tions, vol. 248, no. 4, pp. 375 – 386, 2005. [Online]. Available:
[13] T. Palmeri and I. Gauthier, “Visual object understanding,” http://www.sciencedirect.com/science/article/pii/S0030401804013069
Nature Reviews Neuroscience, vol. 5 (4), pp. 291–303, 2004. [36] A. Abdel-Hakim and A. Farag, “Csift: A sift descriptor
[14] G. Goswami, B. M. Powell, M. Vatsa, R. Singh, and with color invariant characteristics,” in CVPR, vol. 2, 2006,
A. Noore, “Facedcaptcha: Face detection based color image pp. 1978–1983.
captcha,” FGCS, 2012. [37] F. Pavel, Z. Wang, and D. Feng, “Reliable object
[15] P. Golle, “Machine learning attacks against the asirra recognition using sift features,” in MMSP, 2009, pp. 1–6.
captcha,” in CCS ’08, 2008, pp. 535–542. [38] X.-Y. Wang, J.-F. Wu, and H.-Y. Yang, “Robust image
[16] R. Gossweiler, M. Kamvar, and S. Baluja, “What’s up retrieval based on color histogram of local feature regions,”
captcha?: a captcha based on image orientation,” in WWW Multimedia Tools Appl., vol. 49, no. 2, pp. 323–345, 2010.
’09, 2009, pp. 841–850. [39] F. D. M. de Souza, E. Valle, G. C. Chavez, and
[17] R. Datta, J. Li, and J. Z. Wang, “Imagination: a robust A. de Albuquerque Araujo, “Hue histograms to
image-based captcha generation system,” in MULTIMEDIA spatiotemporal local features for action recognition,”
’05, 2005, pp. 331–334. CoRR, 2011.
[18] J. Kim, S. Kim, J. Yang, J.-h. Ryu, and K. Wohn, [40] R. Cucchiara, C. Grana, M. Piccardi, A. Prati, and
“Facecaptcha: a captcha that identifies the gender of face S. Sirotti, “Improving shadow suppression in moving object
images unrecognized by existing gender classifiers,” detection with hsv color information,” in ITS, 2001, pp.
Multimedia Tools and Applications, pp. 1–23, 2013. 334–339.
[19] M. Korayem, A. Mohamed, D. Crandall, and [41] X. Hu and W. Hu, “Motion objects detection based on
R. Yampolskiy, “Learning visual features for the avatar higher order statistics and hsv color space,” in ICM, vol. 3,
captcha recognition challenge,” in ICMLA, vol. 2, 2012, pp. 2011, pp. 71–74.
584–587.
[20] T. Yamasaki and T. Chen, “Face recognition challenge:
Object recognition approaches for human/avatar
classification,” in ICMLA, vol. 2, 2012, pp. 574–579.
[21] P. G. Zimbardo, Psychology and life. HarperCollins, 1992.
[22] S. Coren, C. Porac, and L. M. Ward, Senses and sensation;
Perception. Academic Press (New York), 1979.
[23] T. Gevers and H. Stokman, “Robust histogram
construction from color invariants for object recognition,”
PAMI, vol. 26, pp. 113–118, 2004.
[24] H. Stokman and T. Gevers, “Selection and fusion of color
models for image feature detection,” PAMI, vol. 29, no. 3,
pp. 371–381, 2007.
[25] T. Liu, H. Guo, and Y. Wang, “A new approach for
color-based object recognition with fusion of color models,”
90