Generative Adversarial Ranking Nets: Yinghua Yao
Generative Adversarial Ranking Nets: Yinghua Yao
Jing Li [email protected]
Center for Frontier AI Research, A*STAR, Singapore
and
Institute of High Performance Computing, A*STAR, Singapore
Abstract
We propose a new adversarial training framework – generative adversarial ranking networks
(GARNet) to learn from user preferences among a list of samples so as to generate data
meeting user-specific criteria. Verbosely, GARNet consists of two modules: a ranker and
a generator. The generator fools the ranker to raise generated samples to the top; while
the ranker learns to rank generated samples at the bottom. Meanwhile, the ranker learns
to rank samples regarding the interested property by training with preferences collected
on real samples. The adversarial ranking game between the ranker and the generator en-
ables an alignment between the generated data distribution and the user-preferred data
distribution with theoretical guarantees and empirical verification. Specifically, we first
prove that when training with full preferences on a discrete property, the learned distri-
bution of GARNet rigorously coincides with the distribution specified by the given score
vector based on user preferences. The theoretical results are then extended to partial
preferences on a discrete property and further generalized to preferences on a continuous
∗. The first version of this work was done when the author was a PhD student at Southern University of
Science and Technology (SUSTech), China and University of Technology Sydney (UTS), Australia.
†. The corresponding author.
c 2024 Yinghua Yao, Yuangang Pan, Jing Li, Ivor W. Tsang and Xin Yao.
License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided
at http://jmlr.org/papers/v25/23-0461.html.
Yao, Pan, Li, Tsang, Yao
property. Meanwhile, numerous experiments show that GARNet can retrieve the distri-
bution of user-desired data based on full/partial preferences in terms of various interested
properties (i.e., discrete/continuous property, single/multiple properties). Code is available
at https://github.com/EvaFlower/GARNet.
Keywords: Generative Adversarial Network, Controllable Generation, User Preferences,
Adversarial Ranking, Relativistic f -Divergence
1. Introduction
Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Arjovsky et al., 2017;
Huang et al., 2022) are a popular generative model to approximate a data distribution via
adversarial training between two networks, wherein a generator learns to generate data alike
real data, whereas a discriminator learns to distinguish real data from synthetic data com-
ing from the generator. Although unconditional generative modeling presents an intriguing
technical challenge, it lacks the ability to control the generation of data and has limited
practical relevance. In contrast, controllable generative models can generate data to match
some user-desired properties, meeting the needs of many real-world applications (Engel
et al., 2018). Recent success has been driven in the field of using some predefined at-
tributes/classes or an evaluator that can evaluate specific properties for samples to direct
the generation process of GANs (Mirza and Osindero, 2014; Asokan and Seelamantula,
2020; Engel et al., 2018; Gupta and Zou, 2019; Samanta et al., 2020).
However, in many real-world applications, users may not be able to explicitly articulate
the data they are interested in via the desired class/attribute labels or high values from
the property evaluator (Engel et al., 2018). Instead, more naturally, users would often
implicitly describe their preferences for data by ranking two or more samples in terms
of a certain interested property (Fürnkranz and Hüllermeier, 2017). In particular, GANs
are supposed to generate more (or even exclusively) user-desired data (i.e., data for which
users have high preferences) by learning from preferences, which can have many promising
applications. Here are some examples we propose. (1) In terms of image retrieval, it is
more precise to adopt visual comparisons than conventional textual feedback to match the
user’s mental model of the desired content (Yu and Kovashka, 2020). Thus, data generated
based on preferences is a good representation of user needs, which can be used to retrieve
user-desired images from the database. (2) In terms of drug design, the experts may not be
able to clearly describe the structure of desired drugs for target diseases but can assign their
preferences over candidate drug structures based on their expertise or lab experiments. The
process of drug discovery can be formulated as learning user-desired drug data distribution
with the preferences. (3) In terms of reinforcement learning (RL) without an off-the-shelf
reward function, it is feasible to provide human preferences between pairs of trajectories
of the agent (Christiano et al., 2017; Wirth et al., 2017). Then, the generative model can
produce user-desired demonstrations by learning from the preferences. The RL tasks can
be solved as imitation learning with these generated demonstrations.
There were a few works (Yu and Kovashka, 2020; Yao et al., 2022) applying user pref-
erences to direct the generation of GANs. However, they required heavy interactions with
humans to obtain preferences for generated samples during training so as to generate user-
desired data. Moreover, these studies only considered the simplest pairwise preferences,
failing to handle more general partial preferences. In this study, we consider teaching
2
Generative Adversarial Ranking Nets
GANs to learn a user-preferred data distribution directly from a more general form of user
preferences, i.e., a collection of partial preferences over the training samples, without extra
annotations on generated samples during the training. Incorporating such preferences into
the training of GANs poses several challenges. First, learning a user-preferred data distri-
bution means that the distribution alignment is imposed between a portion of the training
data and generated data, while GANs match the generated data distribution to the whole
training data distribution (Goodfellow et al., 2014; Mao et al., 2017). Therefore, GANs’
original framework cannot achieve the goal of this study. Second, the discriminator in GANs
is designed to distill label information on individual samples (Goodfellow et al., 2014; Mirza
and Osindero, 2014; Odena et al., 2017). So it cannot deal with preference data defined
upon a list of samples, which is usually formulated as listwise ranking problems (Lin, 2010).
Third, the key to eliciting adversarial classification in GANs is to assign the generated sam-
ples with “fake” labels when training the discriminator and “real” labels when training the
generator, respectively (Goodfellow et al., 2014; Dai et al., 2017). Nevertheless, constructing
the supervision for the generated samples adversarially in the context of users’ preferences
(a.k.a, listwise ranking (Cao et al., 2007; Pan et al., 2022)) has not been explored.
To tackle the challenges, we propose a new distribution matching mechanism between
generated data and high-ranked data while considering all data in the training. Meanwhile,
in order to learn from user preferences, GANs need to handle a list of samples. Motivated
by learning to rank (Cao et al., 2007), we tailor-design a ranker to replace the discriminator
in GANs. Therefore, we reform the adversarial learning framework of GAN and propose
Generative Adversarial Ranking Nets (GARNet) to learn from user preferences. Specifically,
our GARNet consists of a ranker and a generator. While a vanilla discriminator performs a
binary classification task by taking an individual sample as input, our proposed ranker learns
to rank samples in the context of partial ranking lists regarding the interested property. In
addition, the ranker assigns the generated samples to the lowest position. As the generator
targets the user-preferred distribution, it is trained to synthesize samples that confuse the
ranker to be ranked higher than the real samples. Competition between the ranker and the
generator drives both two modules to improve themselves until the generator converges to
a user-preferred data distribution. The main contributions are summarized as follows:
• We propose Generative Adversarial Ranking Nets (GARNet) for data generation guided
by user preferences. In particular, we design an adversarial ranking game between a
ranker and a generator, which enables a distribution alignment between generated data
and a portion of the training data (i.e., user-desired data) (Corollary 7).
• We prove that the objective function of the ranker in GARNet defines a relativistic
f -divergence between the user-preferred data distribution and the generated data distri-
bution, while the generator is to minimize the divergence to learn the user-preferred data
distribution. Meanwhile, we prove that the distribution learned by GARNet is rigorously
determined by the score vector given by the users (Proposition 5).
• We empirically show our GARNet can learn a user-preferred data distribution (deter-
mined by the given score vector) from preferences in terms of discrete and continuous
properties. We further show that our GARNet can better retrieve the distribution of
user-desired data from partial preferences than various baselines. The potential of GAR-
Net is also validated in the application of imbalanced learning.
3
Yao, Pan, Li, Tsang, Yao
1. Regarding data collection, the preferences S are collected among multiple users following the way in
rank aggregation (Lin, 2010; Saquil et al., 2018; Pan et al., 2018).
4
Generative Adversarial Ranking Nets
Table 1: Main mathematical notations in this paper. Elements in a set (e.g., X ) are indexed
using subscripts (e.g., xi ), whereas objects ranked at the i-th position in a ranking list s
are indexed using superscripts si .
Notation Explanation
X training samples X = {xn }N n=1
Y class labels Y = {y1 , y2 , . . . , yT } endowed with an order y1 > y2 > . . . > yT
O score space for a continuous property
y(x) function that maps sample x to its class label
o(x) function that maps sample x to its underlying ground-truth ranking score
xi > xj sample xi is preferred over sample xj , xi , xj ∈ X
s preferences, s := s1 > s2 > . . . > sl , si ∈ X
S a collection of preferences S = {sm }M m=1 over training data X
π(s) ground-truth score vector for preference s
s(R) target preference to train the ranker, i.e., s(R) := s1 > . . . > sl > xg
s(G) target preference to train the generator, i.e., s(G) := xg > s1 > . . . > sl
Pu user-preferred data distribution
Pg distribution of the generative model
Pi distribution of Xi = {x|y(x) = yi }
Df relativistic f -divergence
R ranker
G generator
To be specific, Pu (x) should allocate high density to global high-ranked data while low or
even zero density to global low-ranked data2 .
For example, as shown in Fig. 1, we assume that shoe images with different strengths in
terms of the open attribute are distributed evenly. The collected pairwise preferences reveal
that users prefer more open shoes. Accordingly, the generative model is expected to learn
a distribution that assigns a larger density on shoe images with larger open values. Note
that the generative model has the advantage of generating novel images different from the
training data (Song and Ermon, 2019).
Density
2. User preferences help to derive a global ranking over all the data (Cao et al., 2007; Lu and Boutilier,
2011). Global high-ranked data denote those data ranked high in the global ranking. Global low-ranked
data denote those data ranked low in the global ranking.
5
Yao, Pan, Li, Tsang, Yao
We ignore the preference lists containing samples with the same labels since such prefer-
ence lists can be simply transformed into the preferences (Eq. (2)) by removing redundant
samples (Pan et al., 2022).
Example. Suppose users prefer small digits on the MNIST dataset. Then for digits 0 to 9,
the order would be 0 > 1 > 2 > . . . > 9, according to which users express their preferences
over MNIST images.
In the following part of this section, we propose a framework called Generative Adver-
sarial Ranking Nets (GARNet) for the distribution learning guided by the above-mentioned
preferences (problem definition in Section 2.2). Our GARNet consists of a ranker R (pa-
rameterized by θR ) and a generator G (parameterized by θG ). In particular, an adversarial
ranking process is defined between the generator and the ranker. Namely, they are trained
against the goal of ranking defined between the real data and the generated data (See
Fig. 2). The competition between the generator and the ranker drives the generated dis-
tribution Pg (x) (simplified as Pg ) aligned with the user-preferred data distribution Pu (x)
(simplified as Pu ).
6
Generative Adversarial Ranking Nets
Ranker
Generator
random noise generated
sample
target preference for generator
A full preference s would include samples from all ordered classes. Consequently,
7
Yao, Pan, Li, Tsang, Yao
a new ranking score vector as R(s, xg ). Motivated by vanilla GAN (Goodfellow et al., 2014)
which designs a target real/fake label for each generated sample to trigger the adversarial
game between the discriminator and the generator, we construct a target preference for a
generated list of samples consisting of s and xg , i.e., (s, xg ), as adversarial supervision. To
promote the competition between the ranker R and the generator G, we design the target
preference in two different ways, namely,
The generator fools the ranker to grade the generated samples as the best (Eq. (6b));
while the ranker learns to rank them at the lowest position (Eq. (6a)). Then, we define the
objective for the ranker R and the generator G as follows, respectively,
h i
E LL2R π s(R) , R s(R)
sup , (7a)
R:X →R s∼S
xg ∼Pg
h i
LL2R π s(G) , R s(G)
sup E . (7b)
G:Z→X s∼S
xg ∼Pg
Thus, the training loss for the ranker and the generator with the mini-batch data can be
formulated as follows, respectively:
B
(R) (R)
X
R
L = LL2R π si , R si , (8a)
i=1
B
(G) (G)
X
LG = LL2R π si , R si , (8b)
i=1
8
Generative Adversarial Ranking Nets
target data distribution. Thus similarly, we will show our GARNet is a good estimator
of user-preferred data distribution by deriving a new divergence. In terms of most GAN
variants, the objective function of the discriminator at optimum is a divergence (Jolicoeur-
Martineau, 2020; Goodfellow et al., 2014; Arjovsky et al., 2017). In the following, we prove
that training the ranker in GARNet (Eq. (7a)) is equivalent to estimating a divergence be-
tween the user-preferred data distribution Pu (x) and the generated data distribution Pg (x).
The generator in GARNet (Eq. (7b)) is trained to minimize the divergence so as to achieve
Pg (x) = Pu (x).
D(P, Q) ≥ 0
(9)
D(P, Q) = 0 ⇐⇒ P = Q.
is a relativistic f -divergence.
9
Yao, Pan, Li, Tsang, Yao
T
" !#
X
Df (Pu , Pg ) = sup E f qi R(si ) − R(xg ) , (12)
R:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg
on the first term and the common difference, making it generally applicable to preferences
of various
lengths. Particularly, we demonstrate that when choosing the first T terms of
π s (R) as an arithmetic progression, the optimal ranker of our GARNet approximates
a relativistic f -divergence between the user-preferred data distribution and the generated
data distribution shown in Theorem 4.
Proposition 5 Given the training dataset X = {xn }N n=1 and a collection of full prefer-
M 1 2 T i
PT S = {sm }m=1 , where each preference s := s > s . . . > s ∈ S and s ∼ Pi . Let(R)
ences P
u =
q P
i=1 i i be the mixture distribution of the whole ordered data where q i = σ π s i
.
Let Pg be the distribution of generated data xg . Given a fixed generator G, the optimal
ranker R∗ of our GARNet (Eq. (7a)) approximates the relativistic f -divergence between Pu
1
and Pg , i.e., Df (Pu , Pg ), if π s(R) = [a + (T − 1)d, a + (T − 2)d, . . . , a, b] and ea−b → 0.
eb 1
lim qT +1 = lim σ π s(R) = lim PT = 0, (13a)
1
ea−b
→0 1
ea−b
→0 T +1 1
ea−b
→0 eb + j=1 ea+(T −j)d
T
X 2
lim qi = lim 1 − qT +1 = 1. (13b)
1 1
→0 →0
ea−b i=1 ea−b
10
Generative Adversarial Ranking Nets
1
where 1 , 2 follow the Squeeze theorem (Stewart et al., 2020) given 0 < qT +1 < ea−b
and
1 1
ea−b
→ 0. Note in practice, we set a − b = 10 as ea−b → 0.
TX
+1
LL2R π s(R) , R s(R) = σ(π(s(R) ))i log σ(R(s(R) ))i
(14)
i=1
T i
!
1 X eR(s ) eR(xg )
= qi log + qT +1 log
eR(xg ) + Tj=1 eR(sj ) eR(xg ) + Tj=1 eR(sj )
P P
i=1
PT i)
e i=1 qi R(s eR(xg )
= log + qT +1 log
eR(xg ) + Tj=1 eR(sj )
PT P T i=1 qi
P
eR(xg ) + R(sj )
j=1 e
2 1
= log PT .
R(sj )− T
i PT P i
eR(xg )− i=1 qi R(s ) + j=1 e
i=1 qi R(s )
1 follows the definition of the target preference s(R) in Eq. (6a). 2 is valid due to Eq. (13a)
and Eq. (13b).
Meanwhile, when the ranker is approaching the optima R∗ for a fixed generator G,
i.e., σ R∗ s(R) = σ π s(R) , we have R∗ (s(R) ) = π s(R) + δ due to the translation
invariance of softmax (Laha et al., 2018), i.e., σ(r + δ) = σ(r). Therefore, the first T terms
of R∗ (s(R) ) is also arithmetic progression with a common difference of −d same as that of
π s(R) . Then, we have
T
X T
X
R∗ (sj )− qi R∗ (si ) = R∗ (s1 ) − (j − 1)d − qi R∗ (s1 ) − (i − 1)d
(15)
i=1 i=1
T T T
! ! !
X X 1 X
= R∗ (s1 ) − qi R∗ (s1 ) − j−1+ qi (1 − i) d = qi i − j d = cj .
i=1 i=1 i=1
11
Yao, Pan, Li, Tsang, Yao
1
1
Eq. (7a) ≈ sup E log PT i )−R(x
R:X →R s∼S
xg ∼Pg c + e−( i=1 qi R(s g ))
" #
2 1
= sup E log PT
c + e−(( )−R(xg ))
i)
R:X →R s1 ∼P1 i=1 qi R(s
...
sT ∼PT
xg ∼Pg
" #
3 1 (16)
= sup E log PT + log(c + 1)
c + e−(( )−R(xg ))
i)
R:X →R s1 ∼P1 i=1 qi R(s
...
sT ∼PT
xg ∼Pg
T
" !#
4 X
i
= sup E f qi R(s ) − R(xg ) ,
R:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg
PT
R(sj )− i)
PT
1 is reasonable because i=1 qi R(s
j=1 e in Eq. (14) would degenerate to a constant
c = ec1 +c2 +...+cT
following Eq. (15), being independent of the ranker when the ranker is
approaching the optima. 2 is valid as y(si ) ≡ yi according to Eq. (4). 3 is valid due to
the addition of a constant. 4 is obtained by denoting f (z) = log c+e1−z + log(c + 1).
Note f is a concave function; f (0) = 0; f is differentiable at 0; f 0 (0) 6= 0; supx f (x) =
M > 0; and arg supx f (x) > 0. According to Theorem 4, the optimal ranker of our GARNet
approximately estimates a relativistic f divergence.
Remark 6 In theory, it seems feasible to directly adopt the following objective to learn Pu
according to Theorem 4.
T
" !#
X
sup E f qi R(si ) − R(xg ) , (17a)
R:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg
T
" !#
X
i
sup E f R(xg ) − qi R(s ) , (17b)
G:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg
where the concave function f can be defined as f (z) = log(sigmoid(z))+log(2) like Jolicoeur-
Martineau (2019). However, in practice, the above objective cannot provide sufficient gradi-
ent for the generator G to learn properly. Specifically, in the early training, the generator G
is poor, and the ranker R can easily assign lower values to the generated sample xg than
PT i
real samples. i=1 qi R(s ) − R(xg ) would be large and then gradient vanish occurs.
12
Generative Adversarial Ranking Nets
which maximizes the ranker’s output for the generated samples R∗ (xg ). Then, Df (Pu , Pg )
will be minimized.
Proposition 5 and Proposition 8 demonstrate that the distribution learned by our GAR-
Net (Eq. (7)) is determined by the pre-specified score vector π(s). We present a case study
on MNIST (Lecun et al., 1998) to justify it by setting d to 0.1, 0.5, 1 and 5, respectively. As
shown in Fig. 3, the proportion of generated data from GARNet is almost consistent with
the top-1 probability calculated by the specified score vector, i.e., q = σ(π(s)).
In terms of preferences over a discrete property, the most desirable data is supposed to
belong to the global top-1 class, i.e., y(x) = y1 . As clarified in Corollary 7, we can simply
set up a sufficiently large d (e.g., 5) to achieve this goal.
13
Yao, Pan, Li, Tsang, Yao
Proportion
Proportion
Proportion
0.5 0.5 0.5 0.5
Figure 3: GARNet learns the distribution Pu with different pre-specified score vectors π(s)
from full preferences w.r.t. a discrete property on MNIST (preferring small digits, i.e.,
0 1 . . . 9). d is the common difference for π s(R) (Proposition 5). q = σ(π(s))
calculates the ground-truth top-1 probability of each digit class. q̃ calculates the proportion
of different digit classes for generated data from GARNet. The digit values for the generated
data are evaluated by a pretrained classifier for digit classification.
Similarly, we construct the adversarial ranking process between the ranker and generator:
We adopt the objective (Eq. (7)) to learn the user-preferred distribution Pu (x) via the
adversarial ranking (Eq. (22)). Thanks to the consistency of the score function (Lemma 9),
our GARNet can still learn the desired distribution from the partial preferences.
Lemma 9 (Consistency of the score function (Xia et al., 2008)) Given a collection
of partial preferences S = {sm }M
m=1 , where 2 ≤ |sm | ≤ T , the ListNet loss (Eq. (5)) adopted
in our GARNet can recover the optimal score function for the ranker if S contains sufficient
preferences and the samples X are well covered3 .
According to Lemma 9, the optimal ranker can recover s1 > s2 > . . . > sT > xg .
Therefore, GARNet can still learn a user-preferred data distribution from partial prefer-
ences, which assigns greater density to higher-ranked data. We present a case study on
MNIST (Lecun et al., 1998) with partial preferences (2 ≤ l ≤ 10). We set the ground-truth
score vector π s(R) for Eq. (22a) as [a + (l − 1)d0 , a + (l − 2)d0 , . . . , a, b] similar in Proposi-
tion 5. As shown in Fig. 4, the proportion of generated data from GARNet is larger when
the digit class is ranked higher, which is consistent with user preferences. Especially, when
d0 is sufficiently large, GARNet only generates high-ranked data, namely, digit zero and one.
3. It doesn’t mean to include every sample into the preferences but requires the samples included to be
approximately uniformly distributed in the score space.
14
Generative Adversarial Ranking Nets
Proportion
Proportion
Proportion
0.5 0.5 0.5 0.5
Figure 4: GARNet learns the distribution Pu with different pre-specified score vectors π(s)
from partial preferences (2 ≤ l ≤ 10) w.r.t. a discrete property on MNIST (preferring small
digits). d0 is the common difference of the ground-truth score vector π s(R) for Eq. (22a).
q̃ calculates the proportion of different digit classes for generated data from GARNet. q is
copied from the results in Fig. 3 for references. The digit values for the generated data are
evaluated by a pretrained classifier for digit classification.
We found that training on relatively long preferences (7 ≤ l ≤ 10), GARNet can converge
to top-1 data distribution when setting d0 = 5. Note the resultant proportion per digit q̃ is
no longer consistent with pre-specified values q as y(si ) ≡ yi , ∀s ∈ S and ∀i = 1, 2, . . . , T
does not hold anymore for partial preferences.
Example. Suppose users prefer smiling face images. Then users would assign higher pref-
erences to those images with bigger smiles, i.e., the score for sample x in terms of the
“smiling” attribute o(x) being larger.
15
Yao, Pan, Li, Tsang, Yao
Proportion
Proportion
Proportion
0.50 0.50 0.50 0.50
Figure 5: GARNet learns the distribution Pu with different pre-specified score vectors
π(s) from full preferences w.r.t. a continuous property on LFW (preferring smiling faces).
q = σ(π(s)) calculates the top-1 probabilities of each class. q̃ calculates the proportion of
different classes for generated data from GARNet. Top 1 class represents images locating in
the score interval of those real images ranked top 20% in terms of the smile attribute; top
2 class represents images locating in the score interval of those real images ranked between
top 20% and top 40%; so on and so forth. The labels for the generated data are obtained
based on the score given by a pretrained ranker for face images w.r.t. the smile attribute.
Then, we can construct a union of the non-overlapping intervals to cover the score interval
[A, B]: [
O= [oi − i , oi + i ) ∪ [o1 − 1 , o1 + 1 ], (26)
i=T 0 ,...,2
y(xi ) ≡ yi
(28)
if o(xi ) ∈ Oi .
To sum up, the preferences in terms of a continuous property (Definition 10) approximate
the preferences in terms of T 0 ordered classes.
Remark 12 The assumption of Eq. (27) is in line with real-world applications. People
usually cannot distinguish a pair of samples when the difference is small and would annotate
them as ties (Zhou et al., 2008), where i is the indistinguishable score difference for each
16
Generative Adversarial Ranking Nets
Oi . Nevertheless, for o(x1 ) ∈ Oi−1 and o(x2 ) ∈ Oi , x1 and x2 are indistinguishable when
o(x1 ) − o(x2 ) is small. Fortunately, assigning them as two adjacent ordered classes yi−1 and
yi is still reasonable because the order gap between yi−1 and yi is not significant when T 0 is
sufficiently large.
According to Theorem 11, learning from the preferences in terms of a continuous prop-
erty can be discretized as learning from the preferences over a finite number of ordered
classes. Therefore, we can adopt GARNet which defines an adversarial ranking goal (Eq. (6))
to learn the desired distribution from full preferences in terms of the continuous property.
We present a case study on LFW (Huang et al., 2008) w.r.t. the smile attribute in Fig. 5
to justify this. Simply, we discrete the continuous ranking score w.r.t. the smile attribute
over real images into five ordered classes. That is, images ranked top 20% in terms of the
smile attribute are assigned as top 1 class; images ranked between top 20% and top 40% is
assigned as top 2 class; so on and so forth. To label a generated sample, we first apply a
pretrained ranker for ranking face images w.r.t. the smile attribute to predict the sample’s
score and then determine the label as the corresponding class according to the score interval
in which the score is located. As shown in Fig. 5, the proportion of generated data from
GARNet is almost consistent with the top-1 probability calculated by the specified score
vector, i.e., q = σ(π(s)).
Similarly, we can adopt GARNet (Eq. (22)) to learn the desired distribution from partial
preferences in terms of the continuous property (see Section 7.2 for more empirical studies).
In practice, we do not explicitly discretize the continuous ranking scores to a finite number
of classes and define a so-called top-1 class as the most desirable data. Instead, we suppose
the generative model is more consistent with user preferences if the generated data has
higher ranking scores.
For simplicity, we extend the scalar output of the ranker R to a U dimension vector so as
to learn from {Su }U
u=1 simultaneously. Ru is the u-th output of the ranker.
17
Yao, Pan, Li, Tsang, Yao
two separate GARNet models or extending our GARNet to its conditional version, denoted
as CGARNet, conditioned on each group of preferences. The extension is similar as ex-
tending GAN to conditional GAN (Mirza and Osindero, 2014). Both the ranker and the
generator can be conditioned on an attribute label.
18
Generative Adversarial Ranking Nets
samples while the other requires the generation of partial data samples, i.e., samples with
high-ranking scores. Thus, learning the desired data distribution cannot be well achieved.
Generator with Naive Ranker Another naive approach is to just have a ranker that
learns from user preferences on real data to guide a generator, which is called GR. However,
such an approach would suffer from poor data quality issues since there is no modeling for
data authenticity in this model.
7. Experiments
We empirically verify that our GARNet can learn a data distribution that is best consistent
with user preferences. Furthermore, we demonstrate the potential of GARNet in improving
imbalanced class learning.
Dataset: (1) MNIST dataset (Lecun et al., 1998) consists of 28 × 28 images with digit
zero to nine. We use its training set (50K images) for experiment. We suppose that users
prefer smaller digits of MNIST. An image with a smaller digit value will be ranked higher.
For instance, the partial ranking list over four digits 1, 3, 2, 9 is s := 1 > 2 > 3 > 9. As there
are 10 digits in total, the maximal length of the preferences l can be 10. In particular, the
length of the preferences included in the training data varies from 7 to 10 so that GARNet
can converge to top-1 data distribution.
(2) Labeled Faces in the Wild (LFW) dataset (Huang et al., 2008) consists of 13, 143
celebrity face images from the wild. LFW-10 consists of a subset of 2, 000 images from
LFW along with about 500 pairwise comparisons for one attribute. We take the pairwise
comparisons in terms of the smile attribute as the user preferences for the training data.
Since LFW10 has limited pairwise comparisons, we exploit a pretrained ranker to augment
more preferences for the training data. In addition, as LFW10 has limited training images,
i.e., 1K images, we take 13, 143 images from LFW as our training data also. Specifically,
we pretrain a ranker to learn to rank images in terms of the smile attribute from given
pairwise comparisons of LFW10. Then we use the ranker to output ranking scores for all
training samples of LFW and construct pairwise preferences based on the scores. If face
image xa has larger smile than face image xb , then the preference is s := xa > xb .
(3) UT-Zap50K dataset (Yu and Grauman, 2014) contains 50, 025 shoe images from Zap-
pos.com. It contains pairwise comparisons over several attributes. We use all pairs, i.e.,
UT-Zap50K-1, UT-Zap50K-2, and UT-Zap50K-lexi as training data. The pairwise compar-
isons in terms of comfort (4, 483 pairwise comparisons), open (4, 351) and sporty (4, 085)
attributes are taken as the user preferences for the training data, respectively. We exploit a
pretrained ranker to augment more preferences for the training data. We pretrain a ranker
to rank images in terms of a specified attribute from given pairwise comparisons and then
construct pairwise preferences based on the scores evaluated by the ranker. If shoe image
xa is more comfort/open/sporty than shoe image xb , then the preference is s := xa > xb .
Baselines: We consider GAN (Goodfellow et al., 2014), FBGAN (Gupta and Zou,
2019), GAN-RK (Saquil et al., 2018) (GAN plus an additional ranker) and GR (a generator
with a ranker that is only trained with partial preferences) as our baselines. In terms of
GAN, it is trained with three kinds of subsets: the entire data, the subset made up of local
high-ranked samples (top-1 samples in the partial preferences), and the subset made up of
global high-ranked samples (e.g., top 10% for MNIST; top 50% for LFW and UT-Zap50K)
19
Yao, Pan, Li, Tsang, Yao
Figure 6: Comparison w.r.t. the generation of user-preferred digits (small digits) on MNIST.
selected by a surrogate ranker. These three variants are denoted as GAN-0, GAN-1, and
GAN-2, respectively.
Network & Hyperparameters: All methods are implemented based on WGAN-
GP architecture (DCGAN version) (Gulrajani et al., 2017) unless specifically mentioned.
For the training we use the Adam optimizer (Kingma and Ba, 2015) with learning rate
2 ∗ 10−4 and β1 = 0.5, β2 = 0.999. According to Proposition 5 and Corollary 7, we set
the ground-truth score vector for s(R) as π s(R) = [10 + 5(l − 1), 10 + 5(l − 2), . . . , 10, 0]
for all datasets, which can make GARNet learn a data distribution that is best consistent
(G) as
preferences as q1 ≈ 1. We simply set the ground-truth score vector for s
with user
(G)
π s = [10 + 5(l − 2), 10 + 5(l − 3), . . . , 10, 5, 0]. For MNIST, the batch size used is 50.
The training iteration is set to 100K. For LFW and UT-Zap50K, the batch size is 64. The
training iteration is 200K. The training images are resized to 32 × 32 unless specifically
mentioned. To rank images in terms of a certain attribute, we adopt a pairwise ranking loss
in Burges et al. (2005). The pretrained VGG16 (Simonyan and Zisserman, 2015) is used to
map an image to its ranking score.
Evaluation Metrics: (1) For MNIST, mean digit (MD) is adopted as a performance
measure for the generation of preferred digits. The digit values for the generated data will
be evaluated by a pretrained classifier for digit classification. For each method, we randomly
generate 50K digits and calculate their MD.
(2) For LFW and UT-Zap50K, mean score (MS) is adopted as a performance measure
for the generation of user-preferred face images. The scores are evaluated by a pretrained
ranker that learns to rank images in terms of the target attribute from the given preferences.
Frechet Inception Distance (FID) (Heusel et al., 2017) measures the visual quality w.r.t. the
high-ranked real face images. For each method, we randomly select 50K generated images
and conduct the evaluation.
20
Generative Adversarial Ranking Nets
Table 2: Comparison of various methods on MNIST w.r.t. mean digit (MD, ↓). Best results
are in bold. FBGAN suffers from mode collapse (Fig. 6d). GR generates meaningless results
(Fig. 6f), so its MD is not collected.
According to Fig. 6 and Table 2, we highlight that only our GARNet successfully learns
the distribution of the top-ranked digits from user preferences. Table 2 shows that except for
GAN-0, GAN-RK, and GR, other methods have smaller MD compared to the training data
(denoted as real). It means that those generative models can learn from user preferences
to some extent. However, only FBGAN and GARNet can converge to generate top-ranked
digits with MD 0.00. On the other hand, GAN-0 is the vanilla GAN that is trained with
all training data, so its MD is close to that of real images. GAN-RK fails to learn from
user preferences because of the conflict between the discriminator and ranker. GR cannot
get any meaningful generations because it does not involve a distribution matching between
real and generated samples. Therefore, its MD is not applicable.
Fig. 6 shows that: (1) GARNet generates top-ranked digits (digit 0), with high quality
as shown in Fig. 6g. (2) According to Fig. 6b, GAN-1 still generates digits that are not
ranked top, like digit two, since the constructed training subset contains undesired sam-
ples. (3) GAN-2 achieves better performance than GAN-1 but still fails to converge to the
distribution of the top-ranked digit. (4) FBGAN suffers from mode collapse of which the
generated digit zeros are exactly the same since its selection of high-ranked generated data
does not consider maintaining sample diversity.
21
Yao, Pan, Li, Tsang, Yao
suffers from mode collapse and its generated images have very poor quality (see Appendix),
which is verified by its large FID score. Similar results can be concluded on UT-Zap50K.
For further visual quantification, we place GARNet’s generated samples and training
samples (a.k.a. real samples) w.r.t. their scores on the smile axis in Fig. 8a. GARNet
mainly (over 95%) covers those real images ranked top 50% in terms of the smile attribute.
GARNet even generates images with higher scores than the maximum scores in the LFW
dataset, which shows its potential for generating data beyond existing rankings. We place
GARNet’s generated samples and training samples (a.k.a. real samples) w.r.t. their scores
on the comfort axis in Fig. 8b. Over 75% generated images cover those real images ranked
top 50% in terms of the comfort attribute. Results on the open and sporty attributes can
be seen in Appendix.
Table 3: Comparison on LFW face data and UT-Zap50K shoe data w.r.t. performance
measure (MS, ↑) and quality score (FID, ↓). The best results are highlighted in bold.
Since FBGAN suffers from mode collapse and poor image quality issues (large FID; see its
generated images in Appendix), its MS is not collected.
MS FID
Dataset
Real GAN-1 GAN-2 FBGAN GARNet GAN-1 GAN-2 FBGAN GARNet
LFW (smile) 0.59 1.41 1.14 - 2.29 47.55 37.40 89.36 22.22
UT-Zap50K (comfort) -5.06 -2.66 -0.91 - -0.90 49.20 77.99 253.53 39.16
UT-Zap50K (open) -0.36 0.93 0.99 - 3.75 63.92 112.93 265.49 46.78
UT-Zap50K (sporty) 4.09 5.28 4.48 - 7.07 41.02 84.59 431.44 32.87
To demonstrate that GARNet can generalize to a mixture of user preferences, i.e., pref-
erences on multiple attributes, we conduct the experiments in terms of the comfort and
open attributes as the training data. To achieve a better generation, we apply a more ad-
vanced GAN architecture (Karras et al., 2020). Meanwhile, the training images are resized
to 64 × 64 for better resolution.
In Fig. 9, we visualize real images and generated images by ordering them using the
pretrained rankers defined over the open and comfort attributes, respectively. It is observed
that GARNet generates shoe images that are more comfort or more open. In addition, the
least comfort or the least open shoe images are not generated by GARNet as they are
disliked by users.
It is worth mentioning that GARNet achieves a compromise between a pair of moderate
conflicting attributes. Intuitively, open and comfort are conflicting attributes. For example,
sport shoes are thought comfortable but may be slightly open. As shown in Figure 9,
learning from preferences w.r.t. the open and comfort attributes, GARNet converges to
avoid generating samples ranked lowest in either of two attributes but would generate
relatively close and comfortable shoes or relatively uncomfortable and open shoes.
22
Generative Adversarial Ranking Nets
Density
Density
Density
0.10
0.2 0.05
0.05
0.1 0.05
0.59 -0.36 4.09
1.41 -5.06 5.28
-2.66 0.93
0.0 2.5 0.0 2.5 5.0 0.00 0.00 40 0.00
40 20 0 20 20 0 20 25 0 25
Score Score Score Score
Density
Density
Density
0.10
0.2 0.05
0.05
0.1 0.05
0.59 -5.06 -0.36 4.09
1.14 -0.91 0.99 4.48
0.0 2.5 0.0 2.5 5.0 0.00 0.00 40 0.00
40 20 0 20 20 0 20 25 0 25
Score Score Score Score
Real Real 0.10 Real
Real
0.4 Generated 0.15 Generated 0.10 Generated Generated
0.3
Density
Density
Density
Density
0.10
0.2 0.05
0.05
0.1 0.05
-0.36 4.09
0.59 2.29 -5.06 3.75 7.07
0.0 -0.90
2.5 0.0 2.5 5.0 0.00 0.00 40 20 0 20 0.00 25 0 25
40 20 0 20
Score Score Score Score
Figure 7: Comparison of GAN-1 (Row 1), GAN-2 (Row 2) and our GARNet (Row 3) in
terms of the score density. The green point denotes the mean score of generated samples
while the red point denotes that of real samples. Note the x-axis denotes ranking scores of
samples instead of ordered classes in Figure 5. Similar observations that a preferred class
has a larger proportion can be found if we coarsen the ranker scores into limited ordered
classes.
comfort
smile
Figure 8: Real images (32 × 32, above the axis) vs. generated images (GARNet, below the
axis) w.r.t. the score axis of attributes. The percentile rank of a given score is the percent-
age of scores in its frequency distribution that are less than that score. Real (generated)
percentile rank means calculated among real (generated) images.
23
Yao, Pan, Li, Tsang, Yao
-18.1 -9.3
-5.1 0.6
21.0 20.3
Figure 9: GARNet for multiple attributes. Images (64×64) are placed w.r.t. the “comfort”
and “open” score. The results are obtained by an advanced GAN variant, StyleGAN2-
ADA (Karras et al., 2020), thus having better quality.
Real Real
0.10 Generated 0.10 Generated
Density
Density
0.05 0.05
-0.36 -0.36
4.33 -15.07
0.00 40 20 0 20 0.00 40 20 0 20
Score Score
Figure 10: The generated samples (32 × 32) and the score density (open ↑, score ↑) for
CGARNet conditioned on open and not open attributes, respectively. The green point
denotes the mean score of generated samples while the red point denotes that of real samples.
24
Generative Adversarial Ranking Nets
100
0 0.1 0.1 -0.2
0.1 -0.1
1
80
0.1 0.1 -0.1 -0.1
2
60
0.1 0.1 -0.1 -0.1
Actual
20
0.1 0.1 -0.3 0.1
8
0
0 1 2 3 4 5 6 7 8 9
Predicted
Figure 11: (a) GARNet boosts imbalance classification on MNIST with digit six as the
minority class. The gain matrix (zero is not presented) is obtained by C 0 -C, where C 0 and
C are the confusion matrix calculated on GARNet boosted data and original MNIST data,
respectively. The color denotes the confusion matrix (%) on original MNIST data. (b-c)
Visual results of GARNet and Elastic-infoGAN on MNIST with extremely imbalanced data.
25
Yao, Pan, Li, Tsang, Yao
8. Conclusion
This paper presents a novel adversarial ranking framework, GARNet, to learn from user
preferences. In particular, we prove that GARNet is equivalent to optimizing a divergence
between the user-preferred data distribution (determined by the given score vector) and
the generated data distribution, which theoretically gurantees GARNet as a good estima-
tor of the desired data distribution. Meanwhile, we empirically show GARNet can obtain
corresponding distributions when different given score vectors are specified. Numerous
experiments demonstrate GARNet can learn the distribution that best matches user pref-
erences compared to various baselines. A study of imbalanced class learning validates the
advantage of preference-guided GARNet over GAN conditioned on labels.
GARNet is modeled under the homogeneity assumption of rank aggregation that only
one ground truth full ranking exists (Lin, 2010), which is in line with the actual situation
in many real-world applications – peer grading (Raman and Joachims, 2014), to image
rating (Liang and Grauman, 2014), document recommendation (Sellamanickam et al., 2011),
opinion analysis (Kim et al., 2015), and bioinformatics (Chatterjee et al., 2018). It is
possible to break such an assumption when the preferences are sampled from heterogeneous
user groups, i.e., a mixture of preferences: (1) There is one group of users dominating the
ranking. Then, the homogeneity assumption is still valid by simply taking the samples
from the majority of preferences as clean preferences and samples from the rest ones as
noisy preferences. Robust rank aggregation (Pan et al., 2022) can be incorporated into our
GARNet so as to handle these noisy preferences, which is left for future exploration. (2) The
preferences from different groups of users are balanced. The homogeneity assumption on
users is no longer valid and it is more reasonable and realistic to assume a heterogeneity of
users for an idealized scenario (Zhao et al., 2016), which is called mixture rank aggregation.
We conduct a preliminary exploration for a simple case where the group affiliation of each
preference is known. We leave the general case of mixture ranking for future work.
Acknowledgements
This work was supported by the A*STAR Centre for Frontier AI Research, the A*STAR
Career Development Fund (Grant No.C222812019), the A*STAR GAP project (Grant
No.I23D1AG079), the Program for Guangdong Introducing Innovative and Entrepreneurial
Teams (Grant No. 2017ZT07X386), NSFC (Grant No.62250710682), and the Program for
Guangdong Provincial Key Laboratory (Grant No. 2020B121201001).
26
Generative Adversarial Ranking Nets
T T
" # " #
X
∗ i ∗
X
w i w
1
Df (Pu , Pg ) = E f( qi R (s ) − R (xg )) ≥ E f( qi R (s ) − R (xg )) = 0
s1 ∼P1 s1 ∼P1
... i=1 ... i=1
sT ∼PT sT ∼PT
xg ∼Pg xg ∼Pg
PT (30)
1 is valid because i=1 qi = 1.
PT
#2 Prove that Pu = i=1 qi Pi = Pg =⇒ Df (Pu , Pg ) = 0.
T
" !#
X
∗ i ∗
Df (Pu , Pg ) = E f qi R (s ) − R (xg )
s1 ∼P1
... i=1
sT ∼PT
xg ∼Pg
T
" !#
1 X
≤ E f qi R∗ (si ) − E [R∗ (xg )]
s1 ∼P1 xg ∼Pg
... i=1
sT ∼PT
T
" #
2 X
qi R∗ (si ) − E [R∗ (xg )]
≤f E
(31)
s1 ∼P1 xg ∼Pg
... i=1
sT ∼PT
T
!
X
qi E R∗ (si ) − E [R∗ (xg )]
=f
si ∼Pi xg ∼Pg
i=1
T
!
3 X
∗ ∗
=f qi E [R (x)] − E [R (x)]
x∼Pi x∼Pg
i=1
T
!
4
Z X Z
=f R∗ (x) qi Pi dx − R∗ (x)Pg dx = f (0) = 0,
i=1
where 1 and 2 follow Jensen’s inequality for a concave function f . 3 is valid due to
change of variables. 4 is valid because Pg = Ti=1 qi Pi . Since Df (Pu , Pg ) ≥ 0 (Eq. (30)),
P
we have Df (Pu , Pg ) = 0.
PT
#3 Prove that Df (Pu , Pg ) = 0 =⇒ Pu = i=1 qi Pi = Pg .
27
Yao, Pan, Li, Tsang, Yao
(
∆ if x ∈ H
Let R0 (x) = , where ∆ 6= 0, then we have
0 else
T T
" !# " !#
X 1 X
qi R0 (si ) − R0 (xg ) qi R0 (si ) − R0 (xg )
L(∆) = E f = E f
s1 ∼P1 s1 ∼P1
... i=1 ... i=1
sT ∼PT sT ∼PT
xg ∼Pg xg ∼Pg
T T
" #
2 X X
qi f (R0 (si ) − R0 (xg )) = qi f (R0 (si ) − R0 (xg ))
≥ E E
s1 ∼P 1 si ∼P
... i=1 i=1 x ∼Pi
g g
sT ∼P T
xg ∼Pg
T
3 X
qi f (R0 (xu ) − R0 (xg ))
= E
xu ∼Pi
i=1 xg ∼Pg
Z Z XT
= f (R0 (xu ) − R0 (xg ))( qi Pi (xu ))Pg (xg )dxu dxg (32)
X X i=1
4
Z Z
= f (R0 (xu ) − R0 (xg ))Pu (xu )Pg (xg )dxu dxg
Z XZ X
28
Generative Adversarial Ranking Nets
open
sporty
Figure 12: Real images (32 × 32, above the axis) vs. generated images (GARNet, below the
axis) w.r.t. the score axis of attributes. The percentile rank of a given score is the percent-
age of scores in its frequency distribution that are less than that score. Real (generated)
percentile rank means calculated among real (generated) images.
Figure 13: Visual results (32 × 32, mode collapse) of FBGAN on LFW (smile) and UT-
Zap50K (comfort, open and sporty).
29
Yao, Pan, Li, Tsang, Yao
(a) smile
(b) comfort
(c) open
(d) sporty
Figure 14: Verification of GARNet generating novel images (32 × 32) (generated
images not exactly alike any training images) on LFW (smile) and UT-Zap50K (comfort,
open and sporty). Nearest neighbors are measured by the `2 distance between images. The
first column is generated samples from GARNet. The other columns are the generated
samples’ nearest neighbors in the training dataset.
30
Generative Adversarial Ranking Nets
References
Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial
networks. In International Conference on Machine Learning, pages 214–223, 2017.
Siddarth Asokan and Chandra Sekhar Seelamantula. Teaching a GAN what not to learn.
In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and
Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems, 2020.
Christopher J. C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamil-
ton, and Gregory N. Hullender. Learning to rank using gradient descent. In International
Conference on Machine learning, 2005.
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from
pairwise approach to listwise approach. In International Conference on Machine learning,
pages 129–136, 2007.
Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. Pairwise ranking
aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international
conference on Web search and data mining, pages 193–202, 2013.
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei.
Deep reinforcement learning from human preferences. In Advances in Neural Information
Processing Systems, pages 4299–4307, 2017.
Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and Russ R Salakhutdinov. Good
semi-supervised learning that requires a bad gan. Advances in neural information pro-
cessing systems, 30, 2017.
Xin Ding, Yongwei Wang, Zuheng Xu, William J Welch, and Z Jane Wang. Ccgan: Contin-
uous conditional generative adversarial networks for image generation. In International
conference on learning representations, 2020.
Jesse H. Engel, Matthew D. Hoffman, and Adam Roberts. Latent constraints: Learning to
generate conditionally from unconditional generative models. In International Conference
on Learning Representations, 2018.
Johannes Fürnkranz and Eyke Hüllermeier. Preference learning. In Claude Sammut and
Geoffrey I. Webb, editors, Encyclopedia of Machine Learning and Data Mining, pages
1000–1005. Springer, 2017.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in
Neural Information Processing Systems, 27:2672–2680, 2014.
31
Yao, Pan, Li, Tsang, Yao
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C
Courville. Improved training of wasserstein gans. In Advances in Neural Information
Processing Systems, pages 5767–5777, 2017.
Anvita Gupta and James Zou. Feedback gan for dna optimizes protein functions. Nature
Machine Intelligence, 1(2):105, 2019.
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp
Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equi-
librium. Advances in neural information processing systems, 30, 2017.
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint
arXiv:2207.12598, 2022.
Gary B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. Labeled Faces in
the Wild: A Database forStudying Face Recognition in Unconstrained Environments. In
Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, 2008.
Jian Huang, Yuling Jiao, Zhen Li, Shiao Liu, Yang Wang, and Yunfei Yang. An error
analysis of generative adversarial networks for learning distributions. Journal of Machine
Learning Research, 23(116):1–43, 2022.
Harold Jeffreys, Bertha Jeffreys, and Bertha Swirles. Methods of mathematical physics.
Cambridge university press, 1999.
Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from
standard GAN. In International Conference on Learning Representations, 2019.
Alexia Jolicoeur-Martineau. On relativistic f-divergences. In International Conference on
Machine Learning, pages 4931–4939. PMLR, 2020.
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo
Aila. Training generative adversarial networks with limited data. In Advances in Neural
Information Processing Systems, 2020.
MinJi Kim, Farzad Farnoud, and Olgica Milenkovic. Hydra: gene prioritization via hybrid
distance-score rank aggregation. Bioinform., 31(7):1034–1043, 2015.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In
International Conference on Learning Representations, 2015.
Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik
Sankaranarayanan, and Harish G Ramaswamy. On controllable sparse alternatives to
softmax. Advances in neural information processing systems, 31, 2018.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to docu-
ment recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.
Xiao Li, Chenghua Lin, Chaozheng Wang, and Frank Guerin. Latent space factorisation
and manipulation via matrix subspace projection. International Conference on Machine
Learning, 2020.
32
Generative Adversarial Ranking Nets
Lucy Liang and Kristen Grauman. Beyond comparing image pairs: Setwise active learning
for relative attributes. In Proceedings of the IEEE conference on Computer Vision and
Pattern Recognition, pages 208–215, 2014.
Tyler Lu and Craig Boutilier. Learning mallows models with pairwise preferences. In
International Conference on Machine Learning, pages 145–152, 2011.
Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smol-
ley. Least squares generative adversarial networks. In International Conference on Com-
puter Vision, pages 2794–2802, 2017.
Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint
arXiv:1411.1784, 2014.
Takeru Miyato and Masanori Koyama. cgans with projection discriminator. In International
Conference on Learning Representations, 2018.
Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis
with auxiliary classifier gans. In International conference on machine learning, pages
2642–2651, 2017.
Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, and Yong Jae Lee. Elastic-infogan:
Unsupervised disentangled representation learning in class-imbalanced data. In Advances
in Neural Information Processing Systems, 2020.
Yuangang Pan, Bo Han, and Ivor W Tsang. Stagewise learning for noisy k-ary preferences.
Machine Learning, 107(8):1333–1361, 2018.
Yuangang Pan, Ivor W Tsang, Weijie Chen, Gang Niu, and Masashi Sugiyama. Fast and
robust rank aggregation against model misspecification. J. Mach. Learn. Res., 23:23–1,
2022.
Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, and Vittorio Ferrari. C-flow:
Conditional generative flow models for images and 3d point clouds. In IEEE Conference
on Computer Vision and Pattern Recognition, pages 7949–7958, 2020.
Karthik Raman and Thorsten Joachims. Methods for ordinal peer grading. In International
Conference on Knowledge Discovery and Data Mining, pages 1037–1046, 2014.
Bidisha Samanta, Abir De, Gourhari Jana, Vicenç Gómez, Pratim Kumar Chattaraj, Niloy
Ganguly, and Manuel Gomez-Rodriguez. Nevae: A deep generative model for molecular
graphs. The Journal of Machine Learning Research, 21(1):4556–4588, 2020.
33
Yao, Pan, Li, Tsang, Yao
Yassir Saquil, Kwang In Kim, and Peter M. Hall. Ranking cgans: Subjective control over
semantic image attributes. In British Machine Vision Conference, page 131. BMVA Press,
2018.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition. In International Conference on Learning Representations, 2015.
Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation
using deep conditional generative models. Advances in Neural Information Processing
Systems, 28:3483–3491, 2015.
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data
distribution. In Advances in Neural Information Processing Systems 32: Annual Con-
ference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14,
2019, Vancouver, BC, Canada, pages 11895–11907, 2019.
James Stewart, Daniel K Clegg, and Saleem Watson. Multivariable calculus. Cengage
Learning, 2020.
Larissa T Triess, Andre Bühler, David Peter, Fabian B Flohr, and Marius Zöllner. Point
cloud generation with continuous conditioning. In International Conference on Artificial
Intelligence and Statistics, pages 4462–4481. PMLR, 2022.
Ze Wang, Seunghyun Hwang, Zichen Miao, and Qiang Qiu. Image generation using contin-
uous filter atoms. Advances in Neural Information Processing Systems, 34:17826–17838,
2021.
Christian Wirth, Riad Akrour, Gerhard Neumann, Johannes Fürnkranz, et al. A survey of
preference-based reinforcement learning methods. Journal of Machine Learning Research,
18(136):1–46, 2017.
Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. Listwise approach to
learning to rank: theory and algorithm. In Proceedings of the 25th international confer-
ence on Machine learning, pages 1192–1199, 2008.
Yinghua Yao, Yuangang Pan, Ivor W Tsang, and Xin Yao. Differential-critic gan: Generat-
ing what you want by a cue of preferences. IEEE Transactions on Neural Networks and
Learning Systems, 2022.
Zac Yu and Adriana Kovashka. Syntharch: Interactive image search with attribute-
conditioned synthesis. In Conference on Computer Vision and Pattern Recognition,
CVPR Workshops, pages 645–654, 2020.
34
Generative Adversarial Ranking Nets
Zhibing Zhao, Peter Piech, and Lirong Xia. Learning mixtures of plackett-luce models. In
International Conference on Machine Learning, pages 2906–2914. PMLR, 2016.
Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. Learning to rank with ties. In
Proceedings of the 31st annual international ACM SIGIR conference on Research and
development in information retrieval, pages 275–282, 2008.
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image
translation using cycle-consistent adversarial networks. In International Conference on
Computer Vision, pages 2223–2232, 2017.
35