0% found this document useful (0 votes)

25 views35 pages

Generative Adversarial Ranking Nets: Yinghua Yao

Uploaded by

amrointerlockbricks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views35 pages

Generative Adversarial Ranking Nets: Yinghua Yao

Uploaded by

amrointerlockbricks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Journal of Machine Learning Research 25 (2024) 1-35 Submitted 4/23; Revised 3/24; Published 4/24

Generative Adversarial Ranking Nets

Yinghua Yao∗ [email protected]

Center for Frontier AI Research, A*STAR, Singapore
and
Institute of High Performance Computing, A*STAR, Singapore

Yuangang Pan† [email protected]

Center for Frontier AI Research, A*STAR, Singapore
and
Institute of High Performance Computing, A*STAR, Singapore

Jing Li [email protected]
Center for Frontier AI Research, A*STAR, Singapore
and
Institute of High Performance Computing, A*STAR, Singapore

Ivor W. Tsang [email protected]

Center for Frontier AI Research, A*STAR, Singapore
and
Institute of High Performance Computing, A*STAR, Singapore

Xin Yao [email protected]

Department of Computing and Decision Sciences, Lingnan University, Hong Kong

Editor: Matthew Hoffman

Abstract
We propose a new adversarial training framework – generative adversarial ranking networks
(GARNet) to learn from user preferences among a list of samples so as to generate data
meeting user-specific criteria. Verbosely, GARNet consists of two modules: a ranker and
a generator. The generator fools the ranker to raise generated samples to the top; while
the ranker learns to rank generated samples at the bottom. Meanwhile, the ranker learns
to rank samples regarding the interested property by training with preferences collected
on real samples. The adversarial ranking game between the ranker and the generator en-
ables an alignment between the generated data distribution and the user-preferred data
distribution with theoretical guarantees and empirical verification. Specifically, we first
prove that when training with full preferences on a discrete property, the learned distri-
bution of GARNet rigorously coincides with the distribution specified by the given score
vector based on user preferences. The theoretical results are then extended to partial
preferences on a discrete property and further generalized to preferences on a continuous

∗. The first version of this work was done when the author was a PhD student at Southern University of
Science and Technology (SUSTech), China and University of Technology Sydney (UTS), Australia.
†. The corresponding author.

c 2024 Yinghua Yao, Yuangang Pan, Jing Li, Ivor W. Tsang and Xin Yao.
License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided
at http://jmlr.org/papers/v25/23-0461.html.
Yao, Pan, Li, Tsang, Yao

property. Meanwhile, numerous experiments show that GARNet can retrieve the distri-
bution of user-desired data based on full/partial preferences in terms of various interested
properties (i.e., discrete/continuous property, single/multiple properties). Code is available
at https://github.com/EvaFlower/GARNet.
Keywords: Generative Adversarial Network, Controllable Generation, User Preferences,
Adversarial Ranking, Relativistic f -Divergence

1. Introduction
Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Arjovsky et al., 2017;
Huang et al., 2022) are a popular generative model to approximate a data distribution via
adversarial training between two networks, wherein a generator learns to generate data alike
real data, whereas a discriminator learns to distinguish real data from synthetic data com-
ing from the generator. Although unconditional generative modeling presents an intriguing
technical challenge, it lacks the ability to control the generation of data and has limited
practical relevance. In contrast, controllable generative models can generate data to match
some user-desired properties, meeting the needs of many real-world applications (Engel
et al., 2018). Recent success has been driven in the field of using some predefined at-
tributes/classes or an evaluator that can evaluate specific properties for samples to direct
the generation process of GANs (Mirza and Osindero, 2014; Asokan and Seelamantula,
2020; Engel et al., 2018; Gupta and Zou, 2019; Samanta et al., 2020).
However, in many real-world applications, users may not be able to explicitly articulate
the data they are interested in via the desired class/attribute labels or high values from
the property evaluator (Engel et al., 2018). Instead, more naturally, users would often
implicitly describe their preferences for data by ranking two or more samples in terms
of a certain interested property (Fürnkranz and Hüllermeier, 2017). In particular, GANs
are supposed to generate more (or even exclusively) user-desired data (i.e., data for which
users have high preferences) by learning from preferences, which can have many promising
applications. Here are some examples we propose. (1) In terms of image retrieval, it is
more precise to adopt visual comparisons than conventional textual feedback to match the
user’s mental model of the desired content (Yu and Kovashka, 2020). Thus, data generated
based on preferences is a good representation of user needs, which can be used to retrieve
user-desired images from the database. (2) In terms of drug design, the experts may not be
able to clearly describe the structure of desired drugs for target diseases but can assign their
preferences over candidate drug structures based on their expertise or lab experiments. The
process of drug discovery can be formulated as learning user-desired drug data distribution
with the preferences. (3) In terms of reinforcement learning (RL) without an off-the-shelf
reward function, it is feasible to provide human preferences between pairs of trajectories
of the agent (Christiano et al., 2017; Wirth et al., 2017). Then, the generative model can
produce user-desired demonstrations by learning from the preferences. The RL tasks can
be solved as imitation learning with these generated demonstrations.
There were a few works (Yu and Kovashka, 2020; Yao et al., 2022) applying user pref-
erences to direct the generation of GANs. However, they required heavy interactions with
humans to obtain preferences for generated samples during training so as to generate user-
desired data. Moreover, these studies only considered the simplest pairwise preferences,
failing to handle more general partial preferences. In this study, we consider teaching

2
Generative Adversarial Ranking Nets

GANs to learn a user-preferred data distribution directly from a more general form of user
preferences, i.e., a collection of partial preferences over the training samples, without extra
annotations on generated samples during the training. Incorporating such preferences into
the training of GANs poses several challenges. First, learning a user-preferred data distri-
bution means that the distribution alignment is imposed between a portion of the training
data and generated data, while GANs match the generated data distribution to the whole
training data distribution (Goodfellow et al., 2014; Mao et al., 2017). Therefore, GANs’
original framework cannot achieve the goal of this study. Second, the discriminator in GANs
is designed to distill label information on individual samples (Goodfellow et al., 2014; Mirza
and Osindero, 2014; Odena et al., 2017). So it cannot deal with preference data defined
upon a list of samples, which is usually formulated as listwise ranking problems (Lin, 2010).
Third, the key to eliciting adversarial classification in GANs is to assign the generated sam-
ples with “fake” labels when training the discriminator and “real” labels when training the
generator, respectively (Goodfellow et al., 2014; Dai et al., 2017). Nevertheless, constructing
the supervision for the generated samples adversarially in the context of users’ preferences
(a.k.a, listwise ranking (Cao et al., 2007; Pan et al., 2022)) has not been explored.
To tackle the challenges, we propose a new distribution matching mechanism between
generated data and high-ranked data while considering all data in the training. Meanwhile,
in order to learn from user preferences, GANs need to handle a list of samples. Motivated
by learning to rank (Cao et al., 2007), we tailor-design a ranker to replace the discriminator
in GANs. Therefore, we reform the adversarial learning framework of GAN and propose
Generative Adversarial Ranking Nets (GARNet) to learn from user preferences. Specifically,
our GARNet consists of a ranker and a generator. While a vanilla discriminator performs a
binary classification task by taking an individual sample as input, our proposed ranker learns
to rank samples in the context of partial ranking lists regarding the interested property. In
addition, the ranker assigns the generated samples to the lowest position. As the generator
targets the user-preferred distribution, it is trained to synthesize samples that confuse the
ranker to be ranked higher than the real samples. Competition between the ranker and the
generator drives both two modules to improve themselves until the generator converges to
a user-preferred data distribution. The main contributions are summarized as follows:
• We propose Generative Adversarial Ranking Nets (GARNet) for data generation guided
by user preferences. In particular, we design an adversarial ranking game between a
ranker and a generator, which enables a distribution alignment between generated data
and a portion of the training data (i.e., user-desired data) (Corollary 7).
• We prove that the objective function of the ranker in GARNet defines a relativistic
f -divergence between the user-preferred data distribution and the generated data distri-
bution, while the generator is to minimize the divergence to learn the user-preferred data
distribution. Meanwhile, we prove that the distribution learned by GARNet is rigorously
determined by the score vector given by the users (Proposition 5).
• We empirically show our GARNet can learn a user-preferred data distribution (deter-
mined by the given score vector) from preferences in terms of discrete and continuous
properties. We further show that our GARNet can better retrieve the distribution of
user-desired data from partial preferences than various baselines. The potential of GAR-
Net is also validated in the application of imbalanced learning.

3
Yao, Pan, Li, Tsang, Yao

2. Background and Problem Statement

In this section, we first review existing work that applies labels or property evaluators
to control generation with desired properties. Then we formulate our problem setting –
controllable generation by preferences.

2.1 Property-oriented Generation by Labels/Evaluators

Many works (Mirza and Osindero, 2014; Odena et al., 2017; Miyato and Koyama, 2018;
Sohn et al., 2015; Li et al., 2020; Pumarola et al., 2020; Ho and Salimans, 2022; Ding et al.,
2020; Wang et al., 2021; Triess et al., 2022) proposed conditional variants of generative
models that introduce conditioning variables C to enable user control for the generation,
i.e., modeling the conditional distribution p(x|C). However, the condition C involving dis-
crete class labels and attribute labels (Zhu et al., 2017) makes a restricting user control
(discussed in Section 2.2.1). The conditions involving continuous labels can have a fine-
grained control, but they require complete knowledge about the properties, implying that
there exists a universal evaluation criterion to annotate data with absolute values in terms
of the properties, which could be hard to access in some real-world applications (Christiano
et al., 2017).
A few works (Engel et al., 2018; Gupta and Zou, 2019; Samanta et al., 2020) proposed to
apply an accessible property evaluator h(x), which can provide evaluations for data samples
w.r.t. a specified property to control the generation of generative models. However, these
evaluators may not exist in real-world applications (Christiano et al., 2017) since they
also require complete knowledge about the properties. In addition, their optimization for
desired properties requires extensive feedback loops where all generated samples are sent
to the evaluator h(x) for evaluation (Gupta and Zou, 2019), which would incur heavy cost
especially when the evaluation process is expensive. Further, an additional problem is that
they need to carefully balance the goal of generation quality and that of producing data
with desired properties (Engel et al., 2018; Samanta et al., 2020) .

2.2 Our Problem: Property-oriented Generation by Preferences

Let X = {xn }N M
n=1 denote a training dataset with N samples and S = {sm }m=1 denote a
1
collection of M preferences defined on subsets of X . In particular, each preference s ∈ S
is an ordered list, namely,

s := s1 > s2 > . . . > sl , and si ∈ X , ∀i = 1, 2, . . . , l, (1)

where 2 ≤ l N denotes the number of samples contained in s. l is usually varied for

different preferences. Note that with l N , annotating preferences simply relies on partial
knowledge about the properties (Fürnkranz and Hüllermeier, 2017) because it only involves
comparisons among small subsets of the dataset. Our target is to learn a generative model
Pg (x) that is equal to the user-preferred data distribution Pu (x) from partial preferences S.

1. Regarding data collection, the preferences S are collected among multiple users following the way in
rank aggregation (Lin, 2010; Saquil et al., 2018; Pan et al., 2018).

4
Generative Adversarial Ranking Nets

Table 1: Main mathematical notations in this paper. Elements in a set (e.g., X ) are indexed
using subscripts (e.g., xi ), whereas objects ranked at the i-th position in a ranking list s
are indexed using superscripts si .

Notation Explanation
X training samples X = {xn }N n=1
Y class labels Y = {y1 , y2 , . . . , yT } endowed with an order y1 > y2 > . . . > yT
O score space for a continuous property
y(x) function that maps sample x to its class label
o(x) function that maps sample x to its underlying ground-truth ranking score
xi > xj sample xi is preferred over sample xj , xi , xj ∈ X
s preferences, s := s1 > s2 > . . . > sl , si ∈ X
S a collection of preferences S = {sm }M m=1 over training data X
π(s) ground-truth score vector for preference s
s(R) target preference to train the ranker, i.e., s(R) := s1 > . . . > sl > xg
s(G) target preference to train the generator, i.e., s(G) := xg > s1 > . . . > sl
Pu user-preferred data distribution
Pg distribution of the generative model
Pi distribution of Xi = {x|y(x) = yi }
Df relativistic f -divergence
R ranker
G generator

To be specific, Pu (x) should allocate high density to global high-ranked data while low or
even zero density to global low-ranked data2 .
For example, as shown in Fig. 1, we assume that shoe images with different strengths in
terms of the open attribute are distributed evenly. The collected pairwise preferences reveal
that users prefer more open shoes. Accordingly, the generative model is expected to learn
a distribution that assigns a larger density on shoe images with larger open values. Note
that the generative model has the advantage of generating novel images different from the
training data (Song and Ermon, 2019).

Density

Training data Generated data

Figure 1: Illustration of user-preferred data distribution learning guided by user preferences

(on the open attribute).

2. User preferences help to derive a global ranking over all the data (Cao et al., 2007; Lu and Boutilier,
2011). Global high-ranked data denote those data ranked high in the global ranking. Global low-ranked
data denote those data ranked low in the global ranking.

5
Yao, Pan, Li, Tsang, Yao

2.2.1 Controlled by Preferences VS. Conditioned on Labels

Class labels or attribute labels are argued to have limited description capacity (Parikh and
Grauman, 2011) since such category-level labels cannot capture intra-category differences.
In contrast, user preferences can be used to describe fine-grained information (Parikh and
Grauman, 2011; Chen et al., 2013; Raman and Joachims, 2014). For example, people feel
tangled when they are asked to decide whether a face image is smiling or not, as there always
exist many images that are difficult to categorize, apart from a small amount of obvious
smiling images and non-smiling images. Instead, people can compare a set of images and
rank them in terms of the strength of the smile attribute.
On the other hand, since generation conditioned on class labels is dominated by those
classes with sufficient training observations, a class with limited samples would be over-
looked by the generative model. Instead, generation controlled by preferences can stress
the modeling for “preferred class”. We will empirically show that preferences-guided distri-
butional generation can serve as a remedy to improve extremely imbalanced class learning,
while generation conditioned on labels fails in this case (See Section 7.4).

3. GARNet for Preferences w.r.t. a Discrete Property

First of all, we consider a scenario in which the ranking relation among training samples is
based on a discrete property. In particular, each training sample belongs to one of a finite
set of classes which possess a natural order.
Definition 1 (Preferences w.r.t. a discrete property) Let X = {xn }N n=1 be a set of
training instances and Y = {y1 , y2 , . . . , yT } be a set of class labels endowed with an order
y1 > y2 > . . . > yT , where T is the number of ordered classes. Assume each training
instance xi is assigned with a label y(xi ) ∈ Y, where y(·) is a function that outputs the label
for a sample. The preference over a group of samples {xi1 , xi2 , . . . , } ⊆ X is defined as

xi1 > xi2 > xi3 > . . . ,

(2)
if y(xi1 ) > y(xi2 ) > y(xi3 ) > . . . .

We ignore the preference lists containing samples with the same labels since such prefer-
ence lists can be simply transformed into the preferences (Eq. (2)) by removing redundant
samples (Pan et al., 2022).
Example. Suppose users prefer small digits on the MNIST dataset. Then for digits 0 to 9,
the order would be 0 > 1 > 2 > . . . > 9, according to which users express their preferences
over MNIST images.
In the following part of this section, we propose a framework called Generative Adver-
sarial Ranking Nets (GARNet) for the distribution learning guided by the above-mentioned
preferences (problem definition in Section 2.2). Our GARNet consists of a ranker R (pa-
rameterized by θR ) and a generator G (parameterized by θG ). In particular, an adversarial
ranking process is defined between the generator and the ranker. Namely, they are trained
against the goal of ranking defined between the real data and the generated data (See
Fig. 2). The competition between the generator and the ranker drives the generated dis-
tribution Pg (x) (simplified as Pg ) aligned with the user-preferred data distribution Pu (x)
(simplified as Pu ).

6
Generative Adversarial Ranking Nets

target preference for ranker

preference over a subset of

Ranker
Generator
random noise generated
sample
target preference for generator

Figure 2: Framework of GARNet.

3.1 Given Full Preferences

We open the description of our GARNet with the case given a collection of full prefer-
ences S = {sm }M m=1 . That is, the length of each preference l is equal to the number of
ordered classes T . Namely

s := s1 > s2 . . . > sl , l = T where si ∈ X , ∀i = 1, 2, . . . , l. (3)

A full preference s would include samples from all ordered classes. Consequently,

y(si ) ≡ yi , ∀s ∈ S and ∀i = 1, 2, . . . , T. (4)

Denoting Pi as the distribution of data Xi = {x|y(x) = yi , x ∈ X }, we have si ∼ Pi .

3.1.1 Listwise Ranking

The ranker R learns a ranking function from the preferences S. We employ ListNet (Cao
et al., 2007), which is a cross-entropy based Learning-to-Rank (L2R) function that maxi-
mizes:
Xl
LL2R (π(s), R(s)) = σ(π(s))i log σ(R(s))i . (5)
i=1

With a slight abuse of notation, s = (s1 , s2 , . . . , sl ).

π(s) denotes the ground-truth score
vector for the preference s, which can be explicitly or implicitly given by humans (Cao
et al., 2007). R(·) denotes the score vector for the preference s predicted by the ranker R.
In specific, the ranker R acts as a nonlinear feature extractor with a scalar output. Given
feature matrix w = [w1 , w2 , . . . , wl ] of l samples, the ranker R outputs a score vector R(w) =
[R(w1 ), R(w2 ), . . . , R(wl )], where R(wi ) ∈ R. σ(·) is a softmax function that takes a list of
r
scalar values r = [r1 , r2 , . . . , rl ] as input, i.e., σ(r)i = Pl e i rj . σ(π(s))i calculates the top-1
j=1 e
probability of object si , si
i.e., the probability of being ranked on the top given the scores
of all the objects π(s). Similarly, σ(R(s))i gives the top-1 probability of object si given the
score vector R(s).

3.1.2 Generative Adversarial Ranking

Given a preference s and a generated sample xg = G(z) ∼ Pg (x), where z ∼ Z is a random
noise, the ranker R incorporates the generated sample xg into the ranking list s and outputs

7
Yao, Pan, Li, Tsang, Yao

Algorithm 1 Generative Adversarial Ranking Nets

1: Input: Training data X = {xn }N M
n=1 , user preferences S = {sm }m=1 , batch size B, score
vector π, ranker R and generator G.
2: Output: Generator G for user-preferred data distribution, i.e., Pg (x) = Pu (x).
3: repeat
4: Sample a mini-batch of preferences {si }B i=1 from S.
5: Get fake samples {xgi }B
i=1 from the generator G, i.e., xgi = G(zi ) where zi is a random
noise.
(R)
6: Following Eq. (6a), construct target preferences {si }B i=1 for the ranker R.
7: Train the ranker R according to Eq. (8a).
(G)
8: Following Eq. (6b), construct target preferences {si }B i=1 for the generator G.
9: Train the generator G according to Eq. (8b).
10: until convergence

a new ranking score vector as R(s, xg ). Motivated by vanilla GAN (Goodfellow et al., 2014)
which designs a target real/fake label for each generated sample to trigger the adversarial
game between the discriminator and the generator, we construct a target preference for a
generated list of samples consisting of s and xg , i.e., (s, xg ), as adversarial supervision. To
promote the competition between the ranker R and the generator G, we design the target
preference in two different ways, namely,

target preference for R s(R) := s1 > . . . > sT > xg , (6a)

(G) 1 T
target preference for G s := xg > s > . . . > s . (6b)

The generator fools the ranker to grade the generated samples as the best (Eq. (6b));
while the ranker learns to rank them at the lowest position (Eq. (6a)). Then, we define the
objective for the ranker R and the generator G as follows, respectively,
h i
E LL2R π s(R) , R s(R)

sup , (7a)
R:X →R s∼S
xg ∼Pg
h i
LL2R π s(G) , R s(G)

sup E . (7b)
G:Z→X s∼S
xg ∼Pg

Thus, the training loss for the ranker and the generator with the mini-batch data can be
formulated as follows, respectively:
B
(R) (R)
X
R
L = LL2R π si , R si , (8a)
i=1
B
(G) (G)
X
LG = LL2R π si , R si , (8b)
i=1

where B is the batch size. The training algorithm is summarized in Algorithm 1.

GANs’ theories usually show GANs estimate a divergence between the generated data
distribution and the real data distribution, to support that they are good estimators of the

8
Generative Adversarial Ranking Nets

target data distribution. Thus similarly, we will show our GARNet is a good estimator
of user-preferred data distribution by deriving a new divergence. In terms of most GAN
variants, the objective function of the discriminator at optimum is a divergence (Jolicoeur-
Martineau, 2020; Goodfellow et al., 2014; Arjovsky et al., 2017). In the following, we prove
that training the ranker in GARNet (Eq. (7a)) is equivalent to estimating a divergence be-
tween the user-preferred data distribution Pu (x) and the generated data distribution Pg (x).
The generator in GARNet (Eq. (7b)) is trained to minimize the divergence so as to achieve
Pg (x) = Pu (x).

Definition 2 (Divergence) Let P ∈ S and Q ∈ S be probability distributions where S is

the set of all probability distributions with common support. A function D : (S, S) → [0, +∞)
is a divergence if it respects the following two conditions:

D(P, Q) ≥ 0
(9)
D(P, Q) = 0 ⇐⇒ P = Q.

Theorem 3 (Relativistic f -divergence (Jolicoeur-Martineau, 2020)) Let f : R →

R be a concave function such that f (0) = 0, f is differentiable at 0, f 0 (0) 6= 0, supv f (v) =
M > 0, and arg supv f (v) > 0. Assume P and Q are two distributions with support X , we
have that
Df (P, Q) = sup 2 E [f (C(x) − C(x̃))] (10)
C:X →R x∼P
x̃∼Q

is a relativistic f -divergence.

Theorem 3 demonstrates the optimal discriminator in relativistic GAN (Jolicoeur-Martineau,

2019) estimates a relativistic f -divergence between the real data distribution and the gen-
erated data distribution. Its discriminator that is adapted from the concave function
f (z) = log(sigmoid(z)) + log(2) defines a pairwise ranking loss between a real sample and
a fake sample, and we find that
1
f (C(x) − C(x̃)) = log( ) + log(2)
1+ e−(C(x)−C(x̃))
eC(x)
= log( ) + log(2) (11)
eC(x) + eC(x̃)
1
= LL2R (π(s), C(s)) + log(2),

1 is valid if π(s) = [0, −∞] and s := x > x̃ where x ∼ P and x̃ ∼ Q.

Inspired by the connection between the relativistic f -divergence (Eq. (10)) and the
tailored loss of our GARNet (Eq. (6a), Eq. (7a)), we argue that the optimal ranker of our
GARNet also implicitly estimates a relativistic f -divergence, but between the user-preferred
data distribution and the generated data distribution, which will be proven in Theorem 4.
Accordingly, we introduce a new relativistic f -divergence between the user-preferred data
distribution Pu and the generated data distribution Pg , which generalizes the targeted data
Pu = Ti=1 qi Pi with a user-
P
distribution P (x) in Theorem 3 to the mixture distribution
P
specified mixing ratio where q1 > q2 > . . . > qT and i qi = 1. In particular, Pi is the

9
Yao, Pan, Li, Tsang, Yao

distribution of Xi = {x|y(x) = yi , x ∈ X }. The mixture distribution Pu allocates a larger

density qi to the higher-ranked class i, consistent with our problem setting in Section 2.2.

Theorem 4 (Relativistic f -divergence between Pu and Pg ) Let f : R → R be a con-

cave function such that f (0) = 0, f is differentiable at 0, f 0 (0) 6= 0, supv f (v) = M > 0,
and arg
Psupv f (v) > 0. Let Pu be the mixture distribution of the whole ordered data, i.e.,
Pu = Ti=1 qi Pi , where q1 > q2 > . . . > qT and i qi = 1. Let Pg be the distribution of
P
generated data xg . We have that

T
" !#
X
Df (Pu , Pg ) = sup E f qi R(si ) − R(xg ) , (12)
R:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg

is a relativistic f -divergence between Pu and Pg .

According to Definition 2, we prove that Df (Pu , Pg ) is a divergence between Pu and Pg

following the three steps:
#1 Prove that Df (Pu , Pg ) ≥ 0.
#2 Prove that Pu = Ti=1 qi Pi = Pg =⇒ Df (Pu , Pg ) = 0.
P

#3 Prove that Df (Pu , Pg ) = 0 =⇒ Pu = Ti=1 qi Pi = Pg .

P
The details are left in Appendix A.
Instead of targeting general preferences that require T + 1 hyperparameters for the
ground-truth score vector π s(R) in Eq. (7a), we focus our problem setting on preferences
where the first T terms of π s(R) form an arithmetic progression, which depends only

on the first term and the common difference, making it generally applicable to preferences
of various
lengths. Particularly, we demonstrate that when choosing the first T terms of
π s (R) as an arithmetic progression, the optimal ranker of our GARNet approximates
a relativistic f -divergence between the user-preferred data distribution and the generated
data distribution shown in Theorem 4.

Proposition 5 Given the training dataset X = {xn }N n=1 and a collection of full prefer-
M 1 2 T i
PT S = {sm }m=1 , where each preference s := s > s . . . > s ∈ S and s ∼ Pi . Let(R)
ences P
u =
q P
i=1 i i be the mixture distribution of the whole ordered data where q i = σ π s i
.
Let Pg be the distribution of generated data xg . Given a fixed generator G, the optimal
ranker R∗ of our GARNet (Eq. (7a)) approximates the relativistic f -divergence between Pu
1
and Pg , i.e., Df (Pu , Pg ), if π s(R) = [a + (T − 1)d, a + (T − 2)d, . . . , a, b] and ea−b → 0.

Proof First of all, given the definition of π s(R) , we have

eb 1
lim qT +1 = lim σ π s(R) = lim PT = 0, (13a)
1
ea−b
→0 1
ea−b
→0 T +1 1
ea−b
→0 eb + j=1 ea+(T −j)d
T
X 2
lim qi = lim 1 − qT +1 = 1. (13b)
1 1
→0 →0
ea−b i=1 ea−b

10
Generative Adversarial Ranking Nets

1
where 1 , 2 follow the Squeeze theorem (Stewart et al., 2020) given 0 < qT +1 < ea−b
and
1 1
ea−b
→ 0. Note in practice, we set a − b = 10 as ea−b → 0.

According to Eq. (5), we have

TX
+1
LL2R π s(R) , R s(R) = σ(π(s(R) ))i log σ(R(s(R) ))i

(14)
i=1
T i
!
1 X eR(s ) eR(xg )
= qi log + qT +1 log
eR(xg ) + Tj=1 eR(sj ) eR(xg ) + Tj=1 eR(sj )
P P
i=1
PT i)
e i=1 qi R(s eR(xg )
= log + qT +1 log
eR(xg ) + Tj=1 eR(sj )
PT P T i=1 qi
P
eR(xg ) + R(sj )
j=1 e

2 1
= log PT .
R(sj )− T
i PT P i
eR(xg )− i=1 qi R(s ) + j=1 e
i=1 qi R(s )

1 follows the definition of the target preference s(R) in Eq. (6a). 2 is valid due to Eq. (13a)
and Eq. (13b).

Meanwhile, when the ranker is approaching the optima R∗ for a fixed generator G,
i.e., σ R∗ s(R) = σ π s(R) , we have R∗ (s(R) ) = π s(R) + δ due to the translation

invariance of softmax (Laha et al., 2018), i.e., σ(r + δ) = σ(r). Therefore, the first T terms
of R∗ (s(R) ) is also arithmetic progression with a common difference of −d same as that of
π s(R) . Then, we have

T
X T
X
R∗ (sj )− qi R∗ (si ) = R∗ (s1 ) − (j − 1)d − qi R∗ (s1 ) − (i − 1)d

(15)
i=1 i=1
T T T
! ! !
X X 1 X
= R∗ (s1 ) − qi R∗ (s1 ) − j−1+ qi (1 − i) d = qi i − j d = cj .
i=1 i=1 i=1

where cj is a constant ∀j = 1, 2, . . . , T , exclusively determined by the pre-specified score

vector π(s). 1 is valid due to Eq. (13b).

11
Yao, Pan, Li, Tsang, Yao

Therefore, the objective for the ranker R can be approximated as follows:

1

1
Eq. (7a) ≈ sup E log PT i )−R(x
R:X →R s∼S
xg ∼Pg c + e−( i=1 qi R(s g ))

" #
2 1
= sup E log PT
c + e−(( )−R(xg ))
i)
R:X →R s1 ∼P1 i=1 qi R(s
...
sT ∼PT
xg ∼Pg
" #
3 1 (16)
= sup E log PT + log(c + 1)
c + e−(( )−R(xg ))
i)
R:X →R s1 ∼P1 i=1 qi R(s
...
sT ∼PT
xg ∼Pg
T
" !#
4 X
i
= sup E f qi R(s ) − R(xg ) ,
R:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg

PT
R(sj )− i)
PT
1 is reasonable because i=1 qi R(s
j=1 e in Eq. (14) would degenerate to a constant
c = ec1 +c2 +...+cT
following Eq. (15), being independent of the ranker when the ranker is
approaching the optima. 2 is valid as y(si ) ≡ yi according to Eq. (4). 3 is valid due to
the addition of a constant. 4 is obtained by denoting f (z) = log c+e1−z + log(c + 1).
Note f is a concave function; f (0) = 0; f is differentiable at 0; f 0 (0) 6= 0; supx f (x) =
M > 0; and arg supx f (x) > 0. According to Theorem 4, the optimal ranker of our GARNet
approximately estimates a relativistic f divergence.

Remark 6 In theory, it seems feasible to directly adopt the following objective to learn Pu
according to Theorem 4.
T
" !#
X
sup E f qi R(si ) − R(xg ) , (17a)
R:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg
T
" !#
X
i
sup E f R(xg ) − qi R(s ) , (17b)
G:X →R s1 ∼P1 i=1
...
sT ∼PT
xg ∼Pg

where the concave function f can be defined as f (z) = log(sigmoid(z))+log(2) like Jolicoeur-
Martineau (2019). However, in practice, the above objective cannot provide sufficient gradi-
ent for the generator G to learn properly. Specifically, in the early training, the generator G
is poor, and the ranker R can easily assign lower values to the generated sample xg than
PT i
real samples. i=1 qi R(s ) − R(xg ) would be large and then gradient vanish occurs.

12
Generative Adversarial Ranking Nets

Based on Proposition 5, we set a and b to be 10and 0, respectively. The common

difference d for the arithmetic progression in π s(R) determines the mixture distribu-
tion Pu = Ti=1 qi Pi . Namely, the mixture distribution Pu would be close to the distribution
P
of top-1 data when d is sufficiently large, which refers to our Corollary 7; while Pu would
be close to a distribution where the whole ordered data is uniformly distributed when d is
very small.
Corollary 7 The optimal ranker of GARNet (Eq. (7a)) defines a relativistic f -divergence
between the distribution of top-1 data P1 and the generated data distribution Pg , i.e.,
Df (P1 , Pg ) when e−d → 0.
Proof According to the definition of q1 in Proposition 5, we have
ea+(T −1)d
q1 = P
T a+(T −i)d + eb
i=1 e
ea+(T −1)d (18)
=
ea+(T −1)d 1 + e−d + . . . + e−(T −1)d + eb−(a+T −1)d)
1 1
= >
−d
1 + e + ... + e −(T −1)d +e b−(a+T −1)d) 1 + T e−d
Since 1+T1e−d < q1 < 1, we have q1 → 1 following the Squeeze theorem (Stewart et al., 2020)
when e−d → 0. The proof is completed.

Proposition 8 Given the optimal ranker R∗ as demonstrated in Proposition 5, the genera-

tor of GARNet (Eq. (7b)) is minimizing the divergence between Pu and Pg , i.e., Df (Pu , Pg ).
Proof Given R∗ , the objective for the generator can be formulated as:
h i
E LL2R π s(G) , R∗ s(G)

sup . (19)
G:Z→X s∼S
xg ∼Pg

We update the parameters of generator θG as follows:

h i
t+1 t
+ ∇θG E LL2R π s(G) , R∗ s(G)

θG = θG , (20)
s∼S
xg ∼Pg

which maximizes the ranker’s output for the generated samples R∗ (xg ). Then, Df (Pu , Pg )
will be minimized.

Proposition 5 and Proposition 8 demonstrate that the distribution learned by our GAR-
Net (Eq. (7)) is determined by the pre-specified score vector π(s). We present a case study
on MNIST (Lecun et al., 1998) to justify it by setting d to 0.1, 0.5, 1 and 5, respectively. As
shown in Fig. 3, the proportion of generated data from GARNet is almost consistent with
the top-1 probability calculated by the specified score vector, i.e., q = σ(π(s)).
In terms of preferences over a discrete property, the most desirable data is supposed to
belong to the global top-1 class, i.e., y(x) = y1 . As clarified in Corollary 7, we can simply
set up a sufficiently large d (e.g., 5) to achieve this goal.

13
Yao, Pan, Li, Tsang, Yao

1.0 1.0 1.0 1.0

q q q q
q q q q
Proportion

Proportion

Proportion
0.5 0.5 0.5 0.5

0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9

Digit Digit Digit Digit
(a) d = 0.1 (b) d = 0.5 (c) d = 1 (d) d = 5

Figure 3: GARNet learns the distribution Pu with different pre-specified score vectors π(s)
from full preferences w.r.t. a discrete property on MNIST (preferring small digits, i.e.,
0 1 . . . 9). d is the common difference for π s(R) (Proposition 5). q = σ(π(s))
calculates the ground-truth top-1 probability of each digit class. q̃ calculates the proportion
of different digit classes for generated data from GARNet. The digit values for the generated
data are evaluated by a pretrained classifier for digit classification.

3.2 Given Partial Preferences

In real-world situations, full preferences are not always available. A more realistic scenario
is to provide partial preferences of various lengths as training dataset (Lin, 2010). Namely,

s := s1 > s2 > . . . > sl , l ≤ T. (21)

Similarly, we construct the adversarial ranking process between the ranker and generator:

ranker s(R) := s1 > . . . > sl > xg ; (22a)

generator s(G) := xg > s1 > . . . > sl . (22b)

We adopt the objective (Eq. (7)) to learn the user-preferred distribution Pu (x) via the
adversarial ranking (Eq. (22)). Thanks to the consistency of the score function (Lemma 9),
our GARNet can still learn the desired distribution from the partial preferences.

Lemma 9 (Consistency of the score function (Xia et al., 2008)) Given a collection
of partial preferences S = {sm }M
m=1 , where 2 ≤ |sm | ≤ T , the ListNet loss (Eq. (5)) adopted
in our GARNet can recover the optimal score function for the ranker if S contains sufficient
preferences and the samples X are well covered3 .

According to Lemma 9, the optimal ranker can recover s1 > s2 > . . . > sT > xg .
Therefore, GARNet can still learn a user-preferred data distribution from partial prefer-
ences, which assigns greater density to higher-ranked data. We present a case study on
MNIST (Lecun et al., 1998) with partial preferences (2 ≤ l ≤ 10). We set the ground-truth
score vector π s(R) for Eq. (22a) as [a + (l − 1)d0 , a + (l − 2)d0 , . . . , a, b] similar in Proposi-
tion 5. As shown in Fig. 4, the proportion of generated data from GARNet is larger when
the digit class is ranked higher, which is consistent with user preferences. Especially, when
d0 is sufficiently large, GARNet only generates high-ranked data, namely, digit zero and one.
3. It doesn’t mean to include every sample into the preferences but requires the samples included to be
approximately uniformly distributed in the score space.

14
Generative Adversarial Ranking Nets

1.0 1.0 1.0 1.0

q q q q
q q q q
Proportion

Proportion

Proportion
0.5 0.5 0.5 0.5

0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9

Digit Digit Digit Digit
(a) d0 = 0.1 (b) d0 = 0.5 (c) d0 = 1 (d) d0 = 5

Figure 4: GARNet learns the distribution Pu with different pre-specified score vectors π(s)
from partial preferences (2 ≤ l ≤ 10) w.r.t. a discrete property on MNIST (preferring small
digits). d0 is the common difference of the ground-truth score vector π s(R) for Eq. (22a).

q̃ calculates the proportion of different digit classes for generated data from GARNet. q is
copied from the results in Fig. 3 for references. The digit values for the generated data are
evaluated by a pretrained classifier for digit classification.

We found that training on relatively long preferences (7 ≤ l ≤ 10), GARNet can converge
to top-1 data distribution when setting d0 = 5. Note the resultant proportion per digit q̃ is
no longer consistent with pre-specified values q as y(si ) ≡ yi , ∀s ∈ S and ∀i = 1, 2, . . . , T
does not hold anymore for partial preferences.

4. GARNet for Preferences w.r.t. a Continuous Property

We consider a scenario that the ranking relation among training samples is based on a
continuous property. Particularly, each training sample xi is associated with an underlying
score o(xi ) that represents the user’s preference for xi in terms of the property.

Definition 10 (Preferences w.r.t. a continuous property) Given a training dataset

X = {xn }N n=1 , where each training instance xn ∈ X has an underlying ground-truth ranking
score o(xn ) ∈ [A, B] ⊆ O in terms of a certain continuous property, the preference among
a list of samples {xi1 , xi2 , . . . , } ⊆ X is defined as

xi1 > xi2 > xi3 > . . . ,

(23)
if o(xi1 ) > o(xi2 ) > o(xi3 ) > . . . .

Example. Suppose users prefer smiling face images. Then users would assign higher pref-
erences to those images with bigger smiles, i.e., the score for sample x in terms of the
“smiling” attribute o(x) being larger.

Theorem 11 Given a collection of preferences in terms of a continuous property over

[A, B], see Eq. (23), it can be transformed equivalently to the preferences in terms of finite
ordered classes if we consider the samples with close ranking scores as ties.

Proof According to Heine-Borel theorem (Jeffreys et al., 1999), we have

O = {(oi − i , oi + i )|oi ∈ [A, B], i > 0, i = 1, 2, . . . , T 0 }, (24)

15
Yao, Pan, Li, Tsang, Yao

1.00 1.00 1.00 1.00

q q q q
0.75 q 0.75 q 0.75 q 0.75 q
Proportion

Proportion

Proportion
0.50 0.50 0.50 0.50

0.25 0.25 0.25 0.25

0.00 1 2 3 4 5 0.00 1 2 3 4 5 0.00 1 2 3 4 5 0.00 1 2 3 4 5

Top Top Top Top

(a) d = 0.1 (b) d = 0.5 (c) d = 1 (d) d = 5

Figure 5: GARNet learns the distribution Pu with different pre-specified score vectors
π(s) from full preferences w.r.t. a continuous property on LFW (preferring smiling faces).
q = σ(π(s)) calculates the top-1 probabilities of each class. q̃ calculates the proportion of
different classes for generated data from GARNet. Top 1 class represents images locating in
the score interval of those real images ranked top 20% in terms of the smile attribute; top
2 class represents images locating in the score interval of those real images ranked between
top 20% and top 40%; so on and so forth. The labels for the generated data are obtained
based on the score given by a pretrained ranker for face images w.r.t. the smile attribute.

which is a finite subcover of [A, B]. Without loss of generality, we assume

o1 > o2 > . . . > oT 0 . (25)

Then, we can construct a union of the non-overlapping intervals to cover the score interval
[A, B]: [
O= [oi − i , oi + i ) ∪ [o1 − 1 , o1 + 1 ], (26)
i=T 0 ,...,2

where oT 0 − T 0 = A, o1 + 1 = B, oi + i = oi−1 − i−1 , i = T 0 , . . . , 2.

Meanwhile, we define a pair of samples within the same internal as a tie following Zhou
et al. (2008), namely
xm = xn
(27)
for any o(xm ), o(xn ) ∈ Oi ,
where Oi = [oi − i , oi + i ), i = T 0 , T 0 − 1, . . . , 2 and O1 = [o1 − 1 , o1 + 1 ].
By assigning the same label yi to all samples within each interval Oi , we can obtain a
0
set of ordered class labels {yi }Ti=1 where y1 > y2 > . . . > yT 0 , namely

y(xi ) ≡ yi
(28)
if o(xi ) ∈ Oi .

To sum up, the preferences in terms of a continuous property (Definition 10) approximate
the preferences in terms of T 0 ordered classes.

Remark 12 The assumption of Eq. (27) is in line with real-world applications. People
usually cannot distinguish a pair of samples when the difference is small and would annotate
them as ties (Zhou et al., 2008), where i is the indistinguishable score difference for each

16
Generative Adversarial Ranking Nets

Oi . Nevertheless, for o(x1 ) ∈ Oi−1 and o(x2 ) ∈ Oi , x1 and x2 are indistinguishable when
o(x1 ) − o(x2 ) is small. Fortunately, assigning them as two adjacent ordered classes yi−1 and
yi is still reasonable because the order gap between yi−1 and yi is not significant when T 0 is
sufficiently large.

According to Theorem 11, learning from the preferences in terms of a continuous prop-
erty can be discretized as learning from the preferences over a finite number of ordered
classes. Therefore, we can adopt GARNet which defines an adversarial ranking goal (Eq. (6))
to learn the desired distribution from full preferences in terms of the continuous property.
We present a case study on LFW (Huang et al., 2008) w.r.t. the smile attribute in Fig. 5
to justify this. Simply, we discrete the continuous ranking score w.r.t. the smile attribute
over real images into five ordered classes. That is, images ranked top 20% in terms of the
smile attribute are assigned as top 1 class; images ranked between top 20% and top 40% is
assigned as top 2 class; so on and so forth. To label a generated sample, we first apply a
pretrained ranker for ranking face images w.r.t. the smile attribute to predict the sample’s
score and then determine the label as the corresponding class according to the score interval
in which the score is located. As shown in Fig. 5, the proportion of generated data from
GARNet is almost consistent with the top-1 probability calculated by the specified score
vector, i.e., q = σ(π(s)).
Similarly, we can adopt GARNet (Eq. (22)) to learn the desired distribution from partial
preferences in terms of the continuous property (see Section 7.2 for more empirical studies).
In practice, we do not explicitly discretize the continuous ranking scores to a finite number
of classes and define a so-called top-1 class as the most desirable data. Instead, we suppose
the generative model is more consistent with user preferences if the generated data has
higher ranking scores.

5. GARNet for a Mixture of Preferences

We extend our GARNet to a more general circumstance where a mixture of user preferences
are collected from distinct groups of users. Particularly, we consider a simple situation where
each group of users rank the samples in terms of a specific attribute, i.e., {Su }U u=1 . The
objective of GARNet in this context can be formulated as follows:
U h
X i
LL2R π s(R) (R)

sup E u , Ru su , (29a)
R:X →R u=1 su ∼Su
xg ∼Pg
U h
X i
LL2R π s(G) (G)

sup E u , R s
u u . (29b)
G:Z→X u=1 su ∼Su
xg ∼Pg

For simplicity, we extend the scalar output of the ranker R to a U dimension vector so as
to learn from {Su }U
u=1 simultaneously. Ru is the u-th output of the ranker.

Conditional GARNet When provided a mixture of preferences where one group of

preferences might be conflicting with another, GARNet can achieve a compromise perfor-
mance, which will be empirically verified in the experiment. However, for exactly opposing
attributes, like being both “open” and “not open”, a practical way is to model them using

17
Yao, Pan, Li, Tsang, Yao

two separate GARNet models or extending our GARNet to its conditional version, denoted
as CGARNet, conditioned on each group of preferences. The extension is similar as ex-
tending GAN to conditional GAN (Mirza and Osindero, 2014). Both the ranker and the
generator can be conditioned on an attribute label.

6. Discussion on GANs Extended to Preference-guided Generation

In this section, we discuss how current GANs variants can be extended to preference-guided
generation and their defects. To simplify the discussion, here we aim to learn the distribution
that is most consistent with user preferences, i.e., the distribution of top-1 data P1 .

Preferences guided generative adversarial learning To apply vanilla GAN (denoted

as GAN-0) (Goodfellow et al., 2014) for this application, an intuitive way is to first construct
a dataset consisting of the user-desired samples and then perform regular GAN training on
this dataset to obtain a generator that purely outputs the desired samples. Two possible
strategies can be applied:
(1) Selecting desired samples using partial preferences directly (GAN-1): A new dataset
is constructed by selecting top-1 samples from each partial preference sm . Since the top-
1 samples in partial preferences are not necessarily the global top-ranked samples, this
strategy produces biased training data that inevitably involves undesired samples. As a
result, the derived distribution is not precisely user-desired. On the other hand, it would
suffer from insufficient data issues when the amount of training data is small.
(2) Selecting desired samples with a ranking proxy (GAN-2): Specifically, we train a
global ranking proxy using partial preferences (Cao et al., 2007) and then apply the proxy
to select the global high-ranked samples as a new training dataset. For example, the top
30% of the training samples are selected as the desired dataset. In spite of its feasibility,
this strategy may also incur insufficient data issues especially when the desired data in the
original training dataset is limited.
Feedback GAN (FBGAN) (Gupta and Zou, 2019) is based on the second strategy and
remedies its drawbacks by introducing the high-ranked generated samples into previous
training data. Since the ratio of the high-ranked samples in the training data gradually
increases along with the training epoch, all training data will ideally converge to the user-
desired samples. However, FBGAN may suffer from severe data quality issues since plainly
treating the generated samples as the training data will degrade the generation quality.

GANs plus a ranking module Another idea is to introduce a ranker as an extra

critic (Saquil et al., 2018) to promote the generation of user-desired samples while the
original GAN’s discriminator is kept as a critic to guarantee the generation quality, which
is dubbed as GAN-RK.
Similarly, the ranker learns to rank samples from partial preferences and outputs high
ranking scores for user-desired samples. The discriminator can be a classifier that distin-
guishes real from fake (Goodfellow et al., 2014) or define a distribution discrepancy (Ar-
jovsky et al., 2017). For the generator, apart from the goal of aligning the generated dis-
tribution with the real distribution guided by the discriminator, an extra goal is in pursuit
of generating samples with high-ranking scores judged by the ranker. However, both goals
are conflicting since one requires the generator to synthesize samples alike the whole data

18
Generative Adversarial Ranking Nets

samples while the other requires the generation of partial data samples, i.e., samples with
high-ranking scores. Thus, learning the desired data distribution cannot be well achieved.
Generator with Naive Ranker Another naive approach is to just have a ranker that
learns from user preferences on real data to guide a generator, which is called GR. However,
such an approach would suffer from poor data quality issues since there is no modeling for
data authenticity in this model.

7. Experiments
We empirically verify that our GARNet can learn a data distribution that is best consistent
with user preferences. Furthermore, we demonstrate the potential of GARNet in improving
imbalanced class learning.
Dataset: (1) MNIST dataset (Lecun et al., 1998) consists of 28 × 28 images with digit
zero to nine. We use its training set (50K images) for experiment. We suppose that users
prefer smaller digits of MNIST. An image with a smaller digit value will be ranked higher.
For instance, the partial ranking list over four digits 1, 3, 2, 9 is s := 1 > 2 > 3 > 9. As there
are 10 digits in total, the maximal length of the preferences l can be 10. In particular, the
length of the preferences included in the training data varies from 7 to 10 so that GARNet
can converge to top-1 data distribution.
(2) Labeled Faces in the Wild (LFW) dataset (Huang et al., 2008) consists of 13, 143
celebrity face images from the wild. LFW-10 consists of a subset of 2, 000 images from
LFW along with about 500 pairwise comparisons for one attribute. We take the pairwise
comparisons in terms of the smile attribute as the user preferences for the training data.
Since LFW10 has limited pairwise comparisons, we exploit a pretrained ranker to augment
more preferences for the training data. In addition, as LFW10 has limited training images,
i.e., 1K images, we take 13, 143 images from LFW as our training data also. Specifically,
we pretrain a ranker to learn to rank images in terms of the smile attribute from given
pairwise comparisons of LFW10. Then we use the ranker to output ranking scores for all
training samples of LFW and construct pairwise preferences based on the scores. If face
image xa has larger smile than face image xb , then the preference is s := xa > xb .
(3) UT-Zap50K dataset (Yu and Grauman, 2014) contains 50, 025 shoe images from Zap-
pos.com. It contains pairwise comparisons over several attributes. We use all pairs, i.e.,
UT-Zap50K-1, UT-Zap50K-2, and UT-Zap50K-lexi as training data. The pairwise compar-
isons in terms of comfort (4, 483 pairwise comparisons), open (4, 351) and sporty (4, 085)
attributes are taken as the user preferences for the training data, respectively. We exploit a
pretrained ranker to augment more preferences for the training data. We pretrain a ranker
to rank images in terms of a specified attribute from given pairwise comparisons and then
construct pairwise preferences based on the scores evaluated by the ranker. If shoe image
xa is more comfort/open/sporty than shoe image xb , then the preference is s := xa > xb .
Baselines: We consider GAN (Goodfellow et al., 2014), FBGAN (Gupta and Zou,
2019), GAN-RK (Saquil et al., 2018) (GAN plus an additional ranker) and GR (a generator
with a ranker that is only trained with partial preferences) as our baselines. In terms of
GAN, it is trained with three kinds of subsets: the entire data, the subset made up of local
high-ranked samples (top-1 samples in the partial preferences), and the subset made up of
global high-ranked samples (e.g., top 10% for MNIST; top 50% for LFW and UT-Zap50K)

19
Yao, Pan, Li, Tsang, Yao

(a) GAN-0 (b) GAN-1 (c) GAN-2 (d) FBGAN

(e) GAN-RK (f) GR (g) GARNet

Figure 6: Comparison w.r.t. the generation of user-preferred digits (small digits) on MNIST.

selected by a surrogate ranker. These three variants are denoted as GAN-0, GAN-1, and
GAN-2, respectively.
Network & Hyperparameters: All methods are implemented based on WGAN-
GP architecture (DCGAN version) (Gulrajani et al., 2017) unless specifically mentioned.
For the training we use the Adam optimizer (Kingma and Ba, 2015) with learning rate
2 ∗ 10−4 and β1 = 0.5, β2 = 0.999. According to Proposition 5 and Corollary 7, we set
the ground-truth score vector for s(R) as π s(R) = [10 + 5(l − 1), 10 + 5(l − 2), . . . , 10, 0]
for all datasets, which can make GARNet learn a data distribution that is best consistent
(G) as
preferences as q1 ≈ 1. We simply set the ground-truth score vector for s
with user
(G)
π s = [10 + 5(l − 2), 10 + 5(l − 3), . . . , 10, 5, 0]. For MNIST, the batch size used is 50.
The training iteration is set to 100K. For LFW and UT-Zap50K, the batch size is 64. The
training iteration is 200K. The training images are resized to 32 × 32 unless specifically
mentioned. To rank images in terms of a certain attribute, we adopt a pairwise ranking loss
in Burges et al. (2005). The pretrained VGG16 (Simonyan and Zisserman, 2015) is used to
map an image to its ranking score.
Evaluation Metrics: (1) For MNIST, mean digit (MD) is adopted as a performance
measure for the generation of preferred digits. The digit values for the generated data will
be evaluated by a pretrained classifier for digit classification. For each method, we randomly
generate 50K digits and calculate their MD.
(2) For LFW and UT-Zap50K, mean score (MS) is adopted as a performance measure
for the generation of user-preferred face images. The scores are evaluated by a pretrained
ranker that learns to rank images in terms of the target attribute from the given preferences.
Frechet Inception Distance (FID) (Heusel et al., 2017) measures the visual quality w.r.t. the
high-ranked real face images. For each method, we randomly select 50K generated images
and conduct the evaluation.

20
Generative Adversarial Ranking Nets

7.1 Preferences w.r.t. Discrete Digits

In this section, we apply GARNet to generate the user-preferred digits on MNIST using
the partial preferences.

Table 2: Comparison of various methods on MNIST w.r.t. mean digit (MD, ↓). Best results
are in bold. FBGAN suffers from mode collapse (Fig. 6d). GR generates meaningless results
(Fig. 6f), so its MD is not collected.

Method Real GAN-0 GAN-1 GAN-2 GAN-RK GR FBGAN GARNet

MD 4.45 4.49 0.75 0.15 4.63 - 0.00 0.00

According to Fig. 6 and Table 2, we highlight that only our GARNet successfully learns
the distribution of the top-ranked digits from user preferences. Table 2 shows that except for
GAN-0, GAN-RK, and GR, other methods have smaller MD compared to the training data
(denoted as real). It means that those generative models can learn from user preferences
to some extent. However, only FBGAN and GARNet can converge to generate top-ranked
digits with MD 0.00. On the other hand, GAN-0 is the vanilla GAN that is trained with
all training data, so its MD is close to that of real images. GAN-RK fails to learn from
user preferences because of the conflict between the discriminator and ranker. GR cannot
get any meaningful generations because it does not involve a distribution matching between
real and generated samples. Therefore, its MD is not applicable.
Fig. 6 shows that: (1) GARNet generates top-ranked digits (digit 0), with high quality
as shown in Fig. 6g. (2) According to Fig. 6b, GAN-1 still generates digits that are not
ranked top, like digit two, since the constructed training subset contains undesired sam-
ples. (3) GAN-2 achieves better performance than GAN-1 but still fails to converge to the
distribution of the top-ranked digit. (4) FBGAN suffers from mode collapse of which the
generated digit zeros are exactly the same since its selection of high-ranked generated data
does not consider maintaining sample diversity.

7.2 Preferences w.r.t. a Continuous Attribute

As described in experiments on MNIST, GAN-1, GAN-2, FBGAN and our GARNet can
learn from the user preferences. Therefore, we only compare GARNet with these baselines
on generating user-preferred facial images on LFW w.r.t. the smile attribute as well as
generating user-preferred shoe images on UT-Zap50K w.r.t. the comfort, open and sporty
attribute, respectively.
We quantitatively evaluate the performance of learning from user preferences by mean
score (MS) and the generation quality by FID in Table 3. Meanwhile, we plot the score
density for generated samples of various methods and training samples in Fig. 7. The
evaluation results on LFW (smile) show that: GAN-1, GAN-2, and GARNet shift to a
distribution of data with larger scores, which means they can learn from user preferences
and generate face images with large smiles. (1) Our GARNet learns the best user-preferred
distribution, indicated by a best MS, along with the best image quality, indicated by a
lowest FID score. (2) GAN-1 and GAN-2 have poorer quality than GARNet. (3) FBGAN

21
Yao, Pan, Li, Tsang, Yao

suffers from mode collapse and its generated images have very poor quality (see Appendix),
which is verified by its large FID score. Similar results can be concluded on UT-Zap50K.
For further visual quantification, we place GARNet’s generated samples and training
samples (a.k.a. real samples) w.r.t. their scores on the smile axis in Fig. 8a. GARNet
mainly (over 95%) covers those real images ranked top 50% in terms of the smile attribute.
GARNet even generates images with higher scores than the maximum scores in the LFW
dataset, which shows its potential for generating data beyond existing rankings. We place
GARNet’s generated samples and training samples (a.k.a. real samples) w.r.t. their scores
on the comfort axis in Fig. 8b. Over 75% generated images cover those real images ranked
top 50% in terms of the comfort attribute. Results on the open and sporty attributes can
be seen in Appendix.

Table 3: Comparison on LFW face data and UT-Zap50K shoe data w.r.t. performance
measure (MS, ↑) and quality score (FID, ↓). The best results are highlighted in bold.
Since FBGAN suffers from mode collapse and poor image quality issues (large FID; see its
generated images in Appendix), its MS is not collected.

MS FID
Dataset
Real GAN-1 GAN-2 FBGAN GARNet GAN-1 GAN-2 FBGAN GARNet
LFW (smile) 0.59 1.41 1.14 - 2.29 47.55 37.40 89.36 22.22
UT-Zap50K (comfort) -5.06 -2.66 -0.91 - -0.90 49.20 77.99 253.53 39.16
UT-Zap50K (open) -0.36 0.93 0.99 - 3.75 63.92 112.93 265.49 46.78
UT-Zap50K (sporty) 4.09 5.28 4.48 - 7.07 41.02 84.59 431.44 32.87

7.3 User Control on a Mixture of Preferences

To demonstrate that GARNet can generalize to a mixture of user preferences, i.e., pref-
erences on multiple attributes, we conduct the experiments in terms of the comfort and
open attributes as the training data. To achieve a better generation, we apply a more ad-
vanced GAN architecture (Karras et al., 2020). Meanwhile, the training images are resized
to 64 × 64 for better resolution.
In Fig. 9, we visualize real images and generated images by ordering them using the
pretrained rankers defined over the open and comfort attributes, respectively. It is observed
that GARNet generates shoe images that are more comfort or more open. In addition, the
least comfort or the least open shoe images are not generated by GARNet as they are
disliked by users.
It is worth mentioning that GARNet achieves a compromise between a pair of moderate
conflicting attributes. Intuitively, open and comfort are conflicting attributes. For example,
sport shoes are thought comfortable but may be slightly open. As shown in Figure 9,
learning from preferences w.r.t. the open and comfort attributes, GARNet converges to
avoid generating samples ranked lowest in either of two attributes but would generate
relatively close and comfortable shoes or relatively uncomfortable and open shoes.

22
Generative Adversarial Ranking Nets

0.4 Real Real 0.10 Real

Real
Generated 0.15 Generated Generated
Generated 0.10
0.3
Density

Density

Density
Density
0.10
0.2 0.05
0.05
0.1 0.05
0.59 -0.36 4.09
1.41 -5.06 5.28
-2.66 0.93
0.0 2.5 0.0 2.5 5.0 0.00 0.00 40 0.00
40 20 0 20 20 0 20 25 0 25
Score Score Score Score

0.4 Real Real 0.10 Real

Real
Generated 0.15 Generated Generated
Generated 0.10
0.3
Density

Density

Density
Density
0.10
0.2 0.05
0.05
0.1 0.05
0.59 -5.06 -0.36 4.09
1.14 -0.91 0.99 4.48
0.0 2.5 0.0 2.5 5.0 0.00 0.00 40 0.00
40 20 0 20 20 0 20 25 0 25
Score Score Score Score
Real Real 0.10 Real
Real
0.4 Generated 0.15 Generated 0.10 Generated Generated
0.3
Density

Density

Density
Density

0.10
0.2 0.05
0.05
0.1 0.05
-0.36 4.09
0.59 2.29 -5.06 3.75 7.07
0.0 -0.90
2.5 0.0 2.5 5.0 0.00 0.00 40 20 0 20 0.00 25 0 25
40 20 0 20
Score Score Score Score

(a) smile (b) comfort (c) open (d) sporty

Figure 7: Comparison of GAN-1 (Row 1), GAN-2 (Row 2) and our GARNet (Row 3) in
terms of the score density. The green point denotes the mean score of generated samples
while the red point denotes that of real samples. Note the x-axis denotes ranking scores of
samples instead of ordered classes in Figure 5. Similar observations that a preferred class
has a larger proportion can be found if we coarsen the ranker scores into limited ordered
classes.

real percentile rank real percentile rank

5% 25% 50% 75% 95% 5% 25% 50% 75% 95%

comfort
smile

5% 25% 50% 75% 95% 5% 25% 50% 75% 95%

generated percentile rank generated percentile rank

(a) LFW (b) UT-Zap50K

Figure 8: Real images (32 × 32, above the axis) vs. generated images (GARNet, below the
axis) w.r.t. the score axis of attributes. The percentile rank of a given score is the percent-
age of scores in its frequency distribution that are less than that score. Real (generated)
percentile rank means calculated among real (generated) images.

23
Yao, Pan, Li, Tsang, Yao

Min Max Min Max

-54.6 -40.0 -25.4 -10.9 3.7
Text

18.3 -22.4 -15.3 -8.1

Text

-1.0 6.2 13.3

Min -31.1 comfort Min -19.2 comfort

-18.1 -9.3

-5.1 0.6

8.0 10.4 Text

21.0 20.3

Max 34.0 Max 30.2

open open
Real samples Generated samples

Figure 9: GARNet for multiple attributes. Images (64×64) are placed w.r.t. the “comfort”
and “open” score. The results are obtained by an advanced GAN variant, StyleGAN2-
ADA (Karras et al., 2020), thus having better quality.

Real Real
0.10 Generated 0.10 Generated
Density

Density

0.05 0.05
-0.36 -0.36
4.33 -15.07
0.00 40 20 0 20 0.00 40 20 0 20
Score Score

(a) open (b) not open

Figure 10: The generated samples (32 × 32) and the score density (open ↑, score ↑) for
CGARNet conditioned on open and not open attributes, respectively. The green point
denotes the mean score of generated samples while the red point denotes that of real samples.

7.3.1 Conditional GARNet

We apply our conditional GARNet (CGARNet) for exact opposing attributes, i.e., open
and not open.
Fig. 10 shows that: (1) When conditioned on the open attribute, CGARNet generates
shoe images with large open attribute values, with a score density that locates on a region
of large open score values. (2) When conditioned on the not open attribute, CGARNet
generates shoe images with small open attribute values, with a score density that locates
on a region of small open score values.

7.4 GARNet on preferences VS. GANs on Labels

In this section, we show that preference-guided generation outperforms generation condi-
tioned on labels in extremely imbalance class learning. Note conditional generation can
generate samples for the minority data, showing a promising application for imbalanced

24
Generative Adversarial Ranking Nets

100
0 0.1 0.1 -0.2
0.1 -0.1
1

80
0.1 0.1 -0.1 -0.1
2

-0.1 -0.1 0.1 0.1

5 4 3

60
0.1 0.1 -0.1 -0.1
Actual

-0.2 0.2 0.1 -0.1

40
-0.1 -0.1 -2.9 -3.8 8.9 -1.9 -0.1
6

0.2 -0.2 -0.1 0.1

20
0.1 0.1 -0.3 0.1
8

0.1 -0.1 -0.1 -0.4 0.5

0
0 1 2 3 4 5 6 7 8 9
Predicted

(a) Performance gain matrix (b) GARNet (c) Elastic-infoGAN

Figure 11: (a) GARNet boosts imbalance classification on MNIST with digit six as the
minority class. The gain matrix (zero is not presented) is obtained by C 0 -C, where C 0 and
C are the confusion matrix calculated on GARNet boosted data and original MNIST data,
respectively. The color denotes the confusion matrix (%) on original MNIST data. (b-c)
Visual results of GARNet and Elastic-infoGAN on MNIST with extremely imbalanced data.

data classification by promoting the minority via data augmentation in a pre-processing

manner.
Experimental setup: On the MNIST dataset (Lecun et al., 1998), we randomly pick
up 0.5% samples of the digit six to constitute the minority class. All samples of the rest
1
classes are retained. Note the imbalance ratio reaches to minor/major ≈ 200 . We construct
partial preferences simply by preferring digit six than any other class. We compare the
classification performance of a CNN classifier (CNN) without and with data augmentation
by GARNet.
Baselines: Elastic-infoGAN (Ojha et al., 2020) is a state-of-art generative model for
imbalanced data generation. Standard GAN trained only with minor class is excluded as
a baseline since there are insufficient samples (0.5% ∗ 4, 951 ≈ 25) for training a generative
model.
Fig. 11b shows that our GARNet successfully generates the user-desired data, i.e., digit
six. Though there are limited digit six images in the training set, we can construct sufficient
partial preferences between the minor class and the major classes. Therefore, our GARNet
can learn a direction of user preferences and generate the desired data. In contrast, Fig. 11c
shows that Elastic-infoGAN fails to generate digit six since it relies on class labels and the
samples of digit six are too limited.
Therefore, our GARNet can be used to improve imbalance class learning via data aug-
mentation but Elastic-infoGAN cannot. Fig. 11a shows the combined confusion matrix of
two settings: with augmentation by GARNet and without augmentation. The color in-
dicates the confusion matrix of CNN with imbalanced data, where every class has a high
purity except digit six which is a bit confusing with digit four, five, and eight. By replen-
ishing the digit six with GARNet, we train the same CNN architecture (CNN-GARNet). It
shows that the purity of digit six increases by 8.9%, demonstrating GARNet can efficiently
augment the minority data in imbalanced scenarios.

25
Yao, Pan, Li, Tsang, Yao

8. Conclusion

This paper presents a novel adversarial ranking framework, GARNet, to learn from user
preferences. In particular, we prove that GARNet is equivalent to optimizing a divergence
between the user-preferred data distribution (determined by the given score vector) and
the generated data distribution, which theoretically gurantees GARNet as a good estima-
tor of the desired data distribution. Meanwhile, we empirically show GARNet can obtain
corresponding distributions when different given score vectors are specified. Numerous
experiments demonstrate GARNet can learn the distribution that best matches user pref-
erences compared to various baselines. A study of imbalanced class learning validates the
advantage of preference-guided GARNet over GAN conditioned on labels.
GARNet is modeled under the homogeneity assumption of rank aggregation that only
one ground truth full ranking exists (Lin, 2010), which is in line with the actual situation
in many real-world applications – peer grading (Raman and Joachims, 2014), to image
rating (Liang and Grauman, 2014), document recommendation (Sellamanickam et al., 2011),
opinion analysis (Kim et al., 2015), and bioinformatics (Chatterjee et al., 2018). It is
possible to break such an assumption when the preferences are sampled from heterogeneous
user groups, i.e., a mixture of preferences: (1) There is one group of users dominating the
ranking. Then, the homogeneity assumption is still valid by simply taking the samples
from the majority of preferences as clean preferences and samples from the rest ones as
noisy preferences. Robust rank aggregation (Pan et al., 2022) can be incorporated into our
GARNet so as to handle these noisy preferences, which is left for future exploration. (2) The
preferences from different groups of users are balanced. The homogeneity assumption on
users is no longer valid and it is more reasonable and realistic to assume a heterogeneity of
users for an idealized scenario (Zhao et al., 2016), which is called mixture rank aggregation.
We conduct a preliminary exploration for a simple case where the group affiliation of each
preference is known. We leave the general case of mixture ranking for future work.

Acknowledgements

This work was supported by the A*STAR Centre for Frontier AI Research, the A*STAR
Career Development Fund (Grant No.C222812019), the A*STAR GAP project (Grant
No.I23D1AG079), the Program for Guangdong Introducing Innovative and Entrepreneurial
Teams (Grant No. 2017ZT07X386), NSFC (Grant No.62250710682), and the Program for
Guangdong Provincial Key Laboratory (Grant No. 2020B121201001).

Appendix A. Proof of Theorem 4

h P i
T
Proof Let R∗ = arg sup E f q
i=1 i R(si ) − R(x )
g be the best choice of R.
R:X →R s1 ∼P1
...
sT ∼PT
xg ∼Pg

#1 Prove that Df (Pu , Pg ) ≥ 0.

Let Rw (x) = c0 ∀x ∈ X (worst possible choice of R), where c0 is a constant.

26
Generative Adversarial Ranking Nets

T T
" # " #
X
∗ i ∗
X
w i w
1
Df (Pu , Pg ) = E f( qi R (s ) − R (xg )) ≥ E f( qi R (s ) − R (xg )) = 0
s1 ∼P1 s1 ∼P1
... i=1 ... i=1
sT ∼PT sT ∼PT
xg ∼Pg xg ∼Pg

PT (30)
1 is valid because i=1 qi = 1.

PT
#2 Prove that Pu = i=1 qi Pi = Pg =⇒ Df (Pu , Pg ) = 0.

T
" !#
X
∗ i ∗
Df (Pu , Pg ) = E f qi R (s ) − R (xg )
s1 ∼P1
... i=1
sT ∼PT
xg ∼Pg

T
" !#
1 X
≤ E f qi R∗ (si ) − E [R∗ (xg )]
s1 ∼P1 xg ∼Pg
... i=1
sT ∼PT
 
T
" #
2 X
qi R∗ (si ) − E [R∗ (xg )]
 
≤f  E

(31)
s1 ∼P1 xg ∼Pg 
... i=1
sT ∼PT
T
!
X
qi E R∗ (si ) − E [R∗ (xg )]

=f
si ∼Pi xg ∼Pg
i=1
T
!
3 X
∗ ∗
=f qi E [R (x)] − E [R (x)]
x∼Pi x∼Pg
i=1
T
!
4
Z X Z
=f R∗ (x) qi Pi dx − R∗ (x)Pg dx = f (0) = 0,
i=1

where 1 and 2 follow Jensen’s inequality for a concave function f . 3 is valid due to
change of variables. 4 is valid because Pg = Ti=1 qi Pi . Since Df (Pu , Pg ) ≥ 0 (Eq. (30)),
P
we have Df (Pu , Pg ) = 0.

PT
#3 Prove that Df (Pu , Pg ) = 0 =⇒ Pu = i=1 qi Pi = Pg .

We prove this by contraposition. Namely, we prove that Pu 6= Pg =⇒ Df (PRu , Pg ) 6= 0.

Let HR = {x|Pu (x) > Pg (x)}. SinceR Pu 6= Pg , we have H 6= ∅. RLet u = H Pu (x)dx and
v = H Pg (x)dx, we have (1 − u) = X \H Pu (x)dx and (1 − v) = X \H Pg (x)dx. According
to the definition, u > 0, v > 0, and u > v.

27
Yao, Pan, Li, Tsang, Yao

(
∆ if x ∈ H
Let R0 (x) = , where ∆ 6= 0, then we have
0 else
T T
" !# " !#
X 1 X
qi R0 (si ) − R0 (xg ) qi R0 (si ) − R0 (xg )

L(∆) = E f = E f
s1 ∼P1 s1 ∼P1
... i=1 ... i=1
sT ∼PT sT ∼PT
xg ∼Pg xg ∼Pg

T T
" #
2 X X
qi f (R0 (si ) − R0 (xg )) = qi f (R0 (si ) − R0 (xg ))

≥ E E
s1 ∼P 1 si ∼P
... i=1 i=1 x ∼Pi
g g
sT ∼P T
xg ∼Pg
T
3 X
qi f (R0 (xu ) − R0 (xg ))

= E
xu ∼Pi
i=1 xg ∼Pg
Z Z XT
= f (R0 (xu ) − R0 (xg ))( qi Pi (xu ))Pg (xg )dxu dxg (32)
X X i=1

4
Z Z
= f (R0 (xu ) − R0 (xg ))Pu (xu )Pg (xg )dxu dxg
Z XZ X

= f (R0 (xu ) − R0 (xg ))Pu (xu )Pg (xg )dxu dxg

H H Z Z
+ f (R0 (xu ) − R0 (xg ))Pu (xu )Pg (xg )dxu dxg
H X \H
Z Z
+ f (R0 (xu ) − R0 (xg ))Pu (xu )Pg (xg )dxu dxg
X \H H
Z Z
+ f (R0 (xu ) − R0 (xg ))Pu (xu )Pg (xg )dxu dxg
X \H X \H
= f (∆)u(1 − v) + f (−∆)v(1 − u),
PT
1 is valid because i=1 qi = 1. 2 follows Jensen’s inequality for concave function f . 3
is valid due to the change of variables. 4 is obtained by denoting Pu = Ti=1 qi Pi .
P
Since u(1−v) > v(1−u), ∃∆∗ > 0, s.t. L(∆∗ ) > 0 according to Lemma A.3 in Jolicoeur-
Martineau (2020). Thus, let ∆ = ∆∗ , we have Eq. (32) > 0. Therefore,
" T # " T #
X X
Df (Pu , Pg ) = E f( qi R∗ (si ) − R∗ (xg )) ≥ E f( qi R0 (si ) − R0 (xg )) > 0.
s1 ∼P1 s1 ∼P1
... i=1 ... i=1
sT ∼PT sT ∼PT
xg ∼Pg xg ∼Pg

The proof is completed.

Appendix B. Additional Experimental Results

28
Generative Adversarial Ranking Nets

real percentile rank

5% 25% 50% 75% 95% real percentile rank
5% 25% 50% 75% 95%

open
sporty

5% 25% 50% 75% 95% 5% 25% 50% 75% 95%

generated percentile rank generated percentile rank

(a) UT-Zap50K open (b) UT-Zap50K sporty

Figure 12: Real images (32 × 32, above the axis) vs. generated images (GARNet, below the
axis) w.r.t. the score axis of attributes. The percentile rank of a given score is the percent-
age of scores in its frequency distribution that are less than that score. Real (generated)
percentile rank means calculated among real (generated) images.

(a) smile (b) comfort (c) open (d) sporty

Figure 13: Visual results (32 × 32, mode collapse) of FBGAN on LFW (smile) and UT-
Zap50K (comfort, open and sporty).

29
Yao, Pan, Li, Tsang, Yao

(a) smile

(b) comfort

(d) sporty

Figure 14: Verification of GARNet generating novel images (32 × 32) (generated
images not exactly alike any training images) on LFW (smile) and UT-Zap50K (comfort,
open and sporty). Nearest neighbors are measured by the `2 distance between images. The
first column is generated samples from GARNet. The other columns are the generated
samples’ nearest neighbors in the training dataset.

30
Generative Adversarial Ranking Nets

References
Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial
networks. In International Conference on Machine Learning, pages 214–223, 2017.

Siddarth Asokan and Chandra Sekhar Seelamantula. Teaching a GAN what not to learn.
In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and
Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems, 2020.

Christopher J. C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamil-
ton, and Gregory N. Hullender. Learning to rank using gradient descent. In International
Conference on Machine learning, 2005.

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from
pairwise approach to listwise approach. In International Conference on Machine learning,
pages 129–136, 2007.

Sujoy Chatterjee, Anirban Mukhopadhyay, and Malay Bhattacharyya. A weighted rank

aggregation approach towards crowd opinion analysis. Knowledge-Based Systems, 149:
47–60, 2018.

Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. Pairwise ranking
aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international
conference on Web search and data mining, pages 193–202, 2013.

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei.
Deep reinforcement learning from human preferences. In Advances in Neural Information
Processing Systems, pages 4299–4307, 2017.

Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and Russ R Salakhutdinov. Good
semi-supervised learning that requires a bad gan. Advances in neural information pro-
cessing systems, 30, 2017.

Xin Ding, Yongwei Wang, Zuheng Xu, William J Welch, and Z Jane Wang. Ccgan: Contin-
uous conditional generative adversarial networks for image generation. In International
conference on learning representations, 2020.

Jesse H. Engel, Matthew D. Hoffman, and Adam Roberts. Latent constraints: Learning to
generate conditionally from unconditional generative models. In International Conference
on Learning Representations, 2018.

Johannes Fürnkranz and Eyke Hüllermeier. Preference learning. In Claude Sammut and
Geoffrey I. Webb, editors, Encyclopedia of Machine Learning and Data Mining, pages
1000–1005. Springer, 2017.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in
Neural Information Processing Systems, 27:2672–2680, 2014.

31
Yao, Pan, Li, Tsang, Yao

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C
Courville. Improved training of wasserstein gans. In Advances in Neural Information
Processing Systems, pages 5767–5777, 2017.
Anvita Gupta and James Zou. Feedback gan for dna optimizes protein functions. Nature
Machine Intelligence, 1(2):105, 2019.
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp
Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equi-
librium. Advances in neural information processing systems, 30, 2017.
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint
arXiv:2207.12598, 2022.
Gary B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. Labeled Faces in
the Wild: A Database forStudying Face Recognition in Unconstrained Environments. In
Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, 2008.
Jian Huang, Yuling Jiao, Zhen Li, Shiao Liu, Yang Wang, and Yunfei Yang. An error
analysis of generative adversarial networks for learning distributions. Journal of Machine
Learning Research, 23(116):1–43, 2022.
Harold Jeffreys, Bertha Jeffreys, and Bertha Swirles. Methods of mathematical physics.
Cambridge university press, 1999.
Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from
standard GAN. In International Conference on Learning Representations, 2019.
Alexia Jolicoeur-Martineau. On relativistic f-divergences. In International Conference on
Machine Learning, pages 4931–4939. PMLR, 2020.
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo
Aila. Training generative adversarial networks with limited data. In Advances in Neural
Information Processing Systems, 2020.
MinJi Kim, Farzad Farnoud, and Olgica Milenkovic. Hydra: gene prioritization via hybrid
distance-score rank aggregation. Bioinform., 31(7):1034–1043, 2015.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In
International Conference on Learning Representations, 2015.
Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik
Sankaranarayanan, and Harish G Ramaswamy. On controllable sparse alternatives to
softmax. Advances in neural information processing systems, 31, 2018.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to docu-
ment recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.
Xiao Li, Chenghua Lin, Chaozheng Wang, and Frank Guerin. Latent space factorisation
and manipulation via matrix subspace projection. International Conference on Machine
Learning, 2020.

32
Generative Adversarial Ranking Nets

Lucy Liang and Kristen Grauman. Beyond comparing image pairs: Setwise active learning
for relative attributes. In Proceedings of the IEEE conference on Computer Vision and
Pattern Recognition, pages 208–215, 2014.

Shili Lin. Rank aggregation methods. Wiley Interdisciplinary Reviews: Computational

Statistics, 2(5):555–570, 2010.

Tyler Lu and Craig Boutilier. Learning mallows models with pairwise preferences. In
International Conference on Machine Learning, pages 145–152, 2011.

Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smol-
ley. Least squares generative adversarial networks. In International Conference on Com-
puter Vision, pages 2794–2802, 2017.

Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint
arXiv:1411.1784, 2014.

Takeru Miyato and Masanori Koyama. cgans with projection discriminator. In International
Conference on Learning Representations, 2018.

Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis
with auxiliary classifier gans. In International conference on machine learning, pages
2642–2651, 2017.

Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, and Yong Jae Lee. Elastic-infogan:
Unsupervised disentangled representation learning in class-imbalanced data. In Advances
in Neural Information Processing Systems, 2020.

Yuangang Pan, Bo Han, and Ivor W Tsang. Stagewise learning for noisy k-ary preferences.
Machine Learning, 107(8):1333–1361, 2018.

Yuangang Pan, Ivor W Tsang, Weijie Chen, Gang Niu, and Masashi Sugiyama. Fast and
robust rank aggregation against model misspecification. J. Mach. Learn. Res., 23:23–1,
2022.

Devi Parikh and Kristen Grauman. Relative attributes. In International Conference on

Computer Vision, pages 503–510. IEEE, 2011.

Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, and Vittorio Ferrari. C-flow:
Conditional generative flow models for images and 3d point clouds. In IEEE Conference
on Computer Vision and Pattern Recognition, pages 7949–7958, 2020.

Karthik Raman and Thorsten Joachims. Methods for ordinal peer grading. In International
Conference on Knowledge Discovery and Data Mining, pages 1037–1046, 2014.

Bidisha Samanta, Abir De, Gourhari Jana, Vicenç Gómez, Pratim Kumar Chattaraj, Niloy
Ganguly, and Manuel Gomez-Rodriguez. Nevae: A deep generative model for molecular
graphs. The Journal of Machine Learning Research, 21(1):4556–4588, 2020.

33
Yao, Pan, Li, Tsang, Yao

Yassir Saquil, Kwang In Kim, and Peter M. Hall. Ranking cgans: Subjective control over
semantic image attributes. In British Machine Vision Conference, page 131. BMVA Press,
2018.

Sundararajan Sellamanickam, Priyanka Garg, and Sathiya Keerthi Selvaraj. A pairwise

ranking based approach to learning with positive and unlabeled examples. In Proceedings
of the 20th ACM international conference on Information and knowledge management,
pages 663–672, 2011.

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition. In International Conference on Learning Representations, 2015.

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation
using deep conditional generative models. Advances in Neural Information Processing
Systems, 28:3483–3491, 2015.

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data
distribution. In Advances in Neural Information Processing Systems 32: Annual Con-
ference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14,
2019, Vancouver, BC, Canada, pages 11895–11907, 2019.

James Stewart, Daniel K Clegg, and Saleem Watson. Multivariable calculus. Cengage
Learning, 2020.

Larissa T Triess, Andre Bühler, David Peter, Fabian B Flohr, and Marius Zöllner. Point
cloud generation with continuous conditioning. In International Conference on Artificial
Intelligence and Statistics, pages 4462–4481. PMLR, 2022.

Ze Wang, Seunghyun Hwang, Zichen Miao, and Qiang Qiu. Image generation using contin-
uous filter atoms. Advances in Neural Information Processing Systems, 34:17826–17838,
2021.

Christian Wirth, Riad Akrour, Gerhard Neumann, Johannes Fürnkranz, et al. A survey of
preference-based reinforcement learning methods. Journal of Machine Learning Research,
18(136):1–46, 2017.

Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. Listwise approach to
learning to rank: theory and algorithm. In Proceedings of the 25th international confer-
ence on Machine learning, pages 1192–1199, 2008.

Yinghua Yao, Yuangang Pan, Ivor W Tsang, and Xin Yao. Differential-critic gan: Generat-
ing what you want by a cue of preferences. IEEE Transactions on Neural Networks and
Learning Systems, 2022.

A. Yu and K. Grauman. Fine-grained visual comparisons with local learning. In IEEE

Conference on Computer Vision and Pattern Recognition, Jun 2014.

Zac Yu and Adriana Kovashka. Syntharch: Interactive image search with attribute-
conditioned synthesis. In Conference on Computer Vision and Pattern Recognition,
CVPR Workshops, pages 645–654, 2020.

34
Generative Adversarial Ranking Nets

Zhibing Zhao, Peter Piech, and Lirong Xia. Learning mixtures of plackett-luce models. In
International Conference on Machine Learning, pages 2906–2914. PMLR, 2016.

Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. Learning to rank with ties. In
Proceedings of the 31st annual international ACM SIGIR conference on Research and
development in information retrieval, pages 275–282, 2008.

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image
translation using cycle-consistent adversarial networks. In International Conference on
Computer Vision, pages 2223–2232, 2017.

Exploring Generative Adversarial Networks: Applications of GANs in AI and Big Data
No ratings yet
Exploring Generative Adversarial Networks: Applications of GANs in AI and Big Data
8 pages
Generative Adversarial Networks - A Literature Review
No ratings yet
Generative Adversarial Networks - A Literature Review
23 pages
Evolutionary Generative Adversarial Networks
No ratings yet
Evolutionary Generative Adversarial Networks
14 pages
10 23919@CCC50068 2020 9188912
No ratings yet
10 23919@CCC50068 2020 9188912
6 pages
2019 - Data Augmentation Using GANs - Fabio Henrique
No ratings yet
2019 - Data Augmentation Using GANs - Fabio Henrique
16 pages
Generative Adversarial Network An Overview of Theory and Applications
No ratings yet
Generative Adversarial Network An Overview of Theory and Applications
9 pages
Generative Adversarial Nets Overview
No ratings yet
Generative Adversarial Nets Overview
9 pages
A Comprehensive Survey and Analysis of Generative Models in Machine Learning - CSR 2020
No ratings yet
A Comprehensive Survey and Analysis of Generative Models in Machine Learning - CSR 2020
29 pages
GANs for AI Researchers
No ratings yet
GANs for AI Researchers
57 pages
NIPS 2016: Generative Adversarial Networks
No ratings yet
NIPS 2016: Generative Adversarial Networks
57 pages
Boundary-Seeking GANs for Discrete Data
No ratings yet
Boundary-Seeking GANs for Discrete Data
17 pages
GAN Project Report
No ratings yet
GAN Project Report
13 pages
1 s2.0 S2666307422000080 Main
No ratings yet
1 s2.0 S2666307422000080 Main
12 pages
Generative Adversarial Nets:Optimizations and Functioning: Motivation
No ratings yet
Generative Adversarial Nets:Optimizations and Functioning: Motivation
5 pages
Invertible Conditional GANs For Image Editing
No ratings yet
Invertible Conditional GANs For Image Editing
9 pages
Applsci 14 07438
No ratings yet
Applsci 14 07438
23 pages
9 Gans
No ratings yet
9 Gans
9 pages
GFlowNet: Diverse Candidate Generation
No ratings yet
GFlowNet: Diverse Candidate Generation
25 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
10 pages
Deconstructing Generative Adversarial Networks
No ratings yet
Deconstructing Generative Adversarial Networks
25 pages
What Are Generative Adversarial Networks - 2
No ratings yet
What Are Generative Adversarial Networks - 2
20 pages
GAN Variants: A Comprehensive Survey
No ratings yet
GAN Variants: A Comprehensive Survey
8 pages
Fitee 1700786 PDF
No ratings yet
Fitee 1700786 PDF
10 pages
Diffusion Recommender Model: Wenjie Wang Yiyan Xu Fuli Feng
No ratings yet
Diffusion Recommender Model: Wenjie Wang Yiyan Xu Fuli Feng
10 pages
Generative Adversarial Network For Class-Conditional Data Augmentation
No ratings yet
Generative Adversarial Network For Class-Conditional Data Augmentation
15 pages
Generatebioinformaticsdatausing Generative Adversarial Network AReview
No ratings yet
Generatebioinformaticsdatausing Generative Adversarial Network AReview
12 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
4 pages
Paper4 (GAN)
No ratings yet
Paper4 (GAN)
24 pages
Generating Realistic Users Using Generative Adversarial Network With Recommendation-Based Embedding
No ratings yet
Generating Realistic Users Using Generative Adversarial Network With Recommendation-Based Embedding
10 pages
Game of GANs Game Theoretical Models For Generativ
No ratings yet
Game of GANs Game Theoretical Models For Generativ
16 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
6 pages
Intro to Generative Adversarial Networks
No ratings yet
Intro to Generative Adversarial Networks
31 pages
Deep Generative Adversarial Networks For Image-To
No ratings yet
Deep Generative Adversarial Networks For Image-To
26 pages
Generative Adversarial Networks (Gans) : An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments
No ratings yet
Generative Adversarial Networks (Gans) : An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments
17 pages
GAN Comparison
No ratings yet
GAN Comparison
10 pages
Gan Journal: April 2021
No ratings yet
Gan Journal: April 2021
6 pages
Reinformcement Learning For Generative Ai
No ratings yet
Reinformcement Learning For Generative Ai
35 pages
GANs: A Decade of Innovation
No ratings yet
GANs: A Decade of Innovation
36 pages
AdvGAN: Enhancing GANs with Adversarial Training
No ratings yet
AdvGAN: Enhancing GANs with Adversarial Training
12 pages
Report: Trends in Generative Models
No ratings yet
Report: Trends in Generative Models
10 pages
Project3-Arc1 1
No ratings yet
Project3-Arc1 1
7 pages
Deep Generative Image Models Using A Laplacian Pyramid of Adversarial Networks
No ratings yet
Deep Generative Image Models Using A Laplacian Pyramid of Adversarial Networks
10 pages
Recent Progress On Generative Adversarial Networks GANs A Survey
No ratings yet
Recent Progress On Generative Adversarial Networks GANs A Survey
12 pages
Generating Users Desired Face Image Using The Conditional Generative Adversarial Network and Relevance Feedback
No ratings yet
Generating Users Desired Face Image Using The Conditional Generative Adversarial Network and Relevance Feedback
11 pages
Preliminay Report For Projrct Bayesian Deep Learning and Recommender Systems
No ratings yet
Preliminay Report For Projrct Bayesian Deep Learning and Recommender Systems
2 pages
GAN Visualization for Experts
No ratings yet
GAN Visualization for Experts
13 pages
Controllable Data Generation by Deep Learning: A Review
No ratings yet
Controllable Data Generation by Deep Learning: A Review
55 pages
A Survey On Generative Adversarial Networks (GANs)
No ratings yet
A Survey On Generative Adversarial Networks (GANs)
5 pages
GANs: A Comprehensive Review
No ratings yet
GANs: A Comprehensive Review
6 pages
Deep Learning Models
No ratings yet
Deep Learning Models
18 pages
AA2 3.3.1 Generative Explicit Density 2024
No ratings yet
AA2 3.3.1 Generative Explicit Density 2024
43 pages
Visual Reinforcement Learning With Imagined Goals: Equal Contribution. Order Was Determined by Coin Flip
No ratings yet
Visual Reinforcement Learning With Imagined Goals: Equal Contribution. Order Was Determined by Coin Flip
15 pages
Evasion Generative Adversarial Network For Low Data Regimes
No ratings yet
Evasion Generative Adversarial Network For Low Data Regimes
13 pages
Paper 8
No ratings yet
Paper 8
26 pages
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
No ratings yet
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
17 pages
ML Report
No ratings yet
ML Report
9 pages
Generative Adversarial Networks Review in Earthquake-Related Engineering Fields
No ratings yet
Generative Adversarial Networks Review in Earthquake-Related Engineering Fields
52 pages
Impact of Classification Difficulty On The Weight Matrices Spectra in Deep Learning and Application To Early-Stopping
No ratings yet
Impact of Classification Difficulty On The Weight Matrices Spectra in Deep Learning and Application To Early-Stopping
40 pages
On Truthing Issues in Supervised Classification: Jonathan K. Su
No ratings yet
On Truthing Issues in Supervised Classification: Jonathan K. Su
91 pages
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems From Discrete Observations
No ratings yet
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems From Discrete Observations
50 pages
A Variational Approach To Bayesian Phylogenetic Inference: Cheng Zhang
No ratings yet
A Variational Approach To Bayesian Phylogenetic Inference: Cheng Zhang
56 pages
Statistical Inference For Fairness Auditing: John J. Cherian
No ratings yet
Statistical Inference For Fairness Auditing: John J. Cherian
49 pages
Omnisafe: An Infrastructure For Accelerating Safe Reinforcement Learning Research
No ratings yet
Omnisafe: An Infrastructure For Accelerating Safe Reinforcement Learning Research
6 pages
High Voltage Solutions in HEV EV Part I - On Board Chargers and Charging Stations
No ratings yet
High Voltage Solutions in HEV EV Part I - On Board Chargers and Charging Stations
21 pages
Apple Inc.: History and Talent Management
No ratings yet
Apple Inc.: History and Talent Management
18 pages
VBL Sip Report
100% (1)
VBL Sip Report
24 pages
CIMB Account Statement Notice
No ratings yet
CIMB Account Statement Notice
2 pages
Analysis of An Accident in A Chemical Plant - Vizag Gas Leak
No ratings yet
Analysis of An Accident in A Chemical Plant - Vizag Gas Leak
3 pages
Disposal Instruction For Settlement of Inward Remittance
No ratings yet
Disposal Instruction For Settlement of Inward Remittance
3 pages
UGRD-IT6208 System Integration and Architecture 1 FINAL LAB EXAM
No ratings yet
UGRD-IT6208 System Integration and Architecture 1 FINAL LAB EXAM
41 pages
MPPSC 2020 ICT Notes Overview
No ratings yet
MPPSC 2020 ICT Notes Overview
93 pages
RTK UC625 Programmable Alarm System
No ratings yet
RTK UC625 Programmable Alarm System
4 pages
CS-T180 User Manaul - 1038111 - 2018-06
No ratings yet
CS-T180 User Manaul - 1038111 - 2018-06
322 pages
Types and Selection of Chillers
No ratings yet
Types and Selection of Chillers
11 pages
QST 30 Maintenance Manual
100% (1)
QST 30 Maintenance Manual
25 pages
Business Plan
50% (2)
Business Plan
38 pages
Newsela - Costumes, Candy and Pumpkins Make Halloween Shopping Fun For Everyone
No ratings yet
Newsela - Costumes, Candy and Pumpkins Make Halloween Shopping Fun For Everyone
1 page
1-11-B 1. Tools and Equipment: Deadline Anchor Maintenance
100% (2)
1-11-B 1. Tools and Equipment: Deadline Anchor Maintenance
2 pages
The Finite Segment Method-A Numerical Rolling Contact Fatigue Life Model For Bearings Subjected To Stochastic Operating Conditions
No ratings yet
The Finite Segment Method-A Numerical Rolling Contact Fatigue Life Model For Bearings Subjected To Stochastic Operating Conditions
9 pages
PPE Investments Working Paper
No ratings yet
PPE Investments Working Paper
15 pages
ANFIS Fuzzy Rule Network in R
No ratings yet
ANFIS Fuzzy Rule Network in R
12 pages
Leuze HRTR 2
No ratings yet
Leuze HRTR 2
4 pages
SQD Transformadores de Media Tensión
No ratings yet
SQD Transformadores de Media Tensión
16 pages
EconUnited - Unit 4 of Macroeconomics Slides
No ratings yet
EconUnited - Unit 4 of Macroeconomics Slides
43 pages
Homeopathic Medicine Price List
No ratings yet
Homeopathic Medicine Price List
10 pages
Mil Organization Ms-1
100% (2)
Mil Organization Ms-1
17 pages
Door and Window Schedule Details
No ratings yet
Door and Window Schedule Details
6 pages
Revised - Strategic Marketing BBA PDF
No ratings yet
Revised - Strategic Marketing BBA PDF
9 pages
Chapter11 PDF
No ratings yet
Chapter11 PDF
39 pages
Construction Cost Estimating Guide
100% (1)
Construction Cost Estimating Guide
59 pages
Dla400020300 Nangue Et Fils
No ratings yet
Dla400020300 Nangue Et Fils
1 page
Part VII. Miscellaneous Topics Module 1: Recovery and Recrystallisation
No ratings yet
Part VII. Miscellaneous Topics Module 1: Recovery and Recrystallisation
7 pages
Binder BD 115 Gravity Convection Incubator Lab OvenBrochure
No ratings yet
Binder BD 115 Gravity Convection Incubator Lab OvenBrochure
5 pages

Generative Adversarial Ranking Nets: Yinghua Yao

Uploaded by

Generative Adversarial Ranking Nets: Yinghua Yao

Uploaded by

Journal of Machine Learning Research 25 (2024) 1-35 Submitted 4/23; Revised 3/24; Published 4/24

Generative Adversarial Ranking Nets

Yinghua Yao∗ [email protected]

Yuangang Pan† [email protected]

Ivor W. Tsang [email protected]

Xin Yao [email protected]

Editor: Matthew Hoffman

2. Background and Problem Statement

2.1 Property-oriented Generation by Labels/Evaluators

2.2 Our Problem: Property-oriented Generation by Preferences

s := s1 > s2 > . . . > sl , and si ∈ X , ∀i = 1, 2, . . . , l, (1)

where 2 ≤ l  N denotes the number of samples contained in s. l is usually varied for

Training data Generated data

Figure 1: Illustration of user-preferred data distribution learning guided by user preferences

2.2.1 Controlled by Preferences VS. Conditioned on Labels

3. GARNet for Preferences w.r.t. a Discrete Property

xi1 > xi2 > xi3 > . . . ,

target preference for ranker

preference over a subset of

Figure 2: Framework of GARNet.

3.1 Given Full Preferences

s := s1 > s2 . . . > sl , l = T where si ∈ X , ∀i = 1, 2, . . . , l. (3)

y(si ) ≡ yi , ∀s ∈ S and ∀i = 1, 2, . . . , T. (4)

Denoting Pi as the distribution of data Xi = {x|y(x) = yi , x ∈ X }, we have si ∼ Pi .

3.1.1 Listwise Ranking

With a slight abuse of notation, s = (s1 , s2 , . . . , sl ).

3.1.2 Generative Adversarial Ranking

Algorithm 1 Generative Adversarial Ranking Nets

target preference for R s(R) := s1 > . . . > sT > xg , (6a)

where B is the batch size. The training algorithm is summarized in Algorithm 1.

Definition 2 (Divergence) Let P ∈ S and Q ∈ S be probability distributions where S is

Theorem 3 (Relativistic f -divergence (Jolicoeur-Martineau, 2020)) Let f : R →

Theorem 3 demonstrates the optimal discriminator in relativistic GAN (Jolicoeur-Martineau,

1 is valid if π(s) = [0, −∞] and s := x > x̃ where x ∼ P and x̃ ∼ Q.

distribution of Xi = {x|y(x) = yi , x ∈ X }. The mixture distribution Pu allocates a larger

Theorem 4 (Relativistic f -divergence between Pu and Pg ) Let f : R → R be a con-

is a relativistic f -divergence between Pu and Pg .

According to Definition 2, we prove that Df (Pu , Pg ) is a divergence between Pu and Pg

#3 Prove that Df (Pu , Pg ) = 0 =⇒ Pu = Ti=1 qi Pi = Pg .

Proof First of all, given the definition of π s(R) , we have

According to Eq. (5), we have

where cj is a constant ∀j = 1, 2, . . . , T , exclusively determined by the pre-specified score

Therefore, the objective for the ranker R can be approximated as follows:

Based on Proposition 5, we set a and b to be 10and 0, respectively. The common

Proposition 8 Given the optimal ranker R∗ as demonstrated in Proposition 5, the genera-

We update the parameters of generator θG as follows:

1.0 1.0 1.0 1.0

0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9

3.2 Given Partial Preferences

s := s1 > s2 > . . . > sl , l ≤ T. (21)

ranker s(R) := s1 > . . . > sl > xg ; (22a)

1.0 1.0 1.0 1.0

0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9 0.0 0 1 2 3 4 5 6 7 8 9

4. GARNet for Preferences w.r.t. a Continuous Property

Definition 10 (Preferences w.r.t. a continuous property) Given a training dataset

xi1 > xi2 > xi3 > . . . ,

Theorem 11 Given a collection of preferences in terms of a continuous property over

Proof According to Heine-Borel theorem (Jeffreys et al., 1999), we have

O = {(oi − i , oi + i )|oi ∈ [A, B], i > 0, i = 1, 2, . . . , T 0 }, (24)

1.00 1.00 1.00 1.00

0.25 0.25 0.25 0.25

0.00 1 2 3 4 5 0.00 1 2 3 4 5 0.00 1 2 3 4 5 0.00 1 2 3 4 5

(a) d = 0.1 (b) d = 0.5 (c) d = 1 (d) d = 5

which is a finite subcover of [A, B]. Without loss of generality, we assume

o1 > o2 > . . . > oT 0 . (25)

where oT 0 − T 0 = A, o1 + 1 = B, oi + i = oi−1 − i−1 , i = T 0 , . . . , 2.

5. GARNet for a Mixture of Preferences

Conditional GARNet When provided a mixture of preferences where one group of

6. Discussion on GANs Extended to Preference-guided Generation

Preferences guided generative adversarial learning To apply vanilla GAN (denoted

GANs plus a ranking module Another idea is to introduce a ranker as an extra

(a) GAN-0 (b) GAN-1 (c) GAN-2 (d) FBGAN

(e) GAN-RK (f) GR (g) GARNet

7.1 Preferences w.r.t. Discrete Digits

Method Real GAN-0 GAN-1 GAN-2 GAN-RK GR FBGAN GARNet

7.2 Preferences w.r.t. a Continuous Attribute

where 2 ≤ l N denotes the number of samples contained in s. l is usually varied for

O = {(oi − i , oi + i )|oi ∈ [A, B], i > 0, i = 1, 2, . . . , T 0 }, (24)

where oT 0 − T 0 = A, o1 + 1 = B, oi + i = oi−1 − i−1 , i = T 0 , . . . , 2.