Information Technology Reference
InDepth Information
=
where
g
q
is any signpreserving function, such as
g
q
(x)
α
q
x
, and
α
q
is a query
dependent positive constant.
It is clear that
E
[

]
represents the lack of compatibility between the relative
document orderings given by
π
and those given by
s
, with more positive energy
indicating less compatibility.
Using the energy function, we can now define the conditional Boltzmann distri
bution over document permutations by exponentiating and normalizing, as follows:
π
s
Z
exp
−
]
,Z
exp
−
]
.
1
P(π

s
)
=
E
[
π

s
=
E
[
π

s
(4.29)
π
s
)
and
Z
exactly since both of them contain
sums over exponentially many permutations. In practice, however, efficient approx
imations of these quantities allow inference and learning in the model.
After defining the permutation probability as above, one can follow the idea in
ListMLE [
31
] to define the loss function as the negative log likelihood of the ground
truth permutation, or follow the idea in ListNet [
4
] to define the loss function as
the KL divergence between the permutation probability distribution given by the
scoring function
f
and that given by the ground truth labels.
In addition, it is also possible to extend the above idea to optimize evaluation
measures [
30
]. For example, suppose the evaluation measure is NDCG, then its
expected value with respect to all the possible permutations can be computed as
follows:
Note that it is costly to compute
P(π

E
[
NDCG
]=
P(π

s
)
NDCG
(π,
y
).
(4.30)
π
s
)
in the above formula is a function of the ranking model
f
, and
it is continuous and differentiable, one can use simple gradient based methods to ef
fectively optimize (
4.30
). In order to avoid over fitting, one can also use the linear
combination of the expected NDCG and the KL divergence loss to guide the opti
mization process. Experimental results have shown that in this way, the BoltzRank
method can outperform many other ranking methods, such as AdaRank [
32
] and
ListNet [
4
].
Since only
P(π

4.4 Summary
As shown in this chapter, different kinds of listwise ranking algorithms have been
proposed. Intuitively speaking, they model the ranking problem in a more natural
way than the pointwise and pairwise approaches, and thus can address some prob
lems that these two approaches have encountered. As we have discussed in the previ
ous chapters, for the pointwise and pairwise approaches, the positional information
is invisible to their loss functions, and they ignore the fact that some documents
(or document pairs) are associated with the same query. Comparatively speaking,