Goal:
1. To
solve the problem of efficiently allocating the sponsored links and
simultaneously estimating the CTR.
2. The
task is to strike a balance between exploring, i.e., showing a link
to get a better estimate of its CTR, and exploiting, i.e., showing
links that have the best performance, according to our current estimates
of the CTRs.
3. In the
mechanism, an advertiser (buyer) has an incentive to increase his bid by
some amount, which we call the value of learning.
Model and Notations:
1. We
consider a setting where n bidders (or candidate sponsored links) are
competing to be placed in one of the m slots.
2. Assume
a separable model for CTR, i.e., if the link of advertiser i is
displayed in position j, it will be clicked on with probability γjλi.
Here γj is associated with each slot j=1,…,m (called
the position bias of slot j), and λi is determined by each
advertiser (called the clickability of this advertiser).
3. Assume
that the slots are numbered in decreasing order of their position bias, i.e., γ1≥γ2≥…γm.
The exact clickability of advertisers are not known, and the estimate of the
clickability of advertiser i is denoted by λiestimate.
4. bi:
the bid of the i-th buyer
5. pi:
the price of the i-th buyer pays
6. vi:
the value of the i-th buyer per click
Randomization Step:
1. In
a repeated second-price auction without exploration, when there is uncertainty
about the clickability of the advertiser, it is no longer the buyer’s interest
to bid his value per click. Specifically, the buyer has the incentive to
increase his bid in order to induce the mechanism to explore him.
2. Randomization:
assume the clickability λi is distributed according to a prior
distribution Dλ. Each time before the ranking begins, the i-th
advertiser’s clickability λi is picked from the prior distribution.
It gives the buyers who have low clickability (which may be a result of
insufficient traffic) a chance to prove themselves.
Note:
Different buyers may have different prior distribution Dλ
Value of Learning:
1. How
to estimate the clickability of a buyer: λiestimate=ci/eci,
where ci is total clicks the i-th buyer has receives (for all
positions), and eci (cumulative expected clicks) is the number of
clicks we would expect this buyer to receive, if he had a clickability of 1.
2. Define
the value of learning for this buyer as θiestimatebi, where
θiestimate=C*sqrt(λiestimate/eci)
for a constant C.
3. The
mechanism sorts the advertisers based on their (λiestimate+θiestimate)bi,
allocates the j-th slot to the i-th buyer in this order (which we
call buyer i), and charges him (λi+1estimate+θi+1estimate)bi+1/(λiestimate+θiestimate)
for each click.
Properties:
1. The
mechanism can lead to a higher revenue if the short term.
2. The
mechanism can lead to higher efficiency.
3. The
main assumption is that ordering of the buyers in decreasing order of (λiestimate+θiestimate)vi
is the same as their ordering in decreasing order of λiestimatevi.
Since θi’s are typically small and
higher slots get more clicks, this assumption is often true.
Reference:
[1] Li, S., Mahdian, M., & McAfee, P. (2010). Value of learning in sponsored search auctions. In Proceedings of the 6th international workshop on internet and network economics (pp. 294–305).