Sunday, March 18, 2012

Value of Learning

    Goal:
1. To solve the problem of efficiently allocating the sponsored links and simultaneously estimating the CTR. 
2. The task is to strike a balance between exploring, i.e., showing a link to get a better estimate of its CTR, and exploiting, i.e., showing links that have the best performance, according to our current estimates of the CTRs. 
3. In the mechanism, an advertiser (buyer) has an incentive to increase his bid by some amount, which we call the value of learning.

Model and Notations:
1. We consider a setting where n bidders (or candidate sponsored links) are competing to be placed in one of the m slots.
2. Assume a separable model for CTR, i.e., if the link of advertiser i is displayed in position j, it will be clicked on with probability γjλi. Here γj is associated with each slot j=1,…,m (called the position bias of slot j), and λi is determined by each advertiser (called the clickability of this advertiser).
3. Assume that the slots are numbered in decreasing order of their position bias, i.e., γ1≥γ2≥…γm. The exact clickability of advertisers are not known, and the estimate of the clickability of advertiser i is denoted by λiestimate.
4. bi: the bid of the i-th buyer
5. pi: the price of the i-th buyer pays
6. vi: the value of the i-th buyer per click

    Randomization Step:
1. In a repeated second-price auction without exploration, when there is uncertainty about the clickability of the advertiser, it is no longer the buyer’s interest to bid his value per click. Specifically, the buyer has the incentive to increase his bid in order to induce the mechanism to explore him.
2. Randomization: assume the clickability λi is distributed according to a prior distribution Dλ. Each time before the ranking begins, the i-th advertiser’s clickability λi is picked from the prior distribution. It gives the buyers who have low clickability (which may be a result of insufficient traffic) a chance to prove themselves.
Note: Different buyers may have different prior distribution Dλ

  Value of Learning:
1. How to estimate the clickability of a buyer: λiestimate=ci/eci, where ci is total clicks the i-th buyer has receives (for all positions), and eci (cumulative expected clicks) is the number of clicks we would expect this buyer to receive, if he had a clickability of 1.
    2. Define the value of learning for this buyer as θiestimatebi, where
         θiestimate=C*sqrt(λiestimate/eci) for a constant C.
3. The mechanism sorts the advertisers based on their (λiestimateiestimate)bi, allocates the j-th slot to the i-th buyer in this order (which we call buyer i), and charges him (λi+1estimatei+1estimate)bi+1/(λiestimateiestimate) for each click.

   Properties:
1. The mechanism can lead to a higher revenue if the short term.
2. The mechanism can lead to higher efficiency.
3. The main assumption is that ordering of the buyers in decreasing order of (λiestimateiestimate)vi is the same as their ordering in decreasing order of λiestimatevi. Since θi’s are typically small and higher slots get more clicks, this assumption is often true.

 Reference:
[1] Li, S., Mahdian, M., & McAfee, P. (2010). Value of learning in sponsored search auctions. In Proceedings of the 6th international workshop on internet and network economics (pp. 294–305).



No comments:

Post a Comment