Statistics/Data Mining Applications: March 2012

Sunday, March 18, 2012

Value of Learning

Goal:

1. To solve the problem of efficiently allocating the sponsored links and simultaneously estimating the CTR.

2. The task is to strike a balance between exploring, i.e., showing a link to get a better estimate of its CTR, and exploiting, i.e., showing links that have the best performance, according to our current estimates of the CTRs.

3. In the mechanism, an advertiser (buyer) has an incentive to increase his bid by some amount, which we call the value of learning.

Model and Notations:

1. We consider a setting where n bidders (or candidate sponsored links) are competing to be placed in one of the m slots.

2. Assume a separable model for CTR, i.e., if the link of advertiser i is displayed in position j, it will be clicked on with probability γ_jλ_i. Here γ_jis associated with each slot j=1,…,m (called the position bias of slot j), and λ_i is determined by each advertiser (called the clickability of this advertiser).

3. Assume that the slots are numbered in decreasing order of their position bias, i.e., γ₁≥γ₂≥…γ_m. The exact clickability of advertisers are not known, and the estimate of the clickability of advertiser i is denoted by λ_i^estimate.

4. b_i: the bid of the i-th buyer

5. p_i: the price of the i-th buyer pays

6. v_i: the value of the i-th buyer per click

Randomization Step:

1. In a repeated second-price auction without exploration, when there is uncertainty about the clickability of the advertiser, it is no longer the buyer’s interest to bid his value per click. Specifically, the buyer has the incentive to increase his bid in order to induce the mechanism to explore him.

2. Randomization: assume the clickability λ_iis distributed according to a prior distribution D_λ. Each time before the ranking begins, the i-th advertiser’s clickability λ_iis picked from the prior distribution. It gives the buyers who have low clickability (which may be a result of insufficient traffic) a chance to prove themselves.

Note: Different buyers may have different prior distribution D_λ

Value of Learning:

1. How to estimate the clickability of a buyer: λ_i^estimate=c_i/ec_i, where c_iis total clicks the i-th buyer has receives (for all positions), and ec_i (cumulative expected clicks) is the number of clicks we would expect this buyer to receive, if he had a clickability of 1.

2. Define the value of learning for this buyer as θ_i^estimateb_i, where

θ_i^estimate=C*sqrt(λ_i^estimate/ec_i) for a constant C.

3. The mechanism sorts the advertisers based on their (λ_i^estimate+θ_i^estimate)b_i, allocates the j-th slot to the i-th buyer in this order (which we call buyer i), and charges him (λ_i+1^estimate+θ_i+1^estimate)b_i+1/(λ_i^estimate+θ_i^estimate) for each click.

Properties:

1. The mechanism can lead to a higher revenue if the short term.

2. The mechanism can lead to higher efficiency.

3. The main assumption is that ordering of the buyers in decreasing order of (λ_i^estimate+θ_i^estimate)v_iis the same as their ordering in decreasing order of λ_i^estimatev_i.Sinceθ_i’sare typically small and higher slots get more clicks, this assumption is often true.

Reference:

[1] Li, S., Mahdian, M., & McAfee, P. (2010). Value of learning in sponsored search auctions. In Proceedings of the 6th international workshop on internet and network economics (pp. 294–305).

Wednesday, March 7, 2012

SQL Query: traffic referred from the same resource

CREATE temp TABLE VISITKEY_LOOKUP AS
SELECT VISIT_KEY
FROM TABLE_PAGE_VIEWS
WHERE SITE_KEY = 1
and day_key >= 20080501
AND REFERRER like '%google%'
AND VISITORIZED='true'
group by 1;

--PV
SELECT DAY_KEY || SUBSTRING(Event_DTM from 11 for 3) as DAY_HOUR, count(*) PAGEVIEW
FROM TABLE_PAGE_VIEWS PV
INNER JOIN VISITKEY_LOOKUP
ON PV.VISIT_KEY=VISITKEY_LOOKUP.VISIT_KEY
WHERE SITE_KEY = 1
and day_key >= 20080501
group by 1
order by 1;

Saturday, March 3, 2012

Reserve Prices in Internet Advertising Auctions

Suppose the reserve price is set for each content page (or for each seller, or for each category) at the time point (denoted by t) when a user lands on this page and the auction for showing recommended content pages begins.

1. At this time point t, compute the average number of buyers bidding on this content page (or this seller, or this category), the average bid, and the average standard deviation of the bids, where the average was taken over all biddings. That is, every time the content page is viewed and the auction is conducted, the three statistics were computed, and then the average over all biddings for this given content page was take. The bid of the highest bidder in every auction was excluded from the statistics, because the theory does not allow us to pin it down (every bid of the highest bidder above a certain value results in the same payoffs). It is assumed that bidders’ values are drawn from a lognormal distribution, and its mean and standard deviation can be estimated.

Example: before this auction begins, we have bidding historical data by buyers.

Number of bidders	Mean(bid) excluding the highest bid	Sd(bid) Excluding the highest bid
3	$0.15	0.02
5	$0.17	0.04
4	$0.14	0.03
…	…	…
6	$0.20	0.05
Average of the above historical bidding data
4.3	$0.16	0.04

2. These two statistics can be achieved in step 1. At time point t when the auction begins, we may observe the number of bidders as well as their bids. Denote the number of bidders for this auction as k. Simulate three moments (observed number of bidders (which is equal to k for this auction), average bid in positions 2 and below, and the standard deviation of the bids in positions 2 and below) for various true values of the mean and standard deviation of the lognormal distribution of values. To do that, 1000 draws of the vectors of bidder values (with k bidders) were drawn from the lognormal distribution estimated in step 1. Note that the number of bidders is irrelevant for setting the optimal reserve price, but it needs to be estimated in order to get an accurate estimate of the mean and the standard deviation of the distribution of values.

Example: suppose k=4, then simulate 1000 draws from the estimated lognormal distribution with mean $0.16 and standard deviation 0.04 made up from example in step 1.

Number of bidders	bids	Mean(bid) excluding the highest bid	Sd(bid) Excluding the highest bid
4	$0.25, $0.18, $0.16,$0.14	$0.16	0.02
4	$0.22, $0.16, $0.13,$0.1	$0.13	0.03
4	$0.18, $0.15, $0.12,$0.09	$0.12	0.03
…	…	…	…
Average of the above simulated bidding data
4		$0.14	0.04

3. As long as the lognormal normal distribution is estimated for the case of k bidders, the theoretically optimal reserve price r is computed by solving the equation

r-[1-F(r)]/f(r)=0, (1)

where F is the cumulative distribution function and f is the probability density function estimated at step 2 for the case of k bidders.

Example: as in the made-up example in step 2, for the case of k=4 bidders, the corresponding lognormal distribution has mean $0.14 and standard deviation 0.04. Then the reserve price can be calculated by solving equation (1) for the current auction.

4. The above calculated reserve price is based on uniform CTRs for each buyer, and it needs to be converted into buyer-specific reserve prices that reflected the quality scores of the buyers: buyers with higher CTR face lower per-click reserves, and vice versa.

In the current auction at time point t, suppose the k bidders’ CTRs are CTR_1, CTR_2,…, CTR_k, and denote their average as CTR_average. Then the i-th buyer’s reserve price is r_i=r*CTR_average/CTR_i.

Disadvantages: for some auctions, the reserve price may push out some potential qualified bidders so that the spots may not be fully filled and thus results in a revenue loss.

Reference:

[1] Ostrovsky, M., Schwarz, M., 2010. Reserve Prices in Internet Advertising Auctions: A Field Experiment.