Ad Testing Maths

The Distribution of CTR

Suppose we have an Advert A with a click through rate p. Then the probability of getting n clicks from m impressions is:

{m }p{n}(1-p){m-n}

{m }p{n}(1-p){m-n}

So using quite simple maths we have a probability on n and m given p. Bayes’ Theorem allows us to turn this around and get a probability distribution for p based on n and m. This ia a lot more useful because we know n and m but we don’t know p.

Using Bayes’ Theorem we have:

P(P=p|m,n)=

P(P=p|m,n)=

Where P is the probability distribution of the CTR p and the function f is the prior distribution. The prior distribution represents our prior knowledge about the CTR p. The integral in the denominator normalises the distribution P

If we assume no prior knowledge of the CTR (other than that it is between 0 and 1) then f is a uniform distribution meaning that f(t) does not depend on t. The simplifies the integral to:

P(P=p|m,n)=

P(P=p|m,n)=

This is a beta distribution.

I have abused notation slightly to make it easier to follow. In actual fact P is a continuous distribution so probabilty density functions should be used. I have also talked only about clicks and impressions rather than sucesses and trials. This makes things a bit easier to grasp to begin with. It is easy enough to change to clicks and conversions or some other success/trial measure.

The Probability that One Advert is worse than Another

So to sum up so far, probability density function for the CTR p of an ad with n clicks from m impressions is:

P(p=p|m,n)=

P(p=p|m,n)=

Now we can calculate the probability that one ad has a higher click though rate than another.

Firstly we want to know the probability that the CTR for an ad is less than some number x. This is given by the following integral:

P(p<x|m,n)=_{0}^{x}dp=

P(p<x|m,n)=_{0}^{x}dp=

For general m, n we’d have to use a beta function to calculate the integral but for this problem m and n are always whole numbers so we can use integration by parts which gives that:

P(p<x|m,n)=_{i=n+1}{m+1}x{j}(1-x)^{m+1-j}

P(p<x|m,n)=_{i=n+1}{m+1}x{j}(1-x)^{m+1-j}

Now let us introduce some notation. Let A be an advert with CTR distribution PA arising from nA clicks from mA impressions. Define a CTR distribution for another ad B similarly.

If we again abuse notation slightly to make it easier to deal with the continuous probability density functions it is easy to see that:

P(p{A}<p{B})={0}^{1}P(p{B}=x)P(p_{A}<x)dx

P(p{A}<p{B})={0}^{1}P(p{B}=x)P(p_{A}<x)dx

Using what we have already calculated this gives that:

P(P{A}<P{B})={i=n_A+1}^{m{A}+1}

P(P{A}<P{B})={i=n_A+1}^{m{A}+1}

Using integration by parts again on both integrals and then moving everything that does not depend on i out of the sum gives that:

P(p{A}<p{B})={i=n{A}+1}^{m_{A}+1}

P(p{A}<p{B})={i=n{A}+1}^{m_{A}+1}

The formula is what the Ad Tester uses to determine the chance that one advert is worse than another.