Why You are Doing PPC Wrong

  • How do you test your adverts?
  • How do you test your conversion rate?
  • Do you use an online tool?
  • Do you use a spreadsheet?
  • What is the algorithm used?

Testing click through rates and conversion rates for an advert or keyword is a class of problem known as Binomial Proportion Estimation and if you (or the provider of your tool) hasn’t read beyond a basic statistics text book then you are almost certainly doing it wrong.

Interval Estimation for a Binomial Proportion (free to view on Project Euclid) discusses the weaknesses of the Wald Interval, the interval presented as the solution to this problem in most textbooks, and it also discusses several alternatives.

I won’t go through the paper in huge detail here, I won’t do as good a job as the original authors. Instead I’ll extract the most important points:

Why the Wald Interval is Crap

  • “It is widely recognized that the actual coverage probability of the standard [Wald] interval is poor for p near 0 or 1.” The problems we have in AdWords almost always have p near 0; CTR’s and conversion rates are not ofter above 10%.
  • “The actual coverage probability of the standard interval can differ significantly from the nominal confidence level for moderate and even large sample sizes.” i.e even for large sample sizes the coverage probability for a 95% confidence interval may well be less than 95%.

After demonstrating the inadequacies of the Wald interval the authors then go on to discuss three alternative candidates. NB: Responses and Comments on this paper all seem to agree on the crapness of the Wald Interval but there is no agreement on which is best to replace it with.

The Recommended Intervals

The Wilson Interval

  • “Coverage of the Wilson interval fluctuates acceptably near 1-alpha, except for p very near 0 or 1”. The Wilson Interval outperforms the Wald Interval, but it still has large fluctuations in the coverage probability near 0. I’d say this makes it definitely unsuitable for use in the content network where CTRs and conversion rates are ofter lower.
  • “Always, the standard interval and the Wilson interval have almost identical average expected length.” This means you can expect to see test results just as quickly with the Wilson Interval as with the Wald Interval. Caveat: I am not sure how this changes for large n and for very small p.
  • This confidence interval is used to rank comments on Reddit

The Agresti-Coull Interval

  • “The Agresti-Coull intervals are never shorter than the Wilson intervals.” This means that tests using this method will take at least the same time to work as tests using the Wilson interval.
  • “The Agresti-Coull interval has good minimum coverage probability. The coverage probability of the interval is quite conservative for p very close to 0 or 1.” For small p a 95% confidence interval found using this method has a better than 95% chance of containing p. This could be considered a good thing or it could be considered overkill.

Jeffreys Interval

  • No link for this one; the best definition I can find is actually contained in the paper.
  • “The coverage has an unfortunate fairly deep spike near p = 0”. O’dear. A modified version is presented later in the paper to fix this problem.
  • “The average coverage of the Jeffreys interval is really very close to the nominal level even for quite small n. This is quite impressive.”
  • When talking about the difference in length between the Jeffreys Interval and the Wilson Interval; “the difference is not of practical relevance.”

My Recommendation

Seeing as we (PPC marketers) generally deal with large sample sizes and small values of p I recommend using the Agresti-Coull Interval.

I haven’t changes the algorithm used on my AdWords ad tester because I am not using the Wald Interval. I’m not using the Agresti-Coull Interval either; check out the ad tester maths page to find out why.