Ad Tester Weaknesses

Mathematical Weaknesses

Both Tools

  • Tests are run pairwise on all adverts in an ad group. If an ad group contains 2 ads then 1 test is run, 3 ads and 2 tests are run, 4 ads and 6 tests are run etc. Every individual test has a 1 - test strength chance of being wrong. Taken over a whole account and it is almost certain that the tools are returning some bad results. Wikipedia has more information on this sort of problem: http://en.wikipedia.org/wiki/Multiple_testing and http://en.wikipedia.org/wiki/False_discovery_rate
  • I am not using a hypothesis test in the normal way. I have not defined a null hypothesis because I am asking a slightly different question. To form a null hypothesis I would be saying something like “Suppose these two ads actually perform the same. What is the probability of getting the results I have observed through random chance?” Instead I ask “What is the chance that ad A is better/worse than ad B?”
    This makes far more sense to me but it is not the conventional way of doing things.

Weighted Ad Tester

  • My demonstration of how the test works relies on the number of trials and number of successes being integers. With the weighted ad tester, the numbers on which the test are run are almost certainly not integers so the proof as it is written is invalid.
    I think I could modify the proof to work using gamma functions instead in which case all will be well. The gamma function is not close to the factorial function for x<1 so if you are testing stuff where this sort of value may get passed (e.g an old advert that had few impressions a long time ago and nothing recently) you might not get the results you expect.

Program Weaknesses

  • Error checking is very poor. It won’t take much effort for you to crash everything if that is what you want to do.
  • As long as it can be parsed the .tsv file is not checked for correctness. In some respects this is just an instance of garbage in garbage out, but you won’t get any sort of warnign that garbage is going in.