AdWords Competitor Analysis Script

As far as I’m aware there are no free competative intelligence tools for AdWords. This script can scrape a search results page and returns a list of all the adverts on it. It isn’t very useful in its current state but it could be pretty handy if you hooked it up to a database.

I use the TagSoup module for parsing because the Google SERP pages are not well formed. I think they save bandwidth by missing out a lot of tags.

I am very grateful to Neil Mitchel for the example code in one of his TagSoup blog posts

> import Text.HTML.TagSoup

I found Text.HTML.Download to be unreliable so using Network.Curl is better even if it is a bit more complicated

> import Network.Curl (curlGetString)
> import Data.List
> import Data.Maybe

System.Environment is needed to get the arguments provided when calling a program from the command line. System.Posix is used for the sleep function so that you can get results every hour (or similar).

Using these might mean that this won’t compile on Windows

> import System.Environment
> import System.Posix

It is possible to get some information (sometimes) about the URL of the landing page without clicking the ad. This is in the q parameter of the URL presented on the search results page so import Network.URL to help get at it

> import Network.URL (importParams)

The time the search is being done is quite important for working out day parting strategies.

> import System.Time
> import Data.List.Utils (replace)

Define the Advert data type. Adtext2 is Maybe String because for the top ranking ads the description lines are not split.

> data Advert = Advert{link::String
> 		    ,title::String
> 		    ,dispurl::String
> 		    ,adtext1::String
> 		    ,adtext2::Maybe String
> 		    } deriving Show

Define a search result as the keyword, date/time of search, advert and ad rank. There might be a more natural way of defining this; it will depend a bit on what data you are trying to get.

> data SearchResult = SearchResult {keyword::String
> 				  ,date::CalendarTime
> 				  ,advert::Advert
> 				  ,rank::Int
> 				  } deriving Show

Google uses bold text to highlight phrases used in the keyword. This just confuses things for us so remove it.

> removeBold ((TagOpen "b" []):xs)= removeBold xs
> removeBold ((TagClose "b"):xs) = removeBold xs
> removeBold (x:xs) = x : removeBold xs
> removeBold [] = []

There are also a lot of &nbsp which are used for formatting. This is not important for us.

> removeNbs = filter (/='\160')

Once the bold tags have been removed we get things like [TagText ’Blah Blah’, TagText ’This bit was bold’]. We want these to be part of the same TagText.

> concatTags ((TagText x):(TagText y):xs)= concatTags ((TagText $ x++y): xs)
> concatTags (x:xs) = x : concatTags xs
> concatTags []=[]

A simple function composing what we have done so far

> parsed = concatTags . removeBold . parseTags

Extract the URL parameter that corresponds to the destination URL

> desturl url = case importParams url of
> 		Nothing -> "Could not get URL"
> 		Just a -> snd $ last a

topAdverts takes the page source as input and extracts the ads that appear above the search results into the Advert format. It is a very ugly function but it seems to do the job.

> topAdverts src = [Advert (desturl x) (fromTagText y) (fromTagText z) 
> 	(removeNbs (fromTagText t)) Nothing| l:TagOpen "h3" []:TagOpen "a" 
> 	(_:("href", x):_):y:TagClose "a":TagClose "h3":TagOpen "cite" []:z:
> 	TagClose "cite":t:_ <- tails $ parsed src, l ~== "<li>"]

sideAdverts does the same but for the ads that appear to the side of the organic results

> sideAdverts src = [Advert (desturl x) (fromTagText y) (fromTagText v)
> 	(fromTagText t) (Just (fromTagText u))| l:TagOpen "h3" []:TagOpen "a" 
> 	(_:("href", x):_):y:TagClose "a":TagClose "h3":t:TagOpen "br" []:u:
> 	TagOpen "br" []: TagOpen "cite" []:v:_ 
> 	<- tails $ parsed src, l ~== "<li>"]

A simple function to make working with curlGetString a bit easier

> getResults keywords = do
> 	src<-curlGetString ("http://www.google.co.uk/search?q="++
> 		(replace " " "+" keywords)) []
> 	return $! snd src

getAndParse does the actual work

> getAndParse keywords = do
> 	time<-getClockTime
> 	date<-toCalendarTime time
> 	src<-getResults keywords
> 	let ads=(topAdverts src)++(sideAdverts src)
> 	return $ listOfAds ads keywords date

listOfAds is my favourite function in this program. It ranks the ads and puts them into the SearchResult data type along with the date

> listOfAds list keyword date = zipWith (SearchResult keyword date) list [1..]

The following code will format the results and output them to a tsv file. I use tsv rather than csv because it makes it easier to deal with adverts that have commas in the ad text.

> showadtext2 text = case text of
> 			Nothing -> ""
> 			Just a -> a
> 
> searchResultToRow x = concat $ intersperse "\t" [keyword x, 
> 	show $ ctHour $ date x, show $ ctMin $ date x, 
> 	show $ ctWDay $ date x, show $ ctDay $ date x, 
> 	show $ ctMonth $ date x, show $ ctYear $ date x, 
> 	show $ rank x, title $ advert x, adtext1 $ advert x, 
> 	showadtext2 $ adtext2 $ advert x, dispurl $ advert x, 
> 	link $ advert x]
> 
> listToTsv = unlines . (map searchResultToRow)
> 
> appendToFile keywords filename = do
> 		results <- getAndParse keywords
> 		let output = listToTsv results
> 		appendFile filename output

waitAndRepeat is my best effort and making a haskell function sleep.

> waitAndRepeat function sleeptime = do
> 				putStrLn "Doing function"
> 				function
> 				putStrLn ("Waiting for "++show sleeptime)
> 				sleep sleeptime
> 				putStrLn ("Finished Waiting")
> 				waitAndRepeat function sleeptime

main takes the keyword, the search frequency (in seconds) and the output file as arguments. It won’t like it if these things are missing.

> main = do
> 	(keywords:frequency:filename:_)<-getArgs
> 	waitAndRepeat (appendToFile keywords filename) (read frequency::Int)