AdWords Competitor Analysis Script
As far as I’m aware there are no free competative intelligence tools for AdWords. This script can scrape a search results page and returns a list of all the adverts on it. It isn’t very useful in its current state but it could be pretty handy if you hooked it up to a database.
I use the TagSoup module for parsing because the Google SERP pages are not well formed. I think they save bandwidth by missing out a lot of tags.
I am very grateful to Neil Mitchel for the example code in one of his TagSoup blog posts
import Text.HTML.TagSoup
I found Text.HTML.Download to be unreliable so using Network.Curl is better even if it is a bit more complicated
import Network.Curl (curlGetString)
import Data.List
import Data.Maybe
System.Environment is needed to get the arguments provided when calling a program from the command line. System.Posix is used for the sleep function so that you can get results every hour (or similar).
Using these might mean that this won’t compile on Windows
import System.Environment
import System.Posix
It is possible to get some information (sometimes) about the URL of the landing page without clicking the ad. This is in the q parameter of the URL presented on the search results page so import Network.URL to help get at it
import Network.URL (importParams)
The time the search is being done is quite important for working out day parting strategies.
import System.Time
import Data.List.Utils (replace)
Define the Advert data type. Adtext2 is Maybe String because for the top ranking ads the description lines are not split.
data Advert = Advert{link::String
,title::String
,dispurl::String
,adtext1::String
,adtext2::Maybe String
} deriving Show
Define a search result as the keyword, date/time of search, advert and ad rank. There might be a more natural way of defining this; it will depend a bit on what data you are trying to get.
data SearchResult = SearchResult {keyword::String
,date::CalendarTime
,advert::Advert
,rank::Int
} deriving Show
Google uses bold text to highlight phrases used in the keyword. This just confuses things for us so remove it.
removeBold ((TagOpen "b" []):xs)= removeBold xs
removeBold ((TagClose "b"):xs) = removeBold xs
removeBold (x:xs) = x : removeBold xs
removeBold [] = []
There are also a lot of   which are used for formatting. This is not important for us.
removeNbs = filter (/='\160')
Once the bold tags have been removed we get things like [TagText ’Blah Blah’, TagText ’This bit was bold’]. We want these to be part of the same TagText.
concatTags ((TagText x):(TagText y):xs)= concatTags ((TagText $ x++y): xs)
concatTags (x:xs) = x : concatTags xs
concatTags []=[]
A simple function composing what we have done so far
parsed = concatTags . removeBold . parseTags
Extract the URL parameter that corresponds to the destination URL
desturl url = case importParams url of
Nothing -> "Could not get URL"
Just a -> snd $ last a
topAdverts takes the page source as input and extracts the ads that appear above the search results into the Advert format. It is a very ugly function but it seems to do the job.
topAdverts src = [Advert (desturl x) (fromTagText y) (fromTagText z)
(removeNbs (fromTagText t)) Nothing| l:TagOpen "h3" []:TagOpen "a"
(_:("href", x):_):y:TagClose "a":TagClose "h3":TagOpen "cite" []:z:
TagClose "cite":t:_ <- tails $ parsed src, l ~== "<li>"]
sideAdverts does the same but for the ads that appear to the side of the organic results
sideAdverts src = [Advert (desturl x) (fromTagText y) (fromTagText v)
(fromTagText t) (Just (fromTagText u))| l:TagOpen "h3" []:TagOpen "a"
(_:("href", x):_):y:TagClose "a":TagClose "h3":t:TagOpen "br" []:u:
TagOpen "br" []: TagOpen "cite" []:v:_
<- tails $ parsed src, l ~== "<li>"]
A simple function to make working with curlGetString a bit easier
getResults keywords = do
src<-curlGetString ("http://www.google.co.uk/search?q="++
(replace " " "+" keywords)) []
return $! snd src
getAndParse does the actual work
getAndParse keywords = do
time<-getClockTime
date<-toCalendarTime time
src<-getResults keywords
let ads=(topAdverts src)++(sideAdverts src)
return $ listOfAds ads keywords date
listOfAds is my favourite function in this program. It ranks the ads and puts them into the SearchResult data type along with the date
listOfAds list keyword date = zipWith (SearchResult keyword date) list [1..]
The following code will format the results and output them to a tsv file. I use tsv rather than csv because it makes it easier to deal with adverts that have commas in the ad text.
showadtext2 text = case text of
Nothing -> ""
Just a -> a
searchResultToRow x = concat $ intersperse "\t" [keyword x,
show $ ctHour $ date x, show $ ctMin $ date x,
show $ ctWDay $ date x, show $ ctDay $ date x,
show $ ctMonth $ date x, show $ ctYear $ date x,
show $ rank x, title $ advert x, adtext1 $ advert x,
showadtext2 $ adtext2 $ advert x, dispurl $ advert x,
link $ advert x]
listToTsv = unlines . (map searchResultToRow)
appendToFile keywords filename = do
results <- getAndParse keywords
let output = listToTsv results
appendFile filename output
waitAndRepeat is my best effort and making a haskell function sleep.
waitAndRepeat function sleeptime = do
putStrLn "Doing function"
function
putStrLn ("Waiting for "++show sleeptime)
sleep sleeptime
putStrLn ("Finished Waiting")
waitAndRepeat function sleeptime
main takes the keyword, the search frequency (in seconds) and the output file as arguments. It won’t like it if these things are missing.
main = do
(keywords:frequency:filename:_)<-getArgs
waitAndRepeat (appendToFile keywords filename) (read frequency::Int)
