Monday, 22 September 2008

Expanded Matching – broadening my skepticism

OK, Google’s expanded match has been around for a while now, but is it just me or are things getting worse? Over the last couple of months I’ve seen broadmatched terms increase their impressions by 150-200%. Google seems to dial it up during slow seasons for individual keywords.

For example September is a particularly slow month for selling wrist watches, so why, we have to ask, has the term ‘gold watches’ on broad match received more impressions than it did last month?

It’ll be interesting to see if the keyword tool broad match and exact match volume estimates for affected terms like this reflect the increased expanded matching. Eg, does it show an increase in broad match impressions where there is actually a decrease in exact match impressions? I’ll look into this over the next couple of weeks.

What can we as advertisers do about this?
Make sure list of negative keywords is comprehensive. Google introduced the search query report to help us to identify more negative keywords, but unfortunately as the impression volume goes up you can bet that the number of “other unique queries” goes up too! I’ve not yet come across a third party tool that gives full search query visibility either (several give more visibility, but not total) – if anyone knows of one then please make me aware!

In the mean time here’s a nice little example of expanding matching too far – my ad for the product SnoreWizard was appearing against searches for ‘Harry Potter’ and ‘Roy Wood’. Nice.

Friday, 19 September 2008

How Accurate is the Google Keyword Tool’s Volume Estimates?

The following is a post I wrote for search engine war back in July. I've republished it as a quick way to get content onto the blog so i can play around with how things look - so apologies if you've read it before!

There was a fair amount of fuss made about Google’s decision to include ‘real’ numerical volume data in the keyword tool update (released 07/07/2008). Many people shouted hoorah at a new age of openness from Google, and twice as many huffed and puffed and dismissed it as inaccurate off hand. But almost no one published the results of their tests with any numerical visibility.

So how do we test it? In theory we just need to find a keyword with a 100% impression share for a search term, and see whether its impressions figure for June matches with the figure given in the tool. But it was never going to be that simple...

There is no way within the Google interface (to my knowledge) to establish if a keyword has 100% impression share, unless that keyword has been placed is in its own adgroup. Our accounts only have one-keyword-adgroups in very rare circumstances as, even with high volume exact matches, there are normally a handful of terms that group nicely together, be they plurals or common variations (such as the ever present “keyword UK”, that always gets volume but rarely justifies its own adgroup from a quality score point of view for a uk targeted site).

So what keywords is it reasonable to assume have a 100% IS and match as closely as possible to Google’s total keyword volume? In theory a keyword has to fulfil the following requirements to be eligible for this test:
It needs to be completely uncapped, its display unrestrained by any budget or adscheduling.
It needs to be active on BOTH Google and the search network, as the keyword tool’s figures include both.
It needs to have the same language and location settings as the tool, which in our case needs to be English, United Kingdom. The tool as yet cannot be set to include region targeting (and if the woeful inaccuracy of the traffic estimator tool when it comes to UK region targeting is anything to go by, I’m not sure I’d use it if it did!).
It needs to be running in 1st position, at all times. This is because the search network includes ‘search results’ on sites that have limited space to display ads (eg ebay) and show only the top few results.
Finally the keyword needs to be set to Exact Match. In theory this isn’t an absolute necessity, you could achieve a 100% IS on a broadmatch keyword, and the tool’s results are filterable by all match types, but i think the current ‘Expanded Broad Match’ relevancy lottery that Google seems to be running (perhaps a subject of future post), as well as variation caused by negative keywords, etc, will only add further inaccuracies into the test.

The type of keywords that immediately spring to mind when we’re talking about ‘always top, always displayed’ terms are those related to brand. Many of our clients need to be shown as the top result when a search on their brand is performed, whether due to competition against an untrademarkable brand, poor positioning within the algorithmic listings or to promote a specific campaign. Of course, brand terms are often low volume, but fortunately a few of our clients have very generic keywords as their brand names, so I was able to proceed with eight keywords of varying volume levels that should have received, as far as possible, a 100% impression share.
I’ve hidden the identity of the keywords, but the figures are real.

Many of these differences are huge, with both “Generic Keyword A” and “Generic Brand A” having more than twice the volume Google estimates. Interestingly, the ‘average search volume’ is closer to the actual June figure with all but one result. I wasn’t going to include the ‘average’ statistic, theorising that it surely takes into account 12 months data to allow for full seasonality , but the fact it seems more accurate means it needs to be discussed (more on this below).

Why the inaccuracy? Well, we must take into account Google’s manipulation of the figures. Google admits to only giving ‘approximate’ data, but goes one step further than simply rounding - it actually groups all keywords into sets of volumes. No matter what search query you put into the tool its volume will always be one of a fixed set of results.
For example, with the kind of volumes we’re dealing with (between 1,000 and 100,000 impressions) Google allows only the following results from the keyword tool:
90500, 74000, 60500, 49500, 40500, 33100, 27100, 22200, 18100, 14800, 12100, 9900, 8100, 6600, 5400, 4400, 3600, 2900, 2400, 1900, 1600, 1300, 1000, 880, 720

Lets put those estimations against our keywords. I have rounded each actual keyword volume DOWN (as all of the results exceed the estimations) to the closest available result. The table now looks like this for June:

As you can see two keywords, “Generic Brand B” and “Generic Brand C”, are now correct, but all the others were underestimated by more than one category. This isn’t such a big issue at the lower end of the volume spectrum, where sets are closer spaced, but for higher volume keywords this makes the data wildly inaccurate. Generic B was only one set out, but this is still 10,000 impressions, potentially hundreds of clicks a month.

The impression data that the tool produced for June is not accurate, even once Google’s rounding is taken into account. Does this make it useless? Well, no, of course not. First of all you’ll notice that, with the exception of “Generic Keyword A”, the keyword’s relative positioning in the volume list was correct. Secondly, the average volume was more accurate, and this is the column most will use for all but seasonally affected campaigns.
This disparity between the accuracy of the average impressions and the June impressions suggests to me that Google has manipulated the results. The way in which the keywords are grouped into sets for volume might suggest they are grouped in other ways, perhaps at a thematic level, and if June was a poor volume month in certain sectors (which it was) Google might moderate the results in these sectors down. Its much more dangerous for Google to overestimate volume than underestimate volume – a new advertiser may be put off bidding on a high volume term, or may limit their bid, budget or location targeting because of a high volume estimate, therefore it makes sense for Google to cautiously underestimate impression volume, as evidenced by the fact all June results were less than the actual impressions.
The final factor is the location/language setting set by the user. How accurate is Google’s location data? Does it identify UK volume in the same way it identifies UK users when ads are displayed, or does it use a more simplistic method? The only way to find out would be to look at data from accounts in other regions, particularly the default US, and see what trends are observed. Additionally I believe it perfectly likely that if modifications are made to the volume figures of keyword groups, as suggested above, then these modifications are probably based on US data and rolled out across other markets (particularly if the language is the same).
I hope this has helped give an insight into the accuracy of the data the tool produces. Please let me know what trends you’ve seen – i’ll update this when the July figures are published and see if my observations carry through. I still think this level of visibility is much better than the old relative volume bars – if nothing else we can use the results to show a more accurate relative volume than the old 1-5 scale. Use it, with caution, and assume more volume than estimated.
An update on this with July's figures can be found here. Actual unique content coming soon!