Now, if you’re anything like me then you’ve got GA installed on your site and have compared the data it produces with the data thrown up by your web server’s stats (something called AWStats, in my case). More often than not, the figures don’t seem to agree – in fact, generally I’d say that GA reports substantially fewer visits than the server stats show.
Why should that be? The answer is that GA relies on a technique called page tagging, whereas logfile analysis depends on the data generated by your server as it records each individual file it serves. Let’s find out what that means in practice.
For one thing, it means that views of cached pages are recorded. If a page is cached by the user’s own browser or by the user’s ISP for future use, a subsequent visit to that page won’t be recorded in the website server’s logfiles – because the files have been retrieved from the cache and not from the server. So a substantial number of revisits to a site may be missed in stats generated from logfiles.
On the other hand, there are some users and visits that GA misses.
Which way does the cookie crumble?
One month on, it’s unclear how far users of websites have taken to blocking GA cookies as a result of all this publicity. It could be an insignificant number, or it could be tens of thousands. But inevitably the effect will be to depress GA visit figures – the only question is, by how much?
“Block ad” tackle
A longer-standing problem is the existence of browser plug-ins that remove advertising from web pages. Some of these plug-ins may block the GATC, whether automatically or as a configurable option.
OK, so we’ve had a look at the human visitors and how GA records them (or doesn’t, in some cases). But what about the logfiles? Are there any peculiarities that affect their figures?
Where logfile analysis wins hands-down is on non-human visitors – the search engine spiders, or “bots”.
The other big advantage that server logfiles have is that the data they record don’t have to travel to a remote location; they’re recorded there and then. With GA and other page-tagging stats analysis tools, the data must be sent from the users’ browser to the tool’s own servers, and thus have to negotiate the Internet before being recorded. If there are any problems along the way, then the data may be delayed or even lost.
Don’t sweat the small stuff
Ultimately, though, it’s probably not worth fretting about the discrepancies between page tags and logfiles. As Kay’s said before, avoid paralysis by analysis – it’s the overall trends that are likely to give you the most worthwhile information, not the individual visitors’ actions.