Same As It Ever Was
As you read this essay, keep in mind the importance of web analytics. Most business on the web are built from these measurement tools. Publishers with page views. Advertising with impressions. Maybe most important of all, e-commerce with conversion rate. All the apps, sites and stores are built on top of percentage measurements of what happened when a resource was served to a person.
I’m old enough to recall that when web analytics emerged, they weren’t questioned much. People were just happy to have easy measures and reporting that were not log-files. The acceptance of these web measurement standards fueled everything we have today, which is a lot!
But…what if these measures were not the best way to understand the digital behavior of people?
The idea of “direct traffic” is one case in point of the fallacy of traditional web analytics. It’s really “traffic of an unknown origin” or more technically traffic with no referrer data in the packet. How about the lack of time-based measurements? Or weighting “bounce-rate” and others metric as a percentage. Don’t even get me started about raw page-view and impression metrics. In many respects these metrics beg more questions than they answer.
The biggest miss these days is that nothing much has changed in twenty years of web measurement while the web has changed from static to dynamic, from desktop to mobile, from directories, to search engines to social. From latent to real-time. From text to video. The question of web analytics modernity is really no question at all. Web analytics are not modern.
Well…How Did I Get Here?
Before we look ahead let’s look back. The history of web analytics is an interesting one for many reasons. There is much to learn. Web analytics were some of the first SaaS products. They were differentiated by user experience, feature sets and pricing. Anyone building a SaaS business now would be wise to learn from this rich and extensive history.
Early web analytics products included WebSideStory’s Hitbox which exchanged analytics for ad traffic and went public in 2004. Visual Sciences that used data visualization as its special sauce and was purchased by WebSideStory in 2006. Early examples of SaaS verticalization with Coremetrics and mobile differentiation with Flurry led to acquisitions (IBM and Yahoo respectively).
It is however Google buying Urchin in 2005 and making it free that was the seminal moment in the history of web analytics. And last but not least we can’t leave out the amazing story of Omniture. OMTR had the youngest CEO of a public company in 2006, bought almost all its competitors and was itself bought by Adobe in 2009. It remains the dominant player in Enterprise and has allowed Adobe to expand into a data fueled businesses and the marketing cloud.
Little has evolved from these launching points. This is partly because it is extremely hard to innovate on top of these old and active deployments. Case in point, the “UTM” parameter everybody knows in GA stands for “Urchin Tracking Module.” The Omniture “eVar” has been a thorn in the side of conversion measurement for well over a decade.
There have been attempts at advancement on the advertising side but third-party tagging has a myriad of issues. As someone who was able to validate Moat, IAS, DV and other with my own server data I can tell you how difficult accurate cross-domain measurement is. Even tech behemoths Google, Facebook and Amazon even have ad measurement challenges. Last click attribution is still used by most marketers because a better, more trusted system has never emerged in the past 15 years!
One evolution worth noting is the connection of data to experience. This is an important evolution fraught with its own challenges, but this is not the same thing as measurement. Dynamic experiences, headerless CMS, personalization and customer journey orchestration are applications of data. It is important to note that even here, the same underlying collection and measurement challenges exist as in web analytics because often times the same systems or technologies are being used to feed data for decision and/or measure success. There are myriad of false promises from Salespeople around these technologies because the data they use is poor.
Lastly, an area worth noting that has emerged and advanced is infrastructure and application performance measurement. But this is a far cry from measuring the behavior of people.
If we agree that the web has changed dramatically and more importantly the way people use the web and the reason people use the web have changed dramatically, it easy to reconcile that analytics have not advanced at anywhere near the same rate. In fact, they have advanced very little at all. The questions then become, what needs to happen in order for web analytics to advance? What do analytics look like for the modern web? How does analytics need to evolve in the future?
Modernizing Web Analytics
The basics of web analytics are not going to change. Event collection & logging will always be necessary. The ability to gain insight about people and their behavior through the data will always remain the most important use case. Activating relevant messages and experiences to people from this learning will always be the way to create value from data and analytics.
There are many blind spots but two glaring needs for modernization.
The value of context might best be represented in what happened to display advertising on the open web over the last decade. As display/banner ads moved away from context-based signals to serve ads (AdSense & Contextual Networks) and moved almost entirely to cookie based identifiers (RTB & Ad Exchanges), the performance of the ads (as measured by click-thru rates, itself a flawed metric but sadly the only analytic proxy for relevance ever created) fell off a cliff. The explanation was simple. Relevance can not be delivered without understanding context. So as consumers we’ve gotten a sea of irrelevant display ads the past decade as we surfed the web.
As it is with banner ads, one of the largest missing pieces of web analytic data is context. We can log an event that I visited a product page looking to buy something. It’s more valuable to be able to add the context of that product to the event or the context of the event to that product. What is the price-band of the product? Was it discounted? Is it a commercial or residential product? What are the main attributes of this product that differentiate it from others? This type of contextual information attached to the page view creates a wealth of knowledge about me as a customer. This contextual data can be used to make sure my experience is the most relevant it can be, ensuring the highest degree of certainty of a great experience that will deliver value to the brand.
There are lots of contexts about me that has been available to web analytics for a long time but are rarely used. What it my local time I’m on the site? Am I at work or home? What device am I on? Now imagine being able to stitch contexts together! What geographic areas look at the highest value products? How many of our commercial customers are visiting us on their mobile? What did our best customers look at on our site last week? This can all be known data but usually this data is scattered across reports populated by traditional web analytics data models. This data is not brought together in context where it becomes meaningful to use in a way that drive business forward other than simple segmentation and Boolean logic. Boo.
We need to be able to associate metadata with dimensions or use additional dimensions via integrated systems. We need to associate responses with behaviors. We need to take data and key it together to unify it around a context. We need to be able to extract contextual data from web analytics but also have web analytics set contextual data to other systems.
The good news is much of the plumbing to leverage context in these manners has been built. APIs and Customer Data Platforms (CDPs) have laid the ground work and modern analytics systems like Heap and Snowplow which allow data ownership and leverage easy custom event creation using JSON and the ability to productize via that enrichment. The good news is that help is on the way.
From the dawn of (analytics) time, data collected has been used to create reports to let you know what people did in the past. That is all well and good to know. The question people really want answered from analytics however is ‘what can the past teach me about the present and the future?’ What we need is the ability to use historical data to act in real-time and to predict.
Around 2010 things got a little bit better on the analytics front with advances in tech to enable real-time logging. Real-time streaming of data made real-time analytics possible. Companies like Chartbeat ushered in the real-time analytics wave and it was cool. My old company Yieldbot advanced the use of real-time data to prediction around this time as well. We used real-time data for look-ups vs historical data to make real-time decisions and then used the outcomes of those decisions to get smarter for the next decision.
Nearly 10 years later advances with big data processing and streaming continue while unit costs for infra keep shrinking. More advanced use of cloud infrastructure enables companies to leverage data warehousing, massive parallel processing and ready-made build, train, deploy machine learning models. So much is opened sourced.
Binary classification, multiclass classification and regression capabilities need to be part of all business process but like anything else data related, their outputs depend on the quality and volume of data being input, as well as the acumen of the people that will instrument and leverage these systems.
Even traditional web analytics has plenty of data for prediction, but a dearth of data scientists and other priorities have continued to reduce the usage and application of prediction using web analytics. There are also data issues with sampling, storage and most of all data ownership/governance that have got in the way of using web data for prediction. According to Gartner only 11% of companies use any data at all for prediction. If we were to measure the use of web behavioral data for prediction it is likely 1 or 2% of all companies. This is a giant miss. The platforms built on prediction from web behavior include Google, Amazon and Facebook. Businesses can’t expect to compete without platforming this data for their own use.
It is now more possible than ever to use prediction to make proactive decisions. It’s almost inexcusable not to be using predictive models. It’s also inexcusable not to have your behavioral web data in an owned database where this data exploitation can be accomplished.
While innovation in web analytics has slowed in recent years and almost all the install base of web analytics lacks modernity, the rapid pace of change in consumer behavior keeps accelerating faster than ever. It is time for business leaders to evolve too in their use of technology and serving their customers. No longer should business run on descriptive and diagnostic reporting and the KPIs contained therein. The modern business must use first-party customer behavioral data to feed systems and tools where nimble and high-value prediction can take place across contexts and dimensions of behavior. The strategic business conversation needs to be about what will happen and how we get smarter over time.
It is the largest businesses and brands that will benefit the most from this. Rather than be scared of upstart competitors who are digitally focused or “DTC,” incumbent businesses must appreciate and leverage the fact that they possess not only the competitive advantage of high volumes of first-party data but the unique ability to “close-the-loop” on customer behavior. This data can spin the flywheel businesses need to achieve growth from data-led initiatives. Modernizing customer behavioral data collection and event logging is the first step in getting this flywheel spinning.