By Gary Angel
Expert Author
Article Date: 2008-08-11
For many years, marketing professionals have relied on a set of analysis techniques designed to help them understand the demographic and psychographic profiles of their customers and prospects.
These traditional segmentations are usually derived from complex clustering techniques that map rich primary research data (usually survey based) into common groups or profiles. These groups are then given highly descriptive business names and rich descriptions and provide a framework for a wide range of marketing activities. Though such segmentations can be (and are) applied to online customers, companies that have tried to map these segmentations down to the individual level (for targeting or reporting) in the online world have mostly been disappointed.
In Part I of this series, I described the biggest pitfall in extending these segmentations - the near impossibility of mapping demographic and psychographic profiles to visitors about whom we typically know nothing except their online behavior. In Part II, I discussed the advantages and disadvantages of building a behavioral segmentation. In Part III, I covered different strategies for joining survey data to a behavioral segmentation, when each is appropriate, and why the join is necessary at all. In Part IV, I covered basic data transformations for segmentation - focusing on describing visitor-level topic interest. In Part V, I described a Functional approach to building session profiles - and how these session-styles lend a whole new dimension to behavioral segmentation.
In this post, I'm going to talk about time-based attributes and how they can be captured either inside or external to the segmentation.
One of the interesting problems posed by behavioral segmentation is that behavioral profiles introduce an element of time into the analysis in a way that is fundamentally different from that captured in traditional demographic and psychographic analysis. It's true, of course, that age is one of the most powerful demographic variables and it necessarily evolves over time (as can income, zip, attitudes, etc.). But for marketing purposes, all these variables are treated effectively as a snapshot because the rate of change is too slow to impact any marketing decisions or analysis and because the changes are fundamentally exogenous to the business. Neither is true for the variables in a behavioral segmentation.
When you build a behavioral segmentation, you are necessarily relying on a snapshot in time - and the length of that snapshot will have a profound effect on the nature of the segmentation. A visitor segmentation around 1 visit will be fundamentally different that a segmentation around 1 day, 1 week, 1 month, or for lifetime tracking. Nor is there a single right answer about the best time-frame for a visitor snapshot to drive a segmentation - the most interesting length is heavily dependent on the shape of the business, the goals of the analysis and the infrastructure for tracking.
For cookie-based visitor segmentation, there is also a finite duration of time in which enough visitor behavior can be consistently tracked to drive a meaningful analysis - and that time period is usually no more than 2-3 months.
What's challenging, from an analytic perspective, is the mix of potential visitor types that any extended time-frame will create in a segmentation. Visitors may be new - arriving at the very end of the segmentation period and having no chance for additional behavior. Visitors may have been new at the beginning of the segmentation period and never have shown additional behavior. They may already have been long-standing visitors at the beginning of the segmentation period and remained constant throughout. Or visitors may have been heavy users at the beginning of the segmentation period but gone inactive as the period progressed. And so on - through an infinite series of possible changes in behavior.
Each of these types of visitors are quite different in nature - and different in ways that can shape the analysis and cause it to miss the mark unless at least some time variables are captured in the mix. In particular, the relationship between # of visits (during the period), visit # (lifetime) and start date turns out to be almost universally important.
The analyst will need to think carefully about the role of time sequencing in the segmentation. For many a prospect segmentation, the analysis period will often capture several generations of the sales-cycle. So a snapshot view can be quite effective in capturing the various permutations of interest and success that may exist.
But for a customer site (trading, operations, banking, support, etc.), a snapshot view that takes little account of time sequence may be a very poor choice. For sites where virtually all behavior is oriented toward long-time repeat visitors, capturing the movement of behavior is all important.
There are two strategies for capturing behavioral movement: one intrinsic to the segmentation and a second strategy that is external. In the external strategy, the segmentation is typically done on a relatively short time frame and the movement of visitors is captured by recording their flow to and from segment classifications. Using this method, visitors are profiled in terms of what they are doing right now. But one of the most interesting uses of this data is to track the flow of visitors (both individually and as an aggregate) from and to segments.
A simple snapshot segmentation on an operational site would probably capture a combination of interests, tool usages and styles along with some significant measure of the degree of usage. Tracking the flows between these segments can be a powerful reporting tool:
| Low Usage | Medium Usage | High Usage | |
| Low Usage to | 87% | 9% | 4% |
| Medium Usage to | 15% | 80% | 5% |
| High Usage to | 14% | 15% | 71% |
Most high-level traffic numbers are heavily influenced by both acquisition flow (new visitors) and patterns in repeat behavior. Isolating the usage flows of repeat visitors is difficult to do in most web analytic reports - but it's easy and illuminating when done within the context of a broader segmentation scheme.
This "external" strategy for segmentation consciously leaves out the role of change from the segmentation scheme, but change can also be baked into the profiles as a fundamental aspect of understanding visitors. In this method, the goal is to create profiles that capture some of the key evolutionary stages of visitors. To do this will typically require either different analytic techniques or - once again - some significant data transformations.
One strategy for capturing the flow of visitor interest and usage over time is to create time-buckets for each user calculated from the visitor start-date (not the beginning of the segmentation). These buckets should replicate all of the key variables (interest/session-styles) by period so that a row of data would look something like this:
Visitor-id, Period 1, Period 1 Interest Variables, Period 1 Session Variables, Period 2, Period 2 Interest Variables, Period 2 Session Variables
This foundation captures a certain degree of time - but most analytic technique will treat each period as essentially equivalent with no understanding of their fundamental linearity. To capture the element of change, the row can be modified to look more like this:
Visitor ID, Interest Variables, Session Variables, Period 1 Interest Variables, Period 1 Session Variables, Period 2 Rates of Change, Period 3 Rates of Change, etc.
With this type of data, an analysis will capture key elements of how behavior evolves over time. In addition to obvious patterns like steady growth, decline, and attrition, patterns like sustained but sporadic usage, seasonal users (the capture of which generally requires additional data transformation), and repeated plateaus may all emerge. It's critical that these periods be relative to the visitor start - not segmentation start.
Mapping the data to visitor start creates much better analysis around patterns of usage but it also introduces some significant complexity. If you are looking for seasonal or event-driven patterns in the data, you will have to specially code for these - since they will not emerge out any analysis of periodicity. Indeed, where strong seasonal or event patterns dominate behavior, it can be exceeding complex to handle both over-time usage and the impact of when/how a visitor started.
As with my previous discussions of data transformations around mindshare and session-styles, this post can hardly more than scratch the surface of the issues and opportunities involved in capturing elements of time and usage in a behavioral segmentation. I hope, mostly, to suggest the potential richness of this type of variable and its potential contribution to segmentation and analysis. For a segmentation of a long-term customer web site, there are few more interesting facts than the nature and degree of visitor change over time. Segments built to include such data can inform a range of tactical and strategic marketing decisions precisely because they represent some of the most salient aspects of your relationship with a web site user!
X Change is just a week away now (thank god!) - so I'll probably be posting about the conference in the next week or two. I know last year's event gave me lots of stuff to talk about - some of which I never quite got around to. I'm sure this year's version will be at least as fruitful and I can always hope that I'll be more productive. But I'm far from finished with a discussion of behavioral segmentation. I've touched on a range of data transformations in these last posts. When I take this series up again, I expect to cover some of the issues surrounding presentation of the data and techniques for making behavioral segmentation more intuitive and accessible.
Comments

Gary Angel is the author of the "SEMAngel blog - Web Analytics and Search Engine Marketing practices and perspectives from a 10-year experienced guru.

