Platt Perspective on Business and Technology

Connecting into the crowd as a source of insight and market advantage – 11: reconsidering market demographics

Posted in macroeconomics, strategy and planning by Timothy Platt on August 5, 2012

This is my eleventh installment in a series on connecting into the crowd as a source of insight and market advantage, but with an important difference – doing this in ways that explicitly allow and support measurement of costs and of value returned in the interactive online context (see Macroeconomics and Business, postings 77 and loosely scattered following for Parts 1-10.) I add here that in a fundamental sense this is also a direct continuation of a separate posting that I added to this blog about a month ago: On the Importance of Gathering Marketing Insight from Unexpected Demographic Directions.

So far in this series I have discussed data-testable hypotheses and the need to seek out the right types of data and in sufficient quantities so that you can statistically test them. And I have more recently been discussing a range of factors and considerations that go into establishing a valid data set and matching the right types of data to the right statistical tests, so as to meet those tests’ functional requirements. Now I turn to a set of issues that has in a fundamental sense only really developed since the advent of the internet:

• Data selection and the potential for data overload and with all of the data source options available: internally developed and third party sourced.
• And the challenge of finding and using data that can specifically address the hypotheses – the marketing questions, that you need answers to.

And while the basic issues expressed there have always been factors in marketing analysis:

• New and emerging channels of online connectivity and interactive online and social media, and
• The explosive growth of information as raw data coming from them, have made new traps that businesses can fall into.

I begin this posting with the issues of data overload, and filtering out the right data for consideration, and as unbiased random data sets or at least close approximations thereof. And I begin with the problems that can arise from in-house sourced data in general, and of automatically generated web site activity log files in particular – and with a well-known historical example.

Web sites create opportunity to collect seemingly endless amounts of data, and on what web pages are visited and for how long, where visitors click from in getting there, where they click to when leaving, what their IP addresses are and more – much more. Some of this is anonymous as to the identity of the site visitor but with the interactive web, more and more avenues of opportunity are added in for connecting visitor identity to online activity, and to assemble increasingly complex profiles and of both anonymous demographics level and individualized nature. And this all feeds into online and server-based databases and web activity log files that can be converted from delineated text format (with separate data points delineated from each other by some distinctive character such as a semicolon) to database file formats.

Seemingly vast and even open-ended-for-scale pools of raw data are available from this. But:

• The content, format and precise source points of that collected data from web sites is often determined separately from and a priori to any specific marketing study design or other data usage considerations.
• This means that the specific types of data collected might not perfectly match what would at least ideally be required for addressing the specific hypotheses – the specific marketing questions that you seek to answer.
• Here, that is not a matter of how the data fields that are gathered into are labeled, but rather of what is actually collected and aggregated. And that includes any collection biases that might be introduced depending on precisely where and how this data is gathered and who actually contributes to it.
• So marketing analysts frequently find themselves using what I think of as surrogate data. And that can mean data that is overtly at least somewhat different than what the analyst would like to use if for, or differences can be obscured by unconsidered bias in data collection that can skew what is in aggregate actually gathered.

And the specific example I would cite here is the by now very familiar one of web page visits, with data from that used as surrogate and substitute for transaction activity and monetizable value.

• No one is going to make a purchase online, or because of information and motivation gained from an online experience, without first visiting the web pages where they would make their purchase decisions from.
• But as the first big dot-com bubble proved it is a lot easier to gather page visit counts data than it is to gather the data needed to directly measure sales and return on investment – at least in a Web 1.0 context.
• And that bubble built up and burst because so many online businesses and their managers and owners simply used their page hits surrogate data as an exact equivalent to the sales and sales potential, and return on investment data they actually needed to address their hypotheses. And they were wrong.

That is an obvious example and certainly in retrospect, but marketing analysts and business planners and strategists in general need to always ask if there are misalignments and unaddressed assumptions in their data and in how it was assembled and from where, and if they are in danger of repeating this same basic type of mistake for their own businesses too. And up to here I have only addressed in-house developed and accumulated data – the data that a business should know the most about, and as to source, reliability and range of applicability.

When third party data is added in this creates whole new potential misunderstanding and misalignments as to the precise source demographics this data was derived from – and both in general and as data collection bias might have been added in. Even if you are basically collecting from the right pool of potential responders and data sources, precisely how you do that might create participation asymmetries and biases for collection within that group. And with third party data sources, you have fewer resources for identifying and evaluating any such issues.

I wrote in On the Importance of Gathering Marketing Insight from Unexpected Demographic Directions about the increasing value and importance of this flood of data and of the business intelligence and value that can be developed from it. I note here the importance of knowing precisely what you are collecting and using, so that you can direct it towards addressing hypotheses for which it would actually, accurately apply.

And I finish this posting with a final thought. When you need one precise type of data but your closest readily available surrogate to it is different, and in ways that would mislead you if you simply used it as if it were the data you need, this reflects a data correlation issue. The surrogate data you have at hand does not in fact correlate highly enough with the data you need. And if you can find a way to directly measure and record the data you would need for the hypothesis that brought this distinction to your attention, it is likely that you would derive real value from doing correlation analyses of those two types of data. That knowledge would probably help you too, and if not directly in you’re here and now business practices, then longer term in how you collect and organize, and evaluate the raw business intelligence directed data that you do collect.

When I wrote Part 10 of this series, I included at its end, a brief foretaste comment as to what I would focus on here in this Part 11. I included in that note the phrase “in a fully interactive context where you do not necessarily know precise demographic profile” of the sources of the information that you would start from. I am going to flip that around in my next series installment to consider the limitations of what we do and can know about the target demographics we would market to. Meanwhile, you can find this and related postings at Business Strategy and Operations – 2 (and also see Business Strategy and Operations.) You can also find this at Macroeconomics and Business.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: