Platt Perspective on Business and Technology

Connecting into the crowd as a source of insight and market advantage – 7: collecting the right data and analyzing it -2

Posted in macroeconomics, strategy and planning by Timothy Platt on June 26, 2012

This is my seventh installment in a series on connecting into the crowd as a source of insight and market advantage, but with an important difference – doing this in ways that explicitly allow and support measurement of costs and of value returned in the interactive online context (see Macroeconomics and Business, postings 77 and loosely scattered following for Parts 1-6.) I began a discussion of data analysis in Part 6 and with a focus on how much data is needed in order to make meaningful statistical analyses – and on how the minimum amount of data needed varies depending on what types of tests you would use and what questions you would seek to answer with them. Here, I look at the data itself, and effectively limiting the confounding influence of bias and skew in what data you do examine and analyze.

I said at the end of Part 6 that I will be discussing issues such as outliers here, and normal data distributions and they are important. But I want to start this discussion with the fundamentals and with consideration of data bias per se, and with a simplistic real world example as a starting point.

You decide to conduct a survey to find the most popular summer food that would be served as a main course at a family picnic. As context, you might for example be collecting marketing data for a local supermarket to help them more effectively develop their sales fliers and to help them more effectively optimize their inventory levels for picnic-related items that they carry. You have two survey takers collecting data for you: A and B. Unfortunately, neither understands data bias or the issues involved in limiting it. So A goes to her favorite restaurant, a steak house to find survey responders, thinking this is a best place where people interested in food would congregate. B does the exact same thing: going to her favorite restaurant as a place where people interested in and involved in food would congregate. But B is a vegetarian, and more than just that a vegan so she goes to find survey responses in a very different type of setting and facing a very different – and equally skewed population demographic – skewed with regard to food preference demographics for the entire community that this supermarket serves.

The only responses that A comes away with involve red meat. The only responses that B finds involve grilled tofu, wheat gluten (also referred to as seitan) and vegetables. So when they report their findings back for inclusion in the overall survey there is no overlap or even connection in what is reported in their findings.

This is a crude example in many ways but the basic issue I bring up here is very important, and particularly as any survey taker can approach this task with biases and assumptions. They can have biases that stem from what they would prefer their survey to find and confirm. They can have skewing biases and preferences in who they seek out to gather survey data from – or they can find it easiest to collect data from certain types of potential responder, introducing in biases simply from who is available to survey.

• Understand the basic demographics of the population that you would sample from in conducing your surveys, and try to match that, not on an individual selection basis, but on an overall group basis.
• Within these general demographic constraints (e.g. seeking to capture data from approximately equal numbers of men and women, or with proportionate representation across some age range), randomly select participants to collectively match the demographics model that you seek information about.

In my supermarket survey example, the two data collectors were very clearly making mistakes. If they wanted to capture data relevant to a conventional bricks and mortar supermarket, they should have gone to an actual supermarket to seek data and insight from shoppers, and they would be best off going to the supermarket that needed this marketing intelligence. If they were collecting data for an online shopping/truck delivery model supermarket such as Fresh Direct that option would not be available, and certainly for capturing data directly from that supermarket’s own customers as the business does not operate storefront operations – and very few supermarkets would be happy to see people questioning their shoppers in support of a competitor’s marketing outreach. So an option such as restaurant-sited surveys might be needed. But you would need to find survey venues that bring in roughly the same demographic spread as would shop at the business you are seeking this data for. And you would have to find venues for this that were not resistant to allowing surveyors to ask questions, and where the shoppers themselves would not be resentful or resistant for the interruption.

So even with my admittedly cartoonish example, some real challenges come out when you consider actually conducting these surveys. And this gets more complicated when drafting your survey questions so as not to lead or bias responses towards what the marketing business would desire to see.

• So with Part 6 I discussed surveying the right number of people.
• Here I discussed gathering survey data from the right demographics so as to match and represent the types of people you wish to be able to predict behavior for.
• And I have started discussing what you ask and the issues and challenges of not introducing bias there.

For that third bullet point this means asking clear, unambiguous questions that do not prompt responders to any given answers. But this also and just as importantly means asking the right amount of questions – enough to be able to gather the information you need, but not so much as to turn people away for overly imposing on them and their patience. Context is very important in that, as the more distracted responders are by their surroundings, or by the pressures to carry on with what they were doing there, the less patience they will have for stopping to give you accurate, thought-out and considered responses to your questions – or even just accurate immediate and first impression responses. Do not ask survey questions of a parent struggling with a screaming child – unless that is the demographic you seek to capture data from.

• And as previously noted (see Part 6) stick to numerical and Boolean data types for most if not all of the statistical analyses that you plan on conducing.
• And as a final bullet point here to add to this now five point list, ask everyone the same questions and in the same order, and in the context of a standardized orienting and explanatory pitch which you develop and test out in advance to limit your introducing response preference bias.

And with that I am finally approaching a position where I can meaningfully discuss outliers and normal distributions of data – important considerations for when you have your data sets that hopefully meet the types of criteria that I have been discussing up to here. And I explicitly note that those criteria have up to now have focused on who you survey as your marketing population sample, what you ask, and the context in which you ask your questions. Outliers and normal distributions of data, and related factors involve responses given, as they fit together to form overall response patterns. And you cannot effectively address issues at that level unless and until you systematically account for potential problems that can arise in what you ask and of whom. And this leaves me with just one Who and How issue that would go into setting up your marketing studies: the issues and sometimes myths of random sampling and of capturing representative data. I will add that into this discussion in my next series installment, and will then turn to consider outliers, data distribution patterns and related survey outcomes issues in my next series installment after that. Meanwhile, you can find this and related postings at Business Strategy and Operations – 2 (and also see Business Strategy and Operations.) You can also find this at Macroeconomics and Business.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: