Platt Perspective on Business and Technology

Connecting into the crowd as a source of insight and market advantage – 10: collecting the right data and analyzing it -5

Posted in macroeconomics, strategy and planning by Timothy Platt on July 21, 2012

This is my tenth installment in a series on connecting into the crowd as a source of insight and market advantage, but with an important difference – doing this in ways that explicitly allow and support measurement of costs and of value returned in the interactive online context (see Macroeconomics and Business, postings 77 and loosely scattered following for Parts 1-9.) And I want to start this by acknowledging that the title of this posting is accurate, but the lead-in to it that I ended Part 9 with could be seen as misleading – depending on how you view and select statistical and other analytical tools. I said at the end of Part 9 that I would turn to look at statistical tests themselves. But my goal here is not to give a compendium of specific parametric and non-parametric statistical tests.

I will, however, discuss some of the core issues that would go into selecting the right statistical tests to meet your needs, and I begin with how data is distributed, and the issues of skew and outliers (see Part 7.)

• Nonparametric tests do not require or depend upon any particular type of data distribution, and as such tend to be more generally applicable.
• Parametric tests on the other hand, do assume and depend upon the data analyzed fitting particular types of data distribution, and most commonly that means what is called a normal distribution.
• Essentially every statistical test has some data requirements, and if no other a minimum sample size requirement, with more data required as more variables are considered – and as the questions (hypotheses) statistically tested become more complex.

And I write this posting with a particular experience in mind, that dates back to when I was still doing clinical research and teaching medical residents how to do it too. I was working with a group of intelligent, dedicated professionals who were not seeking to cut corners. They were simply trying to do statistical analytical tests on a body of epidemiological data that they had collected for a research study in order to meet their clinical training research requirements – in this case a small study on the outcome of a particular patient care follow-up approach in the population they worked with as orthopedic surgeons. And they pooled their patient-sourced data, collected anonymously as to individual patient, and entered it all into a desktop computer-based statistical software package. They had enough data by data point count to be able to run any of the tests they could have wanted to use, but data volume is not everything. So they began posing their hypotheses, and turning them into possible statistical tests and they began clicking the menu options to run their tests. And they kept getting error messages to the effect that their data did not fit those tests and asking them if they wanted to run them anyway. I was just observing, watching from behind as this went on. And they talked about their sample sized and how they had to be sufficient and they kept clicking yes and running those tests.

After they were through with that session on the computer they started really looking at the results obtained and asking what they meant; they began interpreting their statistical test results into in this case surgical and post-surgical outcomes and effects. And they stood there saying that none of the test results made any sense given their hands-on experience working with those patients of their study sample.

They had enough data. The problem was that when they looked at its distributions for the variables measured, their data showed significant skew, and I add a second problem that would make their data deviate from that required normal distribution, kurtosis : a measure of how tightly bunched up the data was.

It is easy to select and run tests on statistical analysis software, and certainly once you have your data uploaded to the computer and entered into it. But you have to use the right tests, and you have to be aware of the need to determine what is and is not appropriate first.

My orthopedic residents retested using a more appropriate mix of statistical tests – primarily non-parametric. They reframed the hypotheses they really needed to be able to address with those tests in mind. And their results turned out to be important, as the patients they and their colleagues were serving in that inner city community were significantly different from the very populations discussed in the clinical literature – and they needed different post-surgical care accordingly. So this turned out to be important and not just a student exercise.

• Most of the time when you do marketing analysis and selecting and performing statistical tests on your data, you are not going to be close enough to individual customers, and in large enough numbers to be able to see whether the results make sense or not – you will not and cannot expect to have alternative, independent empirical bases for determining relevance or reliability of results obtained.
• Using the wrong tests, and simply going with the results will yield skewed and misleading results and that means you’re making bad business decisions – that you at least start out thinking are mathematically validated.

Whether you maintain statistical analytical expertise in-house in your marketing team, or bring this set of skills and experience in on an as-needed basis in the form of a consultant or other third party provider services, keep the requirements I write of in this series in mind – and certainly when bringing in third parties anyway, for help developing or running marketing analysis for you.

I am going to switch away from data analysis in my next series installment and back to data collection, and to the issues and challenges of collecting a valid random sample of data that would represent the target populations you are interested in, in a fully interactive context where you do not necessarily know precise demographic profile information to start from. You can find this and related postings at Business Strategy and Operations – 2 (and also see Business Strategy and Operations.) You can also find this at Macroeconomics and Business.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: