Platt Perspective on Business and Technology

Big data 8: redefining the group demographic 1

Posted in business and convergent technologies by Timothy Platt on May 9, 2013

This is my eighth installment in a series on an emerging capability that has become surrounded by hype, even as it has emerged as a powerfully disruptive societal force: big data (see Ubiquitous Computing and Communications – everywhere all the time, postings 177 and following for Parts 1-7.)

Big data has what can be considered a series of Holy Grail goals – purposes and functional objectives that would be reached through creative, thoughtful use of vast amounts of accumulated data by identifying and characterizing unexpected or unpredictable patterns from it. I have already written in this series about one of these goals: determining a true demographic of one for at least potentially, each and every individual member of a larger marketplace community in order to directly personalize marketing and sales to them and their individual needs and preferences (see particularly Part 1: the emergence of the demographic of one for that.)

A second core goal is to better understand, and I add define and parameterize the group marketing demographic, and more generally the group demographic per se. After all, businesses and business marketers only constitute one constituency that sees value in big data and its accumulation and availability.

I have recently been posting to a series: Opening Up the Online Business Model for New and Emerging Opportunity, in which I have been discussing how big data can be mined in developing these more nuanced and business objective-defined group demographics (see Startups and Early Stage Businesses, postings 142 and following.) And my initial thought for this posting was to divide this topic of big data defined and driven group demographics between these two series along a simple set of lines. I would focus on how data is assembled into these data sets and warehouses here, and on how demographics are defined from all of that as a matter of process. And I would focus on how knowledge of those demographics would be used, at least in a business context, in my Opening Up series. Even brief consideration, however, reminded me that that breakdown and division cannot work.

• Standard marketing demographics are defined according to and in terms of standard, a priori classification standards such as age and gender, home or business zip code, family income levels, and perhaps measures of buying interest such as whether an individual or family subscribes to a cable TV service.
• Big data mining allows data analysts and data users to assemble novel and even uniquely conceived demographics models according to essentially any possible set of shared, correlated traits – and what sets of correlated traits are included and data mined for is highly hypothesis driven.
• So what is collected and organized into a demographics model, and how that would be used cannot be separated.

I want to take that out of the abstract with a specific real world case in point, coming out of a data mining team that works for the office of the Mayor of the City of New York.

New York City, like most large urban areas has and supports an infrastructure for collecting and managing waste, with both regular trash pickups and disposal, and an active recycling program. Connected into both is a system for managing disposal of hazardous waste of all sorts, and that can include recyclable environmentally toxic materials such as lead, cadmium and mercury, and less recyclable materials such as old mixed petrochemical solvents. A goal in this is to both collect all of this so it does not contaminate the land or ground water or pose a health hazard, and to find ways to more effectively recycle it where possible.

And no matter how New York, or I add any other municipal government sets up their waste collection and management systems, there are always going to be both individuals and businesses that cannot be bothered to dispose of even hazardous waste correctly – and that probably holds true particularly for hazardous waste as it usually requires special packaging and it is generally collected separately from regular trash and regular recycling. In New York City, there are both City owned and managed Department of Sanitation services, and private waste hauling businesses that are licensed to pick up, transport and dispose of these varying materials. And with that as background, I come to my working example.

Sanitation Department workers and others see waste and hazardous waste that has simply been illegally dumped. But it can be difficult to identify where it came from and who disposed of it improperly simply from an examination of the waste materials themselves. As a very serious example of the consequences of this, I cite a specific incidence that I remember reading of where someone disposed of bottles of hydrofluoric acid, hiding them in regular trash bins. The Sanitation Department worker who emptied those cans into the back of his trash truck and pushed the button to compress this new addition to his pick-up and push it deeper into the truck, broke those bottles when doing so – simply following practices as usual for picking up and carting off standard trash. A cloud of highly toxic, corrosive acid mist rushed out of those broken bottles and he breathed some of it in, destroying his lungs. And he died there and then as a result. I remember reading that he had a wife and small children. When that data mining team from the NYC office of the Mayor set out to sift through the evidence they had in their data warehouse to better pinpoint who was illegally dumping, they were looking for people who create irritating problems that add to costs, and also the ones who cause real and even tremendous risk for others.

• They wanted to find where the most likely businesses where for illegally dumping waste and particularly hazardous waste and they wanted to identify the most likely specific culprits responsible for this, for on-site inspection by agents from appropriate city agencies.
• So they searched their data to find where this was being found – knowing that people who do this rarely just toss these materials in their own trash cans or dumpsters. In the hydrofluoric acid incident that I cited above, someone brought these containers from what was most likely a commercial or industrially zoned area to a residential neighborhood and put them in with the household trash that had been left at the curb for pick-up.
• They also searched the records for businesses that did not have contracts with licensed private haulers, licensed to handle and dispose of hazardous waste, and who had not been contacting the City Sanitation Department for guidance and information on how to safely, lawfully dispose of this either.
• Actually, following a process similar to criminal detective work as would be carried out by a Police Department, they sifted through and correlated a fairly wide range of data types. And the result was that they had a list of suspects who were very likely to be illegally disposing of these waste materials. And when Sanitation inspectors went to visit and surprise inspect them, virtually all were caught with sufficient evidence that the city could file charges.

To finish this example, I would at least briefly cite a second type of waste that is far less dangerous than hydrofluoric acid as a dumped waste problem, but that is far more common too: restaurant grease. Every restaurant that runs deep fat fryers generates significant and even huge volumes of this waste and if it is simply dumped down the nearest storm drains, it congeals there and with time creates a thick waxy barrier to water flow. With time this can even effectively stop up those drain pipes and cause local street flooding. And it turns out that restaurants that dispose of this grease illegally are also much more likely to be following poor safety and cleanliness practices in removing grease build-up in air vents and from behind stoves, greatly increasing risk of grease fires. The same types of big data mining for business and industrial waste dumpers has been used successfully for identifying restaurants that dump their waste too – and as a side benefit that appears to have cut down on the numbers of restaurant kitchen fires that the Fire Department has to respond to as well, and certainly for those at-risk businesses.

This set of examples is all about assembling and systematically using novel big data-enabled demographics models. And the more data is accumulated and the greater its diversity the wider the range of data sensitive questions and hypotheses can be address with it.

It can fairly easily be argued that identifying businesses that illegally dump hazardous and toxic waste should be considered a societally positive goal. But these same tools and capabilities can be used for other and less positive purposes too. I am going to continue this discussion of big data-driven social demographic modeling in my next installment in this series, there considering how this can be used to identify and crush political dissent and open public discussion. Meanwhile, you can find this and related postings at Ubiquitous Computing and Communications – everywhere all the time and its continuation page.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: