Platt Perspective on Business and Technology

Building a startup for what you want it to become 36: moving past the initial startup phase 22

Posted in startups by Timothy Platt on February 8, 2019

This is my 36th installment to a series on building a business that can become an effective and even a leading participant in its industry and its business sector, and for its targeted marketplaces (see Startups and Early Stage Businesses and its Page 2 continuation, postings 186 and loosely following for Parts 1-35.)

I have been successively discussing a brief but important set of issues in this since Part 31 that deal with business intelligence, and particularly where that is originally sourced from individual people (e.g. individual customers) and from other businesses, and where that increasingly includes more and more types and quantities of sensitive and confidential information. And in the course of that I have at least selectively touched on the issues of how this information is gathered, organized, processed and used, and both in-house by an original aggregator business and as a marketable commodity that such a business would sell as a product or service, and primarily on a business-to-business basis.

And that has led me to the final complex of issues that I would address here in the context of this series, at least as far as this understanding of raw and processed information is concerned, in a business intelligence context. One of the key tools used in safeguarding the security and confidentiality of initial sources of all of this data and certainly as raw data, is to anonymize it, stripping it of personally identifiable markers that could be used to link it to any particular individual source. And according to that approach, most such data would be pooled demographically for use, with a much smaller amount of this excerpted out as anonymously sourced case in point examples.

That noted as background for what is to follow here, my last to-address point from the above-cited topics list that I have been working my way through here, is:

• “And that will mean addressing the sometimes mirage of data anonymization, where the more comprehensive the range and scale of such data collected, and the more effectively it is organized for practical use, the more likely it becomes that it can be linked to individual sources that it ultimately came from, from the patterns that arise within it.”

The bigger that big data becomes and the more effectively it can be and is organized into actionable knowledge, the more likely it becomes that any effort to so mask and anonymize its individual sources becomes problematical at best. And that failure of effectiveness in what has become a basic standard for managing personal privacy and for limiting individual source exposure – and for limiting the liability that can result from loss of effectiveness there, is going to become compelling overtly obvious in the coming years.

Simple data anonymization as achieved by algorithmically stripping out overtly personally identifying and similar problematical data fields, while preserving and aggregating the rest for use, can no longer be presumed to work as hoped for and with that leading to a loss of privacy and a loss of positive control over most any attempted anonymizing process currently in use and with an increased risk created from that for the businesses that would develop and market, or acquire and use such information resources.

• And this calls for new understandings of data anonymization that would actively promote the development of demographic and other data resources that can remain effectively anonymized,
• And new information management processes and technologies that would work more effectively in a big data context and regardless of how that scales up.

This is important. Traditionally, hacking with its overt theft and use of data from information storage systems, has been considered the one real threat to the anonymity of ultimate data sources. Loss of control of accumulated and maintained stores of credit card account and related personally identifiable account holder information immediately comes to mind for many in that context, and reasonably so.

But anonymization per se as it is currently more routinely carried out, in the risk management-mandated processing of increasingly comprehensive flows and accumulations of individually sourced data, is at least as big a source of threat now.

Let me take that out of the abstract with a simplistic but nevertheless realistic example. Consider a demographics level database resource that includes in it individually anonymized records, that is offered on a business-to-business basis to other enterprises. And in this example, those records include those individuals’ zip codes and the honorific that they use: Mr., Mrs., Ms., Miss and Dr. If a zip code included there covers a large population as would for example apply in most any large densely populated urban setting, this would likely afford significant anonymity for any individual whose data is included there. But consider a small town and its unique identifier zip code, with one physician living and working there. And she is the only one there who actually uses the title Doctor, and its Dr. abbreviation. In that case, any records associated with “Dr.” as an “anonymous” designator could readily and quickly be linked to that one individual.

Big data, by its very nature, allows for and supports finer detail mapping and understanding of whatever overall data universe and its source that is under consideration. That finer granularity in effect turns even the largest and most densely populated community into readily distinguished and identified small towns and villages, to keep with the terminology of my above-offered example. And that, increasingly puts all of us that much closer to being in the more readily identified position of that small town doctor, and regardless of the fact that our individual names and home addresses, etc are redacted from it as directly offered.

• The bigger and more comprehensive the big data in question and the more carefully and thoroughly it is organized and analyzed, with the accumulation of processed knowledge that comes from that, the smaller the small towns of this become. And in this regard, I offer reference here to a series that I wrote to this blog a few years ago: Big Data (as can be found at Ubiquitous Computing and Communications – everywhere all the time as postings 177 and following for its Parts 1-7. And I make particular note here to one particular installment in that: Big Data 1: the emergence of the demographic of one. I primarily focused there on the more positive side of this, and turn here to address the negative potential in ever-growing big data too. Both sides to that are very real and both will become increasing so in the coming years.

To round out this posting and its line of discussion, at least for here and now, I conclude it by offering three news and analysis links from the open online literature:

Once Again With Feeling: ‘Anonymized’ Data Isn’t Really Anonymous: a tech podcast reference.
Your Anonymous Data isn’t as Nameless as Companies Would Have You Believe, Researchers Say: from the news and current affairs division of the Global Television Network in Canada.
• And Anonymous Browsing Data Isn’t As Anonymous As You Think: from Forbes Magazine, Feb 17, 2017.

Big data and its impact have become essential parts in our day-to-day lives and certainly as they have come to be shaped by our online experience, but also in our more directly real world experiences too. I write here in this series of businesses and their acquisition and use of market-sourced and I add marketable data. But I write just as specifically and directly here, about all of us as individual consumers and citizens too, as the ultimate sources of so much of that data.

Anonymized data has become a basic tool for both safeguarding our individual privacy and confidentiality in all of that, while supporting our having progressively more personalized experiences with the businesses and other organizations around us that also enter into and shape our overall communities. I am going to continue this discussion in a next series installment where I will at least offer some thoughts on how to move beyond this current and growing impasse where this tool has so significantly begun to fail us. Then after addressing that, as at least an initial first step response, I will reconsider the impact that all of this has on:

• Businesses that provide big data as a marketable commodity,
• Businesses that buy access to it (startups included), and
• The ultimate sources of all of this data, with consumers and other individuals prominently included there.

And I will also circle back in this overall discussion to consider opt-in and opt-out options and systems, and the stealthy collection of more and more data and from more and more sources where neither of those choice possibilities are meaningfully possible.

Meanwhile, you can find this and related material at my Startups and Early Stage Businesses directory and at its Page 2 continuation.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: