Platt Perspective on Business and Technology

Building a startup for what you want it to become 37: moving past the initial startup phase 23

Posted in startups by Timothy Platt on April 12, 2019

This is my 37th installment to a series on building a business that can become an effective and even a leading participant in its industry and its business sector, and for its targeted marketplaces (see Startups and Early Stage Businesses and its Page 2 continuation, postings 186 and loosely following for Parts 1-36.)

I have been discussing a succession of business intelligence-related risk management issues in this series since Part 31, and began discussing the challenges of data anonymization as a part of that in Part 36. And my initial goal at least for this posting is to continue my discussion of that complex topic, at least for purposes of this series.

I began discussing anonymization as a source of risk management concern when handling confidential and personally identifiable information, by pointing out how true, effective anonymization of original data sources is becoming increasingly difficult and even impossible at least as an effectively zero risk goal, as big data becomes bigger and bigger, and as it is more and more effectively organized into actionable patterns. To briefly reiterate the conclusion that I arrived at in my Part 36 narrative, the more comprehensive the overall set of data types collected and the more skillfully and comprehensively they are organized and processed into meaningful actionable patterns, the more and more likely it becomes that even just sets of what would seem to be anonymous data about some individual source, would indicate the values that must have been there for key individually identifying data fields that were redacted for anonymization purposes.

I then concluded Part 36 by stating that I would offer some thoughts here on how to move beyond this current and growing impasse where this tool: data anonymization has so significantly begun to fail us. Then after addressing that, as at least an initial first step response, I said that I will more specifically reconsider the impact that all of this has on:

• Businesses that provide big data as a marketable commodity,
• Businesses that buy access to it (startups included), and
• The ultimate sources of all of this data, with consumers and other individuals prominently included there.

And I added that after addressing those issues, I will circle back in this overall discussion to consider opt-in and opt-out options and systems, and the stealthy collection of more and more data and from more and more sources where neither of those choice possibilities are always meaningfully possible. Facebook’s user information comes to mind as a source of cautionary note examples there, and I will cite and discuss that business and its practices in this regard when I reach this point in this overall narrative.

• Meanwhile, I begin addressing that new list of topics to come here, with the question of how data anonymization might at least be made more secure than it is now, as a risk management tool for limiting liability faced from violating security oversight of personally identifiable information.

I begin this by acknowledging what might be the single most important starting point assumption that the developers, managers and users of big data should consider:

• Data anonymization might be important and even crucially so and for vast numbers of businesses and business models, and ultimately for the consumers who they would serve too.
• But it can never be made absolutely perfect: absolutely secure from a risk management perspective.
• So any real effort here should be directed towards making this process and the pools of data assembled from it as risk-reduced as possible. 0% risk is never going to be possible in the real world for any business or business process, so this type of risk limiting is in fact a realistic goal and one that would meet realistically effective risk management requirements. A realistic and I add acceptable goal here should be one of acknowledging that there are specific avoidable and unavoidable risks here, understanding how they arise, and reducing them to an acceptable level where possible, and with mechanisms in place for identifying and rapidly remediating any security and confidentiality breakdowns that do occur.

Now, how would I propose actively addressing this challenge? How would I propose carrying out the intentions offered in the above three bullet points and particularly in the third one of that set?

You can only control and minimize the risk faced from anonymizing increasingly comprehensive sets of data as gathered across larger and larger numbers of individual sources, if you actively test to see if and where it might be possible to infer redacted personally identifiable data field contents, from the accumulated patterns of what would still be included as anonymized data. You have to have a team that is dedicated for at least some significant proportion of their jobs, to actually trying to break the anonymization protections that have been attempted, by testing to see what they can learn from the data that is included in anonymized, “cleaned” data sets, that would breach efforts to protect the identities and other confidential information of that data’s original sources.

• Set up a white hat hacker team for this in-house, or outsource this testing to a reliable third party specialist service provider and preferably one that is bonded and that has insurance coverage included in their consulting agreements, in the event of confidentiality breaches in the data sets that they approve as meeting their due diligence standards.

This means looking at older data that is already held in these data repositories as well as looking at new data streams as they come in. It is in fact that older data that was gathered in before this issue rose to visible prominence that might prove to be the most problematical and precisely because of that fact, and certainly where it is mixed into new data and data types as they arrive.

• Ultimately, this is all about looking for, characterizing and understanding, and remediating blind spots in your thinking as to what types of data you actually have and how all of its data fields might connect together to tell a story about its original sources.

I am going to continue this discussion in a next series installment where I will explicitly discuss the three participants in any business information-as-commodity transaction: data aggregating, developing and selling businesses, data acquiring and using businesses, and the original sources of all of this data with that ultimately coming to a large degree from individual consumers and customers. And as noted above, my goal beyond that is to take this line of discussion out of the abstract by citing and at least selectively discussing, some real world business examples: Facebook definitely included there.

Meanwhile, you can find this and related material at my Startups and Early Stage Businesses directory and at its Page 2 continuation.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: