Platt Perspective on Business and Technology

Usage challenges that drive the evolution and growth of information technology – 1

Posted in business and convergent technologies by Timothy Platt on January 5, 2012

“What came first, the chicken or the egg?” Answering questions can mean identifying and going beyond some automatic assumptions. If you only think in terms of chickens you quickly end up with a quandary. But if you expand your perspective and allow for non-chicken eggs the answer becomes obvious. Dinosaurs laid eggs before there were any birds, and fish and amphibians did so before there were any dinosaurs, and fish before there were any land animals or amphibians. So eggs per se came well before any first possible chicken did. Why do I start this posting by belaboring the obvious? Because we live and breath within mazes of assumptions, many and in fact most of which we rarely examine let alone question. That can, among other things mean getting caught up in questions that a wider perspective would either easily answer, or prove to be flawed and without any single valid answer at all.

• Which comes first, new and emergent information technology or new and emergent applications that for their demand to be carried out force the direction and pace of technology development?

A simple answer to this question that carries a significant measure of truth in it, is that information technology and its use co-evolve and in effect drive each other’s advancement. The current state of the art for the technology side of this puzzle, and the thrust of its up to now evolution set expectations as to what can be done now and what is most likely going to be possible soon. Pressure to expand application and usage capabilities to the edge of the technically possible follows. At the same time, information processing needs arise independently of information technology development per se, but as problems that can only be addressed and acted upon by application of robust information technology capabilities. So outside forces step in demanding that next level of technology development too, with that sometimes calling for revolutionary change as well as evolutionary. Chickens and eggs are simple by comparison.

• Outside, technology user-provided impetus for new technology always seems to arrive from unexpected and even technically disruptive directions, and the more novel and disruptive with regard to previous usage needs, the more profound the impact on this technology overall.

I have written a number of times about the demands and pressures of games in forcing the pace and direction of information technology capabilities. Games have driven development of easier and more intuitive user interfaces, greater information processing speeds and bandwidth capabilities, and vastly improved graphics capabilities with all of the technology that involves. This source of technology-driving impetus goes back to the first general public oriented desktop computers where real world users immediately started demanding more capable platforms for playing early command line-only games on those seeming dinosaur vintage pre-graphical user interface screens. And that continues today where the demand is for photo-realistic game screens with ongoing, flexible real time action, among other advances.

Weather forecasting, real-time market analysis for stock and bond trading and a range of other demands have also shaped the development and advancement of information technology with much of that focusing on the development of supercomputers and computer network equivalents to that. And of course military and national defense concerns have played a role here, and both for strictly-military applications such as design and simulation of nuclear weapons, and in initiatives that have migrated to the general public sectors. The core design and concept of the internet itself began that way as an initially national defense oriented Advanced Research Projects Agency (ARPA) project.

This posting is about a new and emerging outside requirement that is already starting to significantly influence the development of new information technology and both for computer hardware and software and in how networks are organized and optimized, connecting them together into larger coherent systems: bioinformatics.

Bioinformatics as a whole covers a large range of data types and analytical needs and this posting is only intended to cite that as a working example of how the dynamics of information technology advancement works. So for this I will only address, and in brief outline, the combined challenges of genomics and proteomics for DNA and protein data analysis respectively, plus the large scale analysis of mRNA data in transcriptomics. And I begin this with some numbers.

• In July, 2007 the cost of completely sequencing all 3 billion base pairs of DNA in a complete human genome totaled some $9 million.
• By July of this year, 2011 that cost on average had dropped some 850-fold to approximately $10,500 according to the US National Human Genome Research Institute.
• Within the next year and a half or so this cost is expected to drop below $1000 per complete human-size genome.

In 2007 only a few complete human genomes were fully sequenced. Estimates are that by the end of this year the equivalent of some 30,000 will have been completed – just this year alone. That comes to the equivalent of some 82 genomes per day. Equivalent here includes multiple genome sequencings from single individuals where, for example individual cancer patients are sequenced for their normal tissue genome and from several sites and tissue samples from their cancers, to look for mutational changes and their roles in their disease processes. In a few years it is likely that a million and more complete genomes will be sequenced annually. And on top of human genome sequencing, genomic analysis is becoming a core tool in systematic biology and the identification and study of other species – many, many other species.

Some of genomics, I add, involves volumes of raw sequencing data that dwarf any possible single genome in scale and complexity and here I cite by way of example the emerging subfield of metagenomics. There, the starting genetic material to be sequenced is drawn from and represents a complete if localized and small ecological environment – such as all of the tens of thousands of microbe species in a small sample of sea water, or all of the species and subspecies residing as normal intestinal flora in a human gut. (Note: Escherichia coli is only one of hundreds and even a thousand or more microbial species to live in the normal human body – we are walking biomes.)

And genes in DNA code for mRNA and that codes for proteins, and both mRNA and proteins are processed and in a complex variety of ways, and all three types of molecule are classified, analyzed and studied and coordinately in bioinformatics as a whole. And to add extra complexity there, it is true that only a small fraction of any naturally occurring genome actually encodes specific working genes with the sequences that end up represented in mRNA and proteins. But the rest includes an incredibly complex system of regulatory sequences and mechanisms too, that control when and where genes are expressed and at what levels, and in many cases with what splicing when transcribed to functional mRNA.

• The cost of genome sequencing has dropped some 200-fold faster than the cost of computational processing that would go into this bioinformatics data processing in the last few years, and with the prospect of that discrepancy increasing over the next few.
• That means the cost of the information processing is rapidly becoming the most significant expense in genomics analysis. And I add this same shift in primary cost from data generation to data analysis applies to transcriptomics and proteomics too, and for all of the data-intensive challenges found in bioinformatics.
• Any additional added complexity may seem gratuitous at this point, but as a crucial practical application rational drug design – pharmaceutical research and drug development based on precise knowledge and understanding of biological processes to be modified by those drugs, calls for precise understanding of how a putative drug would interact with its intended target biological substrate – and with any other systems and processes also present.
• The range of data types that have to be coordinately analyzed keeps growing just as the volumes of each of these data types do and some of these computational problems are very important to solve, and both for individuals and societally. Just consider the rational drug design initiatives in planning and progress for curing diseases such as AIDS or cancer.

This posting, up to here, has outlined something of an emerging challenge that I cite as a strong source of impetus for information technology advancement. I am going to follow this up with a second installment in which I will speculate on some of the types of innovation needed for more effectively managing this rapidly growing flood of bioinformatics data.

You can find this and related postings at Ubiquitous Computing and Communications – everywhere all the time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: