Platt Perspective on Business and Technology

Rethinking the dynamics of software development and its economics in businesses 2

Posted in business and convergent technologies by Timothy Platt on December 19, 2018

This is my second installment continuation to a thought piece that I began offering here in early October, 2018. And my goal, both there and here has been to at least attempt to shed some light on the economics and efficiencies of software development as an industry and as a source of marketable products, in this period of explosively disruptive change (see Part 1.) And as noted at the beginning of that posting, my goal here is at least in part to:

• Include artificial intelligence agent oriented software development in its narrative, and as a central orienting point of example for what is being developed towards now.
• I will at least selectively write of how we have arrived where we are now but I will also offer at least a few selectively considered anticipatory notes as to how the trends and the underlying forces that have brought us here, might move forward too.

Part 1 can be considered a foundation building exercise, entered into in preparation for addressing those two points. I continue developing that foundation for a more focused, artificial intelligence systems-oriented discussion here, by repeating a point that I made in that first posting and by further building upon it:

• “Lean and agile coding begins to break down and can become all but cosmetic in its own right, as the scale of the software that it would be applied to reaches a shadowed, gray area threshold limit where it becomes essentially impossible to at least cost-effectively trace through all of the possible inefficiencies that might have arisen in a software package as a whole.”

I begin building from that by at least briefly discussing object oriented programming and five learning curve experiences that software developers have gone through, leading up to the development of that approach for implementing a basic lean and agile-directed software development model that can (ideally) be more routinely maintained as such. And as object oriented programming is not in and of itself a perfect, or even a perfectible final possible step in reaching lean and agile and readily maintained code perfection, I will at least briefly write in challenge of it too.

I begin addressing that laundry list of topics to come with those five above-mentioned learning curve experiences, offering them here in historical order for when and how they first arose.

1. Machine language programming
2. And its more human-readable and codeable upgrade: assembly language programming,
3. Early generation higher level programming languages (here, considering FORTRAN and COBOL as working examples),
4. Structured programming as a programming language defining and programming style defining paradigm.
5. And object oriented programming itself.

I begin this with Points 1 and 2 of that list. Machine language programming and its more human-readable and codeable upgrade: assembly language programming, represented the first and second learning curve steps that were developed when moving beyond literally physically rewiring computers for each and every computing problem that they would work upon. The increased flexibility that both brought with them can be seen as early and even seminal steps in what has become a multigenerational effort to develop reliable, easy to program and run, flexibly powerful data manipulation and processing capabilities. These two steps in fact marked the beginning of flexible computer programming per se, setting aside for purposes of this discussion, Ada Lovelace’s pre-electronic computer age, invention of computer programming per se from her work with Charles Babbage and his mechanically designed prototype analytical engines, only simple models of which were ever actually built.

Both directly machine readable, binary character limited machine language code and its more human user-friendly upgrade to assembly language were entirely hardware dependent and customized to the specific central processing unit and memory storage system in place, and on a computer-by-computer basis. Programs written for one machine could not be transferred for use on any other computer without rewriting their code, unless the two machines involved happened to be identically constructed for essentially all circuitry that would hold or process information that those programs might in any way deal with. And that identical form requirement would also have to include their using identical operating systems for any application programs under consideration, to avoid for example, conflicts in memory allocation calls where an application program might try using and overwriting memory addresses that the operating system in place was already using too and that were reserved for it.

Note that the above-stated potential for conflict between application programs and operating system code and its implementation, also means that if a same computer’s operating system was upgraded, now claiming an at least somewhat different set of memory addresses for its use, and an application program that had run successfully on that machine before, now sought to use memory spaces that are now reserved by the operating system, then that computer would likely crash when attempts were made to run this program on it. In this, think of simply upgrading the operating system, and even with what might ostensibly just be a minor upgrade, in effect meaning trying to run all of the programs that had been successfully run on it before, on a new and at least somewhat different machine.

Machine code is as user-unfriendly and as difficult to read and understand for its instruction flow and for its underlying algorithm-encoded logic as might possibly be attempted. Assembly language code is a giant step forward from that in readability and programmability, and for software maintenance but it is still awkward and difficult, and particularly for its lack of higher level pre-developed programming capabilities as offered in its functional commands available. It still runs as close to the hardware as possible, for how it is organized and for how it expresses programming step actions taken, complete with its requirements for programmers to explicitly specify specific memory addresses that a program written in it would use. As such both coding paradigms are difficult to follow and even more difficult to update and maintain.

Yes, in principle at least even early generation assembly language code should have been runnable in newer hardware, but as a matter of practice, hardware manufacturers tended to offer their own versions of that “higher level” programming protocol, connecting into their own entirely machine design-specific programs for converting it to directly machine-readable and executable binary character form.

And this brings me to the next step in the progression that I laid out in my above numbered list, and early, first generation higher level user-oriented programming languages, starting for purposes of this discussion with two well known early mainstays: FORTRAN and COBOL.

What made these languages higher level? They incorporated several features that made that designation meaningful, and not just as a matter of marketing them. First of all, to pick up on a detail that I touched upon when briefly discussing machine and assembly language programming, is portability. Machine language and assembly language code, and particularly machine language code, were and I add still are bound to the specific computer systems that specific versions of them were written for. FORTRAN, or Formula Translation Language, and COBOL, or Common Business Oriented Language were explicitly developed to be portable. A programmer would write their programs in one of them and when those programs were run through a computer their code would be translated into machine language code that was specific to that hardware platform, by an internal-to-the-system program designed for that translation purpose, what would be updated to run on new computer architectures and builds as needed: hardware-specific programming language compilers.

Putting that is the terms more usually employed to describe this process, a source code program written and maintained by human programmers in their higher level programming language of choice, that would be largely platform-independent as written, would be loaded into the computer along with any data that it was supposed to process. And both would be converted to object code: directly machine readable and executable binary formatted code, with that processing carried out behind the scenes as far as those programmers were concerned and with the results of it run by the computer as such. And the built-in tools (e.g. compilers) that carried that out would be hardware platform-dependent. Then when the computer was through running this program with its data, relevant code in the machine would translate its output from binary to more human readable and friendly form too, to reverse this process.

But this new portability is only one aspect of what made these computer programming languages higher level, even if it was a crucially important one. From a more directly human usage programming perspective per se, it was the other newly developed features that they offered that really mattered. First of all, and here focusing for comparison on assembly language as a semi-human usage oriented form of programming code, the functional commands that it offered as programming resources were all developed with a goal of carrying out explicitly specified single computer processing steps, and very basic ones as take place one elemental computational operation by one, at the machine code level. So citing a memory address specification example, most every version of assembly language included in it a SHIFT command. That command would direct the computer to carry out the next step of the program that it was running, one or more memory address location designations over from where it had just put the output of a program step just completed, rather than overwriting what was in that memory address and immediately reusing it. That way the value already there could be retained for further use.

These were all very low-level commands and parsing a program in their terms meant that it rapidly became difficult and then all but impossible to discern the overall logic flow and algorithm design of any given program from them, and certainly as that program became larger, and as time passed between when this code was first developed and when it might be reviewed and updated. FORTRAN and COBOL were designed to include in them as basic command term functions, complex, higher level processing tasks that could be specified with single terms and their appended qualifiers by a programmer (e.g. PRINT the output of this program with double spacing: every other line.) These commands were then run through pre-written, optimized and debugged programming code that was included in their appropriate compliers. And to add to that, a number of basic, previously required tasks such as memory address allocation were now taken care of behind the scenes by the computer and its compiler code, allowing programmers to focus on higher level logic flow and coding considerations. But this inclusion of pre-coded programming resources and automation of key machine-design dependent programming complexities from directly human programming, allowing for easier higher level thinking and resulting programming, is only part of the story of this side to what made these programs higher level.

First of all, explicitly higher level languages were developed with a goal of making the programs written in them easier to read. COBOL was in fact developed with a goal of making any program written in it readable as English language text, even if stilted text, and both understandable to a non-programmer (at least ideally) and self-documenting for purposes of program maintenance and updates. But picking up on a core language design detail offered in the above paragraph, both languages and many of the early higher level programming languages in general, were constructed to allow for programmers to create what amounted to their own single term complex commands too, as pre-developed reusable subroutines and even more complex pre-developed and reusable subprograms.

But everything included in a larger overall program that was developed using one of these languages was still run as if all of the code involved were thrown into a single same large box and with all of this code at least potentially connecting together and at all organizational levels. So for example, consider a variable named XP-17 in an included library-sourced subroutine that has a specific functional definition there (e.g. positive integer number values ranging from 1 through 10.) And that same variable name is inadvertently used again and with a different allowed numerical value range, or as a different type of variable (e.g. now as a yes or no, Y or N binary variable, or as a text-accepting data field) elsewhere in a program that this subroutine is run in. The first time that XP-17 is defined in the program would set its supported data value type and range and regardless of which of these two conflicting definitions apply. Then the first time the alternative usage of this variable comes up, apparent data type errors would arise that would likely create run-time anomalies and errors and of types that might or might not show in a run time diagnostics test, as the program as a whole is run. And this would hold true whether that variable in its subroutine form explicitly offers output that would be used by other portions of the overall program outside of itself, or whether it was only there for carrying out calculations internal to the functioning of the subroutine itself, that other parts of the program would never explicitly see (supposedly).

I am going to continue this narrative in a next series installment, starting with program language evolutionary Point 4 as listed above: structured programming. But to round out this posting, I will offer at least some preliminary thoughts as to what large and larger mean in this evolving context. And I begin doing so by citing what has become a relic detail that nevertheless still holds historical value as a benchmark for where our current as of this writing, state of the art programming came from. Large meant anything over a few thousand lines of programming code for most all programmers and even when early higher level programming languages such as FORTRAN and COBOL were first used. Just one thousand lines of code in a single program was considered large then. And these programs were considered large because of the escalating complexities of developing and maintaining them with the tools at hand: the then-current cryptically uninformative debugging software then available, definitely included.

Both hardware and software management constraints put that one thousand line and more scale out of reach for most all of the electronic computers in use prior to the initial development of those then newer languages. Available and expected scale here and certainly at the hardware level and for the scale of software and data that can be run, and their ongoing explosive growth, have created an ongoing, always expanding challenge and both for writing and maintaining software. And that represents a crucial consideration that should be kept in mind when reading this posting and the next in this series to come, where larger can now realistically mean millions and even tens of millions of lines of code and even for routinely used office productivity and business process management software, and certainly where suites of such programs are involved that have to be able to interconnect. And the types of programming that drive modern search engines keeps growing open-endingly.

Reconsider my above-cited example for how a program could drift into difficulties when coded in a language such as FORTRAN, with its overused variable name: XP-17. Smaller programs are usually written, debugged and even longer-term maintained by single programmers. Large and vastly large programs never are. They are created by teams of programmers and even by what amounts to virtual armies of them, with different groups of them specializing in different task and skills areas that others involved in this effort might not even know, and certainly not with any real expertise. And these different groups tend to have their own specialized terminologies … and all of this holds potential for making the types of variable name conflict that I cite here inevitable, at least in a FORTRAN-like higher level programming language context. And that is only one possible type of conflict that I could have offered here by way of example.

That very specifically leads me to Point 4 of the above to-address list and structured programming, as a next step forward. In anticipation of that line of discussion to come, it will mean at least briefly and selectively delving into how structured programming sought to address problems that still endured from Point 1 and 2 programming approaches in Point 3’s early higher languages, and problems that these new programming languages created too, and certainly as hardware and software capabilities continued to scale up, making new types of problems possible in them.

Meanwhile, you can find this and related material at Ubiquitous Computing and Communications – everywhere all the time 3, and also see Page 1 and Page 2 of that directory.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: