Primordial Programming Practices

Programming needs to change. I strongly believe that computers hold far more potential for advancing human knowledge than we are currently extracting from them. These advances are increasingly necessary in both speed and size to match the pace of problems we are encountering socially, environmentally, and medically. But current programmers are hardly more than coal miners equipped with shovels and pick axes who work on messy, tedious, and laborious tasks. We cannot expect these methods to be sustainable or powerful enough to grow with our needs. Much like the industrial revolution mechanized human labor, I dream of a similar revolution which shall obsolete current programmers.

StackOverflow et al. Must Die

StackOverflow is a symptom of how much of a mess we are in. The reason StackOverflow is invaluable as a programmer is because

we develop programs by repeatedly fixing errors
humans asking humans is necessary when the machine lacks information
it is almost always faster to find the answer online than in the source code

but each of these is a sign that something fundamental is broken.

If you have never encountered the feeling of walking through a maze with no eyes, ears, or arms while programming, please contact me because I would like to know your secret. Some programming languages give me this feeling more than others, but regardless of your favorite language, your operating system, web browser, text editor, terminal, etc. have surely thrown you for a frustrating loop which resulted in a magical fix from StackOverflow or settling to workaround the annoyance. In the best case, you learned why something broke, but you are in no better position to prevent this from throwing others (I mean all other users of program/language) for a loop. And workarounds let you move forward, but any future encounter will lead to the same frustration. Even if you cannot feel the pain after the 10th time, it is still there and likely slowing progress on your real objective, but earning a paycheck by doing so can ease the pain. And so StackOverflow has been vital to the growth of programming in enabling (and I mean that in the drug addiction sense) us to make progress even when our underlying languages and programs are monstrous mazes with unpredictable behaviors.

Humans are particularly good at filling in missing details or at least retrieving those missing details. For someone asking a question, this is paramount because they could otherwise answer their own question (with reasonable expectation). So why can’t a computer answer our questions (assuming we put it in some form for them)? Because source code does not reflect the total information needed to reason about a program; some of it is trapped in people’s brains and so it must be communicated in English (or other human language). We are fortunate to have a very fast and global communication system to allow people to share some of this extra information, but this is simply not scalable. Machines also lack information found in comments and if we ever hope to building more sophisticated programs, I cannot rely on my own one brain to reason about the entire program. Nor can I rely on N-many other brains holding a piece of the puzzle because our communications are only available 8 hours a day, we are bad at multitasking so I can only query a few of them at a time, and there can be a lot of social overhead in dealing with people.

Finding the cause of a problem in your own source code can be reasonably effective if you’ve worked with it recently. Your brain probably still has some model of how it works and so you may be able to fix it quickly. However, in dealing with problems from other people’s programs, it can take far more time to read, understand, and reason about it than it would to just pop a question into Google. Even if its not StackOverflow, you will tend to find clues on forums, mailing lists, IRC archives, etc. which can help you in your investigation. Even if you only have to search a few of these arcane text havens, I bet you would still win. In fact, finding the right version of the source code your program is using could take much longer than any of the above mentioned steps. The combination of hiding information from the machine and the insane programming languages we use today make it extraordinarily difficult to solve anything without the aid of the person that actually wrote the code (or some transitive person who learned something from him/her). Reading new source code is such a slow activity because we are abstractly evaluating the entirety of the program. More on this later.

I only use StackOverflow as an example here because many forms of this exist, some of which have already been mentioned: mailing lists, forums, IRC, Google Groups, Github, etc. Whether these tools came to exist because programming languages don’t have proper ways of encoding all the necessary information one needs or because humans provide a much faster summary of a program than we can achieve by reading source code, I sincerely hope these tools die in the near future (for the purposes mentioned; I enjoy other parts of StackExchange for example).

Not Enough Information

I have been vague about what extra information is missing from source code that would be so powerful on purpose; I am not sure if the source code is actually missing the information, or whether the execution of a program doesn’t know where its come from, or something else entirely. For example, even if the source code seems fine, perhaps the error is in your language implementation (GCC vs Clang) or the language specification (C11 vs ANSI C) (we shall ignore things like hardware failures, etc. for simplicity). So in order to gain a full picture of a program’s behavior, we must know which program was used to interpret/compile it. But source code does not specify in a verifiable way which compiler/interpreter it is meant for. Without this information, we lose the ability to correctly blame one party or another. The same applies all the way down to the hardware level of which instruction set you are using. Assuming you have a few years to spare, it may be okay to read the source code of all of the programs running on your machine. But even that would not be enough information.

There are two places I see as large information gaps on a computer: the first being language specifications and the second being hardware specifications. These are very long English documents describing how a system should behave – which is absurd! I want to write English about how my program should behave, but this provides zero guarantee as to how it will actually behave! It cannot be verified and it is not scalable for people to leave this information in English. Having all of this information available on a machine is necessary (but not sufficient) to give programmers vastly more powerful capabilities.

Too Much Information

Now that we have all this information available (in my dreams), how will we deal with it? I don’t know, but here’s food for thought.

The essence of programming is to describe some useful transformation from input bits to output bits. For a given machine architecture, I claim that every programming language is as powerful as another by some constant on an information theoretic level (by most powerful, I mean smallest Kolmogrov complexity). There are no shortcuts and no information comes for free. To compare two programs, we must concatenate a program along with its compiler/interpreter/runtime etc. This has a direct relation to what it is like to program in a language and maintain the implementation of that language. I don’t care that language A has a feature that language B lacks, because it only has that feature because it is implemented somewhere else. Similarly, one could implement that feature in B and end up with a seemingly longer program; in actuality, they are almost identical. In this way, we are simply moving information from one pile to another, but not actually winning in the end.

So can we actually ever reduce the information needed to write a program which does what we want? I don’t think so, but here’s four ways to work around it.

Employ Computers

If we can’t reduce the information we need to understand a given program (our brains are only so large) using programming languages, we should shift this information burden onto others. Historically, we’ve done this by offloading it onto other people, but if you’ve been reading thus far, you know I don’t think that approach is very good nor scalable. Now only if there was something that could follow directions, be mass produced, work 24/7 without breaking labor laws, remember vast quantities of information, and communicate in a few milliseconds… I only get paid for 8 hours of work per day, but I would be happy to pay a computer or a thousand to work the other 16 hours. Current tools people use like IDE’s, refactorers, version control, etc. are still programs we have to understand and fix when they break. I would categorize these like using wheelbarrows and lanterns in a coal mine; they may help you see what you’re doing and carry more coal, but you’re still just a coal miner.

Loosen Control

The other way which actually will reduce the information needed will not be satisfying to many and terrifying to others. We must relinquish control on dictating exactly how a machine should execute and instead leave that burden upon the machine, with the added constraint that it can prove it does what we wish. I only speak for myself but I am pretty sure this will ring true with many others; there is something strangely satisfying about writing C or assembly (or anything else since we established they are almost the same) because of the utter disbelief that it actually works. I imagine this is the same feeling they experienced when they first programmed computers with switches; that flipping switches in the right order made the machine do something complex and as how you expected. But alas, it is time we stop pleasuring ourselves with such activities and commit to finding a more productive way to utilize computers.

Approximation

Many problems we would like a computer to help us answer can be helpfully answered within a certain degree of accuracy, usually in far less resources than would be needed to answer it exactly (eg. Bloom filter). So if information is our resource we would like to conserve, why not also seek program descriptions which do what we wish up to some error rate. This lessens the informational burden upon ourselves and also allows us to run the same program multiple times to increase our confidence with its output. Again, the control freaks we have historically been as programmers will take some time getting used to this idea.

Architecture

If you take my proposition that all current programming languages on mainstream architectures are roughly equal in information power as being true, then do all computer architectures face this problem? Or can we find a machine design which is more powerful than our current Von Neumann babies? I’m undecided on this aspect because on the one hand a computable function requires a minimum amount of information to encode and would end up in the same lot as the programming languages, but on the other I don’t know enough about other architectures. My hunch is that we would simply see the information moved between piles much like above. Here is a good place to mention though that even if we can’t lower the total information needed to encode functions, we can win by strategically placing it in the lowest weighted pile, where piles are weighted by how many human-hours are spent dealing with information in that pile. I think most people have this in mind when they talk about the “power” of programming languages and how language X is much more productive than language Y.

Closing

Programming today is messy, dangerous, laborious, and downright frustrating. I see no excuse why I came across some articles from the 70’s detailing many of the same frustrations on why programming is not more reliable or efficient than it has been and how increased computing power has only enabled us to stay sloppy and speed up. I see programming as being a critical skill for most people in the future, but it cannot be the same programming we have today. It will not scale to that many programmers, we will waste too much electricity with too many inefficient programs, people will give up after trying to learn the idiosyncrasies we seem to revel in, and people will move far too slowly to attack our biggest problems in a timely manner.