I’ve been having a bit of a mid-programming crisis recently and it has started to take the form of a tin foil hat. When you run
cat foo.txt, how do you know it won’t
unlink("foo.txt"); after outputting its contents? Or how about deleting your entire home directory? What is stopping your IRC client from forwarding all your messages to me? Or maybe it will change your text as it sends it? Why do we trust software and how should we gain trust from software?
Read the Source Code!!!
I know, I know. I should download the source, read it front to back and make sure it does what I think it should do, and stop writing a stupid blog post. I can do this for
- its source code is freely available
- it is written in C, a standard language
- I can read and understand C
And this is one of the cornerstones of why open source software is important, regardless of its cost or licensing. But being open source only covers the first of those items and is only feasible for programs of a reasonable length. It is simply not scalable for users to necessarily read the entirety of a programs source code in order to trust it. While this is obvious for much larger programs than
cat, it becomes more of an issue when considering all source code which
cat depends on: the compiler you use, the syscalls it uses, which version of the kernel it interacts with, etc. Once you consider all transitive dependencies, my reading list just became a little too long for my liking.
cat was written in Brainfuck, would reading the source code be of any help in increasing my trust of it? Definitely not; this would be as helpful as reading the compiled binary (probably less helpful actually). What happens if you write a unique language for each program? I then have to learn the language (by reading its source code or otherwise) and then read the program. Even if I can set aside days to disassemble and read machine code or learn new languages, what about the people who don’t even know how to program at all? Do they simply have to trust some programmer when he says, “No sir, this app will not steal all of your money!”?
OSS Falls Short
I don’t think open source software is sufficient for trustworthy computing. Besides it being insufficient, I read enough code in a day and I hardly want to have to page through every program I want to run. The Free in FOSS is even more useless to me for gaining trust; I don’t care about making modifications and paying a license fee or whatever, I just don’t want Skype erasing my pictures. I don’t see machine code as fundamentally not open source (though it may be inconvenient!) and so we need a way of gaining trust from a much more basic level.
Sandboxes, VM’s, OS’s
Mobile operating systems seem to have taken permissions like these to heart and require all apps to request their permissions upon installation. This provides the user with a list of scary items it has access to, like GPS, contacts, microphone, etc. But what’s missing is a permission like, “Yes you can use my microphone to do offline speech recognition, but no you cannot record and upload it”. One answer might be to disable access to uploads, but what if a benign feature needs to upload something? Or perhaps the answer should be to split the app into the smallest pieces which are trustworthy. But can we verify that trust composes? It would be easy to exfiltrate the recorded sound as the input to my next “trusted” app for uploading. All paths point towards permissions being too coarse, as the desirable behavior is a small subset of all programs adhering to the given permissions.
Stepping aside from trustworthiness for a moment, I would like to comment that only allowing a program to do the absolute least amount necessary is highly desirable for programming as a whole. As a programmer, I would sleep easier at night knowing that the code I deployed to EC2 yesterday doesn’t accidentally spin up 1,000 servers and cost me over $9,000.
There are some tidbits in SELinux which are along the right path like being able to specify which programs can access a file. Do you not shudder at the thought that any program you run may or may not read/modify/delete your SSH keys? SELinux can let you limit which programs can access certain files (among other things in which I’m not an expert). Yes this seems dandy, but I don’t think its a proper solution because you now only have trust at the layer between program and OS. I want trust to start inside a language, between libraries, and between my own functions.
From the Language
All of the techniques above use some form of nanny to keep watch over a running program and slap it on the wrist when it does something bad. I would much rather have decision procedures at the language level which could reason about all possible executions and determine the security properties of a particular program. Now of course, this will likely be a costly analysis to perform, but (a) security is important and (b) you can always fall back to doing the nanny approach. This idea of punting on a compiler if it takes too long and instead performing checks at run-time is something I’d like to expand upon in a future post.
Ensuring program security by mandating source code transparency is a social approach which is brittle and easily beat. Furthermore, human processes like these don’t scale and are not fit for the future of computing. We need security by construction.