I recently went to a family gathering where a cousin of mine who was just finishing up some schooling for aviation maintenance and repair was asking me what bearing binary had on programming. He confessed that binary logic, as you'd encode on a circuit board with gates, was a difficult concept to get his head around, and he wondered how much of that was really necessary in "modern" programming.

The tendency for most programmers (or even humans) in this situation is to think for 15 seconds, and then come up with an analogy so that we don't have to speak directly about programming. We do this because it's difficult to discuss the topic at hand without using esoteric jargon whose precise definitions would only be understood by other experts. But I think it's an interesting challenge to attempt to explain the role of binary in computing in a way that's relevant to modern computing and modern programming without the use of metaphor.

Since there's a major conceptual gap between how computer hardware works and how computer software works, I'll start by saying that this gap exists for many people with CS degrees and even many professional programmers. Although understanding what is going on inside a computer isn't a requirement for doing my job, it's a source of ignorance both in the programming community and at large, so although it isn't my specialty, I'll attempt to start with that.

It is true that computers are, at their most basic levels, "zeroes and ones." This description of computing does not really impart any knowledge on anyone, and except where it is used as some philosophical tool to claim that complex systems can be built on simple rules (like the Universe, we think!), it doesn't really give you any understanding. In the very early days of computing, the zeroes and ones corresponded to holes in a card which would either allow or prevent dowels from passing through them, allowing complex things to be quickly aggregated and tabulated. This idea was actually borrowed from textile factories from a machine called a Jacquard loom, where a card with holes in a certain pattern would allow hooks to go through and grab certain threads to create complex patterns quickly and uniformly. The canonical "computing" case is Hollerith's census machine, which sped up the tabulation of the 1890 census considerably and led to the eventual consolidation of IBM.

Technology has advanced in the subsequent 120 years; these holes in a card eventually became mechanical switches, then electric switches, and finally levels of voltage running through a circuit. Because the logic of modern computers is still binary in nature (ie, zeroes and ones, hole or no hole, on or off), binary is still somewhat central to the operation of a computer. At a basic level, computers have various types of hardware components that allow them to determine if bits (a single zero or one) or collections of bits are the same (called AND), or if either collection has an on-bit (called OR), etc.

It's beyond the scope of what I'm trying to do here, but an electrical engineering course will show you how, with very few of these basic operations (in fact, only one is necessary) you can design hardware that will perform all of the basic operations of logic, and how this bit logic turns into mathematical operations like addition) and multiplication on groups of bits that are considered single binary numbers. Modern processors are still designed to load certain-sized collections of bits, called words, into slots called registers, and then perform simple operations on them. The size of these words are important for finding things stored in local memory, where addresses are just a numerical offset of bytes from 0, to the extent that the word size is actually advertised in consumer information; 32 bit vs 64 bit processors today, or 16 bit vs 32 bit back in the early 90s.

Processors load words into their registers from memory, words that it finds based on the value of words that are already loaded in its registers. It performs actions on words in various registers in the same way; by words that actually represent instructions that are loaded from memory. Hardware that our computer interacts with is generally controlled in a similar way. A simple method of hardware control allows you to modify the behavior of hardware by writing different values to specific addresses in memory.

This might seem chicken-and-egg, but conventions during the booting process let your particular computer control which words are loaded and what is done with them. Memory itself has become quite a complicated hierarchical system as processors have got faster, but you can think of any device that stores data as memory. The RAM that is frequently cited is a fast type of memory used as a half-way house for data coming from slower types of memory, or used by running programs as an area to put the results of complex and costly calculations for re-retrieval (like making undo/redo fast in photoshop by storing each state in memory and swapping between them instead of re-running the complex filter) and this is the memory that is addressed numerically, and constrained by your word size. If you have more bytes of memory than the width of your word can encode, you cannot address that extra memory unless you use some clever tricks which make memory access more complex. You can even increasingly think of things like the internet as an extended memory for your computer, which has access latency (delay) and transfer speeds limited by both your internet operator, congestion, and the speed of light.

All data that we have and use on a computer are represented as some collection of binary numbers that is then interpreted by specialty hardware and software to produce some kind of output; like setting the color of a batch of pixels on a display panel, or the pitch and volume of a sound in some speakers. Take a simple binary encoding of the alphabet, where 0 is "a", 1 is "b", 10 (2) is "c", 11 (3) is "d", etc. Add capital letters, numbers, punctuation marks, etc, and the characters necessary for the English language still fit comfortably into 8 bits or 1 byte (which allow us 2^8 = 256 values).

Photos have also historically advertised bit depth, ie "16 bit color", or "24 bit color"; these bits correspond to the amount of data used to encode each pixel (dots that make up an image). Back in the days of 8-bit color, we had only 256 possible colors for each pixel, which is actually very limiting if you want to have a smooth gradient from one color to another. These days, photos can even have "alpha channel" values, or a level of translucency that can be changed on a per-pixel basis. Movies can be thought of as a series of frames encoded in a similar fashion, with audio encoded as sound frequencies at a particular sample rate.

Real file formats utilize statistics or even biological limitations (like ditching sounds we can't hear or color differences we cannot perceive) to achieve drastically smaller file sizes than the naive descriptions above, and these techniques are central to modern computing and to many types of modern programming. But at its most simplistic, programming is simply writing a description of what data to load into the processor and then what to do with that data. To people unfamiliar with programming, this can represent another conceptual leap that is not appreciably discernable from magic. How does typing a URL and hitting enter on one computer start a process which shows me a webpage loaded off the internet? How does this process correspond to zeroes and ones?

This is an extremely common activity these days, and it's complex enough that if you start to understand how this might be accomplished, you might believe me when I say that everything else your computer does is done in a similar fashion.

First, some groundwork on how a programmer uses the internet. Your operating system generally takes care of how to interact with different hardware on your computer, and presents a simpler, virtual, high level interface for the actual code which interacts with the peripheral. These high level operations are a collection of commands that operate on certain memory addresses (or in other ways with other hardware) to perform specific actions, like open a connection to another computer.

A collection of binary instructions, called a routine, can be run with arbitrary "arguments"; so a simple addition routine could take two number arguments to add together, with the result being the sum. A routine to open a connection to another computer will take an address for another computer, and the result is a number which either corresponds to an error that occurred during that connection or a resource tracked by the operating system corresponding to the new connection to that computer. A bunch of related routines can be collected into a library, which can then be consolidated into other libraries that trade power for simplicity and provide easier interfaces for a programmer.

It's this layering process that allows programmers to get complex things done very quickly with very little code, and it also affords a programmer some ignorance about how to do things, provided they understand what is being done by the library. This sounds like a bad thing, but this is actually very important, as libraries can obscure unimportant differences in things like specific peripheral hardware.

Since the process of contacting a computer which could be in Japan could take a very long time compared to the operations your processor does which generally happen on the order of nanoseconds, the OS generally goes off and lets some other program run while your program is waiting for that operation to finish. This is one of the main jobs of modern operating systems, and one of the ways our computers appear to be doing many things at once when they can only actually do one thing at a time per processor.

Back to your web browser. Your URL (which is actually some binary encoding of text) is examined by code in the web browser. The "http://" portion is a higher level concept for what protocol we will be using to communicate with the internet server we are contacting. The next portion, before the first slash, is a domain name, which can be translated into a numeric computer address via the Domain Name Service (DNS). Generally, your computer will ask another server (configured in your internet options) what numerical address this name corresponds to. Once it gets the real address, it connects to that server and asks it for the remainder of the URL; so "http://google.com/search?q=foo" connects to the server at google.com, asks it (using the specified protocol) for "/search?q=foo", which the server at google.com then interprets and responds with some document.

That document is in a particular format that contains all manner of instructions for the structure, layout, related media, and textual content of the document. Your browser at this point loads the many resources that the document references, like images and scripts, and then presents the final collection of resources and documents as a page with links you can click and images you can see. Some of this process is actually extremely complex, but it's all based upon libraries which draw pixels into a buffer to be displayed on the screen.

My job is to write software that turns that URL piece (/search?q=foo) into some document. In modern terms, that "document" can be increasingly complex, to the point where the amount of resources and scripts it can load make it more like an application running within the browser than a document. The truth about binary logic and the description of computing above is that it does not make an explicit daily impact on my work. Even though text encoding remains a major issue (as many different text encodings exist and are encountered in web applications), the specifics of each encoding are unimportant (the one I described is fictional), and the method of converting between them is obscured by multiple levels of libraries.

Despite that, I think binary logic, and most of the story I've told above, are things worth knowing for all programmers and even most people who are determined to understand at least in some small part the way that the technology we've come to rely on functions. I hope what limited understanding it may impart can help to philosophically combat the flippancy with which the immense body of work embedded within modern computing is dismissed as "zeroes and ones" by those who fear their own ignorance.

Sep 5 2011