How Computers Work

Jon Jacky

Here are some concepts and vocabulary that every programmer should know.

Data Representation

Computers are built from physical devices that have two easily distinguishable states: current/no current, light/dark, magnet polarized up/down. These two states are used to represent the two binary digits or bits 0 and 1.

Any number can be represented by a sequence of bits, using a positional notation in base two similar to our usual notation in base 10. For example eleven is coded as the four digit binary number 1011:

   +------ Eights position (8 = 2*2*2 = 2^3)
   |+----- Fours position  (4 =   2*2 = 2^2)
   ||+---- Twos position   (2 =     2 = 2^1)
   |||+--- Ones position   (1 =     1 = 2^0)
   ||||
   1011 = 1*8 + 0*4 + 1*2 + 1*1 = 11

With numbers we can encode any other data. For example, the ASCII code assigns a number to each character on the typewriter keyboard. The numbers are chosen so that letters are encoded in alphabetical order: the code for a is 97, b is 98 etc.

ASCII defines codes for 255 characters (including many special symbols and non-English characters). This requires eight bits. An eight bit unit is called a byte.

Computer Memory

Programs and data are stored in computer memory. Computer memory is a sequence of bytes. Each byte stores contents: an ASCII character, for example.

Each byte has an address. The address of each byte in memory is simply its number in the sequence order: the address of the first byte is zero, the address of the next is one etc.

A byte or sequence of bytes in memory is called a memory location or just a location. Each location can be identified by its address and length.

The contents of a memory location is sometimes called a value or an object.

The contents of one address can be another address. A memory location that stores the address of another location is called a pointer, and is said to point to the location whose address it holds.

Computer Programs

Computers are useful because memory contents can be changed under program control.

A computer is can only execute programs that are written in its native machine language. Machine language programs are sequences of encoded instructions.

Machine language programs are stored in memory, just like any other data.

Some typical instructions say, in effect: "Store these contents at this address", "Copy the contents of this address into that address", "Add the contents of this address to the contents of that address and store the result in this other address", and "Execute the sequence of instructions beginning at this address."

Programmers rarely write machine language programs. Instead they write in high-level languages such as Scheme or Java. Then the high-level language program is translated to a machine language program by another program called an interpreter or a compiler.

Data Types

Each object in memory has an intended interpretation: it is intended to represent a number, or a character, or an instruction, etc. This intended interpretation is called its type.

All memory contents look the same --- they are just bits. It is necessary to keep track of the type of each object so that programs can process them correctly. There are two ways to accomplish this. The translator that writes the machine language program can keep a record of the type stored at each location (C compilers usually use this method). Or, the translator can provide each object with a tag that identifies its type (Lisp translators usually use this method).

Example: Binding Symbols and Values

This Scheme definition

   (define x 3)

defines a variable: a symbol named x with the value 3 bound to it. When the Scheme interpreter executes this definition it creates an object of type symbol that contains the name x and a pointer to an object of type number that contains 3. This diagram shows a possible configuration of computer memory after the definition is executed:

    Address       Contents
    -------       --------

          .              .
          .              .
   22378946         Symbol   (tag to indicate this object is a symbol)
   22378947              x   (first character of symbol name)
   22378948              0   (end-of-name marker)
   22378949       39621564   (pointer to value)
          .              .
          .              .
          .              .
   39621564         Number   (tag to indicate this object is a number)
   39621565              3   (the number itself)
          .              .
          .              .

The actual addresses don't matter, both objects might be anywhere in memory. The only important thing is that the pointer in the symbol object points to the number object. That can be expressed by a more abstract diagram:


          x  ------->  3

Example: Lists

(I explained cons cells, dotted pairs, and lists, referring to diagrams like Figs. 3.1 -- 3.3 on pages 32 -- 33 and Figs. 3.10 -- 3.11 on pages 49 -- 50 in Graham's Common Lisp textbook.)

Jon Jacky, jackyj@evergreen.edu