Read one or more of the on-line Unix Tutorials
Another path that has enhanced the development of computing began further back in history. Shortly after people began communicating, they started looking for ways to communicate with some people in a way that most others couldn't understand (secret codes, encryption). Much later, but still long ago some of those others decided they'd like to know what was being said, and began the process of code breaking. As computers developed, the encrypters realized that computers could be used to generate more secure codes, and the code breakers realized that more computers could be used to break these codes (look at the history of code breaking during World War II). This community has been and is a major driving force behind the development of state of the art computers if only as a major market for the largest and fastest machines.
I always find it interesting to look at the people behind ideas and inventions. I recommend that you take a look at my list of people in the history of computing, and Lyle Long's very brief History of Computing, or do a little reading on your own at the library.
Memory (Storage)
Two classes of information are stored:
CPU (microprocessor)
It follows stored program instructions to move, add, subtract, multiply, divide, and compare data. It contains the following elements:
Peripheral Processors
These relieve the CPU from handling special Input and Output operations. The video card on your PC is an example of a Peripheral Processor Unit (PPU).
It moves data and instructions between the above three
This is the basic driver for operation of the above units. The clock pulse triggers interpretation and execution of instructions. Most instructions take more than one clock pulse to complete. A 133MHz Personal Computer has a CPU clock that pulses 133,000,000 times each second. If such a machine is well designed and programmed, it could in theory do 133,000,000 additions in a second. This seldom happens in practice.
Computers need to deal with information in orderly lumps. The fundamental lump of information is a bit, which contains either the number 0 or 1. This is done with a switch that is either on or off, or an electrical voltage or current that is either high or low. Anything between these on/off, high/low states leads to a potential for ambiguities and detection errors that are not acceptable in computing. This naturally leads to the use of the binary (base 2) number system for arithmetic on computers.
As we will see the byte (8 bits) is nice way to store the information needed to represent the characters we need to output text describing our numerical results. Many computers are structured internally to make accessing data in 8 bits lumps easy. This number of bits lets us represent a set of 256 different "characters", although only 128 are used in the standard ASCII character set (a-z, A-Z, 0-9, various common symbols, and "control" characters). Use of bytes has given rise to common application of the hexadecimal (base 16) number system. Four bits (a "nibble") are needed for each hexadecimal digit, so the contents of any given byte can be represented by a 2 digit hexadecimal number.
Thirty-two bits, (4 bytes) has been the most popular lump for storing numbers. It can store integers from 0 to 4,294,967,295 or -2,147,483,647 to +2,147,483,647, or a floating point, decimal number with about 7 decimal points of accuracy and a big enough exponent to handle most numbers (roughly as small as 10 to the -38 power and as large as 10 to the 38 power).
Actually there are usually more than 32 bits of physical memory for each 32 bits of data. A Parity bit is included to indicate an even or odd number of 1's in the basic 32 bits of data. This will allow the computer to detect corrupted data. More bits of information will let the computer correct a single bit error and detect a double bit error. This detection/correction scheme is common in large fast computers and beginning to appear on personal machines.
One function of integers in computers is to provide an "address" to the CPU of the starting byte in memory where an important piece of information (instruction or data) is located. You don't tend to see machines with 4 billion bytes of main (chip) memory, so the 32 bit integer is generally more than enough. However, you now frequently see computers with more than 4 billion bytes of hard disk space. This can be used by the computer as a form of virtual memory. Modern scientific and 3-D graphics applications can involve manipulation of far more than 4 billion numbers. Although this is not an insurmountable problem with 32 bit addressing, it drives an need for integers using more than 32 bits. Many new machines are designed to store and use 64 bit integers, providing addressing to more than enough memory for any foreseen application..
Scientific computing has also required more bits to represent floating point numbers. Frequently 7 decimal digits of precision is inadequate. The most common solution to this problem is to represent real numbers with 64 bits. This gives approximately 15 decimal digits of precision, and covers powers of 10 between -308 and 308. One nice feature of Fortran 90, is that it provides intrinsic functions (TINY and HUGE) to tell you what decimal numbers can be stored with any given data representation, and intrinsic functions to allow you to specify the decimal precision you need in your calculations.
You build the bus to handle the size of data lumps you want to move. A 32 bit wide bus is now common on PC's, but they are switching to the 64 bit bus to support both larger integers and real numbers, and higher data flow rates for the 32 bit representations.
Note that many clock cycles are needed to get a given piece of data from memory to the CPU. However, memory can be set up so that requests can be overlapped to get one result out of memory per clock cycle after flow has started. This technique is call "banking" memory. This action is complicated on PC's by the fact that the CPU is usually running at a different (faster) clock rate than the bus.
Each Add , Multiply, and Divide operation generally takes more than one clock cycle to complete. However, the associated arithmetic units are usually "segmented", so that a specific portion of the operation is performed in a given segment of the unit during any clock cycle. These units can be "pipelined" so that each segment is operating on the results of a different calculation at any given clock cycle. As an example in the first cycle two numbers (say 3.0 and 4.0) are fed to the multiplier unit and partially processed by the first segment of the unit. In the second clock cycle the partial results from multiplying 3.0 and 4.0 are passed to the second segment while the first segment picks up two new numbers (say 5.0 and 6.0) to start processing. This continues through as many pairs of numbers as you have available with the net result that after an initial startup time, you get one multiplication result out of your multiply unit every clock cycle.
Machines that are designed to take full advantage of pipelined CPU units are called vector processors. They are typically designed with extended register sets to handle long strings of numbers (vector registers) or with special hardware to feed the CPU with vectors directly from main memory. Vector registers originally appeared in the early Cray designs. Direct vector feeds from memory were first seen in the CDC Star computer designed by Neil Lincoln, and later in his ETA series of computers. The ability to perform these fast operations on long strings of numbers, results in significant run time reductions for a wide range of scientific calculations.
Another strategy for speed-up of the CPU is to keep various units busy at the same time. A well designed chip will, for instance, permit the floating point multiply unit to function, while the floating point adder is working on two other numbers, and the integer adder is processing a pair of integers. This is a simple beginning to parallel processing, but a true parallel machine contains many full CPU chips.
Recognition of the needs for providing data to pipelined or parallel CPUs can have a significant impact on the way you program Scientific Applications.
Address Instruction
0 put contents of next word into register 1
4 contains the integer 10
8 put contents of next word into register 3
12 contains the integer 1
16 (10) copy register 1 to register 2
20 (14) subtract register 3 from register 1 put
result in register 1
24 (18) multiply register 1 by register 2 put result
in register 2
28 (1C) if register 1 greater than register 3 branch
to address 20
32 (20) copy register 2 to address FFFF
36 (24) branch to address 10000
How do you get all of this started and stopped? The Operating system handles it.
How do you get all of this done without worrying about binary number machine instructions and keeping track of memory addresses?
If you havn't done so already, read the Introduction to the Class for general background information. Read the section on the fundamentals of Fortran and programming principles.
Maintained by John Mahaffy : jhm@cac.psu.edu