Optimizing Code

Assignment :

Read Sections 11.4 and 16.3, and the Web Notes on Least Squares Fitting

New Fortran:

Statement Functions, Internal Functions, CONTAINS

We have already talked about using Fortran 90 intrinsic functions where possible to achieve optimal efficiency. The next step is to activate the optimizing features of the compiler. Just about every compiler these days has options to request various levels of optimization. Always check for this feature. Typically the levels of optimization are selected for a Unix based compiler by including an option "-O" followed immediately by an integer (usually in the range 1-3). To achieve maximum optimization on our Workstations for a program "test.f" type:

f77 test.f -O3

You can also let the computer determine the "best" level of optimization for your program by typing:

f77 test.f -O

I recommend this course when first using compiler optimization.

It is not unusual for compiler optimization to produce an executable file that runs 2 to 4 times faster than an unoptimized executable. So what is the down side? First, it takes quite a bit of extra time for the compiler to figure the "best" way to implement your program in machine language. If you are developing a code, you want to forget optimization, until all of your bugs are shaken out, to minimize your time waiting for compilation. Just use the optimization option when you are ready to burn lots of computer time running the program, and the extra time spent with an optimized compilation is more than offset by savings in execution time of the program. The second negative with optimization is related to frequency of compiler bugs. The portion of the compiler doing the optimization is by its nature very complex, having many odd nooks and crannies under IF tests that are infrequently exercised. With a very large program, it is not unusual to discover that the highest level of optimization will produce peculiar results. Always test the optimized version of your program very carefully against results from the unoptimized version.

The next level of effort is to look carefully at your program to see what can be done to minimize the total number of machine operations, and perhaps make it more suitable for vectorization or parallelization. The example lsq2.f shows some simple steps to cut the operation count from lsq.f. One example of trading extra memory use for extra speed is shown in lsq3.f. Are these the optimal implementations? Probably not, but they are steps forward from the original brute force approach.

Often you have no way of knowing what approach is the fastest. The first step in learning, is to do some timing studies of your program. Fortran 90 has finally introduced a standard intrinsic subroutine SYSTEM_CLOCK to assist in this task. However, use it with caution. This subroutine provides elapsed real time on workstations, not your own elapsed computing time. If you are sharing the CPU with others, you will pick up the effects of their CPU usage. Run these timings when the machine is empty, and run them several times to try to filter out the effects of system background jobs. Examples of timing are provided in speed.f, speedf.f, and funspeed.f. I have also created a shell script timer, that compiles and executes these three programs with and without compiler optimization. The results of speed.f show the advantages of the DOT_PRODUCT intrinsic functions, as well as the power of optimization by the compiler to speed the execution of some DO loops. If you look carefully at the results of speedf.f, you will see that use of INTERFACE to create vector valued functions, may not be a good idea. On our compiler, the significant extra time, can probably be attributed to allocation of space for the results on each reference to the function. I'll talk more about the results of funspeed.f later, as we cover STATEMENT FUNCTIONS, INTERNAL FUNCTIONS, and COMMON blocks.

If you are working on a large program, you can waste a lot of time improving speed of portions that do not contribute significantly to the overall run time. To use your time most effectively, you should locate a system "bin sampling" utility. This will monitor the execution of your program, and then provide information on the percentage of execution time expended by various portions of the program. Default mode for many of these programs is to provide the percentage on time spent in the main program and in each subprogram. However, every one that I have used also has given options to be more specific, letting me specify blocks of lines within a subprogram for percentage statistics. When the need arises, locate an acknowledged "guru" of your local system and ask for the name and if possible location of documentation for the system's "bin sampler". Besides improving your efficiency, the results of this question might start a rumor that you know what you are doing.



STATEMENT FUNCTIONs are very old and very simple Fortran constructs, designed to improve the speed of code execution at the expense of extra program length. The definition of a STATEMENT FUNCTION looks just like a function definition should. For example:

f(x,y) = x**2 + y**2

g(a) = sin(a)+2*cos(a)

Look at funspeed.f and sifunc.f for more examples of STATEMENT FUNCTION definitions and use.

There are several important things you should remember when defining STATEMENT FUNCTIONS. The first is that they are treated as executable statements. As you would expect, the function definition must appear before its first use (it is common practice to place STATEMENT FUNCTION definitions as the first executable statements). The second important property is that it can only occupy one line (with possible continuations). STATEMENT FUNCTIONS are like simple assignment statements in this respect, and like simple assignments, they are local to the program unit where they are defined. If the above definitions of f(x,y) and g(a) are made in the main program, they can only be used in the main program. To use them in a subprogram, their definitions must be repeated.

STATEMENT FUNCTIONS also have the useful, but potentially dangerous property that they can know more about local variables than is passed through their arguments. If the following lines appear in a program,

f1(x,y) = a*x**2 + b*y

a = 3.0

b = 2.0

z = 1.+ f1(2.,b)

then the result for z is 16.0 (3*2**2+2*2). In fact something rather interesting should happen when the last line is encountered by a compiler. In effect, before generating machine instructions the compiler inserts the function definition into the assignment statement, with appropriate argument substitution. The line to be compiled becomes:

z = 1. + a*4. + b*b

No book keeping is required for arguments in the compiled implementation and no branches occur to coding evaluating the function. There really is no single block of programming associated with "f1" in the compiled version of the above code. STATEMENT FUNCTIONS are often referred to as "in line" functions. Their coding is placed in each assignment line where it is needed. Review, the results of funspeed.f to see the speed advantage of this approach, but note that some compilers (ours is one) don't do a proper in line substitution unless operating in optimization mode.

Fortran 90 has a construct to extend the capability of STATEMENT FUNCTIONs to operations requiring more than one line. Internal procedures (internal functions or internal subroutines) are placed just before the END statement for a main program, or subprogram, and preceded by a special statement, CONTAINS. Like STATEMENT FUNCTIONS, they have knowledge of variables within the unit in which they are contained (see sifunc.f), and only that program unit can reference them. However, they also have the power to hide variables from the calling program. Any variable declared by a type statement within the internal procedure is local to that procedure (see the example contains.f). Although I have heard internal functions advertised as candidates for in line replacement, the results of funspeed.f demonstrate that our compiler does not attempt such replacement by default. However, our compiler and many others have special options that do activate "inlining". Try the option if you are desperate for speed, but understand that this process does not always result in faster execution (it can subvert other forms of optimization).

When you need the same internal subprogram in several program units, you should consider defining it as an internal procedure within a MODULE. One thing to note about internal module procedures is that their access to variables is more restricted than you might believe. An internal module procedure only has knowledge of those variables declared in the main body of that module. It does not have access to variables with the same name used in other internal procedures, but not declared in a type statement before the module's CONTAINS statement. Take a look at the behavior of the variable "aa" in the example modproc.f. If necessary, you can take explicit control of external access to module variables and internal procedures the application of the PUBLIC and PRIVATE statements

Back to the Table of Contents / Home

Written and Maintained by John Mahaffy : jhm@psu.edu