Verification and Validation
The title of this lecture may seem redundant to you. However, in the field of numerical simulation, each of these words has a very distinct definition. Although there isn’t universal agreement on the details of these definitions, by checking other references you’ll see that there is fairly standard agreement on their usage. The simplest definitions that I’ve seen are given in Roache’s book on Verification and Validation (V&V). To paraphrase him, Verification demonstrates that you are solving the equations right. Validation demonstrates that you are solving the right equations.
Remember the steps in problem solution that I introduced at the beginning of the semester:
Validation is the process of determining if the physical approximations introduced in the first step are adequate for the range of situations currently being simulated. An important fine point here is that you validate a simulation code over a specific (and limited) set of conditions. For each new result that you produce, you should document the fact that the physical conditions fall within the range of conditions already validated. If they don’t, you need to qualify your presentation of results clearly noting where you are extrapolating beyond the existing validated region. Ideally when the bounds of existing validation are exceeded, additional experiments are analyzed (and if necessary designed and performed).
Verification covers steps four through seven, but must precede Validation. Don’t expect reasonable conclusions when comparing to experimental results if you’ve got serious errors due to:
2. Selection of a convergence criteria for iterative equation solution that;
3. Programming errors in the computer code;
4. Errors in the specification of input for the simulation of the experiment;
5. Errors in understanding of code output.
The presence of such errors is a particular problem if a finite volume, finite difference, or finite element code is used in the process of determining values for coefficients (a.k.a. fudge factors) in engineering correlations used to model specific physical processes (e.g. heat transfer coefficients, turbulence models, …). This process is sometimes called “tuning” or “calibration”. I have seen a number of instances where models were tuned to operate well on a “reasonable” spatial discretization, but later when good mesh convergence studies were done, comparisons to experimental data became unacceptable. In such cases, the code developers have canceled discretization errors with errors in one or more physical models. I have also seen cases where engineering correlations were adapted without change from the literature and produced poor matches to data on a “standard” spatial discretization, but performed very well on an adequately refined mesh.
Some of you who are really alert might wonder about steps two and three. Classification of equations normally feeds into selection of solution algorithms. An error here would be detected during verification. Incorrect transformation of a basic set of equations falls under the Validation question of whether you have selected the right equations. However, from a practical standpoint, and incorrect transformation into other coordinates would be detected under a well designed verification test set.
Computational simulation of physical systems has been around for a very long time, and Computational Fluid Dynamics is often considered to be a mature field. Why at this stage of the game do I put such a strong emphasis on V&V? Much of the answer is in Mahaffy’s first law of human nature:
Nobody’s perfect, and most people drastically underestimate their distance from that state.
Until relatively recently most computer based simulations have been performed by the authors of the simulation codes. All too often, code developers have been too enthralled by the beauty of their theories, or too vested in the success of large programming projects to look too deeply at results. I’ve seen the consequences in comments from representatives of one of my sponsors over the years, the U.S. Navy. CFD has frequently been referred to as “Colorful Fluid Dynamics” and with other less flattering terms. Too many CFD predictions made for them were later discovered to have significant errors. As a military organization, they have a first hand understanding that serious mistakes can result in unintended loss of life, and are very sensitive to the need for adequate accuracy in predictions.
The need for V&V goes deeper than over-inflated self esteem. Another fundamental human trait is that we see what we want to see (Mahaffy’s Fourth Law of Human Nature). I view Science as the collection of procedures and knowledge developed over millennia to overcome this trait. It permits us to see what is really there, rather than what we want to see. At its heart V&V is just good science. However, neither Verification nor Validation are simple processes. These two lectures are meant to give you a brief introduction, and hopefully keep you from getting into too much trouble. If you find yourself seriously involved in either activity, take the time to at least read Roache’s book and Oberkampf’s various reports on the subjects. Oberkampf’s SANDIA reports are free via the DOE Information Bridge. Roache’s book on V&V is available in the PSU Engineering Library or from Hermosa Publishers and worth the price.
As with Verification and Validation, specific meanings are attached to terms used in the discussion of error. Generally the word “error” is only used to describe a source of inaccuracy that can in principal be corrected or limited to any desired level. The five items listed above fall into that category. These errors are frequently subdivided into recognized errors (hopefully just 1 and 2 in the above list) and unrecognized errors (items 3-5 above).
Inaccuracies in physical model implementations are normally due to lack of knowledge of underlying processes, or the fundamentally stochastic nature of those processes (e.g. turbulence). In this instance the term “uncertainty” is applied in discussions rather than error. If you’ve ever looked at data underlying various engineering correlations, you can appreciate this problem. However, even state quantities such as conductivity have an experimental basis and associated experimental uncertainty. At some point it is important to determine the sensitivity of key outputs from a simulation to these uncertainties. There is a large body of literature on sensitivity analysis applied to simulation codes. This is not my field of expertise, and I won’t cover this particular topic.
I will focus the remainder of the discussion on what can be done about the five sources of error listed above, and those resulting from a specific choice of a physical model. My discussion relies on over 30 years of personal experience, and insights derived from Roache’s V&V book and numerous SANDIA National Laboratory Reports by Oberkampf and his colleagues.
Mesh and time step sensitivity studies lead to an estimate of error associated with discretization, and are also important in procedures used to detect software errors. Roache and Oberkampf have good discussions of this error analysis based upon Richardson Extrapolation. It basically boils down to fitting a curve to a sequence of results and extrapolating beyond those results to estimate the limiting answer with zero mesh length or time step sizes. Consider a sequence of three mesh lengths or time step sizes (from smallest to largest) h1, h2, and h3. Normally the sequence is generated with a constant refinement ratio r= h3/ h2 = h2/h1. h1, h2, and h3 be the computed results at the same point in space and time for the three corresponding values of h. Taking a clue from truncation error analysis, we look for an expression for f as a function of h in the form:
subtracting the equations in pairs gives
Note that if the scaling ratio is constant we can solve for p, but it is much more difficult. Also notice that if values of f are not monotonic, the formula won’t work. Although it is possible to have non-monotonic convergence, you will need results on more than three grids (or time steps) to convince me of any error estimate in such situations.
Given a value of p, equations for the two finest meshes can be solved for the remaining unknowns.
As a result the error on the finest mesh can be estimated as:
Note that if you have faith in the value of p obtained from
These formulas for error and order of accuracy are relatively easy to implement for time step sensitivity and finite difference mesh sensitivity studies where the refined grids contain the points evaluated on the coarser grids. However, for finite volume, if I double the number of volumes, the volume centers don’t match between two levels of refinement. Since f1 and f2 must be compared at the same points in space and time interpolation is required on one of the grids. Be careful that your interpolation is sufficiently accurate that the calculated value of p tells you about the order of accuracy of your finite volume approximation rather than the order of accuracy of your interpolator.
I’ve already said a bit about this during the discussion of solution procedures for flow equations. Check to see that the convergence criteria
are reasonable. For example on a method
with a slow convergence rate, look directly at the equation residuals rather
than change in independent variables.
The simplest study of sensitivity to iteration convergence is to drop
all convergence criteria by an order of magnitude and measure changes in key
simulation state variables. If a
If you are a code developer, these are the things that give you nightmares. However, there are systematic things that you can do to cut the number of errors and get a good night’s sleep, using review, careful programming practice, and testing.
The first thing to remember about programming errors is that they occur regardless of programming practices. Testing procedures must be in place to minimize the number of bugs that survive for any significant time. Quality Assurance (QA) procedures are one way to control the introduction of bugs and formalize test procedure used to localize bugs. However, I don’t recommend rigorous adherence to international standards for QA programs. At some point the system becomes rigid enough that the best scientist/programmers leave to find a better work environment, and the project under QA is doomed to mediocrity at best.
The three components of QA are documentation, testing, and review. Written standards for these components should be established at the beginning of a project and accepted by all involved. Documentation of a new simulation capability usually begins with a simple statement of requirements for what must be modeled, what approximations are and are not acceptable, and form of implementation. A complete written description of the underlying mathematical model provides a basis for Verification activities. A clear description of any experimental basis for the model aids Validation. Good validation testing compares against data beyond the set originally used to generate the model. A clear description of any uncertainties in the model can be valuable in later studies of sensitivity of results to model uncertainties.
Basic documentation should also include a clear written description of the model’s software implementation. This aids later review or modifications of the programming. Relatively little effort is normally expended here on the coding implementing the model itself. More time should be spent documenting flow of data, revisions to data structures, and definitions of important variables.
The final piece of the basic documentation is a test plan. Here a careful explanation is provided of a set of test problems that clearly exercise all new and revised programming. The tests should cleanly isolate individual features of the new capability, and demonstrate that the implementation correctly implements the underlying model. For revisions to physical models, relevant tests against experiments should also be specified.
This documentation should be generated in two drafts. The first precedes actual implementation of the software, and the second is issued as a final report including the final form implemented and results of all proposed tests. It should be accompanied by two phases of independent review, the first focusing on the viability of the proposed approach, and the second focusing on the completeness of testing. My experience has been that even without review, generation of this documentation significantly cuts the number of programming errors introduced into the final project. The act of describing implementation with words, forces a careful review of the software. More importantly, a systematic written description of a test procedure insures that very little can slip through the testing process.
Documentation must also exist at a more automated level via a source code configuration management procedure for the simulation software. This starts with a systematic record of all changes, dates of change and individuals responsible for the change. When under software control this level of code management lets you remove old updates to a program if they are found to be inappropriate, and maintain specialized versions of a base code. These capabilities have been used for a long time on large software projects. The current favorite configuration control tool is CVS, which is GNU open source software, and free. The project under which I do most of my research uses CVS at it’s heart, but extends capabilities via a web page that provides links to all accepted versions of the software, related documentation, test problems, and supporting scripts for version generation and execution of test problems.
The act of bringing a new simulation capability under configuration control (creating a new code version), should provide the most rigorous review for code errors. However, this is largely a function of the individual appointed to be the configuration control manager. Success of a software project often depends on the quality of individual doing that job. He or she must have the breadth of technical experience to understand all documentation associated with updates. He or she should also be well versed in testing procedures and basic scientific method in order to judge the completeness of the test sets submitted with each update.
Any new problems submitted with an update should be included in a regression test suite. This is also a major line of defense against introduction of coding errors. In a complicated simulation code, its much easier than you might think to introduce your own amazing improvement, and unintentionally cripple another portion of the program. However, if that portion went through the documentation and testing procedure that I’ve described, its specific test problems were imbedded into the regression test set. By running the regression test set for each new change, bugs affecting older capabilities are detected very quickly, and corrected before being accepted into the official program. The project that I’m involved in started seven years ago with a regression test set of about 50 problems. It’s now up to about 1300 problems, taking about 3 hours to run on a high end Intel based workstation. The rate of increase of computer speed and adaptation to use of parallel clusters will keep our testing productive and growing through the useful life of the software.
Roache and Oberkampf are both advocates of the method of manufactured solutions as a way to verify coding. I’ve tried it and also consider it to be very valuable. The idea is fairly simple. Start with the basic PDE (or system of PDE’s) in the mathematical model for your problem. For example a 1-D transient conduction problem.
The next thing that you do is pick a solution T(x,t) that you like and run it through the differential operatiors.
So all I’ve got to do is set
Declare initial conditions T(x,0)=300, boundary conditions T(3,t)=300 and T(-3,t)=300, and I’ve got a conduction problem that I can feed to my finite difference or finite volume code with a known answer. This is a nice choice for testing methods that are at least first order accurate in time and second order accurate in space. When such methods are functioning correctly, they will reproduce the solution to machine accuracy. For a sample implementation see my adaptation of Homework 8 to modeling this specific problem.
If you want to test more aggressively with non-zero derivatives at all orders, move away from polynomials.
For a conduction problem, there are simple analytic solutions available. If I go for a solution with q=0, then its going to be a Fourier expansion. I can isolate to one term in the expansion
by judicious choice of the boundary conditions and initial conditions
This particular choice could be considered as testing against an analytic solution. However, I have manipulated initial and boundary conditions to manufacture a solution to the conduction problem.
When checking against solutions like this one, it is important to perform mesh and time step convergence studies as described above . Subtle bugs may be hidden on a coarse mesh or with a time step that is too large.
The best way to avoid coding bugs that I know is through the practice of evolutionary programming. There is next to nothing completely new under the sun. Whether you know it or not and any simulation tool that you are likely to write will be an extension of something already in existence. If you can obtain source code that implements a large subset of your goal, I recommend that you either gradually change that software to meet your goals, or start the creation of your new product so that it should in principle match results with the older program for some set of test problems.
If you start with new code, first check to make certain that you can reproduce the results of the older code to within machine round-off error. You should not expect to match results exactly, because your programming will probably implement expressions that are formally identical, but use difference ordering of arithmetic operations to get the result. This generally produces results that differ in the low order bits. To get a quick feel for the level of impact to be expected from this change in round-off error, compile either the old or new programs to produce codes that are both fully optimized and unoptimized.
From this point either the new or adapted code approach follow the same path. Add new features in a way that separates three classes of changes:
The first change class is easy to test and debug, although you may need to suppress compiler optimization to see the exact match. The second is more difficult, in that you really do need to confirm that differences in results can be attributed to differences in round-off error. The third is where you apply techniques described above such as the Method of Manufactured Solutions for changes to numerical methods, and rigorous validation to check changes in physical models.
As you review discrepancies between results new and old programs, remember that you may find bugs in the old program. You have no guarantee that it is perfect.
As a final suggestion for detection of programming errors, I want to remind you of my discussion of numerically generated Jacobians. Always create at least one version of your program that generates elements of the Jacobian matrix numerically for comparison against any analytic expressions for the same elements. When creating analytic derivatives use features available in Mathematica, Maple, or MacSyma to do the necessary symbolic differentiation and to automatically convert the results to Fortran or C expressions. Also consider the use of ADIFOR.