Can we trust computational results?


This article was originally published in the CSC News magazine (Vol. 10, No 3, September 1998).

Computers have become a universal scientific instrument. It is almost too easy to simulate reality on a computer. Often the computational approach saves a lot of tedious time, but sometimes it gives wrong results. The mistake may lie in the original premises of the scientific theory, or perhaps in the mathematical model, the numerical method, the compiler, the subroutine library, or the computer hardware.

Simulations of reality

We rely more and more on cost-efficient simulations on computers instead of messy and expensive real-world testing. Therefore, we should check all the steps in modeling and computing, and find strategies for testing the results. At least we should make sure that the final results are qualitatively reasonable -- but more stringent criteria should be devised, if at all possible. Perhaps we should devise computational equivalents of double-blind tests. Comparison of two competing algorithms might benefit from this kind of approach.

To err is human

We humans have difficulties in spotting even trivial numerical errors, especially when we are talking about "big numbers". For example, recently the biggest Finnish newspaper carried an article about a forest area of 154 000 hectares. The reporter and the consulted expert decided to explain this number by saying that this area would be a stripe of land 100 meters wide and 154 kilometers long. A large area! The next day the error was corrected, and the stripe was said to be 1 540 kilometers long. Even this was wrong. Fortunately, the subsequent correction was right: the area would be 15 400 kilometers long and 100 meters wide.

Humans err not only in Finland. In 1996, Intel Corporation and the US Department of Energy (DOE) issued a press release about the ASCI Red supercomputer. According to the press release, ASCI Red is so fast that "it would take every man, woman and child in the United States working non-stop with hand calculators over 125 years to equal what this computer can do in one second". This claim is one million times exaggerated. The mistake lies in the word "trillion", which is used differently in the US and in Europe (1012 vs. 1018). In fact, the US population can finish the indicated calculation in about one hour. If the computation includes trigonometric functions, which are standard on most calculators, the speed of ASCI Red would be even slower in the comparison. By the way, this press release is still available (uncorrected) on the Web.

Using black boxes

With the increasingly user-friendly interfaces we tend to forget that computers do not think. You plug in numbers and get out numbers. Often the results are "garbage in, garbage out".

Some of the errors are caused by the software packages we use. There are usually no exhaustive independent tests for the reliability and accuracy of software. Should you trust these packages without checking the results when you are designing nuclear reactors or aeroplanes? Unfortunately, the software licenses usually waive any responsibility the manufacturer might have for direct or indirect damages.

Hardware is a second source of errors. A well-known example concerns Intel Corporation, which did not at once acknowledge that the Pentium processors contained a potentially damaging error in the floating-point unit. Intel is not alone in this matter, however -- all CPUs contain some bugs, but few are as harmful as this was. There may still be millions of Pentiums in use today containing faulty floating point units. I hope these machines will not be used for critical applications, or for scientific computations.

Teaching computational science

Of course, the biggest source of mistakes sits in front of the computer. Perhaps the scientific process weeds out the erroneous results sooner or later. However, for some reason, there is a more relaxed attitude towards repeatability and statistical significance in the computational science, compared to, e.g., rat tests in laboratories, or even psychological tests. In these disciplines the statistical methods are an ingrained tool which is used by default -- sometimes even misused. In computational science, however, there are cases where the reported results do not contain any statistical analysis. One simulation is deemed to be enough...

The sources of computational errors are taught in the basic courses on numerical analysis. I think there should also be lectures and exercises on the scientific methodology in these courses. Students should review several real-world examples to learn from mistakes. When the student loses a ten million-dollar space probe because of an error in a simulation exercise, he or she will remember to double-check the model and the solution method in the future.

In education, we should not strive for getting perfect answers, but instead we should learn from mistakes. Also, in the real world there usually is no perfect answer: all solutions are to some degree false.

A well-known anecdote tells about a company developing a new software package, which reported the range of values where the solution lies, in addition to the ordinary numeric solution. However, selling this software to a customer was very tough: "Why should we use a software package giving unreliable results, when our current software gives results accurate to six or seven digits?"

Big machines for big problems

The ASCI Red project is an ambitious initiative to test nuclear weapons by computer simulations. However, this project has already generated a lot of methodological criticism.

If a supercomputer contains 10 000 processors running week-long simulations, the question of hardware and software reliability in not just an academic exercise. The mean failure rate of memory and CPU modules can seriously affect the reliability of the simulations. Of course, these errors should be noticed when they are generated, and corrected. However, the connections between components and processors are also a source of errors. Thus, fault-tolerance requires a lot of expensive additional hardware and software.

However, an even greater source of error lies in the algorithms which were developed for mainframes, supercomputers, and -- more recently -- for PCs. There algorithms do not necessarily scale to bigger problems -- we are talking about problems 10, 100, 1000 or a million times larger than the current ones. Therefore, we need to develop new algorithms as well as new ways of testing the realiability of the results.

The need for practical numerical analysis

The state of the art papers on numerical analysis are usually not accessible to researchers in other fields. Thus, we should have access to practical textbooks discussing the sources of error in numerical algorithms. Also, high-quality software packages should offer facilities for error control, and at least report all cases when the results may not be trustworthy.

Another source of complications arises from the increasing use of parallelism. Finding and implementing algorithms which are efficient on a parallel computing is a challenge in itself -- implementing reliable and error-correcting algorithms is an even greater challenge.

The draft of the Fortran 2000 standard contains a feature called "interval arithmetic". This is one way -- a controversial one -- of approaching the reliability problem. Interval arithmetic has been used in some applications requiring robustness and reliability which is not achievable by other means. However, in lengthy calculations interval arithmetic tends to be quite unusable: the reported result lies "between minus and plus infinity". Therefore, even this approach does not offer a solution for all cases.

The programmer is to blame?

Programming is another source of error. The researcher strives often for quick results, but checking the reliability of the results often requires more than ten times the work needed to get the first results. Of course, researchers are pressured to publish their results as soon as possible, and this may cause relaxed attitude towards exhaustive error-analysis.

Hardware becomes nowadays obsolete within a couple of years of purchase because the speed of the processors increases and the size of the memory grows. Because of this, programming requires a great deal of planning and discipline. Portability of codes is essential, but so are also scalable algorithms. If you use the Cramer's rule to solve linear equations -- as is still suggested in some textbooks -- you will never be able to solve the 100x100 linear system the researcher in the next room is solving in fractions of a second using the Lapack routines.

Another source of non-portability are the floating point numbers used in the program. Although modern computers tend to use the IEEE floating point standard, even then the results may differ from machine to machine. And if you are interested in parallel computing in a cluster consisting of different types of computers, you will have a lot of work in formulating algorithms and, for example, convergence criteria, which actually work.

Further reading

What Is This Thing Called Science? -- An Assessment of the Nature and Status of Science and Its Methods. A. F. Chalmers. Hackett Publishing Company, 1995.

The Number Sense: How the Mind Creates Mathematics. Stanislas Dehaene. Oxford University Press, 1997.

Taking Computers to Task. W. Wayt Gibbs. Scientific American, July 1997.

Computational Verifiability and Feasibility of the ASCI Program. John Gustafson. IEEE Computational Science & Engineering, January-March 1998.

Users Need Practical Numerical Analysis. Juha Haataja, Juhani Käpyaho and Jussi Rahola. CSC News, October 1994.

The T Experiments: Errors in Scientific Software. Les Hatton. IEEE Computational Science & Engineering, April-June 1997.

Teamwork: Computational Science and Applied Mathematics. Dianne P. O'Leary. IEEE Computational Science & Engineering, April-June 1997.

Fortran 90 in CSE: A Case Study. José E. Moreira and Samuel P. Midkiff. IEEE Computational Science & Engineering, April-June 1998.

Innumeracy: Mathematical Illiteracy and Its Consequences. John Allen Paulos. Vintage Books, 1990.

Numerical Recipes: Does This Paradigm Have a Future? William H. Press and Saul A. Teukolsky. Computers in Physics, Vol. 11, No. 5, 1997.

Tutkijan moraali laskennallisessa tieteessä (The moral of the researcher in computational science). Jussi Rahola. In the book Alkuräjähdyksestä kännykkään, CSC, 1998.

Should Computer Scientist Experiment More? Walter F. Tichy. Computer, May 1998.

Experimental Models for Validating Technology. Marvin V. Zelkowitz and Dolores R. Wallace. Computer, May 1998.

DOE's press release "Ultra" Computer Reaches 1 Trillion Operations Per Second Milestone.

Course material from the intensive course at CSC. Web address http://www.csc.fi/math_topics/courses/ic98/.