There’s Much More to Code Verification

27 Tuesday May 2025

TL;DR

The standard narrative for code verification is demonstrating correctness and finding bugs. While this is true, it also sells verification as a practice that is wildly short. Code verification has a myriad of other uses, foremost the assessment of accuracy without ambiguity. It can also define features of code, such as adherence to invariants in solutions. Perhaps, most compellingly, it can define the limits of a code-method and research needs to advance capability.

“Details matter, it’s worth waiting to get it right.” ― Steve Jobs

What People Think It Does

There is a standard accepted narrative for code verification. It is a technical process of determining that a code is correct. A lack of correctness is caused by bugs in the code that implements a method. It is supported by two engineering society standards written by AIAA (aerospace engineers) and ASME (mechanical engineers). The DOE-NNSA’s computing program, ASC, has adopted the same definition. It is important for the quality of code, but it is drifting to obscurity and a lack of any priority. (Note the IEEE has a different definition for verification, leading to widespread confusion.)

The definition has several really big issues that I will discuss below. Firstly, the definition is too limited and arguably wrong in emphasis. Secondly, it means that most of the scientific community doesn’t give a shit about it. It is boring and not a priority. It plays a tiny role in research. Thirdly, it sells the entire practice short by a huge degree. Code verification can do many important things that are currently overlooked and valuable. Basically there are a bunch of reasons to give a fuck about it. We need to stop undermining the practice.

The basics of code verification are simple. A method for solving differential equations has an ideal order of accuracy. Code verification compares the solution by the code with an analytical solution over a sequence of meshes. If the order of accuracy observed matches the theory, the code is correct. If it does not, the code has an error either in the code or in the construction of the method. One of the key reasons we solve differential equations with computers is the dearth of analytical solutions. For most circumstances of practical interest, there is no analytical solution, nor circumstances that match the order of accuracy of method design.

One of the answers to the dearth of analytical solutions is the practice of the method of manufactured solutions (MMS). It is a simple idea in concept where an analytical right-hand side is added to equations to force a solution. The forced solution is known and ideal. Using this practice, the code can be studied. The technique has several practical problems that should be acknowledged. First is the complexity of these right-hand sides is often extreme, and the source term must be added to the code. It makes the code different than the code used to solve practical problems in this key way. Secondly, the MMS problems are wildly unrealistic. Generally speaking, the solutions with MMS are dramatically unlike any realistic solution.

MMS simply expands the distance of code verification for the code’s actual use. All this does is amplify the degree to which code users disdain code verification. The whole practice is almost constructed to destroy the importance of code verification. It’s also pretty much dull as dirt. Unless you just love math (some of us do), MMS isn’t exciting. We need to move forward towards practices people give a shit about. I’m going to start by naming a few.

“If you thought that science was certain – well, that is just an error on your part.” ― Richard P. Feynman

It Can Measure Accuracy

“I learned very early the difference between knowing the name of something and knowing something.” ― Richard P. Feynman

I’ve already taken a stab at this topic, noting that code verification needs are refresh:

https://williamjrider.wordpress.com/2024/09/21/code-verification-needs-a-refresh/

Here, I will just recap the first and most obvious overlooked benefit, measuring the meaningful accuracy of codes. Code verification’s standard definition concentrates on order of accuracy as the key metric. Practical solutions with code rarely achieve the design order of accuracy. This further undermines code verification as significant. Most practical solutions give first-order accuracy (or lower). The second metric from verification is error, and with analytical solutions, you can get precise errors from the code. The second thing to focus on is the efficiency of a code that connects directly.

A practical measure of code efficiency is accuracy per unit effort. Both of these can be measured with code verification. One can get the precise errors by solving a problem with an analytical solution. By simultaneously measuring the cost of the solution, the efficiency can be assessed. For practical use, this measurement means far more than finding bugs via standard code verification. Users simply assume codes are bug-free and discount the importance of this. They don’t actually care much because they can’t see it. Yes, this is dysfunctional, but it is the objective reality.

The measurement and study of code accuracy is the most straightforward extension of the nominal dull as dirt practice. There’s so much more as we shall see..

It Can Test Symmetries

“Symmetry is what we see at a glance; based on the fact that there is no reason for any difference…” ― Blaise Pascal

One of the most important aspect of many physical laws are symmetries. These are often preserved by ideal versions of these laws (like the differential equations code’s solve). Many of these symmetries are solved inexactly by methods in codes. Some of these symmetries are simple, like preservation of geometric symmetry, such as cylindrical or spherical flows. This can give rise to simple measures that accompany classical analytical solutions. The symmetry measure can augment the standard verification approach with additional value. In some applications, the symmetry is of extreme importance.

There are many more problems that can be examined for symmetry without having an analytical solution. One can create all sorts of problems with symmetries built into the solution. A good example of this is a Rayleigh-Taylor instability problem with a symmetry plane where left-right symmetry is desired. The solution can be examined as a function of time. This is an instability, and the challenge to symmetry grows over time. As the problem evolves forward the lack of symmetry becomes more difficult to control. It makes the test extreme if run for very long times. Symmetry problems also tend to grow as the mesh is refined.

It is a problem I used over 30 years ago to learn how to preserve stability. It was an incompressible variable-density code. I found the symmetry could be threatened by two main parts of the code: the details of upwinding in the discretization, and numerical linear algebra. I found that the pressure solve needed to be symmetric as well. I had to modify each part of the algorithm to get my desired result. The upwinding had to be changed to avoid any asymmetry concerning the sign of upwinding. This sort of testing and improvement is the hallmark of high-quality code and algorithms. Too little of this sort of work is taking place today.

It Can Find “Features”

“A clever person solves a problem. A wise person avoids it.” ― Albert Einstein

The usual mantra for code verification is lack of convergence means a bug. This is not true. This is a very naive and limiting perspective. For codes that compute the solution to shock waves (and weak solutions), correct solutions require conservation and entropy conditions. Methods and codes do not always adhere to these conditions. In those cases, a “perfect” bug-free code will produce incorrect solutions. They will converge on a solution, just converge to the wrong solution. The wrong solution is a feature of the method and code. These wrong solutions are revealed easily by extreme solutions with very strong shocks.

These features are easily fixed by using different methods. The problem is that the codes with these features are the product of decades of investment and reflect deeply held cultural norms. Respect for verification is acutely not one of those norms. My experience is that users of these codes make all sorts of excuses for this feature. Mostly, this sounds like the systematic devaluing of the verification work and excuses for ignoring the problem. Usually, they start talking about how important the practical work the code does. They fail to see how damning the results of failing to solve these problems. Frankly, it is a pathetic and unethical stand. I’ve seen this over and over at multiple Labs.

Before I leave this topic, I will get to another example of a code feature. This has a similarity to the symmetry examination. Shock codes can often have a shock instability with a funky name, the carbuncle phenomenon. This is where you have a shock aligned with a grid, and the shock becomes non-aligned and unstable. This feature is a direct result of properly implemented methods. It is subtle and difficult to detect. For a large class of problems, it is a fatal flaw. Fixing the problem requires some relatively simple but detailed changes to the code. It also shows up in strong shock problems like Noh and Sedov. At the symmetry axes, the shocks can lose stability and show anomalous jetting.

This gets to the last category of code verification benefits, determining a code’s limits and a research agenda.

“What I cannot create, I do not understand.” ― Richard P. Feynman

It Can Find Your Limits and Define Research

If you are doing code verification correctly, the results will show you a couple of key things: what the limits of the code are, and where research is needed. My philosophy of code verification is to beat the shit out of a code. Find problems that break the code. The better the code is, the harder it is to break. The way you break the code is to define harder problems with more extreme conditions. One needs to do research to get to the correct convergent solutions.

Where the code breaks is a great place to focus on research. Moving the horizons of capability outward can define an excellent and useful research agenda. In a broad sense, the identification of negative features is a good practice (the previous section). Another example of this is extreme expansion waves approaching vacuum conditions. In the past, I have found that most usual shock methods cannot solve this problem well. Solutions are either non-convergent or so poorly convergent as to render the code useless.

This problem is not altogether surprising given the emphasis on methods. Computing shock waves has been the priority for decades. When a method cannot compute a shock properly, the issues are more obvious. There is a clear lack of convergence in some cases or catastrophic instability. Expansion waves are smooth, offering less challenge, but they are also dissipation-free and nonlinear. Methods focused on shocks shouldn’t necessarily solve them well (and don’t when they’re strong enough).

I’ll close by another related challenge. The use of methods that are not conservative is driven by the desire to compute adiabatic flows. For some endeavors like fusion, adiabatic mechanisms are essential. Conservative methods necessary for shocks often (or generally) cannot compute adiabatic flows well. A good research agenda might be finding the methods that can achieve conservations, preserving adiabatic flows, and strong shock waves. A wide range of challenging verification test problems is absolutely essential for success.

“The first principle is that you must not fool yourself and you are the easiest person to fool.” ― Richard P. Feynman

Does Uncertainty Quantification Replace V&V?

19 Monday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

Uncertainty quantification (UQ) is ascending while verification & validation (V&V) is declining. UQ is largely done in silico and offers results that trivially harness modern parallel computing. UQ thus parallels the appeal of AI and easy computational results. It is also easily untethered to objective reality. V&V is a deeply technical and naturally critical practice. Verification is extremely technical. Validation is extremely difficult and time-consuming. Furthermore, V&V can be deeply procedural and regulatory. UQ has few of these difficulties although its techniques are quite technical. Unfortunately, UQ without V&V is not viable. In fact, UQ without the grounding of V&V is tailor-made for bullshit and hallucinations. The community needs to take a different path that mixes UQ with V&V, or disaster awaits.

“Doubt is an uncomfortable condition, but certainty is a ridiculous one.” ― Voltaire

UQ is Hot

If one looks at the topics of V&V and UQ, it is easy to see that UQ is a hot topic. It is vogue and garners great attention and support. Conversely, V&V is not. Before I give my sharp critique of UQ, I need to make something clear. UQ is extremely important and valuable. It is necessary. We need better techniques, codes, and methodologies to produce estimates of uncertainty. The study of uncertainty is guiding us to grapple with a key topic; how much we don’t know? This is difficult and uncomfortable work. Our emphasis on UQ is welcome. That said, this emphasis needs to be grounded in reality. I’ve spoken in the past about the danger of ignoring uncertainty. When an uncertainty is ignored it gets the default value of ZERO. In other words, ignored uncertainties are assigned the smallest possible value.

“The quest for certainty blocks the search for meaning. Uncertainty is the very condition to impel man to unfold his powers. ” ― Erich Fromm

As my last post discussed, the focus needs to be rational and balanced. When I observe the current conduct of UQ research, I see neither character. UQ takes on the mantle of the silver bullet. It is subject to the fallacy of the “free lunch” solutions. It seems easy and tailor-made for our currently computationally rich environment. It produces copious results without many of the complications of V&V. The practice of V&V is doubt based on deep technical analysis. It is uncomfortable and asks hard questions. Often questions that don’t have easy or available answers. UQ just gives answers with ease and it’s semi-automatic, in silico. You just need lots of computing power.

“I would rather have questions that can’t be answered than answers that can’t be questioned.” ― Richard Feynman

This ease gets to the heart of the problem. UQ needs validation to connect it to objective reality. UQ needs verification to make sure the code is faithfully correct to the underlying math. Neither practice is so well established that it can be ignored. Yet, increasingly, they are being ignored. Experimental results are needed to challenge our surety. UQ has a natural appetite for computing thus our exascale computers lap it up. It is a natural way to create vast amounts of data. UQ attaches statistics naturally to modeling & simulation, a long-standing gap. Statistics connects to machine learning as it is the algorithmic extension of statistics. The mindset parallels the euphoria recently for AI.

For these reasons, UQ is a hot topic and attracting funding and attention today. Being in silico UQ becomes an easy way to get V&V-like results from ML/AI. What is missing? For the most part, UQ is done assuming V&V is done. For AI/ML this is a truly bad assumption. If you’re working in simulation you know that assumption is suspect. The basics of V&V are largely complete, but its practice is haphazard and poor generally. My observation is that computational work has regressed concerning V&V. Advances made in publishing and research standards have gone backward in recent years. Rather than V&V being completed and simply applied, it is dispised.

All of this equals a distinct danger for UQ. Without V&V UQ is simply an invitation for ModSim hallucinations akin to the problem AI has. Worse yet, it is an invitation to bullshit the consumers of simulation results. Answers can be given knowing they have a tenuous connection to reality. It is a recipe to fool ourselves with false confidence.

“The first principle is that you must not fool yourself and you are the easiest person to fool.” ― Richard P. Feynman

AI/ML Feel Like UQ and Surrogates are Dangerous

One of the big takeaways from the problems with UQ is the appeal of its in-silico nature. This paves the way to easy results. Once you have a model and working simulation UQ is like falling off a log. You just need lots of computing power and patience. Yes, it can be improved and made more accurate and efficient. Nonetheless, UQ asks no real questions about the results. Turn the crank and the results fall out (unless it triggers a problem with the code). You can easily get results although doing this efficiently is a research topic. Nonetheless being in silico removes most of the barriers. Better yet, it uses the absolute fuck out of supercomputers. You can so fill a machine up with calculations. You get results galore.

If one is paying attention to the computational landscape you should be experiencing deja vu. AI is some exciting in-silico shit that uses the fuck out of the hardware. Better yet, UQ is an exciting thing to do with AI (or machine learning really). Even better you can use AI/ML to make UQ more efficient. We can use computational models and results to train ML models that can cheaply evaluate uncertainty. All those super-fast computers can generate a shitload of data. These are called surrogates and they are all the rage. Now you don’t have to run the expensive model anymore; you just train the surrogate and evaluate the fuck out of it on the cheap. Generally machine learning is poor at extrapolating, and in high dimensions (UQ is very high dimensional) you are always extrapolating. You better understand what you’re doing and machine learning isn’t well understood.

What could possibly go wrong?

If the model you trained the surrogate on has weak V&V, a lot can go wrong. You are basically evaluating bullshit squared. Validation is essential to establishing how good a model is. You should produce a model form error that expresses how well the computational model works. The model also has numerical errors due to the finite representation of the computation. I can honestly say that I’ve never seen either of these fundamental errors associated with a surrogate. Nonetheless, surrogates are being developed to power UQ all over the place. It’s not a bad idea, but these basic V&V steps should be an intrinsic part of this. To me, the state of affairs says more about the rot at the heart of the field. We have lost the ability to seriously question computed results. V&V is a vehicle for asking these questions. These questions are too uncomfortable to be confronted with.

V&V is Hard; Too Hard?

I’ve worked at two NNSA Labs (Los Alamos and Sandia) in the NNSA V&V program. So I know where the bodies are buried. I’ve been part of some of the achievements of the program and where it has failed. I was at Los Alamos when V&V arrived. It was like pulling teeth to make progress. I still remember the original response to validation as a focus, “Every calculation a designer does is all the validation we need!” The Los Alamos weapons designers wanted hegemonic control over any assessment of simulation quality. Code developers, computer scientists, and any non-designer were deemed incompetent to assess quality. To say it was an uphill battle was an understatement. Nonetheless, progress was made, albeit mildly.

V&V fared better at Sandia. In many ways, the original composition of the program had its intellectual base at Sandia. This explained a lot of the foundational resistance from the physics labs. It also gets too much of a problem with V&V today. Its focus on the credibility of simulations makes it very process-oriented and regulatory. As such it is eye-rollingly boring and generates hate. This character was the focus of the opposition at Los Alamos (Livermore too). V&V has too much of an “I told you so” vibe. No one likes this and V&V starts to get ignored because it just delivers bad news. Put differently, V&V asks lots of questions but generates few answers.

Since budgets are tight and experiments are scarce, the problems grow. We start to demand calculations to have close agreement with available data. Early predictions typically don’t meet standards. By and large, simulations have a lot of numerical error even on the fastest computers. The cure for this is an ex-post-facto calibration of the model to match experiments better. The problem is that this then short-circuits validation. Basically, there is little or no actual validation. Almost everything is calibrated unless the simulation is extremely easy. The model has a fixed grid, so there’s no verification either. Verification is bad news without a viable current solution. What you can do with such a model is lots of UQ. Thus UQ becomes the results for the entire V&V program.

To really see this clearly we need to look West to Livermore.

What does UQ mean without V&V?

I will say up front that I’m going to give Livermore’s V&V program a hard time, but first, they need big kudos. The practice of computational physics and science at Livermore is truly first-rate. They have eclipsed both Sandia and Los Alamos in these areas. They are exceptional code developers and use modern supercomputers with immense skill. They are a juggernaut in computing.

By almost any objective measure Livermore’s V&V program produced the most important product of the entire ASC program, common models. Livermore has a track record of telling Washington one thing and doing something a bit different. Even better, this something different is something better. Not just a little better, but a lot better. Common models are the archetypical example of this. Back in the 1990’s, there was this metrics project looking at validating codes. Not a terrible idea at all. In a real sense, Livermore did do the metrics in the end but took a different smarter path to them.

What Livermore scientists created instead was a library of common models that combined experimental data, computational models, and auxiliary experiments. The data was accumulated across many different experiments and connected into a self-consistent set of models. It is an incredible product. It has been repeated at Livermore and then at Los Alamos too across the application space. I will note that Sandia hasn’t created this, but that’s another story of differences in Lab culture. These suites of common models are utterly transformative to the program. It is a massive achievement. Good thing too because the rest of V&V there is far less stellar.

What Livermore has created instead is lots of UQ tools and practice. The common models are great for UQ too. One of the first things you notice about Livermore’s V&V is the lack of V&V. Verification leads the way in being ignored. The reasons are subtle and cultural. Key to this is an important observation about Livermore’s identity as the fusion lab. Achieving fusion is the core cultural imperative. Recently, Livermore has achieved a breakthrough at the National Ignition Facility (NIF). This was “breakeven” in terms of energy. They got more fusion energy out than laser energy in for some NIF experiments (all depends on where you draw the control volume!).

The NIF program also has the archetypical example of UQ gone wrong. Early on in the NIF program, there was a study of fusion capsule design and results. They looked at a large span of uncertainties in the modeling of NIF and NIF capsules. It was an impressive display of the UQ tools developed by Livermore, their codes, and computers. At the end of the study, they created an immensely detailed and carefully studied prediction of outcomes for the upcoming experiments. This was presented as a probability distribution function of capsule yield. It covered an order of magnitude from 900 kJ to 9 MJ of yield. When they started to conduct experiments, the results were not even in the range predicted by about a factor of three on the low side. The problems and dangers of UQ were laid bare.

“The mistake is thinking that there can be an antidote to the uncertainty.” ― David Levithan

If you want to achieve fusion the key is really hot and dense matter. The way to get this hot and dense matter is to adiabatically compress the living fuck out of material. To do this there is a belief in numerical Lagrangian hydrodynamics with numerical viscosity turned off in adiabatic regions gives good results. The methods they use are classical and oppositional to modern shock-capturing methods. To compute shocks properly needs dissipation and conservation. The ugly reality is that hydrodynamic mixing (excessively non-adiabatic dissipative and ubiquitous) is an anathema to Lagrangian methods. Codes need to leave the Lagrangian frame of reference and remap. Conserving energy is one considerable difficulty.

Thus, the methods favored at Livermore cannot successfully pass verification tests for strong shocks. Thus, verification results are not shown. For simple verification problems there is a technique to get good answers. The problem is that the method doesn’t work on applied problems. Thus, good verification results for shocks don’t apply to key cases where the codes are used. They know the results will be bad because they are following the fusion mantra. Failure to recognize the consequences of bad code verification results is magical thinking. Nothing induces magical thinking like a cultural perspective that values one thing over all others. This is a form of extremism discussed in the last blog post.

There are two unfortunate side effects. The first obvious one is the failure to pursue numerical methods that simultaneously preserve adiabats and compute shocks correctly. This is a serious challenge for computational physics. It should be vigorously pursued and developed by the program. It also represents a question asked by verification without an easy answer. The second side effect is a complete commitment to UQ as the vehicle for V&V where answers are given and questions aren’t asked. At least not really hard questions.

UQ is left and becomes the focus. It is much better for results and for giving answers. If those answers don’t need to be correct, we have an easy “success.”

“It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.” ― Richard P. Feynman

A Better Path Forward

Let’s be crystal clear about the UQ work at Livermore, it is very good. Their tools are incredible. In a purely in silico way the work is absolutely world-class. The problems with the UQ results are all related to gaps in verification and validation. The numerical results are suspect and generally under-resolved. The validation of the models is lacking. This gap is chiefly from the lack of acknowledgment of calibrations. Calibration of results is essential for useful simulations of challenging systems. NIF capsules are one obvious example. Global climate models are another. We need to choose a better way to focus our work and use UQ properly.

“Science is what we have learned about how to keep from fooling ourselves.” ― Richard Feynman

My first key recommendation is to ground validation with calibrated models. There needs to be a clear separation of what is validated and what is calibrated. One of the big parts of the calibration is the finite mesh resolution of the models. Thus the calibrations mix both model form error and numerical error. All of this needs to be sorted out and clarified. In many cases, these are the dominant uncertainties in simulations. They swallow and neutralize the UQ we spend our attention on. This is the most difficult problem we are NOT solving today. It is one of the questions raised by V&V that we need to answer.

“Maturity, one discovers, has everything to do with the acceptance of ‘not knowing.” ― Mark Z. Danielewski

The practice of verification is in crisis. The lack of meaningful estimates of numerical error in our most important calculation is appalling. Code verification has become a niche activity without any priority. Code verification for finding bugs is about as important as taking your receipt with you from the grocery store. Nice to have, but it’s very rarely checked by anybody. When it is checked, it’s because you’re probably doing something a little sketchy. Code verification is key to code and model quality. It needs to expand in scope and utility. It is also the recipe for improved codes. Our codes and numerical methods need to progress, They must continue to get better. The challenge of correctly computing strong shocks and adiabats is but one. There are many others that matter. We are still far from meeting our modeling and simulation needs.

“What I cannot create, I do not understand.” ― Richard P. Feynman

Ultimately we need to recognize how V&V is a partner to science. It asks key questions of our computational science. It is then the goal to meet these questions with a genuine search for answers. V&V also provides evidence of how well the question is answered, This question-and-answer cycle is how science must work. Without the cycle, progress stalls, and the hopes of the future are put at risk.

“If you thought that science was certain – well, that is just an error on your part.” ― Richard P. Feynman

In a Time of Extremes, Balance is the Answer

17 Saturday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

Today’s world is ruled by extreme views and movements. These extremes are often a direct reaction to progress and change. People often prefer order and a well-defined direction. The current conservative backlash is a reaction. Much of it is based on the discomfort of social progress and economic disorder. In almost all things balance and moderation are a better path. The middle way leads to progress that sticks. Slow and deliberate change is accepted, while fast and precipitous change invites opposition. I will discuss how this plays out in both politics and science. In science, many elements contribute to the success of any field. These elements need to be present for success. Without the balance failure is the ultimate end. Peter Lax’s work was an example of this balance.

“All extremes of feeling are allied with madness.” ― Virginia Woolf

The Extremes Rule Today

It is not controversial to say that today’s world is dominated by extremism. What is a bit different is to point to the world of science for the same trend. The political world’s trends are illustrated vividly on screens across the World. We see the damage these extremes are doing in the USA, Russia, Israel-Gaza and everywhere online. One extreme will likely breed an equal and opposite reaction. Examples abound today such as over-regulation or political correctness in the USA. In each, true excesses are greeted with equal or greater excesses reversing them.

Science is not immune to these trends and excesses. I will detail below a couple of examples where no balance exists. One program is the exascale computing program which focuses on computational hardware. A second example is the support for AI. It is similarly hardware-focused. In both cases, we fail to recognize how the original success was catalyzed. The impacts, the need, and the route to progress are not seen clearly. We have programs that can only see computing hardware and fail to see the depth and nature of software.

In science software is a concrete instantiation of theory and mathematics. If theory and math are not supported, the software is limited in value. The progress desired by these efforts is effectively short-circuited. In computing, software has the lion’s share of the value. Hardware is necessary, but not where the largest advances have occurred. Yet the hardware is the most tangible and visible aspect of technology. In a mathematical parlance, the hardware is necessary, but not sufficient. We are advancing science in a clearly insufficient way.

“To put everything in balance is good, to put everything in harmony is better.” ― Victor Hugo

Balance in All Things

Extremism is simple. Reality is complex. Simple answers fall apart upon meeting the real World. This is true for politics and science. When extreme responses to problems are implemented, they invite an opposite extreme reaction. Lasting solutions require a subtle balance of perspectives and sources of progress. The best from each approach is necessary.

Take the issue of over-regulation as an example. Simply removing regulations invites the same forces that created the over-regulation in the first place. A far better approach is to pare back regulation thoughtfully. Careful identification of excess regulation build support for the project. The same thing applies to the freedom of speech and the excesses of “woke” or “cancel culture”. Those movements overstepped and created a powerful backlash. They were also responding to genuine societal problems of bigotry and oppressive hierarchy. A complete reversal is wrong and will create horrible excesses inviting a reverse backlash. Again, smaller changes that balance progress with genuine freedom would be lasting.

With those examples in mind, let us turn to science. In the computational world, we have seen a decade of excess. First with computing and pursuit of exascale high-performance computing. Next, in a starkly similar fashion artificial intelligence became an obsession. Current efforts to advance AI are focused on computational hardware. Other sources of progress are nearly ignored. In each case, there is serious hype along with an appealing simplicity in the sales pitch. In both cases, the simple approach short-changes progress and hampers broader success and long-term progress.

Let’s turn briefly to what a more balanced approach would look like.

At the core of any discussion of science should be the execution of the scientific method. This has two key parts working in harmony, theory and experiment (observation). Making these two approaches harmonize is the route to progress. Theory is usually expressed in mathematics and is most often solved on computers. If this theory describes what can be measured in the real world, we believe that we understand reality. Better yet, we can predict reality leading to engineering and control. Our technology is a direct result of this and much of our societal prosperity.

With this foundation, we can judge other scientific efforts. Take the recent Exascale program, which focused on creating faster supercomputers. The project focused on computing hardware and computer science while not supporting mathematics and theory vibrantly. This is predicated on some poor assumptions that theory is adequate (it isn’t) and our math is healthy. Both the theory and math need sustained attention. Examining history shows that math and theory have been to core of progress in computing. It is worse than this. The exascale focus came as Moore’s law ended. This was the exponential growth in computing power that held for almost a half-century starting in 1965. Its end was largely based on encountering physical barriers to progress. The route to increased computing value should shift to theory and math (i.e., algorithms). Yet, the focus was on hardware trying to breathe life into the dying Moore’s law. It is both inefficient and ultimately futile

Today, we see the same thing happening with AI. The focus is on computing hardware even though the growth in power is incremental. Meanwhile, the massive breakthrough in AI was enabled by algorithms. Limits on trust and correctness of AI are also grounded in the weakness of the underlying math for AI. A more vibrant and successful AI program would reduce hardware focus and increase math and algorithm support. This would serve both progress and societal needs. Yet we see the opposite.

We are failing to learn from our mistakes. The main reason is that we can’t call them mistakes. In every case, we are seeing excess breed more excess. Instead, we should balance and strike a middle way.

“Progress and motion are not synonymous.” ― Tim Fargo

A Couple Examples

The USA today is living through a set of extremes that are closely related. There is a burgeoning authoritarian oligarchy emerging as the central societal power. Much of the blame for this development is out-of-control capitalism without legal boundaries. There is virtually limitless wealth flowing to a small number of individuals. With this wealth, political power is amassing. A backlash is inevitable. The worst correction to this capitalist overreach would be a suspension of capitalism. Socialism seems to be the obvious cure. A mistake would be too much socialism. It would throw the baby up with the bathwater. A mixture of capitalism and socialism works best. A subtle balance of the two is needed. The most obvious societal example is health care where capitalism is a disaster. We have huge costs with worse outcomes.

In science, recent years have seen an overemphasis on computing hardware over all else. My comments apply to exascale and AI with near equality. Granted there are differences in the computing approach needed for each, but commonality is obvious. The value of that hardware is bound to software. Software’s value is bound to algorithms, which in turn are grounded in mathematics. That math can be discrete, information-related, or a theory of the physical world. This pipeline is the route to computing’s ability to transform our world and society. The answer is not to ignore hardware but to moderate it with other parts of the computing recipe for science. Without that balance the pipeline empties and becomes stale. That has probably already happened.

As I published this article, news of the death of Peter Lax arrived. I was buoyed by the prominence of his obituary in the New York Times. Peter’s work was essential to my career, and it is good to see him appropriately honored. Peter was the epitome of the creation of the value in the balance I’m discussing here. He was the consummate mathematical genius applying it to solve essential problems. While he contributed to applications of math, his work had the elegance and beauty of pure math. He also recognized the essential role of computing and computers. We would be wise to honor his contributions by following his path more closely. I’ve written about Peter and his work on several occasions here (links below).

“Keep in mind that there is in truth no central core theory of nonlinear partial differential equations, nor can there be. The sources of partial differential equations are so many – physical, probalistic, geometric etc. – that the subject is a confederation of diverse subareas, each studying different phenomena for different nonlinear partial differential equation by utterly different methods.”– Peter Lax

https://williamjrider.wordpress.com/2015/06/25/peter-laxs-philosophy-about-mathematics/

https://williamjrider.wordpress.com/2016/05/20/the-lax-equivalence-theorem-its-importance-and-limitations/

https://williamjrider.wordpress.com/2013/09/19/classic-papers-lax-wendroff-1960/

“To light a candle is to cast a shadow…” ― Ursula K. Le Guin

Epilog: What if Hallucinations are Really Bullshit?

07 Wednesday May 2025

Posted by Bill Rider in Uncategorized

≈ 3 Comments

“It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction.” ― Harry G. Frankfurt, On Bullshit

Its really cool when your blog post generates a lot of feedback. Its like when you give a talk, and you get lots of questions. It is a sign that people give a fuck. No questions, is not engagement and lots of no fucks given.

One friend sent me an article, “ChatGPT is Bullshit.” It was riffing on Harry Frankfurt’s amazing monograph, “On Bullshit.” To put it mildly, bullshit is much worse than a hallucination. Even if that hallucination is produced by drugs. Hallucinations are innocent and morally neutral. Bullshit is unethical. The paper makes the case that LLMs are bullshitting us, not offering some innocent hallucinations. We should apply the same standard to computational modeling of the classical sort.

This is, of course, not a case of anthropomorphizing the LLM. The people responsible for designing LLMs want it to provide answers. Providing answers makes the users of LLMs happy. Happy users use their product. Unhappy ones don’t. Bullshit is willful deception. It is deception with a purpose. We should be mindful about willful deception for classical modeling & simulation work. In a subtle way the absence of due diligence such as avoiding V&V is treading close to the line. If V&V is done and then silenced and ignored, you have bullshit. So I’ve seen a lot of bullshit in my career. So have you.

Bullshit is a pox. We need to recognize and eliminate bullshit. Bullshit is the enemy of truth. It is vastly worse than hallucinations and demands attention.

“The bullshitter ignores these demands altogether. He does not reject the authority of the truth, as the liar does, and oppose himself to it. He pays no attention to it at all. By virtue of this, bullshit is a greater enemy of the truth than lies are.” ― Harry G. Frankfurt,

Hicks, Michael Townsen, James Humphries, and Joe Slater. “ChatGPT is bullshit.” Ethics and Information Technology 26, no. 2 (2024): 1-10.

Does Modeling and Simulation Hallucinate Too?

03 Saturday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

One of the most damning aspects of the amazing results from LLMs are hallucinations. These are garbage answers delivered with complete confidence. Is this purely an artifact of LLMs, or more common than believed? I believe the answer is yes. Classical modeling and simulations using differential equations can deliver confident results without credibility. In the service of prediction this can be common. It is especially true when the models are crude or heavily calibrated then used to extrapolate away from data. The key to avoiding hallucination is the scientific method. For modeling and simulation this means verification and validation. This requires care and due diligence be applied in all cases of computed results.

“It’s the stupid questions that have some of the most surprising and interesting answers. Most people never think to ask the stupid questions.” ― Cory Doctorow

What are Hallucinations in LLMs?

In the last few years one of the most stunning technological breakthroughs are Large Language Models (LLMs, like ChatGPT, Claude, Gemini …). This breakthough has spurred visions of achieving artificial general intelligence soon. The answers to queries are generally complete and amazing. In many contexts we see LLMs replacing search as a means of information gathering. It is clearly one of the most important technologies for the future. There is the general view of LLMs broadly driving the economy of our future. There is a blemish on this forecast, hallucinations! Some of these complete confident answers are partial to complete bullshit.

“I believe in everything until it’s disproved. So I believe in fairies, the myths, dragons. It all exists, even if it’s in your mind. Who’s to say that dreams and nightmares aren’t as real as the here and now?” ― John Lennon

A LLM answers questions with unwavering confidence. Most of the time this is well grounded in objective facts. Unfortunately, this confidence is shown when results are false. Many examples show that LLMs will make up answers that sound great, but are lies. I asked ChatGPT to create a bio for me and it constructed a great sounding lie. It had me born in 1956 (1963) with a PhD in Math from UC Berkeley (Nuke Engineering, New Mexico). Other times I’ve asked to elaborate on experts in fields I know well. More than half the information is spot on, but a few experts are fictional.

“It was all completely serious, all completely hallucinated, all completely happy.” ― Jack Kerouac

The question is what would be better?

In my opinion the correct response is for the LLM to say, “I don’t know.” “That’s not something I can answer.” To a some serious extent this is starting to happen. LLMs will tell you they don’t have the ability to answer a questions. You also get responses like “this questions violate their rules.” We see the LLM community responding to this terrible problem. The current state is imperfect and hallucinations still happen. The community guidelines for LLMs are tantamount to censorship in many cases. That said, they are moving toward dealing with it.

Do classical computational models have the same issue?

Are they dealing with problems as they should?

Yes and no.

“I don’t paint dreams or nightmares, I paint my own reality.” ― Frida Kahlo

What would Hallucination be in Simulation?

Standard computational modeling is thought to be better because it is based on physical principles. We typically solve well defined and accepted governing equations. These equations are solved in a manner that is based on well known mathematical and computer science methods. This is correct, but it is not bulletproof. The reasons are multiple. One of the principal ways problems occur are the properties of the materials in a problem. A second way is the inclusion of physics not included in the governing equations (often called closure). A third major category is the construction of a problem in terms of initial and boundary conditions, or major assumptions. Numerical solutions can be under-resolved or produce spurious solutions. Mesh resolution can be suspect or inadequete for the numerical solution. The governing equations themselves include assumptions that may not be true or apply to the problem being solved..

“Why should you believe your eyes? You were given eyes to see with, not to believe with. Your eyes can see the mirage, the hallucination as easily as the actual scenery.” ― Ward Moore

The major weakness is the need for closure of the physical models used. This can take the form of constitutive relations for the media in the problem. It also applies to unresolved sales or physics in the problem. Constitutive relations usually abide by well defined principles quite often grounded in thermodynamics. They are the product of considering the nature of the material at scales under the simulation’s resolution. Almost always these scales are considered to be averaged/mean values. Thus the variability in the true solution is excluded from the problem. Large or divergent physics can emerge if the variability of materials is great at the scale of the simulation. Simple logic dictates that this variability grows larger as the resolution of a simulation becomes smaller.

A second connected piece of this problem is subscale physics not resolved, but dynamic. Turbulence modeling is the classical version of this. These models have significantly limited applicability and great shortcomings. This gets to the first category of assumption that needs to be taken in account. Is the model being used in a manner appropriate for it. Models also interact heavily with the numerical solution. The numerical effects/errors can often mimic the model’s physical effects. Numerical dissipation is the most common version of this, Turbulence is a nonlinear dissipative process, and numerical diffusion is often essential for stability. Surrounding all of this is the necessity of identifying unresolved physics to begin with.

Problems are defined by analysts in a truncated version of the universe. The interaction of a problem with that universe is defined by boundary conditions. The analyst also defines a starting point for a problem if it involves time evolution. Usually the state a problem starts from in a simple quiessiant version of reality. Typically it is far more homogeneous and simple that how reality is. The same goes to the boundary conditions. Each of these decisions influences the subsequent solution. In general these selections make problem less dynamically rich than reality.

Finally, we can choose the wrong governing equations based on assumptions about a problem. This can include choosing equations that leave out major physical effects. A compressible flow problem cannot be described by incompressible equations. Including radiation or multi-material effects is greatly complicating. Radiation transport has a heirarchy of equations ranging from diffusion to full transport. Each level of approximation involves vast assumptions and loss of fidelity. The more complete the equations are physically, the more expensive they are. There is great economy in choosing the proper level for modeling. The wrong choice can produce results that are not meaningful for a problem.

Classical simulations are distiguished by the use of numerical methods to solve equations. This produces solution to these equations far beyond where they are analytical. These numerical methods are grounded in powerful proven mathematics. Of course, the proven powerful math needs to be used and listened to. When it isn’t the solutions are suspect. Too often corners are cut and the theory is not applied. This can result in poorly resolved or spurious solutions. Marginal stability can threaten solutions. Numerical solutions can be non-converged or poorly resolved. Some of the biggest issues numerically are seen with solutions labeled as direct numerical simulation (DNS). Often the declaration of DNS means all doubt and scrutiny is short-circuited. This is dangerous because DNS is often treated as a substitute for experiments.

These five categories of modeling problems should convince the reader that mistakes are likely. These errors can be large and create extremely unphysical results. If the limitations or wrong assumptions are not acknowledged or known, the solution might be viewed as hallucinations. The solution may be presented with extreme confidence. It may seem to be impressive or use vast amounts of computing power. It may be presented as being predictive and an oracle. This may be far from the truth

The conclusion is that classical modeling can definitely hallucinate too! Holy shit! Houston, we have a problem!

“Software doesn’t eat the world, it enshittifies it” – Cory Doctorow

How to Detect Hallucinations?

To find hallucinations, you need to look for them. There needs to be some active doubt applied to results given by the computer. Period.

The greatest risk is for computational results to be a priori believed. Doubt is a good thing. Much of the problem with hallucinating LLMs is the desire to give the user an answer no matter what. In the process of always giving an answer, bad answers are inevitable. Rather than reply that the answer can’t be given or is unreliable, the answer is given with confidence. In response to this problem LLMs have started to respond with caution.

“Would you mind repeating that? I’m afraid I might have lost my wits altogether and just hallucinated what I’ve longed to hear.” ― Jeaniene Frost

The same thing happens with classical simulations. In my experience the users of our codes want to always get an answer. If the codes succeed at this, some of the answers will be wrong. Solutions do not include any warnings or caveats. There are two routes to doing this. Technically the harder route is for the code itself to give warnings and caveats when used improperly. This would require the code’s authors to understand its limits. The other route is V&V. This requires the solutions to be examined carefully for credibility using standard techniques. Upon reflection, both routes go through the same steps, but applied systematically to the code’s solutions. The caveats are simply applied up front if the knowledge of limits is extensive. This can only be achieved through extensive V&V.

Some of these problems are inescapable. There is a way to minimize these hallucinations systematically. In a nutshell the scientific method offers the path for this. Again, we see verification and validation is the scientific method for computational simulation. It offers specific techniques and steps to guard against the problems outlined above. These steps offer an examination of the elements going into the solution. The suitability of the numerical solutions of the models and the models themsevles are examined critically. Detailed comparisons of solutions are made to experimental results. We see how well models produce solutions that model reality. Of course, this is expensive and time consuming. It is much easier to just accept solutions and confidently forge ahead. This is also horribly irresponsible.

“In the land of the blind, the one-eyed man is a hallucinating idiot…for he sees what no one else does: things that, to everyone else, are not there.” ― Marshall McLuhan

How to Prevent Hallucinations?

To a great extent the problem of hallucinations cannot be prevented. The question is whether the users of software are subjected to it without caution. An alternative is for the users to be presented with a notice of caution with results. This does point to the need for caution to be exercised for all computational results. This is true for classical simulations and LLM results. All results should be treated with doubt and scrutiny.

V&V is the route to this scrutiny. For classical simulation V&V is a well-developed field, but often not practiced. For LLMs (AI/ML) V&V is nascent and growing, but immature. In both cases, V&V is difficult and time consuming. The biggest impediment is to V&V is a lack of willingness to do it. The users of all computational work would rather just blindly accept results than apply due diligence.

For the users it is about motivation and culture. Is the pressure on getting answers and moving to the next problem? or getting the right answer? My observation is that horizons are short-term and little energy motivates getting the right answer. With fewer experiments and tests to examine the solutions, the answer isn’t even checked. Where I’ve worked this is about nuclear weapons. You would think that due diligence would be a priority. Sadly it isn’t. I’m afraid we are fucked. How fucked? Time will tell.

“We’re all living through the enshittocene, a great enshittening, in which the services that matter to us, that we rely on, are turning into giant piles of shit.” – Cory Doctorow

V&V: Too Much Process And Not Enough Science

21 Monday Apr 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

The narrative of verification and validation (V&V) is mired in process. The process is in the service of V&V as a means of assessment and credibility. This makes V&V as exciting as a trip to the DMV. The V&V community would be far better served to connect itself to science. Science by its very nature is inspirational serving as the engine of knowledge and discovery. As I noted before, V&V is simply the scientific method applied to computational science. This model serves far better than processes for engineering assessment.

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” ― Albert Einstein

The Symposium

A couple weeks ago I went to one of my favorite meetings of the year, the ASME VVUQ Symposium. It is an area of knowledge and personal expertise. I’ve committed and contributed a lot to it. The topics are important and interesting. The conference also seems to be slowly dying. The number of attendees and papers is dropping in number. This parallels the drop in interest in V&V as a whole. In published research, the V&V content has dropped off as practice is getting worse. Editorial standards are dropping. By the same token V&V content in research proposals is dropping off. Where it is expected, it is seen only as a duty and generally half-hearted.

The health of the conference can be measured by the length of it. The third day was canceled. Signs are bad.

In many places, it has been displaced by uncertainty quantification (UQ). UQ has the luxury of mostly producing results “in silico” totally artificially. You get results without having to go outside a code. Thus, objective reality can be avoided. The actual university because is a pain in the ass and a harsh mistress. UQ is full of complex mathematics and has inspired great work worldwide. It is an indispensable tool for V&V when used correctly. As computing has become the focus of funding, UQ has become the thing. Part of the reason is the huge appetite for computing UQ provides along with avoiding reality. The impact is the erosion of V&V.

There is a deeper perspective to consider. V&V has moved into practical use as part of applied engineering using simulation. V&V is a process to determine the quality of results. The key to that sentence is the word “process” and process sucks. Process is something to hate. Process is defined when practice is missing. Rather than an engine of progress and improvement, V&V has become an impediment to results. The movement to process is an implicit acknowledgment that practice doesn’t exist.

This is truly sad as V&V is equivalent to the scientific method. It absolutely should be standard practice. We should be working to change this.

“Societies in decline have no use for visionaries.” ― Anais Nin

Process is Death

“A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it.” ― Frank Herbert

Process is something that is getting a lot of attention lately. The explosion of process is being recognized as an overhead and impediment to getting shit done. Process is a bureaucracy with its dullness and boredom. Rather than being interesting and exciting, it is dull and formal. The same thing is true in government. Process is in the way of everything. Things are checked and double-checked, then triple-checked. As Tim Trucano said, “V&V takes the fun out of computational simulation.” Science is fun and V&V is all engineering and pocket-protectors.

Process, regulation, and procedure are also a reaction to problems. When principles do not guide action regulations become necessary. The lack of principle is still there and resistance will occur. We see this all the time across society. We also see process and procedure taking the place of a professional practice. If the professional practice is in place, the process becomes natural. It can also adapt and bend to the circumstance instead of blindly following. There is also a suspension of judgment that right sizes effort. Sometimes the situation calls for more effort and others call for less. A professional practice guided by principles can do this. Regulation is not this.

Fortunately, the process overload has been recognized broadly. Recently Ezra Klein’s book, Abundance, has put this into the discourse. It provides examples and evidence of process overreach while suggesting a path to a better future. The example of California’s high-speed rail is compelling. There the regulations and process have led to little progress and huge costs. I’ve seen the same with nuclear power where process and regulation have removed progress. Everything costs much more and takes longer than it should. Rather than produce safe and sensible progress, process is a recipe for no progress. V&V feels like its the same thing to many practicing engineers and scientists.

It doesn’t need to be. It shouldn’t be.

V&V needs to be an engine of progress and excellence. The right way to orient V&V is as science. This should be easy because V&V is the scientific method for computational science. Here V&V can find its principles. Verification connects directly to modeling, and validation connects directly to experiments. Modeling that provides good results for experiments is the goal. V&V provides proof of the success (or failure). Going through the steps of V&V can be a practice where practitioners of modeling find confidence. It can be a means to produce evidence for doubters. I will not that the practitioner should be the first doubter and need to convince themselves.

Science is Inspiration

“We are what we pretend to be, so we must be careful about what we pretend to be.” ― Kurt Vonnegut,

Science is the place to look for a different path for V&V. The fact is that V&V should be essential for the conduct of science when computing is used. Not as a demand, but because it is simply how the scientific method works. Models are essential for science and drive prediction. These models are then tested and refined by experiments of observations. Models are mathematical and most often applied via numerical methods and codes. This points directly to the practice of verification. Validation is exactly the synthesis of the modeling with comparison to measurements. My proposition is that more V&V should be happening naturally. Science done properly should pave the way.

This is not happening.

If one looks to the scientific literature, V&V practice is receding. If we go back in time 30 years we saw a push for V&V in publishing. Editorial standards were introduced to enforce the push. Referees were haphazard and uneven with modest support by editors. The result was a temporary advance in the practice. As time has proceeded the advance has blunted and we’ve regressed to the former mean.

To some extent, this is an indictment of the literature. There is a gap in current practice and the proper scientific method. Unfortunately, progress starting 30 years ago was not sustained. Part of the issue is genuine animosity toward V&V from many quarters. I attribute much of this animosity to the dull process aspect of V&V. A worry about V&V as regulation contributed to this pushback. V&V as a process and requirement also challenged the role of editors and referees as the ultimate gatekeepers of science.

“In the long run, we shape our lives, and we shape ourselves. The process never ends until we die. And the choices we make are ultimately our own responsibility.” ― Eleanor Roosevelt

Science has always had an element of magic to it. The ability of models expressed in mathematics to describe the universe is incredible. It does feel almost magical when you first encounter it. Progress is a constant source of wonder. V&V is often a source of doubt. As such it challenges progress and is resisted by many. Instead, V&V should be a source of further focus and inspiration for science. It is an engine for better science and more solid progress. Science should also be the place for V&V to claim its place and legitimacy. V&V provides evidence of where science should focus on progress. Again, this challenges the gatekeepers.

If V&V continues to be a regulatory and bureaucratic process, it will die, It becomes part of our modern decline and descent into mediocrity. The path forward for V&V is to be an engine of knowledge and discovery. This focuses on action through principles and the adoption of practices that science depends upon. Good V&V is good science and could flourish as such.

“Whoever fights monsters should see to it that in the process he does not become a monster. And if you gaze long enough into an abyss, the abyss will gaze back into you.” ― Friedrich Nietzsche

https://williamjrider.wordpress.com/2016/12/22/verification-and-validation-with-uncertainty-quantification-is-the-scientific-method/

https://williamjrider.wordpress.com/2016/10/25/science-is-still-the-same-computation-is-just-a-tool-to-do-it/

Stumbling Into Mediocrity

12 Saturday Apr 2025

Posted by Bill Rider in Uncategorized

≈ 2 Comments

tl;dr

My early adult life was marked by a vigorous pursuit of excellence. My time in Los Alamos provided the example, path, and environment for it. There I started to achieve it. At the same time, everything around me was decaying. Excellence is under siege in the USA. Eventually, the loss of excellence was too great and overtook the positive. The societal undertow has grown into a vortex of sprawling disappointment. Mediocrity lurks around every corner, and it’s swallowing excellence everywhere. Make no mistake, expanding mediocrity is the hallmark of our time. Excellence is in the past and receding. How deep into incompetence will we go?

“Some men are born mediocre, some men achieve mediocrity, and some men have mediocrity thrust upon them.” ― Joseph Heller

A Rant About Mediocrity

I’m going to start with a rant including a lot of cursing. So if you’re not down for that, stop reading now!. Frankly, if you can’t take some cursing, you’re not my people anyway. The situation we are in should make all of us very fucking angry.

I am really pissed off by this topic. I’m pissed off by the state of my country and the places I work. I’m following the mantra of writing what disturbs you, and this topic fills me with incandescent rage. In summary, we had greatness and excellence once upon a time. We have collectively managed to completely and totally fuck this up. While I will get to the reasons for the descent into incompetence, first and foremost I’m angry. Really deeply fucking angry at how we lost our edge. I’ll just note that I’ve spent a career developing expertise and accumulating knowledge professionally. What does this mean today? Fuck all the fuckwits they’ve put in charge.

I say this knowing many of the leaders at the Labs. Somehow we have a system that takes competent talented people and turns them into idiots. Great people are brought low instead of lifted up. Sometimes the result is pure incompetence or decisional paralysis. In other cases, they become unethical assholes, or the asshole becomes a monster. Societal forces seem to generate incoherence and destroy rationality. Our collective competence is far less than the sum of the parts. This is why I wrote the rest of this. Something about our system and society today is destroying everything good. It does seem to be the pathos of the current age that social stupidity is scheming to demolish any sense of excellence. Just look at our National leaders. Otherwise, talented smart, and successful people suck up ignorance and stupidity. They completely reject their own competence because the system can’t deal with it.

I see this directly at the labs every single day. You would think the importance of nuclear weapons might matter enough. It doesn’t.

What could possibly go wrong? Oh yeah, we aren’t testing these weapons and asserting they work via scientific prowess. Excellence at the Labs matters a lot, or it should. That fact is that it doesn’t really matter today. The approach is that Nuclear Weapons excellence can just be messaged. Except it can’t and I’m sure our adversaries in Beijing or Moscow can see through the bullshit. They know the truth. Our prowess has been in freefall for decades under the yoke of the same elements seen broadly in Washington today. These elements are the hegemonic power of money, lack of trust, and soul-crushing process. The entirety of politics and society bears responsibility. Politics on the left and the right have eroded excellence. One shouldn’t make the mistake of blaming Trump or Obama, Biden or Bush. The problem is all of us.

The path out of this is similarly society-wide. All of us need to find the way out.

“Mediocrity is contextual.” ― David Foster Wallace

The Pursuit of Excellence

When I step back and look at my personal history, I am so fucking lucky to be where I am. I had a solid middle-class upbringing and had a reasonable academic record prior to college. Frankly, I was skating by on my brains and putting little effort into academics. I did just enough so that my parents wouldn’t get wise to my habit of fucking off. I went to college and had an unremarkable record as an undergrad at a third-tier university (New Mexico). Granted, I got married early and worked full-time for most of that time, but my grades we just barely okay. I got my bachelor of science at a shit time to get a job in Nuclear Engineering. So I applied for jobs and didn’t even get a single interview.

“A life of mediocrity is a waste of a life.” ― Colleen Hoover

So, I defaulted into grad school at New Mexico. By the end of my first year, I managed to even disappoint myself. I saw my professional dreams dying due to my own self-imposed mediocrity. I made a pact with myself to get my shit together and start living up to my potential. I spent an entire summer relearning all my undergrad knowledge and skills. I entered the next year as a totally different student. From then on, I kicked ass as a student. I was the stud I could have always been. In short order, I realized my major professor was a complete asshole and I needed to escape from him. Note that I had a fully funded PhD project from NASA at that point that I was rejecting.

I broke from the professor in an epic meltdown. I thought of going to another school and found that between money and my grades; it was impossible. It was time to get a job. This was the best and luckiest decision of my professional life. I was looking for a job at the perfect time. I had everything needed to get a job: the right degree, an MS in Nuclear Engineering, USA citizenship, and a pulse. I had six interviews and six job offers. A couple of the jobs were horrible and non-starters (the interviews are entertaining and great stories though). Two were from local Beltway bandits (or Mesa bandits in New Mexico). They were okay. The last two were from National Labs including Los Alamos. The Los Alamos job was the best by a huge margin. After an urgent call to LANL, I got an offer and I took it.

Los Alamos was perfect; well close to perfect compared to elsewhere. For a student who had gotten their shit together, gaining huge ambition, it was a great environment. It was well beyond what a mediocre student from a third-rate university could hope to expect. I jumped in and immediately felt well out of my depth. I loved it and I was bathed in the excellence that defined Los Alamos. Better yet, the culture of Los Alamos was generous to a fault. I could tap into many people who were smarter than anyone I’d ever known. They would share their knowledge willingly and I grew. My work and colleagues were challenging and brilliant. I got better each and every day.

Los Alamos supported me in getting my PhD. The environment made me grow in ways I’d never anticipated. I finished my degree and continued to grow. Los Alamos was like the greatest grad school imaginable. Gradually, I started to feel that I was in my depth. I began to fit in. I began to meet my actual potential. Suddenly, the imposter syndrome that overcame me at Los Alamos disappeared. I was capable and I was an expert now. The excellence of Los Alamos had rubbed off on me. I had imbibed the culture of this magical place, and it transformed me. I had become a Los Alamos scientist and I belonged.

Little did I know that all of this was going to be destroyed by a tidal wave of idiocy and ignorance. The same idiocy and ignorance laying siege to all of us today. What I’ve come to realize the forces were already destroying Los Alamos and places like it for years before. The difference is that the storm was about to turn itself up to gale force. The terrifying fact is that the storm may be about to crank up to catastrophic hurricane force as you read this. Landfall is imminent if not already upon us.If we aren’t careful it will sweep everything good away. The danger is real. Mediocrity will be our legacy.

“The only sin is mediocrity.” ― Martha Graham

Money over Principles; Regulated to Death

“Ignore the critics… Only mediocrity is safe from ridicule. Dare to be different!” ― Dita Von Teese

How did we get to this point? We need to look back in history to the presidency of Ronald Reagan. The generally acknowledged wisdom at Los Alamos is that the Lab peaked in 1980. That was the year that Harold Agnew stepped down as Lab Director. Harold was a key person in the Manhattan Project and witness to major events in that age. Los Alamos went through forgettable leadership while government stewardship passed from the Atomic Energy Commission to the Department of Energy. This was a pure downgrade. The real corrosive influence was the attitudes of the government toward governance. Reagan represented a lack of trust and opposition to all things government. These forces unleashed by Reagan have grown and metastasized into a vile destructive force.

One of the major things coming from this period is a business principle. Milton Friedman’s approach to the business of maximizing shareholder value has become ever-present. It has become an engine of capitalism run amok. Businesses must always grow akin to a cancerous tumor. Sustainable business has gone out of fashion. The thing that matters to science and the Labs is the view that business principles became a one-size-fits-all all-cure to all things. By the mid-2000s this attitude would fully infect the Labs and reap destructive results for these paragons of science. We changed the social contract with the Labs from stewardship for the public good to corporate management. Somehow we thought a guiding principle to serve the Nation was bad. It needed to be replaced by a for-profit business. This change has only brought destruction.

The other force at work and in tandem is the regulation of every risk in sight. This regulation is dual in approach. On the one hand, it is an attempt to manage every single risk possible. It seeks to ensure that bsd things don’t happen. The other purpose is a general lack of trust for each other and institutions. Both of these desires are extremely expensive. Additionally, they are a bizarre way to provide accountability. Rather than leadership being accountable, the blame is projected onto everyone. Ultimately, the regulation ends up standing in the way of accomplishing things while driving up costs. It is inefficient and disempowering. It also speaks to a desire to control outcomes irrationally. The micromanagement of finances is driven by a lack of trust too. It amplifies all our leadership issues. Accomplishment becomes impossible.

The Nation only suffers and the benefits are illusions. The most corrosive influences of shareholder value are two-fold: money as a measure and short-term focus. The end of the Cold War brought the end of generous and necessary funding to the Labs. Congress now deemed it necessary to micromanage the Lab’s work and research. Over the preceding decades the micromanagement has grown to infect every detail of the Lab’s work Congress defines priorities rather than trust the experts. The overhead and intrusion have only powered a continuous lowering of standards and sapping of intellectual vigor. We now have little flexibility and massive oversight of all activities. The result has been continually lowering standards of work along with risk aversion. All of this is in service of controlling work and deflecting blame.

This influence has been modest in comparison to business-inspired management. The shareholder value-driven management philosophy is completely inappropriate for the Lab’s work. The real core of the problem is the lack of trust associated with how the Labs are managed. We have seen an explosion of oversight driven by suspicion and scandal avoidance. Technical work is graded by people who effectively have no independence. The management’s bonuses are dependent on good grades, and the reviewers know it. If you don’t give good grades you aren’t asked to review again. That paycheck is gone for the reviewer. In this way, duty fades away, and money corrupts the process. The Labs continue to be excellent, but it’s all smoke and mirrors. The truth on the ground is decline. Continuous, profound, and sclerotic decline over decades spreads like a cancer choking the Labs.

The problems with the shareholder value philosophy are becoming obvious at a societal level. In business, the approach can be applied to some significant benefit. With limits, this is where the approach has some virtue. It also has limits, such as supercharging inequality as an acute example. Its problems show up in producing sustainable businesses where growth isn’t an objective. This portends a conclusion that for managing science for National Security at the Labs, the idea is absolute lunacy. There is no profit to be had. The short-term focus that the stock market thrives on makes no sense. The result of the management is the destruction of any long-term health. Science at the Labs is withering under the yoke.

“The key to pursuing excellence is to embrace an organic, long-term learning process, and not to live in a shell of static, safe mediocrity. Usually, growth comes at the expense of previous comfort or safety.” ― Josh Waitzkin

Excellence is Hard; Mediocrity is Easy

“Caution is the path to mediocrity. Gliding, passionless mediocrity is all that most people think they can achieve.” ― Frank Herbert

There is little doubt that the Labs used to be great. The apex of this fulcrum is 1980. Los Alamos has faded; Livermore has faded; Sandia has faded too. The USA and the World are poorer for it. The excellence was supported and needed during the Cold War. The stakes of the work happening at the Lab demanded excellence and the Nation allowed it. Just as the USA grabbed victory in the Cold War, the support went. Part of the end of the Cold War was the hubris of “Star Wars”. The whole SDI idea was bullshit, but the excellence of the Labs sold the idea. The lie was huge and the Soviets believed it. The cost was destroying the trust of the Nation in the process. Relying on a bullshit idea like SDI played into the growing anti-government lack of trust.

Part of the issue is the use of financial incentives in management. For example, money can easily corrupt peer review. If it becomes clear that the reviewers are dependent on giving good grades to get a paycheck, the good grades come without the work. This is where the Labs are at. External reviews help determine the executive pay. The end result is that the reviews are always good. A bad review will lead to the reviewers not getting invited. It also projects onto the regulatory impulses where contracts falsely try to control management via regulated peer review. Leaders aren’t empowered and then held accountable simply. The cumulative result is a neutering of feedback. The reviews turn into admiration societies and increasingly have no value at all. For the Lab organizations, the alarm bell never rings, and quality simply degrades year after year.

The root of the issue then becomes the path of least resistance. Excellence is a hard thing to manage and requires focused attention. Mediocrity on the other hand is simple. Especially when all that really matters is the marketing of the work. It is easier to focus on perceptions of the work. Even this becomes simple as it is clear that standards are actually non-existent. You end up focusing on the work that is politically hot and leads to funding. You also focus on work that is flashy or presents well. More toxically, you simply know that the work is “world-class by definition” and the review has baked in results. All of this snowballs into a steady march toward mediocrity. There ends up being very little incentive for excellence to counteract these forces.

“It is difficult to get a man to understand something, when his salary depends on his not understanding it.” ― Upton Sinclair,

The Path Out

How do we get out of this?

While I can retire, I care about these topics deeply. The devolution of the Labs has been painful to watch and makes me seethe with anger. I am saddened for the Nation. These institutions are (or were) National treasures that have been squandered. The combination of mismanagement and lack of trust has wreaked havoc with the quality of the work. In many ways, the things that have hurt the Labs simply parallel the broader ills of society. The way out is similar to how the Country must heal from its downward spiral.

“Bullshit is truly the American soundtrack.” ― George Carlin

Appropriate and excellence-focused management is needed. We need to reorient the incentive structure and objectives for the Labs. The big issues are societal. There needs to be principles and values that transcend money. Excellence needs to have value for its own sake. There should be explicit empowerment to pursue excellence. Excellence needs to be recognized and bullshit needs to be called out. This will be painful. There is a lot of bullshit out there that managers think is great. We also need to take risks and allow failures. Without big risks and failures, excellence cannot grow. Risk and failure need to come from trying to achieve big things. This needs to be recognized for what it is and not punished or mislabeled as incompetence. This is a difficult thing to do. It is especially difficult in a time when bullshit is so regularly accepted.

The Labs need trust. We all need more trust. Trust is empowering. One of the key aspects of the environment that chokes excellence is an obsession with process. Most of this process is the result of mistrust. When there is mistrust there cannot be true excellence. The temptation or suspicion of bullshit is always present. Failure isn’t tolerated and punished. Rather than fail and learn, we fail and lie. Risks aren’t taken either because the downside is too extreme. We exist in an environment where every mistake is punished. The process is there to keep mistakes from happening. The result is no risk. Without risk, there is no progress or innovation. When it is all summed up, we can see that trust is a superpower.

At a deeper level, the difficulty of excellence can be seen to be one of lack of vulnerability. This is reflected in the humility needed for learning and the proper lessons from failures. Failures require trust and fuel the accumulation of expertise. Accepting all of these mishaps requires courage of vulnerability. We all of course see how today’s World chafes against this. Hubris, falsehoods as truths, and outright shameless bullshit are all expected. Vulnerability and failures are met with attacks, punishments, and reprisals. In these vile habits, excellence is snuffed out, and the tumble to mediocrity is catalyzed and becomes inevitable. When bullshit is as respected as truth, knowledge becomes negotiable. Then mediocrity cannot be separated from excellence. This is the state of things today.

“When an honest man speaks, he says only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the truth nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.” ― Harry G. Frankfurt

How V&V Fits Into My Career

05 Saturday Apr 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

My career is drawing to a close. In looking back it is obvious that V&V played a crucial role. This was never intended, but rather an outgrowth of other goals. The main driver was numerical methods research. V&V assisted my research and became a secondary focus. Along the way, I encountered remarkable resistance to V&V. This is because V&V challenges expert-based gatekeeping. It replaces their judgment with evidence and metrics. The response from V&V should be transformed into something supported. This is a connection to the classical scientific method applied to computations.

How did V&V intersect my Career?

“What’s measured improves” ― Peter Drucker

I am at a point personally where reflection on the past is quite natural. My professional time at revered institutions is drawing to its natural end. At the same time, my father is nearing death in a slow painful decline. My scientific career seems to be undergoing a parallel decline. It feels like it is crawling to the grave ushered by a lack of vision and strategy everywhere. Science and research options are under siege. Rather than being repaired, the decline is accelerating. Our science and engineering is in deep decline. Money is the ruling principle while quality is ignored. The result is an expanding mediocrity.

I have seen a host of significant events during my career that shaped and framed the World. The Cold War ended at the beginning marked by the Berlin Wall coming down in 1989. Working closely within the institutions that oversee nuclear weapons means that politics matter. World events are never far from shaping the work while providing emphasis to our responsibility. The technical work and its quality have always mattered. The stakes are huge. Events today may dwarf anything else from the span of my career. We shall see. I hope this is hyperbolic, but I fear not.

The quality of our work matters. It should matter more greatly if you are working on nuclear weapons. It is what I believe with all my heart. I’ve always embraced this as a primal responsibility. Verification and validation (V&V) is fundamentally about quality. This is why I got involved with it. The core aspect of V&V is measurement and evidence. It is a way of seeing the details of your work without appealing to expert judgment. It was a reaction to science that is ruled by expert gatekeepers.

Being an expert gatekeeper is a great gig. Usually that gatekeeper role is earned through accomplishments. Once the gatekeeper makes progress they often stand in the way of it. The gatekeepers then oppose anyone who disagrees with them. The gatekeepers are often journal editors and common reviewers. Too often they use this position to act as resistance to change and new ideas. These days the gatekeeper role is supercharged by how funding flows. In a day of science contracting, the money has even more power to strangle progress.

“If you thought that science was certain – well, that is just an error on your part.” ― Richard P. Feynman

How V&V Became a Thing for Me?

When I got started in science I wasn’t doing V&V. I was doing a little V&V, but didn’t know it. Like most of you I copied what I saw in the literature. I found ideas that I gravitated towards and then wrote papers like those scientists. Their papers were the roadmap for how I did my work. You adopt the accepted practices of others Eventually as you find success, you start to adapt. I was fortunate enough to get to work with some big names on a large research project. The tendency of youth is to listen with rapt attention to the experts. Over time, I grew tired of simply trusting experts; I wanted to see the receipts. I trusted and respected their work and judgment, but I also needed evidence.

I started to see the cracks in their story. We were working with a couple of big names in computational physics and applied math. They were some of the scientists whose work I’d loved early on. Every couple of months they would travel to Los Alamos, or we’d travel to California for a project meeting. At these project meetings, we would be lectured on the “gospel” of the work. The issue was that the “gospel” changed a little bit each time. Eventually, I found that I needed to start doing everything myself. I needed to understand in detail of the “gospel”. I needed to see the evidence and verify what I heard.

“In questions of science, the authority of a thousand is not worth the humble reasoning of a single individual.” ― Galileo Galilei

This process was my real transformation into a V&V person. I created an independent implementation of everything including testing. I would reproduce tests done by others and then create my own tests. During this time I documented everything and began to adopt my basic mantra of code testing. This mantra is “always know the limits of your code, and how to break it.” This meant I understood where the code worked well and where it fell apart. It tells you where you can safely use the code.

It also tells you where the code falls apart. This is where you should do work to make things better. This should set the research agenda. I have always seen V&V as a route to progress instead of simply measuring capability. V&V should provide evidence to support expanding capabilities. Today, the route to progress via V&V is weak to non-existent.

One of the lessons I learned was the separation between robustness and accuracy tests. Progress happens through transitioning robustness tests into accuracy tests. A robustness test is basically “Can the code survive this and give any answer?”. The accuracy test is “Can the code give an accurate answer?” This was useful then and continues to be a maxim today. We should always be pushing this boundary outward. This is a mechanism to raise capability and do better.

“We learn wisdom from failure much more than from success. We often discover what will do, by finding out what will not do; and probably he who never made a mistake never made a discovery.” ―Samuel Smiles

The Problem with V&V

“There’s nothing quite as frightening as someone who knows they are right.” ― Michael Faraday

In short order, I moved to the Weapons Physics Division in Los Alamos (the infamous X-Division). X-Division was ramping up development efforts to support Stockpile Stewardship. This was the ASCI program. The initial ASCI program was basically writing codes for brand-new supercomputers. The focus was on the computers first and foremost, but the codes were needed to connect to nuclear weapons. The progress was the desire to change from existing codes denoted as “legacy” to new codes. New codes were mostly needed because of the change in computers. This was not about writing better codes, but just using better computers.

The rub was that the legacy codes were trusted by the people who designed weapons. They were the simulation tools used to design weapons in the era when we tested these more fully. This trust was essential to the results of the codes. The new codes were not trusted. To replace the legacy codes this trust needed to be built. One of the mechanisms to build trust was defined as the processes known as V&V. The key part of the trust was validation. Validation is the comparison of simulations with experimental data. The problem with V&V is a certain emotionless approach to science. V&V is process-heavy and emotionless.

“The measure of intelligence is the ability to change.” ― Albert Einstein

Why is the process a problem?

The trust and utility of the legacy codes were mostly granted by experts. The people who designed weapons were the experts! “Designers”. They took an adversarial view of V&V and its process. This process is not expert-based, but rational and metric-based. What I have seen over and over in my career is tension between experts and process. V&V is rejected because of its non-expert rational approach. It was also rather dry and dull compared to the magic of modeling nature on computers. My original love of modeling on the computer was the embrace of its “magic.” It’s fair to say this same magic enchanted others.

I believe that the biggest problem for V&V is the dullness and process. V&V needs to capture more of the magic of modeling. The whole attraction of science is the ability of theory to explain reality. Computation is the way to solve complex models. This is part of the very essence of the scientific method.

“Any sufficiently advanced technology is indistinguishable from magic.” ― Arthur C. Clarke

Seeing V&V Clearly

The V&V program was added to the ASCI program in 1998. It tried to fill the gap of rational process in adopting the new codes. This rational process was supposed to build trust in these codes. Implicitly this put it into direct conflict with the power of experts. Nonetheless, V&V grew and adapted to the environment gaining adherents and mindshare. We can see V&V growing in other parts of the computational modeling world. In broad terms, V&V grew in importance through the period of 2000-2010. After this, it peaked and now has started to decay in interest and importance. IMHO the reason for this is how dull and process-oriented V&V tends to be.

“Magic’s just science that we don’t understand yet.” ― Arthur C. Clarke

A big part of this decay is the continued resistance by experts to the process aspects of V&V. I experienced it directly with my own work. I had the journal editor tell me to “get that shit out of the paper.” While the resistance to V&V at Los Alamos was driven by designer culture. Resistance to V&V was far less at Sandia, but still present. Engineers love processes, but physicists don’t. Still, V&V gets in the middle of processes engineering analysts like. For example, both designers and analysts like to calibrate results.

“The most serious mistakes are not being made as a result of wrong answers. The true dangerous thing is asking the wrong question.” ― Peter Drucker

They like to calibrate to data so that the simulations match experiments well. Worse yet they like to calibrate in ways that are not physically defensible. I’ve seen it over and over at Los Alamos (Livermore too) and Sandia. V&V stands in opposition to this. The common perspective is that V&V is accepted only so long as the results rubber stamp the designer-analyst views. If V&V is more critical, the V&V is attacked. The cumulative effect is for V&V to wane. We see V&V get hollowed out as a discipline.

“We may not yet know the right way to go, but we should at least stop going in the wrong direction.” ― Stefan Molyneux

Another program added to V&V’s waning influence. The exascale program spun up around 2015. In many respects, this program was a redux of the original ASCI with a pure focus on supercomputing. Moore’s law was dying and the USA doubled down on supercomputing research. This program was far more computer-focused than ASCI ever was. It also didn’t try to replace legacy code but rather focused on rewriting the legacy codes. This reduced resistance. It also reduced progress. At least the original ASCI program wrote new codes, which energized modernizing codes. The exascale program lacked this virtue almost entirely. Hand-in-hand with the lack of modernization was a lack of V&V. There was no V&V focus in the exascale. The exascale view was simply that legacy methods are great and just need faster computers. To say this was intellectually shallow is an understatement of extreme degree.

“Management is doing things right; leadership is doing the right things.” ― Peter Drucker

My own theory was that V&V needed to move past its focus on process. V&V needed to be seen differently. My observation was that V&V was really just the scientific method for computational modeling. Verification is a confirmation of solving the theory correctly. Validation is the comparison of theory with experiments (or observation). The real desire here is to connect V&V to the magic of modeling. I wanted to make V&V more smoothly part of the things I love about science and attracted me to this career in the first place.

What Can We Learn?

“Men of science have made abundant mistakes of every kind; their knowledge has improved only because of their gradual abandonment of ancient errors, poor approximations, and premature conclusions.” ―George Sarton

If I look back across my career a few things stick out. One is how the programs rhyme with each other. The original ASCI program was much like the Exascale program. We learned how to fund a focus on big hardware purchases, but not the science parts. In almost every respect the Exascale program was worse than ASCI. It was much less science and much more computers. The way this happened reflects greatly on the forces undermining science more broadly. Computers get interest from Congress, but science and ideas don’t. That interest creates the funding needed, and everything runs on money. Money has become the measure of value for everything today.

“People who don’t take risks generally make about two big mistakes a year. People who do take risks generally make about two big mistakes a year.” ― Peter F. Drucker

The biggest lesson is how irrational science is. Emotions matter a lot in how things play out. We would like to think science is rational, but it’s not. Experts are gatekeepers and they like their power. Rational thought and process are the expert’s enemy. V&V is unrelentingly rational and process-based. Thus the expert will fight V&V. Experts also tend to be supremely confident. This is the uphill climb for V&V and the basis of its decline. The other piece of this is money and its power. Money is not terribly rational, and very emotional. It is the opposite of principle and rationality. This combines to sap the support for V&V.

None of this changes the need for V&V. The thing needed more than anything is a devotion to progress. V&V is a tool for measuring progress and optimizing the targeting of progress. The narrative of V&V as the scientific method also connects better with emotion. In the long run a better narrative and devotion to progress will rule and V&V should play its role.

“The best way to predict your future is to create it” ― Peter Drucker

“The only way of discovering the limits of the possible is to venture a little way past them into the impossible.” ― Arthur C. Clarke

Rider, W. J. “Approximate projection methods for incompressible flow: implementation, variants and robustness.” LANL UNCLASSIFIED REPORT LA-UR-94-2000, LOS ALAMOS NATIONAL LABORATORY. (1995).

Puckett, Elbridge Gerry, Ann S. Almgren, John B. Bell, Daniel L. Marcus, and William J. Rider. “A high-order projection method for tracking fluid interfaces in variable density incompressible flows.” Journal of computational physics 130, no. 2 (1997): 269-282.

Drikakis, Dimitris, and William Rider. High-resolution methods for incompressible and low-speed flows. Springer Science & Business Media, 2005.

Rider, William J., and Douglas B. Kothe. “Reconstructing volume tracking.” Journal of computational physics 141, no. 2 (1998): 112-152.

Greenough, J. A., and W. J. Rider. “A quantitative comparison of numerical methods for the compressible Euler equations: fifth-order WENO and piecewise-linear Godunov.” Journal of Computational Physics 196, no. 1 (2004): 259-281.

Rider, William J., Jeffrey A. Greenough, and James R. Kamm. “Accurate monotonicity-and extrema-preserving methods through adaptive nonlinear hybridizations.” Journal of Computational Physics 225, no. 2 (2007): 1827-1848.

Verification Is Essential; Verification is Broken

23 Sunday Mar 2025

Posted by Bill Rider in Uncategorized

≈ 2 Comments

tl;dr

The practice of verification is absolutely essential for modeling and simulation quality. Yet, verification is not a priority; quality is not a priority. It is ignored by scientific research. This is because verification is disconnected from modeling. Also, it is not a part of active research. The true value of verification is far greater than simple code correctness. With verification, you can measure the error in the solution with precision. Given this, the efficiency of simulations can be measured accurately (efficiency equaling effort for given accuracy). Additionally, the resolution required for computing features can be estimated. Both of these additions to verification connect to the broader scientific enterprise of simulation and modeling. This can revitalize verification as a valued scientific activity.

“Never underestimate the big importance of small things” ― Matt Haig

The Value of Verification

“Two wrongs don’t make a right, but they make a good excuse.” ― Thomas Szasz

Conceptually, verification is a simple prospect. It has two parts; is a model correct in code and how correct is it? Verification is structured to answer this question. Part one about model correctness is called code verification. Part two is about accuracy called solution verification. This structure is simple and unfortunately lacks practical priority. This leads to the activity being largely ignored by science and engineering. It shouldn’t be, but it is. I’ve seen the evidence in scientific proposals. V&V is about evidence and paying attention to it. There is a need to change the underlying narrative around verification.

Under the current definition, code verification relies upon determining the order of accuracy for correctness. There is nothing wrong with this. The order of accuracy should match the design of the code (method) for correctness. This is connected to the fundamental premise of advanced computers. More computing leads to better answers, This is the process of convergence where solutions get closer to exact. This produces better accuracy. Today this premise is simply assumed and evidence of it is not sought. Reality is rarely that simple. Solution verification happens when you are modeling and do not have access to an exact solution. It is a process to estimate the numerical error. These two things complement each other.

“The body of science is not, as it is sometimes thought, a huge coherent mass of facts, neatly arranged in sequence, each one attached to the next by a logical string. In truth, whenever we discover a new fact it involves the elimination of old ones. We are always, as it turns out, fundamentally in error.” — Lewis Thomas

In code verification, you can also computer the error precisely. The focus on the order of convergence dims the attention to error. Yet the issue of error in science is primal. Conversely, the focus on errors in solution verification dims the order of convergence there. In doing work with verification both metrics need to be focused upon equally. Ultimately, the order of accuracy and diminishing of error should be emphasized.

As will be discussed later, the order of accuracy influences the efficiency mightily. A broad observation from my practical experience is that the order of accuracy in application modeling is low. It is lower than expected in theory. It is lower than the method designed going into codes. Thus, the order of convergence actually governs the efficiency of numerical modeling. This is combined with the error to determine the efficiency of the simulation.

“The game of science is, in principle, without end. He who decides one day that scientific statements do not call for any further test and that they can be regarded as finally verified, retires from the game.” — Karl Popper

Why Verification is Broken?

Verification is broken because it is disconnected from science. It has been structured to be irrelevant. In reviewing more than 100 proposals in modeling and simulation over a few years this is obvious. Verification as an activity is beneath mentioning. When it is mentioned it is out of duty. It is simply in the proposal call and mentioned because it is expected. There is little or no earnest interest or effort. Thus the view of the broader community is that verification is an empty activity, not worth doing. It is done out of duty, but not out of free will.

I should have seen this coming.

“True ignorance is not the absence of knowledge, but the refusal to acquire it.” — Karl Popper

Back in the mid-oughts (like saying that!) I was trying to advance methods for solving hyperbolic conservation laws. I had some ideas about overcoming the limitations of existing methods. In doing this work it was important to precisely measure the impact of my methods compared with existing methods. Verification is the way to do this. I highlighted this in the paper. The response to the community via the review process was negative,… very negative,… very fucking negative.

In the end, I had to remove the material to get the paper published. I got a blunt message from an associate editor, “if you want the paper published, get that shit out of your paper”. By “shit” they meant the content related to verification. I’ll also say this is someone I know personally, so the familiarity in the conversation is normal. Even worse this comes from someone with a distinguished applied math background with a great record of achievement. You find that most of the community despises verification.

I will note in passing that this person’s work actually does very well in verification. In another paper, I confirmed this. For more practical realistic problems it does far better than a more popular method. It would actually benefit greatly from what I propose below. What had become the publication standard was a purely qualitative measure of accuracy for the calculations that matter. Honestly, this attitude is stupid and shameful. It is also the standard that exists. As I will elaborate shortly, this is a massive missed opportunity. It is counter-productive to progress and adoption of better methods.

I found this situation to be utterly infuriating. It was deeply troubling to me too. When I stepped back to look at my own career path I realized the nexus of the problem. Back in the 1990’s I got into verification. I used verification to check the correctness of the code I wrote, but it was not the real value. I used verification to measure the efficiency and errors in the methods I developed. I used it to measure the error in the modeling I pursued. The direct measure of error and its comparison to alternatives was the reason I did it. It provides direct and immediate feedback on method development. These notions are absent from the verification narrative. Measuring and reducing errors is one of the core activities of science. It is the right way to conduct science.

Verification needs to embrace this narrative for it to have an impact.

How to Fix Verification?

“I ask you to believe nothing that you cannot verify for yourself.” — G.I. Gurdjieff

As noted above, the key to fixing verification is to keep both orders of accuracy and numerical accuracy in mind. This is true for both code and solution verification. The second part of the fix is expanding the utility of verification. Verification can measure the efficiency of methods. What I mean by efficiency requires a bit of explanation. The first thing is to define efficiency.

Simply put, efficiency is the amount of computational resources used to achieve a certain degree of accuracy. The resource would be defined by mesh size and number of time steps (degrees of freedom). The algorithm used to solve the problem would combine to define the amount of computer memory and a number of operations to use. Less is obviously better. Runtime for a code is a good proxy for this. Lower accuracy is also better. The convergence rate defines the relationship between the amount of effort and accuracy. The product of these two defines the efficiency. Lower is better for this composite metric.

The first time I published something that exposed this was with Jeff Greenough. We compared two popular methods on a set of problems. One was the piecewise linear method using Colella’s improvements (fourth-order slopes). It was a second-order method. The second method is the very popular Weighted ENO method, which is fifth-oder in space and third-order in time. Both of these methods are designed to solve shock wave problems. One might think that the fifth-order method should win every time. This is true if you’re solving a problem where this accuracy matters. The issue is that all the applications of these methods are limited to first-order at best.

“Science replaces private prejudice with public, verifiable evidence.” — Richard Dawkins

This is where the accepted practice breaks down. When Gary Sod published his test problem and method comparison the only metric was runtime. Despite having an analytical solution, no error was measured. Results for the Sod shock tube problem are always qualitative. Early on, results were bad enough that qualitative mattered. Today all the results are qualitatively good and vastly better than nearly 50 years ago. This is accepted practice and implies that there is no difference quantitively. This is objectively false. At a given mesh resolution the error differences are significant. I showed this with Jeff for two “good” methods. As I will further amplify in what follows, at the lower convergence rates in shocks the level of error means vast differences in efficiency. This is true in 1-D and becomes massive in 3-D (really 2-D and 4-D when you add in time).

“The first principle is that you must not fool yourself and you are the easiest person to fool.” — Richard P. Feynman

It turns out that the 5th-order WENO method is about six times as expensive as the 2nd-order PLM scheme. This was true of the desktop computers of 2004. It is close to the same now. The WENO method might have better computational intensity and have advantages on modern GPUs. What we discovered was that the second-order method produced half the error of the WENO method on simple problems (Sod’s shock tube). Thus the WENO method didn’t really pay off. At first-order convergence, this would mean that WENO would need about 24 times the effort to match the accuracy of PLM. For problems with more structure, the situation gets marginally better for WENO. In terms of efficiency, WENO never catches up with PLM, ever. As we will shortly see in 3-D the comparison is even worse. The cost of refining the mesh is much more costly and the accuracy advantage grows.

“Whatever you do will be insignificant, but it is very important that you do it.” ― Mahatma Gandhi

If This is Done, Verification’s Value Skyrockets

“The truth is rarely pure and never simple.” ― Oscar Wilde

Let’s consider a simple example to explain. Consider a method that is twice as expensive and twice as accurate as another method. The methods produce the same order of convergence. The order of convergence matters a great deal in determining efficiency. Consider three-dimensional time-dependent calculations. If the methods are fourth-order accuracy there is a break even. For any lower order of convergence the higher cost, more accurate method wins. The lower the order of convergence the greater the difference. For first-order the advantage is a factor of eight. By the time you drop to half-order convergence, the advantage grows to 128 times.

This example provides a powerful punchline to efficiency. If the order of accuracy is fixed, the level of accuracy makes a huge difference in efficiency. This points to the power of both algorithms and verification in demonstrating the metrics. It is absolutely essential for verification to amplify its impact on science.

“It’s easy to attack and destroy an act of creation. It’s a lot more difficult to perform one.” — Chuck Palahniuk

For problems solving hyperbolic PDEs, the convergence rates are well defined by theory. For the nonlinear compressible structures, the rate is defined as first-order. For linear waves (that Lax defined as linearly degenerate) the convergence rate is less than one. Thus the impact of accuracy is greater. In my experience, first-order accuracy is optimistic for practical application problems. Invariably the accuracy for practical problems codes are applied to are low order. Thus the accuracy for smooth problems where code verification is done has little relevance. This can show that the method is correct as derived, but not relate to the method’s use.

Code verification needs to focus on results that relate more directly to how methods are used practically. This is a challenge that needs focused research. Rather than a check done before use practically, code verification needs utility in the practical use. Today this is largely absent. This must change.

The other great use is the study of efficiency. With Moore’s law, dead and buried algorithms are the path to computational progress. In addition, the use of verification is needed for the expanding use of machine learning (ML, the techniques used for artificial intelligence). The greatest gaps for ML are the absence of theory to support verification. This is closely followed by a lack of accepted practice. Again, this supports algorithm development, which is the path to progress when computing and data are limited.

“The important thing is not to stop questioning. Curiosity has its own reason for existing.” ― Albert Einstein

Sod, Gary A. “A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws.” Journal of computational physics 27, no. 1 (1978): 1-31.

Colella, Phillip. “A direct Eulerian MUSCL scheme for gas dynamics.” SIAM Journal on Scientific and Statistical Computing 6, no. 1 (1985): 104-117.

Jiang, Guang-Shan, and Chi-Wang Shu. “Efficient implementation of weighted ENO schemes.” Journal of computational physics 126, no. 1 (1996): 202-228.

Majda, Andrew, and Stanley Osher. “Propagation of error into regions of smoothness for accurate difference approximations to hyperbolic equations.” Communications on Pure and Applied Mathematics 30, no. 6 (1977): 671-705.

Banks, Jeffrey W., T. Aslam, and William J. Rider. “On sub-linear convergence for linearly degenerate waves in capturing schemes.” Journal of Computational Physics 227, no. 14 (2008): 6985-7002.

Trust, But Verify (for Computing)

14 Friday Mar 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

A faster computer is always a good thing. It is not the best way to get faster or better results. A better program (or method or algorithm) is as important, if not more so. A new algorithm can be transformational and create new value. The faster computer also depends on a correct program, which isn’t a foregone conclusion. Demonstrating that things are done right is also difficult. Technically, this is called verification. Here, we get at the challenges of doing verification across the computing landscape. This is especially true as machine learning (AI) grows in importance where verification is not possible today.

“The man of science has learned to believe in justification, not by faith, but by verification.” ― Thomas H. Huxley

The Basic Premise of Better Computing

All of us have experienced the joy of a faster computer. We buy a new laptop and it responds far better than the old one. Faster internet is similar and all of a sudden streaming is routine and painless. If your phone has more memory you can more freely shoot pictures and video at the highest resolution. At the same time, the new computer can bring problems we all recognize. Often our software does not move smoothly over to the new computer. Sometimes a beloved program is incompatible with the new computer and a new one must be adopted. This sort of change is difficult whenever it is encountered.

Each of these commonplace things has a parallel issue in the more technical professional computing world. There are places where proving the improvement from the new computer is difficult. In some cases, it is so difficult that it is actually an article of faith, not science. It is in these areas where science needs to step up and provide means to prove. My broad observation that the faster, bigger computer as a good thing is largely an article of faith. Without the means to prove it, we are left to believe. Belief is not science or reliable. By and large, we are not doing the necessary work to make this a fact. This means recognizing where the gaps result in the faith is being applied dangerously.

A more critical issue is the recession of algorithmic progress. As we struggle to have faster computers as in the past, algorithms are the means to progress. Instead, we have doubled down on computers as they become worse paths to progress. This is just plain stupid. Algorithmic progress requires different strategies for adopting risky failure-prone research. Progress in algorithms occurs in leaps and bounds after long fallow periods. It also requires investments in thinking particularly in mathematics.

“If I had asked people what they wanted, they would have said faster horses.” ― Henry Ford

Verification in Classical Computational Science

“Trust, but verify.” ― Felix Edmundovich Dzerzhinsky

Where this situation is the clearest is traditional computational science. In areas where computers are employed to solve science problems classically, the issues are well known. To a large extent, mathematical foundations are firmly established and employed. The math is a springboard for progress. A long and storied track record of achievement exists to provide examples. For the most part, this area drove early advances in computing and laid the groundwork for today’s computational wonders. For most of the history of computing, scientific computing drove all the advances. All of it is built on a solid foundation of mathematics and domain science. Today progress lacks these advantages.

In no area was the advance more powerful than the solution of (partial) differential equations. This was the original killer app. Computers were employed to design nuclear weapons, understand the weather, simulate complex materials, and more. These tasks produced the will to create generations of supercomputers. It also drove the creation of programming languages and operating systems. Eventually, computers leaked out to be used for business purposes. Tasks such as accounting were obvious along with related business systems. Still scientific computing was the vanguard. It is useful to examine its foundations. More importantly, it is useful to see where the foundations in other areas are weak. We have a history of success to guide our path ahead.

The impact of computing on society today is huge and powerful It forms the basis of powerful businesses. The incredible run-up of the stock market is all computing. The promise of artificial intelligence is driving recent advances. Most of this is built on a solid technical foundation. In key areas of progress, the objective truth of improvements is flimsy. This is not good. We are ignoring history. In the long run, we are threatening the sustainability of progress and economic success. We need sustained strategic investment in mathematical foundations and algorithmic research. If not, we put the entire field at extreme risk.

If one goes back to the origins of computational science, the use of it showed promise first. First in application to nuclear weapons then rapidly with weather and climate. Based on this success the computers were advanced as the technology was refined. As these efforts began to yield progress mathematics joined. One of the key bits of theoretical work was conditions for proper numerical solutions of models. Chief among this theory was the equivalence theorem by Peter Lax (along with Robert Richymyer). This theorem established conditions for the convergence of solutions to the exact solution of models. Convergence means that as more computing is applied, the solution gets closer to exact.

This is the theoretical justification for more computing. More computing power produces more accuracy. This is a pretty basic assumption made with computing, but it does not come for free. To get convergence the methods must do things correctly. In the same breath, the theory of how to do things better as well. Just as importantly, the theorem gives us guidance on how to check for correctness. This is the foundation for the practice of verification.

In verification, we can do many things. In its simplest form, we get evidence of the correctness of the method. We can get evidence that the method is implemented and provides the accuracy advertised. This is essential for trustworthy, credible computational results. With these guarantees in place, the work done with computational science can be used with confidence. This confidence then allows it to be invested in and trusted. The verification and theory provided confident means to improve methods and measure the impact. For 70 years this has been a guiding light for computational science.

We should be paying attention to its importance moving forward. We are not.

Machine Learning and Artificial Intelligence Are An Issue

“If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.” ― John von Neumann

More recently, the promise of artificial intelligence (AI) has grabbed the headlines. Actually, the technical foundation for AI is machine learning (ML). The breakthrough of generative AI with Large Language Models (LLMs) has rightly captured the world’s imagination and interest. A combination of algorithm (method) advances with high-end computing, powers LLMs. These LLMs are one of the strongest driving forces in the World economically. Their power is founded partly on the fruits of computational science discussed above. This includes computers, software, and algorithms. Unfortunately, the history of success is not being paid sufficient attention to.

The current developments and investments in AI/ML are focused on computers (Big Iron following the exascale program’s emphasis). Secondarily software is given much attention. Missing investments and attention are algorithms and applied math. We seem to have lost the ability to provide focus and attention on laying the ground for algorithm advances. A key driver for algorithmic advances is applied mathematics where the theory guides practice.

For the formative years of computational science applied math gave key guidance. Theoretical advances and knowledge are essential to progress. Today that experience seems to be on the verge of being forgotten. The irony is that the LLM breakthrough in the past few years was dominated by algorithmic innovation. This is the transformer architecture. The attention mechanism is responsible for the phase transition in performance. It is what produced the amazing LLM results that grabbed everyone’s notice. Investments in mathematics could provide avenues for the next advance.

What is missing today is much of the mathematical theory driving credibility and trustworthy methods.

One of the essential aspects of computational science is the concept of convergence. Convergence means more computation yields better results. Mathematics provides the theory underpinning this idea. The process to demonstrate this is known as verification. In verification, convergence is used to prove the correctness and accuracy of algorithms. One of the biggest problems for AI/ML is the lack of theory. This problem is a lack of rigor. Thus the process of verification is not available. Furthermore, the understanding of accuracy for AI/ML is similarly threadbare. Investments and focus to fill these gaps is needed and long overdue.

One of the problems is that this research is extremely difficult and success is not guaranteed. It is likely to be failure-prone and takes some time. Nonetheless, the stakes of not having such a theory are growing. Moreover, success would likely provide pathways for improving algorithms. Many essential ML methods are ill-behaved and perform erratically. Better mathematical theory involving convergence could pave the way for better ML. The theory tells us what works and how to structurally improve techniques. This is what happened in computational science. We should expect the same for AI/ML. Likely, this work would significantly improve trust in systems as well. The combination of trust, efficiency, and accuracy should sufficiently inspire investments. This is if a logical-rational policy was in place.

It is not. Either by the government or the private enterprise. We will all suffer for this lack of foresight.

“Today’s scientists have substituted mathematics for experiments, and they wander off through equation after equation, and eventually build a structure which has no relation to reality. ” ― Nikola Tesla

Algorithms Win and Verification Matters

Anyone who has read my writing knows that algorithms are a clear winning path for computing. Verification is the testing and measurement of algorithmic performance. If one is interested in better computing algorithmic verification is a vehicle for progress. Verification is about producing evidence of correctness and performance. This provides a concrete measurement of algorithmic performance, which can be an engine for progress. In a future without Moore’s law algorithms are the path to improvement.

“Pure mathematics is in its way the poetry of logical ideas.” ― Albert Einstein

As I’ve written before, algorithmic improvement is currently hampered by a lack of support. Some of this is funding and the rest is risk aversion. Algorithmic research is highly failure prone and progress is episodic. A great deal of tolerance for risk and failure is necessary for algorithm advances. All of this can benefit from a focused verification effort. This can measure the impact of work and provide immediate feedback. The mathematical expectations underpinning verification can also provide the basis for improvements. This math provides focus and inspiration for work.

“We can only see a short distance ahead, but we can see plenty there that needs to be done.” ― Alan Turing