Practical Application Accuracy Is Essential

15 Sunday Jun 2025

TL;DR

In classical computational science applications for solving partial differential equations, discretization accuracy is essential. In a rational world, this solution accuracy would rule (other things are important too). It does not! Worse yet, the manner of considering accuracy is not connected to the objective reality of the method’s use. It is time for this to end. Two things influence true accuracy in practice. One is the construction of discretization algorithms, which have some counterproductive biases focused on order-of-accuracy. In real applications, solutions only achieve a low order of accuracy. Thus the accuracy is dominated by different considerations than assumed. It is time to get real.

“What’s measured improves” ― Peter Drucker

The Stakes Are Big

The topic of solving hyperbolic conservation laws has advanced tremendously during my lifetime. Today, we have powerful and accurate methods at our disposal to solve important societal problems. That said, problems and habits are limiting the advances. At the head of the list is a poor measurement of solution accuracy.

Solution accuracy is expected to be measured, but in practice only on ideal problems where full accuracy can be expected. When these methods are used practically, such accuracy cannot be expected. Any method will produce first-order or lower accuracy. Fortunately, analytical problems exist allowing accuracy to be assessed. The missing ingredient is to actually do the measurement. The benefit of changing our practice would cause focus energy and attention on methods that perform better under realistic circumstances. Today, methods with relatively poor accuracy and great cost are favored. This limits the power of these advances.

I’ll elaborate on the handful of issues hidden by the current practices. Our current dialog in methods is driven by high-order methods. These are studied without regard for their efficiency on real problems. The popular methods such as ENO, WENO, TENO, and discontinuous Galerkin dominate, but practical accuracy is ignored. A couple big issues I’ve written about reign broadly over the field. Time-stepping methods follow the same basic pattern. Inefficient methods with formal accuracy dominate over practical concerns and efficiency. We do not have a good understanding of what aspects of high-order methods pay off in practice. Some high-order aspects appear to matter, but others do not yield practical benefit. This includes truly bounding stability conditions for nonlinear systems, computing strong rarefactions-low Mach flows, and multiphysics integration.

The Bottom Line

I will get right to the punch line for my argument and hopefully show the importance of my perspective. Key to my argument is the observation that “real” or “practical” problems converge at a low order of accuracy. Usually, the order of accuracy is less than one, so assuming first-order accuracy is actually optimistic. The second key bit of assumption is what efficiency means in the context of modeling and simulation. I will define it as the relative cost of getting an answer of a specified accuracy. This seems obvious and more than reasonable.

“You have to be burning with an idea, or a problem, or a wrong that you want to right. If you’re not passionate enough from the start, you’ll never stick it out.” ― Steve Jobs

To illustrate my point I’ll construct a contrived simple example. Define three different methods to get our solution that are otherwise similar. Method 1 gives an accuracy of one for a cost of one. Method 2 gives an accuracy of twice as good as Method 1 for double the cost. Method 3 gives an accuracy of four times Method 1 and a cost of four times the cost. We can now compare the total cost for the same level of accuracy looking for the efficiency of the solution. Each method converges at the (optimistic) first-order rate.

If we use Method 3 on our “standard” mesh we get an answer with one-quarter the error for a cost of four. To get the same error as Method 2, we need to use a mesh of half the spacing (and twice the points). With Method 1 we need four times the mesh for the same accuracy. The relative cost of the equally accurate solution depends on the dimensionality of the problem. For transient fluid dynamics, we solve problems in one- two- or three-dimensions plus time. We are operating with methods that need to have the same time step control, the time step size is always proportional to the spatial mesh.

Let’s consider a one-dimensional problem, the cost scales quadratically with mesh (time and one space-time dimension). Our Method 2 will cost a factor of eight to get the same accuracy as Method 3. Thus is cost twice as much. Method 1 needs two mesh refinements for a cost of 16. Thus it costs four times as much as Method 3. So in one dimension, the more accurate method pays off tremendously, and this is the proverbial tip of the iceberg. As we shall see the efficiency gains grow in two or three dimensions.

In two dimensions the benefits grow. Now Method 2 costs 16 units and thus Method 3 pays off by a factor of four. For Method 1 we have a cost of 64 and the payoff is a factor of 16. You can probably see where this going. In three dimensions Method 2 now costs 32 and the payoff is a factor of 8. For Method 1 the payoff is huge. It now costs 256 times to get the same accuracy, Thus the efficiency payoff is a factor of 64. Almost two orders of magnitude difference. This is meaningful and important whether you are doing science or engineering.

Imagine how a seven-dimensional method like full radiation transport would scale. The payoffs for accuracy could be phenomenal. This is a type of efficiency that has been largely ignored in computational physics. It is time for it to end and focus on what really matters in computational performance. The accuracy under conditions actually faced in applications of the methods matters. This is real efficiency, and an efficiency not examined at all in practice.

“Progress isn’t made by early risers. It’s made by lazy men trying to find easier ways to do something.” ― Robert Heinlein

The Usual Approach To Accuracy

The usual approach to designing algorithms is to define basic “mesh” prototypically space and time. The usual mantra is that the most accurate methods are higher order. The higher order the method, the more accurate it is. High-order is often simply more than second-order accurate. Nonetheless, the assumption is that higher-order methods are always more accurate. Thus the best you can do is a spectral method. This belief has driven research in numerical methods forever (many decades at least). This is where every degree of freedom available contributes to the approximation. We know these methods are not practical for realistic problems.

The standard tool for designing methods is the Taylor series. This relies on several things to be true. The function needs to be smooth, and the expansion needs to be in a variable that is “small” in some vanishing sense. This is a classical tool and has been phenomenally useful for centuries of work in numerical analysis. The ideal nature of when it is true is also a limitation. While the Taylor series still holds for nonlinear cases, the dynamics of nonlinearity invariably destroy the smoothness. If smoothness is retained nonlinearly, the problem is pathological. The classic mechanism for this is shocks and other discontinuities. Even smooth nonlinear structures still have issues like cusps as seen in expansion waves. As we will discuss accuracy is not retained in the face of this.

If your solution is analytical in the best way possible this works. This means the solution can be differentiated infinitely. While this is ideal, it is also very infrequently (basically never) encountered in practice. The other issue is that the complexity of a method also grows massively as you go to a higher order. This is true for linear problems, but extra true for nonlinear problems where the error has many more terms. If it were only this simple! It is not by any stretch of the imagination.

“We must accept finite disappointment, but never lose infinite hope.” ― Martin Luther King Jr.

Stability: Linear and Nonlinear

For any integrator for partial differential equations, stability is a key property. Basically, it is a property where any “noise” in the solution decays away. The truth is that there is always a bit of noise in a computed solution. You never want it to dominate the solution. For convergent solution,s stability is one of two ingredients for convergence under mesh refinement. This is a requirement from the Lax equivalence theorem. The other requirement is the consistency of the approximation with the original differential equation. Together this yields the property of convergence where solutions become more accurate as meshes are refined. This principle is one of the foundational aspects of the use of high-performance computing.

Von Neumann invented a classical method to investigate stability. When devising a method, doing this analysis is a wise and necessary first step. Often subtle things can threaten stability and the method is good for unveiling such issues. For real problems, this stability is only the first step in the derivation. It is necessary, but not sufficient. Most problems have a structure that requires nonlinear stability.

This is caused by nonlinearities. in true problems or non-differentiable features in the solution (like shocks or other discontinuities. These require mechanisms to control things like oscillations and positivity of the solution. These mechanisms are invariably nonlinear even for linear problems. This has a huge influence on accuracy and the sort of accuracy that is important to measure. The nonlinear stability assures results in real circumstances. It has a relatively dominant impact on solutions and lets methods get accurate solutions when things are difficult. One of the damning observations is that the accuracy impact of these measures is largely ignored under realistic circumstances. The only thing really examined is robustness and low-level compliance with design.

“We don’t want to change. Every change is a menace to stability.” ― Aldous Huxley

What Accuracy Actually Matters

In the published literature it is common to see the accuracy reported for idealized conditions. These are conditions where the nonlinear stability is completely unnecessary. We do see if and how nonlinear stability impacts this ideal accuracy. This is not a bad thing at all. It goes into the pile of necessary steps for presenting a method. The problems are generally smooth and infinitely differentiable. A method of increasingly higher-order accuracy will get the full order of convergence and very small errors as the mesh is refined. It is a demonstration of the results of the stability analysis. This is to say that a stability analysis can provide convergence and error characterization. There is also a select set of problems for fully nonlinear effects (e.g., the isentropic vortex or the like).

“I have to go. I have a finite amount of life left and I don’t want to spend it arguing with you.” ― Jennifer Armintrout

There is a huge rub to this practice. This error and behavior for the method is never encountered in practical problems. For practical problems shocks, contacts, and other discontinuous phenomena abound. They are inescapable. Once these are present in the solution the convergence rate is first-order or less (theory for this exists). Now the nonlinear stability and accuracy character takes over being completely essential. The issue with the literature is that errors are rarely reported under these circumstances. This happens even if the exact error can be reported. The standard is simply “the eyeball norm”. This standard serves the use of these methods poorly indeed. Results under more realistic problems is close to purely qualitative. This happens even when there is an exact solution available.

One of the real effects of this difference comes down to the issue of what accuracy really matters. If the goal of computing a solution is to get a certain low level of error for the least effort, the difference is profound. The assessment of this might reasonably be called efficiency. In cases where the full order of accuracy can be achieved, the higher the order of the method, the more efficient it will be for small errors. These cases are virtually never encountered practically. The upshot is that accuracy is examined in cases that are trivial and unimportant.

Practical cases converge at first order and the theoretical order of accuracy for a method doesn’t change that. It can change the relative accuracy, but the relationship there is not one-to-one. That said, the higher order method will not always be better than a low order method. One of our gaps in analysis is understanding how the details of a method lead to practical accuracy. Right now, it is just explored empirically during testing. The issue is that the testing and reporting of said accuracy is quite uncommon in the literature. Making this a standard expectation would improve the field productively.

“Don’t be satisfied with stories, how things have gone with others. Unfold your own myth.” ― Rumi

References

Study of real accuracy

Greenough, J. A., and W. J. Rider. “A quantitative comparison of numerical methods for the compressible Euler equations: fifth-order WENO and piecewise-linear Godunov.” Journal of Computational Physics 196, no. 1 (2004): 259-281.

Nonlinear Stability

Guermond, Jean-Luc, and Bojan Popov. “Fast estimation from above of the maximum wave speed in the Riemann problem for the Euler equations.” Journal of Computational Physics321 (2016): 908-926.

Toro, Eleuterio F., Lucas O. Müller, and Annunziato Siviglia. “Bounds for wave speeds in the Riemann problem: direct theoretical estimates.” Computers & Fluids 209 (2020): 104640.

Li, Jiequan, and Zhifang Du. “A two-stage fourth order time-accurate discretization for Lax–Wendroff type flow solvers I. Hyperbolic conservation laws.” SIAM Journal on Scientific Computing 38, no. 5 (2016): A3046-A3069.

High Order Methods

Jiang, Guang-Shan, and Chi-Wang Shu. “Efficient implementation of weighted ENO schemes.” Journal of computational physics 126, no. 1 (1996): 202-228.

Cockburn, Bernardo, Chi-Wang Shu, Claes Johnson, Eitan Tadmor, and Chi-Wang Shu. Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyperbolic conservation laws. Springer Berlin Heidelberg, 1998.

Balsara, Dinshaw S., and Chi-Wang Shu. “Monotonicity preserving weighted essentially non-oscillatory schemes with increasingly high order of accuracy.” Journal of Computational Physics 160, no. 2 (2000): 405-452.

Cockburn, Bernardo, and Chi-Wang Shu. “Runge–Kutta discontinuous Galerkin methods for convection-dominated problems.” Journal of scientific computing 16 (2001): 173-261.

Spiteri, Raymond J., and Steven J. Ruuth. “A new class of optimal high-order strong-stability-preserving time discretization methods.” SIAM Journal on Numerical Analysis40, no. 2 (2002): 469-491.

Methods Advances Not Embraced Enough

Suresh, Ambady, and Hung T. Huynh. “Accurate monotonicity-preserving schemes with Runge–Kutta time stepping.” Journal of Computational Physics 136, no. 1 (1997): 83-99.

Colella, Phillip, and Michael D. Sekora. “A limiter for PPM that preserves accuracy at smooth extrema.” Journal of Computational Physics 227, no. 15 (2008): 7069-7076.

Rider, William J., Jeffrey A. Greenough, and James R. Kamm. “Accurate monotonicity-and extrema-preserving methods through adaptive nonlinear hybridizations.” Journal of Computational Physics 225, no. 2 (2007): 1827-1848.

A Great Workshop Is Inspirational

08 Sunday Jun 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

“Knowledge has to be improved, challenged, and increased constantly, or it vanishes.” ― Peter Drucker

Back in the day I used to write up my thoughts on conferences I went to in this blog. It was a good practice and encouraged me to sit back and get perspective on what I saw. What I learned. What I felt. The workshop I attended this week was excellent with amazing researchers. Thoughtful and wise people who shared their knowledge and wisdom. I saw a great menu of super talks and I had phenomenal conversations. Some of these were one-on-one sidebars, but also panel discussions that were engaging and thought-provoking. I am left with numerous themes to write about for the foreseeable future. A good week indeed, but it left me with mourning too.

The workshop was called “Multiphysics Algorithms for the Post Moore’s Law Era.” It was organized by Brian O’Shea from Michigan State along with a group of illustrious scientists largely from Los Alamos. It was really well done and a huge breath of fresh air. Los Alamos Air is good for that too. I was there largely because I had an invited talk, which I really enjoyed giving. I had put a great deal of thought into my talk. It was some thoughts needed for this present moment. Invited talks are an honor and a good thing to accept. They look great on the resume or annual assessments. I quickly lost any sense of making the wrong decision and immediately felt grateful to attend.

I won’t and really can’t hit all the high points or talks, but will give a flavor of the meeting.

Moore’s law is the empirical observation about the growth of computing power. For about fifty-some-odd years computer power doubled about every 18 months. Over such a period of 60 years, this gives an advance of over a billion times (2 to the 30th power). Starting around 2010 people started to see the end of the road for the law. Physics itself is getting in the way and parallel computing or those magical GPUs that AMD and Nvidia produce aren’t enough. Plus those GPUs are a giant fucking pain in the ass to program. We now spend a vast amount of money to keep advancing computing, and we are not going to be able to keep up. This era is over and what the fuck are we going to do? The workshop was put together to answer this WTF question.

“Vulnerability is the birthplace of innovation, creativity and change.” ― Brene Brown

I will start by saying Los Alamos carries some significant meaning for me personally. I lived and worked there for almost 18 years. It shaped me as a scientist, if not made me the one I am today. It has (had) a culture of scientific achievement and open inquiry that I fully embrace and treasure. I had not spent time like this on the main town site for years. It was a stunning melange of things unchanged and radical change. I ate at new places, and old places running into old friends with regularity. I was left with mixed feelings and deep emotions at the end. Most of all my view of whether leaving there was the right professional move for me. It was probably a good idea. The Lab I knew and loved is almost gone. It has disappeared into the maw of our dysfunctional nation’s destruction of science. It is a real example of where greatness has gone, and the MAGA folks are not doing jack shit to fix it.

More later about the Lab and its directions since I left. Now for the good part of the week, the Workshop.

“The important thing is not to stop questioning. ― Albert Einstein

The first day of the workshop should have left me a bit cold, but it didn’t. The focus was what is the computing environment of the near future. It was all the stuff the high-performance computing people were doing to forestall the demise of Moore’s law. There are a bunch of ideas and zero of them are really appealing or exciting. The biggest message of the day is a focus on missed opportunities. The decade of focus on exascale computers has meant huge opportunity cost. This would unfold brilliantly as the week went along. The greatest take-home message was the cost of keeping up and the drop off of performance in the aggregate list of the fastest computers. We can’t do this anymore. The other big lesson is that quantum computing is no way out. It is cool and does some great shit, but it is limited. Plus its always attached to a regular computer, so that’s an intrinsic limit.

The second day was much more about software. We have made a bunch of amazing software to support all these leading-edge computers. This software is created on a shoestring budget and maintaining it is an increasing tax. The biggest point is that GPUs suck ass to program. We have largely wasted 10 years programming these motherfucking monstrosities. If we weren’t doing that what could we have done? Plus the GPUs have a limited future. There have been some great ideas for dealing with complexity like Sandia’s Kokkos, but there are dead ends. We are so attached to performance, why can’t we work with computers that are a joy to program? Maybe that would be a path we could all support.

At the end of each day, all the speakers formed a panel and we had a moderated conversation with the audience. The first day they asked Mike Norman to lead the conversation. Mike is a renowned astrophysicist and leader in the history of high-performance computing. It was cool to get to meet him. During the discussions, major perspectives came clearly into focus. An example is the above comment about whether we wasted time on GPUs for 10 years? Yes is the answer. Another issue is the problems and cost of software, which isn’t well-funded or supported. I can report from my job that the maintenance cost of code can quickly swallow all your resources. This grows as the code gets old and we make a lot of legacy codes in science. Another topic of repeated discussion every day of the meeting is the growing obsession with AI. There is a manic zeal for AI on the part of managers, and it puts all our science at serious risk. A bit more later about this.

Finally, at the end of day 2 we started in on algorithms and the science done with computing. Thank god! While appreciate learning all about software and computing, I need some science! I was introduced to tensor trains and I’ll admit to not quite grokking how they worked. It was one of several ideas for extremely compressed computing. A great thing is to leave a workshop with homework. After this, we heard about MFEM from Livermore. Lots of computing results and not nearly enough algorithms (which I know exist). They didn’t talk about results with the code, only how fucking great it runs. That said this talk was almost an exclamation point on what GPU-based computing has destroyed.

Wednesday was my talk. I was sandwiched between two phenomenal astrophysics talks with jaw-dropping results and incredible graphics. I felt honored and challenged. Jim Stone gave the first talk and wow! Cool methods and amazing studies of important astrophysical questions. He uses methods I know well and they produce magic. My physics brain left the talk wishing for more. I could watch a week of talks like that. Even better he teed up some topics my talk would attack head-on. After my talk, Bronson Messer from Oak Ridge talked about supernovae. It was sort of a topic I have an amateur taste for. Incredible physics again like Jim’s talk and gratifying uses of computing. I want more!

I gave my talk in a state where I was both inspired and a bit gobsmacked having to sit between these two masterpieces. I had trimmed my talk down to 30 minutes to allow 15 minutes for questions. Undaunted, I stepped into the task. My talk had three main pieces: a discussion of the power and nature of algorithms, how V&V is the scientific method, and how to use verification to embrace true computational efficiency. I sized the talk almost perfectly. I do wish I would move more during the talk and be more dynamic. I was too chained to my laptop. Also hated the hand mike (would have loved to drop it at the end, but that would be a total dick move).

Intel will deliver the Aurora supercomputer, the United States’ first exascale system, to Argonne National Laboratory in 2021. Aurora will incorporate a future Intel Xeon Scalable processor, Intel Optane DC Persistent memory, Intel’s Xe compute architecture and Intel OneAPI programming framework — all anchored to Intel’s six key pillars of innovation. (Credit: Argonne National Laboratory)

“The only way of discovering the limits of the possible is to venture a little way past them into the impossible.” ― Arthur C. Clarke

I always believe that a good talk should generate questions. My talk generated a huge reaction and question after question. Some talked about making V&V more efficient and cheaper. I have a new idea about that after answering. No, V&V should not be cheap. It is the scientific method and a truly great human endeavor. It is labor intensive because it is hard and challenging. People don’t do V&V because they are lazy, and want it on the cheap. It is just like thinking and AI. We still need to think when we do math, code, or write. Nothing about AI should take that away. Science is about thinking and we need to think a lot more, not less. Computers, AI, algorithms, and code are all tools, and we need to be skilled and powerful at using them. We need to be encouraged to think more, question, and do the hard things. None of it should be done away with by these new tools. These new tools should augment productivity making us more efficient. They should have free time to really think more.

The big lasting thought from my talk is about the power of algorithms. Algorithms fall into a set of three rough categories worth paying attention to. This taxonomy is structured with the power of these algorithms too. I will write about this more. I have in the past, but now I have new clarity! Thanks workshop! What an amazing fucking gift!

This taxonomy has three parts:

1. Standard efficiency mapping to computers (parallel, vector, memory serving, …). This is the focus of things lately. They are the lowest rung of the ladder.

2. Algorithms that change the scaling of the method in terms of operations. The archetypical example is linear algebra where the scaling originally was the cube of the number of equations like Gaussian elimination. The best is multigrid which scales linearly with the number of equations. The difference in scaling is truly quantum and rivals or beats Moore’s law easily.

3. Next are the algorithms that are the game changers. These algorithms transform a field of science or the world. The archetype of this is the PageRank algorithm that made Google what it is. Google is now a verb. These algorithms are as close to magic as computers do.

The trick is that each of the rungs in the hierarchy of algorithms is harder, more failure-prone, and rare. These days the last two rungs are ignored and only happen with serendipity. We could do so much more if we were intentional about what we pursue. It also requires a taste for risk and tolerance of failure.

“Any sufficiently advanced technology is indistinguishable from magic.”― Arthur C. Clarke

I wanted this to be a brief post. I have failed. The workshop was a wonderful gift to my brain. So this is a core dump and only a partial one. I even had to clip off the last two days of it (shout out to Riley and Daniel for great talks plus the rest, even more homework). Having worked at Los Alamos I have friends and valued colleagues there. To say that conversations left me troubled is an understatement. I am fairly sure that the Los Alamos I knew and loved as a staff member is dead. I’m always struck by how many of my friends are Lab Fellows, and how dismal my recognition is at Sandia. At Los Alamos, I would have been so much more at least technically. That said, I’m not sure my heart could take what was reported to me. The Lab is something else now and has lost its identity as something special.

The Lab was somewhere special and wonderful. It was a place that I owe my scientific identity to. That place no longer exists, You can still make it out in the shadows and echoes of the past. Those are dimming with each passing day. You may recall that last month, Peter Lax died. A friend shared the Lab’s obituary with me. It wasn’t anything horrible or awful, but it was full of outright errors and a lack of attention to detail. Here is one of the greats of the Lab and a member of the few remaining scientists from the Manhattan Project. He was someone whose contributions to science via applied math define what is missing today. The work in applied math that Peter did is missing today. It is what AI and machine learning need. It is absent. Worse yet, the current leaders of the Lab and nation are oblivious. They botched his obituary and I suppose that’s a minor crime compared to the scientific malpractice.

One cool moment happened at Starbucks on Thursday morning. It was a total “only in Los Alamos” moment. I was sitting down enjoying coffee, and a man came up to me. He asked, “Are you Bill Rider?” He was a fan of this blog. I invited him to sit and talk. We had a great conversation although it did little to calm my fears about the Lab. I can’t decide if I should feel disgusted, a nod of submission, or deep sadness. A beacon of science in the USA and the world is flickering out. At the very least this is a tragedy. The tragedy is born of a lack of vision, trust, and stewardship. It’s not like the Lab does anything essential; it’s just nuclear weapons.

“The present changes the past. Looking back you do not find what you left behind.” ― Kiran Desai,

Rather than close on this truly troubling note, I’ll send on a bit of gratitude. First, I would like to give much appreciation to Brian who did much of the operation and management of the workshop. He did an outstanding job. Chris Fryer and CNLS hosted the workshop under its auspices. It was joyful to be back in the CNLS fold once again. I have so many great memories of attending seminars there along with a few that I gave. Chris and his wife Aimee host wonderful parties at their home. They are truly epic and wonderful with a tremendous smorgasbord of culinary delights and even more stimulating conversations with a plethora of brilliant people. Always a delight to visit them and enjoy their generous hospitality.

“Every revolutionary idea seems to evoke three stages of reaction. They may be summed up by the phrases: (1) It’s completely impossible. (2) It’s possible, but it’s not worth doing. (3) I said it was a good idea all along.” ― Arthur C Clarke

When is Research Done?

01 Sunday Jun 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

TL;DR

There is this trend I’ve noticed over my career, an increasing desire to see research as finished. The results are good enough, and the effort is moved to new endeavors. Success is then divested of. Research is never done, never good enough, and is simply the foundation for the next discovery. The results of this are tragic. Unless we are continually striving for better, knowledge and capability stagnate and then decay. Competence fades and disappears with a lack of attention. In many important areas, this decay is already fully in effect.. The engine of mediocrity is project management with milestones and regular progress reports. Underlying this trend is a lack of trust and short-term focus. The result is a looming stench of mediocrity where excellence should be demanded. The cost to society is boundless.

Capabilities versus Projects

“The worst enemy to creativity is self-doubt.” ― Sylvia Plath

Throughout my career, I have seen a troubling trend in funding for science. This trend has transformed into sprawling mismanagement. Once upon a time, we funded capabilities and competence in specific areas. I’ve worked at multi-program labs that apply a multitude of disciplines to execute complex programs. Nuclear weapons are the archetype of these programs. These programs require executing and weaving together a vast array of technical areas into a cohesive whole. Amongst these capabilities are a handful of overarching necessities. Even the necessities of competence for nuclear weapons are being ignored. This is a fundamental failure of our national leadership. It is getting much worse, too.

The thing that has changed is the projectization of science. We have moved toward applying project management principles to everything. The dumbest part of this is the application of project management for construction into science. We get to plan breakthroughs (planning that makes sure they don’t happen), and apply concepts like “earned value”. The result is the destruction of science, not its execution. Make-believe success is messaged by managers, but is empty in reality. Instead of useful work, we have constant progress reports, updates, and milestones. We have lost the ability to move forward and replaced it with the appearance of progress. Project management has simply annihilated science and destroyed productivity. Competence is a thing of the past.

“The problem with the world is that the intelligent people are full of doubts, while the stupid ones are full of confidence.” ― Charles Bukowski

The milestones themselves are the topic of great management malpractice. These are supposed to serve as the high-level measure of success. We operate under rules where success is highly scrutinized. The milestones cannot fail, and they don’t. The reason is simple: they are engineered to be foolproof. Thus, any and all risk is avoided. Upper management has its compensation attached to it, too. No one wants to take “food” out of their boss’ mouths either (that’s the quiet part, said out loud). The end result is not excellence, but rather a headlong leap into mediocrity. Milestones are the capstone on project management’s corrosive impact on science.

Rather than great work and maintaining capability, we have the opposite, mediocrity and decay.

The Desire to Finish

“Highly organized research is guaranteed to produce nothing new.” ― Frank Herbert

One of the most insidious aspects of the project mindset is the move to terminate work at the end of the project. There is a lot of work that they want to put a bow on and say, “It is finished.” Then we move to focus on something else. The management is always interested in saying, “This work is done,” and “move to something new.” The something new is something that will be big funding and contribute to managerial empire building (another pox). Once upon a time, the Labs were a national treasure (crown jewels). Now we are just a bunch of cheap whores looking for our next trick. This is part of the legacy of project management and our executive compensation philosophy. Much less progress and competence, much more graft and spin.

A few years ago, we focused on computing and exascale machines. Now we see artificial intelligence as the next big thing (to bring in the big bucks). Nothing is wrong with a temporary emphasis and shift of focus as opportunity knocks. Interestingly, exascale was not an opportunity, but rather a struggle against the inevitable death of Moore’s law. Moore’s law was the gift that kept on giving for project management, reliable progress like clockwork.

The project management desires explain exascale more than any technical reasons. Faster computers are worthwhile for sure; however, the current moment does not favor this as a strategy. In fact, it is time to move away from it. AI is different. It is a once in a generation technology to be harnessed, but we are fucking that up too. We seek computing power to make AI work and step away from algorithms and innovation. Brute force has its limits, and progress will soon languish. We suffer from a horrendous lack of intellectual leadership, basic common sense, and courage. We cannot see the most obvious directions to power scientific progress. The project management obsession can be tagged as a reason. If the work doesn’t fit into that approach, it can’t be funded.

Continual progress and competence are out the window. The skills to do the math, engineering, and physics are deep and difficult. The same holds for the skills to work in high-performance computing. The same again for artificial intelligence. Application knowledge is yet another deep, expansive expertise. None of this expertise easily transfers to the next hot thing. Worse yet, expertise fades and ossifies as those mental patterns lapse into hibernation. Now the projects need to finish, and the program should move to something shiny and new. The cost of this attitude is rather profound, as I explore next.

The Problems with Finishing: Loss of Competence

“The purpose of bureaucracy is to compensate for incompetence and lack of discipline.” ― Jim Collins

This all stems from the need for simplicity in a sales pitch. Simple gets the money today. Much of the explanation for this is our broken politics. Congress and the people have lost confidence and trust in science. We live in a time of extremes and an inability to live in the gray. No one can manage a scintilla of subtlety. Thus, we finish things, followed by a divestment of emphasis. That divestment ultimately ends up hollowing out the built expertise needed for achievement. Eventually, the tools developed in the success of one project and emphasis decayed too. Essential capabilities cannot be maintained successfully without continual focus and support.

A story is helpful here. As part of the programs that were part of the nuclear weapons program at the end of the Cold War, simulation tools were developed. These tools were an alternative to full-scale nuclear tests. To me, one of the more horrifying aspects of today’s world is how many of these tools from that era are still essential today. Even tools built as part of the start of stockpile stewardship after the Cold War are long in the tooth today. In virtually every case, these tools were state of the art when conceived originally. Once they were “finished” and accepted for use in applications, the tools went into stasis. In a world of state-of-the-art science, stasis is decline. The only exception is the move of these codes to new computing platforms. This is an ever-present challenge. The stasis is the intellectual content of the tools, which matters far more than the computing platforms.

What usually does not change are the numerical methods, physics, and models in the codes. These become frozen in time. While all of these can be argued to be state of the art when the code was created, they cease to be with time. We are talking decades. This is the trap of finishing these projects and moving on; the state of the art is transitory. If you rest on success and declare victory, time will take that from you. This is the state that too much of our program is in. We have declared victory and failed to see how time eats away at our edge. Today, we have tools operated by people who don’t understand what they are using. The punch line is that research is never done, and never completed. Today’s research is the foundation of tomorrow’s discoveries and an advancing state of the art.

Some of this is the ravages of age for everything. People age and retire. Skills dull and wither from lack of use. Codes age and become dusty, no longer embodying the state of the art. The state of the art moves forward and leaves the former success as history. All of this is now influencing our programs. Over enough time, this evolves into outright incompetence. Without a change in direction and philosophy that incompetence is inevitable. In some particular corners of our capability, the incompetence is already here.

“Here’s my theory about meetings and life: the three things you can’t fake are erections, competence and creativity.” ― Douglas Coupland

A Mercy Killing of an Ill Patient

“Let’s have a toast. To the incompetence of our enemies.” ― Holly Black

The core issues at work in destroying competence are a combination of short-term thinking and lack of trust. The whole project attitude is emblematic of it. The USA has already ceded the crown of scientific and engineering supremacy to China. American leaders won’t admit this, but it’s already true. Recent actions by the Administration and DOGE will simply unilaterally surrender the lead completely and irreversibly. The corollary to all this negativity is that maintaining the edge of competence requires trust and long-term thinking. Neither is available today in the USA.

There is a sharp critique of our scientific establishment available in the recent book Abundance. There, Klein and Thomson provide commentary on what ails science in the USA. It rings true to me, having worked actively for the last 35 years at two National Labs. Risk avoidance, paralyzing bureaucracy, and misaligned priorities have sapped vitality. Too much overhead wastes money. All these ills stem from those problems of short-termism combined with a lack of trust. A good amount of largess and overconfidence conspires as well. Rather than encourage honesty, the lack of trust empowers bullshit. Our key approach to declaring success is to bullshit our masters.

Today is not the time to fix any of this. It is time to think about what a fix will look like. Recent events are the wanton destructive dismantling of the federal scientific establishment. Nothing is getting fixed or improved. It is simply being thrown into the shredder. If we get to rebuild science, we need to think about what it should look like. If we continue with short-term thinking, success won’t be found. The project management approach needs to be rejected. Trust is absolutely necessary, too. Today, trust is also in freefall. Much of the wanton destruction stems from a lack of trust. This issue is shared by both sides of the partisan divide. Their reasons are different, and the truth is in the middle. Unless the foundation for success is available, scientific success won’t return.

“The problem with doing nothing is not knowing when you are finished.” ― Nelson De Mille

What Americans don’t seem to realize is that so much success is science-based. During the Cold War, the connection between national security was obvious. Nuclear weapons made the case with overwhelming clarity. Economic security and success are no less bound to science. The effect is more subtle and longer-term. The loss of scientific power won’t be obvious for a long time. Eventually, we will suffer from the loss of scientific and engineering success. Our children and grandchildren will be poorer, less safe, and live shorter lives due to our actions today. The past four months simply drove nails into that coffin that had already been fashioned by decades of mismanagement.

“Never put off till tomorrow what may be done day after tomorrow just as well.” ― Mark Twain

A Little Verification Idea Seldom Tried

28 Wednesday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

“We don’t want to change. Every change is a menace to stability.” ― Aldous Huxley

There is a problem with finishing up a blog post on a vacation day, you forget one of the nice ideas you wanted to share. So here is a brief addendum to what I wrote yesterday.

Here is a little addition to my last post here. It is another type of test I’ve tried, but also seldom seen documented. A standard threat to numerical methods is violating stability conditions. Stability is one of the most important numerical concepts. It is a prerequisite for convergence and is usually implicitly assumed. What is not usually tested is an active test of the transition of a calculation to instability. The simplest way to do this is to violate the time step size determined for instability.

The tests are simple. Basically, run the code with time steps over the stability limit and observe how sharp the limits are in practice. It also does a good job of documenting what an instability actually looks like when it appears. If the limit is not sharp, it might indicate an opportunity to improve the code by sharpening a bound. One could also examine if the lack of stability inhibits convergence, too. This would just be cases where the instability is mild and not catastrophic.

I did this once in my paper with Jeff Greenough, comparing a couple of methods for computing shocks. In this case, the test was the difference between linear and nonlinear stability for a Runge-Kutta integrator. The linear limit is far more generous than the nonlinear limit (by about a factor of three!). The accuracy of the method is significantly impacted at the limit of each of the two conditions. For shock problems, the difference in solutions and accuracy is much stronger. It also impacts the efficiency of the method a great deal.

There’s Much More to Code Verification

27 Tuesday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

TL;DR

The standard narrative for code verification is demonstrating correctness and finding bugs. While this is true, it also sells verification as a practice that is wildly short. Code verification has a myriad of other uses, foremost the assessment of accuracy without ambiguity. It can also define features of code, such as adherence to invariants in solutions. Perhaps, most compellingly, it can define the limits of a code-method and research needs to advance capability.

“Details matter, it’s worth waiting to get it right.” ― Steve Jobs

What People Think It Does

There is a standard accepted narrative for code verification. It is a technical process of determining that a code is correct. A lack of correctness is caused by bugs in the code that implements a method. It is supported by two engineering society standards written by AIAA (aerospace engineers) and ASME (mechanical engineers). The DOE-NNSA’s computing program, ASC, has adopted the same definition. It is important for the quality of code, but it is drifting to obscurity and a lack of any priority. (Note the IEEE has a different definition for verification, leading to widespread confusion.)

The definition has several really big issues that I will discuss below. Firstly, the definition is too limited and arguably wrong in emphasis. Secondly, it means that most of the scientific community doesn’t give a shit about it. It is boring and not a priority. It plays a tiny role in research. Thirdly, it sells the entire practice short by a huge degree. Code verification can do many important things that are currently overlooked and valuable. Basically there are a bunch of reasons to give a fuck about it. We need to stop undermining the practice.

The basics of code verification are simple. A method for solving differential equations has an ideal order of accuracy. Code verification compares the solution by the code with an analytical solution over a sequence of meshes. If the order of accuracy observed matches the theory, the code is correct. If it does not, the code has an error either in the code or in the construction of the method. One of the key reasons we solve differential equations with computers is the dearth of analytical solutions. For most circumstances of practical interest, there is no analytical solution, nor circumstances that match the order of accuracy of method design.

One of the answers to the dearth of analytical solutions is the practice of the method of manufactured solutions (MMS). It is a simple idea in concept where an analytical right-hand side is added to equations to force a solution. The forced solution is known and ideal. Using this practice, the code can be studied. The technique has several practical problems that should be acknowledged. First is the complexity of these right-hand sides is often extreme, and the source term must be added to the code. It makes the code different than the code used to solve practical problems in this key way. Secondly, the MMS problems are wildly unrealistic. Generally speaking, the solutions with MMS are dramatically unlike any realistic solution.

MMS simply expands the distance of code verification for the code’s actual use. All this does is amplify the degree to which code users disdain code verification. The whole practice is almost constructed to destroy the importance of code verification. It’s also pretty much dull as dirt. Unless you just love math (some of us do), MMS isn’t exciting. We need to move forward towards practices people give a shit about. I’m going to start by naming a few.

“If you thought that science was certain – well, that is just an error on your part.” ― Richard P. Feynman

It Can Measure Accuracy

“I learned very early the difference between knowing the name of something and knowing something.” ― Richard P. Feynman

I’ve already taken a stab at this topic, noting that code verification needs are refresh:

https://williamjrider.wordpress.com/2024/09/21/code-verification-needs-a-refresh/

Here, I will just recap the first and most obvious overlooked benefit, measuring the meaningful accuracy of codes. Code verification’s standard definition concentrates on order of accuracy as the key metric. Practical solutions with code rarely achieve the design order of accuracy. This further undermines code verification as significant. Most practical solutions give first-order accuracy (or lower). The second metric from verification is error, and with analytical solutions, you can get precise errors from the code. The second thing to focus on is the efficiency of a code that connects directly.

A practical measure of code efficiency is accuracy per unit effort. Both of these can be measured with code verification. One can get the precise errors by solving a problem with an analytical solution. By simultaneously measuring the cost of the solution, the efficiency can be assessed. For practical use, this measurement means far more than finding bugs via standard code verification. Users simply assume codes are bug-free and discount the importance of this. They don’t actually care much because they can’t see it. Yes, this is dysfunctional, but it is the objective reality.

The measurement and study of code accuracy is the most straightforward extension of the nominal dull as dirt practice. There’s so much more as we shall see..

It Can Test Symmetries

“Symmetry is what we see at a glance; based on the fact that there is no reason for any difference…” ― Blaise Pascal

One of the most important aspect of many physical laws are symmetries. These are often preserved by ideal versions of these laws (like the differential equations code’s solve). Many of these symmetries are solved inexactly by methods in codes. Some of these symmetries are simple, like preservation of geometric symmetry, such as cylindrical or spherical flows. This can give rise to simple measures that accompany classical analytical solutions. The symmetry measure can augment the standard verification approach with additional value. In some applications, the symmetry is of extreme importance.

There are many more problems that can be examined for symmetry without having an analytical solution. One can create all sorts of problems with symmetries built into the solution. A good example of this is a Rayleigh-Taylor instability problem with a symmetry plane where left-right symmetry is desired. The solution can be examined as a function of time. This is an instability, and the challenge to symmetry grows over time. As the problem evolves forward the lack of symmetry becomes more difficult to control. It makes the test extreme if run for very long times. Symmetry problems also tend to grow as the mesh is refined.

It is a problem I used over 30 years ago to learn how to preserve stability. It was an incompressible variable-density code. I found the symmetry could be threatened by two main parts of the code: the details of upwinding in the discretization, and numerical linear algebra. I found that the pressure solve needed to be symmetric as well. I had to modify each part of the algorithm to get my desired result. The upwinding had to be changed to avoid any asymmetry concerning the sign of upwinding. This sort of testing and improvement is the hallmark of high-quality code and algorithms. Too little of this sort of work is taking place today.

It Can Find “Features”

“A clever person solves a problem. A wise person avoids it.” ― Albert Einstein

The usual mantra for code verification is lack of convergence means a bug. This is not true. This is a very naive and limiting perspective. For codes that compute the solution to shock waves (and weak solutions), correct solutions require conservation and entropy conditions. Methods and codes do not always adhere to these conditions. In those cases, a “perfect” bug-free code will produce incorrect solutions. They will converge on a solution, just converge to the wrong solution. The wrong solution is a feature of the method and code. These wrong solutions are revealed easily by extreme solutions with very strong shocks.

These features are easily fixed by using different methods. The problem is that the codes with these features are the product of decades of investment and reflect deeply held cultural norms. Respect for verification is acutely not one of those norms. My experience is that users of these codes make all sorts of excuses for this feature. Mostly, this sounds like the systematic devaluing of the verification work and excuses for ignoring the problem. Usually, they start talking about how important the practical work the code does. They fail to see how damning the results of failing to solve these problems. Frankly, it is a pathetic and unethical stand. I’ve seen this over and over at multiple Labs.

Before I leave this topic, I will get to another example of a code feature. This has a similarity to the symmetry examination. Shock codes can often have a shock instability with a funky name, the carbuncle phenomenon. This is where you have a shock aligned with a grid, and the shock becomes non-aligned and unstable. This feature is a direct result of properly implemented methods. It is subtle and difficult to detect. For a large class of problems, it is a fatal flaw. Fixing the problem requires some relatively simple but detailed changes to the code. It also shows up in strong shock problems like Noh and Sedov. At the symmetry axes, the shocks can lose stability and show anomalous jetting.

This gets to the last category of code verification benefits, determining a code’s limits and a research agenda.

“What I cannot create, I do not understand.” ― Richard P. Feynman

It Can Find Your Limits and Define Research

If you are doing code verification correctly, the results will show you a couple of key things: what the limits of the code are, and where research is needed. My philosophy of code verification is to beat the shit out of a code. Find problems that break the code. The better the code is, the harder it is to break. The way you break the code is to define harder problems with more extreme conditions. One needs to do research to get to the correct convergent solutions.

Where the code breaks is a great place to focus on research. Moving the horizons of capability outward can define an excellent and useful research agenda. In a broad sense, the identification of negative features is a good practice (the previous section). Another example of this is extreme expansion waves approaching vacuum conditions. In the past, I have found that most usual shock methods cannot solve this problem well. Solutions are either non-convergent or so poorly convergent as to render the code useless.

This problem is not altogether surprising given the emphasis on methods. Computing shock waves has been the priority for decades. When a method cannot compute a shock properly, the issues are more obvious. There is a clear lack of convergence in some cases or catastrophic instability. Expansion waves are smooth, offering less challenge, but they are also dissipation-free and nonlinear. Methods focused on shocks shouldn’t necessarily solve them well (and don’t when they’re strong enough).

I’ll close by another related challenge. The use of methods that are not conservative is driven by the desire to compute adiabatic flows. For some endeavors like fusion, adiabatic mechanisms are essential. Conservative methods necessary for shocks often (or generally) cannot compute adiabatic flows well. A good research agenda might be finding the methods that can achieve conservations, preserving adiabatic flows, and strong shock waves. A wide range of challenging verification test problems is absolutely essential for success.

“The first principle is that you must not fool yourself and you are the easiest person to fool.” ― Richard P. Feynman

Does Uncertainty Quantification Replace V&V?

19 Monday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

Uncertainty quantification (UQ) is ascending while verification & validation (V&V) is declining. UQ is largely done in silico and offers results that trivially harness modern parallel computing. UQ thus parallels the appeal of AI and easy computational results. It is also easily untethered to objective reality. V&V is a deeply technical and naturally critical practice. Verification is extremely technical. Validation is extremely difficult and time-consuming. Furthermore, V&V can be deeply procedural and regulatory. UQ has few of these difficulties although its techniques are quite technical. Unfortunately, UQ without V&V is not viable. In fact, UQ without the grounding of V&V is tailor-made for bullshit and hallucinations. The community needs to take a different path that mixes UQ with V&V, or disaster awaits.

“Doubt is an uncomfortable condition, but certainty is a ridiculous one.” ― Voltaire

UQ is Hot

If one looks at the topics of V&V and UQ, it is easy to see that UQ is a hot topic. It is vogue and garners great attention and support. Conversely, V&V is not. Before I give my sharp critique of UQ, I need to make something clear. UQ is extremely important and valuable. It is necessary. We need better techniques, codes, and methodologies to produce estimates of uncertainty. The study of uncertainty is guiding us to grapple with a key topic; how much we don’t know? This is difficult and uncomfortable work. Our emphasis on UQ is welcome. That said, this emphasis needs to be grounded in reality. I’ve spoken in the past about the danger of ignoring uncertainty. When an uncertainty is ignored it gets the default value of ZERO. In other words, ignored uncertainties are assigned the smallest possible value.

“The quest for certainty blocks the search for meaning. Uncertainty is the very condition to impel man to unfold his powers. ” ― Erich Fromm

As my last post discussed, the focus needs to be rational and balanced. When I observe the current conduct of UQ research, I see neither character. UQ takes on the mantle of the silver bullet. It is subject to the fallacy of the “free lunch” solutions. It seems easy and tailor-made for our currently computationally rich environment. It produces copious results without many of the complications of V&V. The practice of V&V is doubt based on deep technical analysis. It is uncomfortable and asks hard questions. Often questions that don’t have easy or available answers. UQ just gives answers with ease and it’s semi-automatic, in silico. You just need lots of computing power.

“I would rather have questions that can’t be answered than answers that can’t be questioned.” ― Richard Feynman

This ease gets to the heart of the problem. UQ needs validation to connect it to objective reality. UQ needs verification to make sure the code is faithfully correct to the underlying math. Neither practice is so well established that it can be ignored. Yet, increasingly, they are being ignored. Experimental results are needed to challenge our surety. UQ has a natural appetite for computing thus our exascale computers lap it up. It is a natural way to create vast amounts of data. UQ attaches statistics naturally to modeling & simulation, a long-standing gap. Statistics connects to machine learning as it is the algorithmic extension of statistics. The mindset parallels the euphoria recently for AI.

For these reasons, UQ is a hot topic and attracting funding and attention today. Being in silico UQ becomes an easy way to get V&V-like results from ML/AI. What is missing? For the most part, UQ is done assuming V&V is done. For AI/ML this is a truly bad assumption. If you’re working in simulation you know that assumption is suspect. The basics of V&V are largely complete, but its practice is haphazard and poor generally. My observation is that computational work has regressed concerning V&V. Advances made in publishing and research standards have gone backward in recent years. Rather than V&V being completed and simply applied, it is dispised.

All of this equals a distinct danger for UQ. Without V&V UQ is simply an invitation for ModSim hallucinations akin to the problem AI has. Worse yet, it is an invitation to bullshit the consumers of simulation results. Answers can be given knowing they have a tenuous connection to reality. It is a recipe to fool ourselves with false confidence.

“The first principle is that you must not fool yourself and you are the easiest person to fool.” ― Richard P. Feynman

AI/ML Feel Like UQ and Surrogates are Dangerous

One of the big takeaways from the problems with UQ is the appeal of its in-silico nature. This paves the way to easy results. Once you have a model and working simulation UQ is like falling off a log. You just need lots of computing power and patience. Yes, it can be improved and made more accurate and efficient. Nonetheless, UQ asks no real questions about the results. Turn the crank and the results fall out (unless it triggers a problem with the code). You can easily get results although doing this efficiently is a research topic. Nonetheless being in silico removes most of the barriers. Better yet, it uses the absolute fuck out of supercomputers. You can so fill a machine up with calculations. You get results galore.

If one is paying attention to the computational landscape you should be experiencing deja vu. AI is some exciting in-silico shit that uses the fuck out of the hardware. Better yet, UQ is an exciting thing to do with AI (or machine learning really). Even better you can use AI/ML to make UQ more efficient. We can use computational models and results to train ML models that can cheaply evaluate uncertainty. All those super-fast computers can generate a shitload of data. These are called surrogates and they are all the rage. Now you don’t have to run the expensive model anymore; you just train the surrogate and evaluate the fuck out of it on the cheap. Generally machine learning is poor at extrapolating, and in high dimensions (UQ is very high dimensional) you are always extrapolating. You better understand what you’re doing and machine learning isn’t well understood.

What could possibly go wrong?

If the model you trained the surrogate on has weak V&V, a lot can go wrong. You are basically evaluating bullshit squared. Validation is essential to establishing how good a model is. You should produce a model form error that expresses how well the computational model works. The model also has numerical errors due to the finite representation of the computation. I can honestly say that I’ve never seen either of these fundamental errors associated with a surrogate. Nonetheless, surrogates are being developed to power UQ all over the place. It’s not a bad idea, but these basic V&V steps should be an intrinsic part of this. To me, the state of affairs says more about the rot at the heart of the field. We have lost the ability to seriously question computed results. V&V is a vehicle for asking these questions. These questions are too uncomfortable to be confronted with.

V&V is Hard; Too Hard?

I’ve worked at two NNSA Labs (Los Alamos and Sandia) in the NNSA V&V program. So I know where the bodies are buried. I’ve been part of some of the achievements of the program and where it has failed. I was at Los Alamos when V&V arrived. It was like pulling teeth to make progress. I still remember the original response to validation as a focus, “Every calculation a designer does is all the validation we need!” The Los Alamos weapons designers wanted hegemonic control over any assessment of simulation quality. Code developers, computer scientists, and any non-designer were deemed incompetent to assess quality. To say it was an uphill battle was an understatement. Nonetheless, progress was made, albeit mildly.

V&V fared better at Sandia. In many ways, the original composition of the program had its intellectual base at Sandia. This explained a lot of the foundational resistance from the physics labs. It also gets too much of a problem with V&V today. Its focus on the credibility of simulations makes it very process-oriented and regulatory. As such it is eye-rollingly boring and generates hate. This character was the focus of the opposition at Los Alamos (Livermore too). V&V has too much of an “I told you so” vibe. No one likes this and V&V starts to get ignored because it just delivers bad news. Put differently, V&V asks lots of questions but generates few answers.

Since budgets are tight and experiments are scarce, the problems grow. We start to demand calculations to have close agreement with available data. Early predictions typically don’t meet standards. By and large, simulations have a lot of numerical error even on the fastest computers. The cure for this is an ex-post-facto calibration of the model to match experiments better. The problem is that this then short-circuits validation. Basically, there is little or no actual validation. Almost everything is calibrated unless the simulation is extremely easy. The model has a fixed grid, so there’s no verification either. Verification is bad news without a viable current solution. What you can do with such a model is lots of UQ. Thus UQ becomes the results for the entire V&V program.

To really see this clearly we need to look West to Livermore.

What does UQ mean without V&V?

I will say up front that I’m going to give Livermore’s V&V program a hard time, but first, they need big kudos. The practice of computational physics and science at Livermore is truly first-rate. They have eclipsed both Sandia and Los Alamos in these areas. They are exceptional code developers and use modern supercomputers with immense skill. They are a juggernaut in computing.

By almost any objective measure Livermore’s V&V program produced the most important product of the entire ASC program, common models. Livermore has a track record of telling Washington one thing and doing something a bit different. Even better, this something different is something better. Not just a little better, but a lot better. Common models are the archetypical example of this. Back in the 1990’s, there was this metrics project looking at validating codes. Not a terrible idea at all. In a real sense, Livermore did do the metrics in the end but took a different smarter path to them.

What Livermore scientists created instead was a library of common models that combined experimental data, computational models, and auxiliary experiments. The data was accumulated across many different experiments and connected into a self-consistent set of models. It is an incredible product. It has been repeated at Livermore and then at Los Alamos too across the application space. I will note that Sandia hasn’t created this, but that’s another story of differences in Lab culture. These suites of common models are utterly transformative to the program. It is a massive achievement. Good thing too because the rest of V&V there is far less stellar.

What Livermore has created instead is lots of UQ tools and practice. The common models are great for UQ too. One of the first things you notice about Livermore’s V&V is the lack of V&V. Verification leads the way in being ignored. The reasons are subtle and cultural. Key to this is an important observation about Livermore’s identity as the fusion lab. Achieving fusion is the core cultural imperative. Recently, Livermore has achieved a breakthrough at the National Ignition Facility (NIF). This was “breakeven” in terms of energy. They got more fusion energy out than laser energy in for some NIF experiments (all depends on where you draw the control volume!).

The NIF program also has the archetypical example of UQ gone wrong. Early on in the NIF program, there was a study of fusion capsule design and results. They looked at a large span of uncertainties in the modeling of NIF and NIF capsules. It was an impressive display of the UQ tools developed by Livermore, their codes, and computers. At the end of the study, they created an immensely detailed and carefully studied prediction of outcomes for the upcoming experiments. This was presented as a probability distribution function of capsule yield. It covered an order of magnitude from 900 kJ to 9 MJ of yield. When they started to conduct experiments, the results were not even in the range predicted by about a factor of three on the low side. The problems and dangers of UQ were laid bare.

“The mistake is thinking that there can be an antidote to the uncertainty.” ― David Levithan

If you want to achieve fusion the key is really hot and dense matter. The way to get this hot and dense matter is to adiabatically compress the living fuck out of material. To do this there is a belief in numerical Lagrangian hydrodynamics with numerical viscosity turned off in adiabatic regions gives good results. The methods they use are classical and oppositional to modern shock-capturing methods. To compute shocks properly needs dissipation and conservation. The ugly reality is that hydrodynamic mixing (excessively non-adiabatic dissipative and ubiquitous) is an anathema to Lagrangian methods. Codes need to leave the Lagrangian frame of reference and remap. Conserving energy is one considerable difficulty.

Thus, the methods favored at Livermore cannot successfully pass verification tests for strong shocks. Thus, verification results are not shown. For simple verification problems there is a technique to get good answers. The problem is that the method doesn’t work on applied problems. Thus, good verification results for shocks don’t apply to key cases where the codes are used. They know the results will be bad because they are following the fusion mantra. Failure to recognize the consequences of bad code verification results is magical thinking. Nothing induces magical thinking like a cultural perspective that values one thing over all others. This is a form of extremism discussed in the last blog post.

There are two unfortunate side effects. The first obvious one is the failure to pursue numerical methods that simultaneously preserve adiabats and compute shocks correctly. This is a serious challenge for computational physics. It should be vigorously pursued and developed by the program. It also represents a question asked by verification without an easy answer. The second side effect is a complete commitment to UQ as the vehicle for V&V where answers are given and questions aren’t asked. At least not really hard questions.

UQ is left and becomes the focus. It is much better for results and for giving answers. If those answers don’t need to be correct, we have an easy “success.”

“It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.” ― Richard P. Feynman

A Better Path Forward

Let’s be crystal clear about the UQ work at Livermore, it is very good. Their tools are incredible. In a purely in silico way the work is absolutely world-class. The problems with the UQ results are all related to gaps in verification and validation. The numerical results are suspect and generally under-resolved. The validation of the models is lacking. This gap is chiefly from the lack of acknowledgment of calibrations. Calibration of results is essential for useful simulations of challenging systems. NIF capsules are one obvious example. Global climate models are another. We need to choose a better way to focus our work and use UQ properly.

“Science is what we have learned about how to keep from fooling ourselves.” ― Richard Feynman

My first key recommendation is to ground validation with calibrated models. There needs to be a clear separation of what is validated and what is calibrated. One of the big parts of the calibration is the finite mesh resolution of the models. Thus the calibrations mix both model form error and numerical error. All of this needs to be sorted out and clarified. In many cases, these are the dominant uncertainties in simulations. They swallow and neutralize the UQ we spend our attention on. This is the most difficult problem we are NOT solving today. It is one of the questions raised by V&V that we need to answer.

“Maturity, one discovers, has everything to do with the acceptance of ‘not knowing.” ― Mark Z. Danielewski

The practice of verification is in crisis. The lack of meaningful estimates of numerical error in our most important calculation is appalling. Code verification has become a niche activity without any priority. Code verification for finding bugs is about as important as taking your receipt with you from the grocery store. Nice to have, but it’s very rarely checked by anybody. When it is checked, it’s because you’re probably doing something a little sketchy. Code verification is key to code and model quality. It needs to expand in scope and utility. It is also the recipe for improved codes. Our codes and numerical methods need to progress, They must continue to get better. The challenge of correctly computing strong shocks and adiabats is but one. There are many others that matter. We are still far from meeting our modeling and simulation needs.

“What I cannot create, I do not understand.” ― Richard P. Feynman

Ultimately we need to recognize how V&V is a partner to science. It asks key questions of our computational science. It is then the goal to meet these questions with a genuine search for answers. V&V also provides evidence of how well the question is answered, This question-and-answer cycle is how science must work. Without the cycle, progress stalls, and the hopes of the future are put at risk.

“If you thought that science was certain – well, that is just an error on your part.” ― Richard P. Feynman

In a Time of Extremes, Balance is the Answer

17 Saturday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

Today’s world is ruled by extreme views and movements. These extremes are often a direct reaction to progress and change. People often prefer order and a well-defined direction. The current conservative backlash is a reaction. Much of it is based on the discomfort of social progress and economic disorder. In almost all things balance and moderation are a better path. The middle way leads to progress that sticks. Slow and deliberate change is accepted, while fast and precipitous change invites opposition. I will discuss how this plays out in both politics and science. In science, many elements contribute to the success of any field. These elements need to be present for success. Without the balance failure is the ultimate end. Peter Lax’s work was an example of this balance.

“All extremes of feeling are allied with madness.” ― Virginia Woolf

The Extremes Rule Today

It is not controversial to say that today’s world is dominated by extremism. What is a bit different is to point to the world of science for the same trend. The political world’s trends are illustrated vividly on screens across the World. We see the damage these extremes are doing in the USA, Russia, Israel-Gaza and everywhere online. One extreme will likely breed an equal and opposite reaction. Examples abound today such as over-regulation or political correctness in the USA. In each, true excesses are greeted with equal or greater excesses reversing them.

Science is not immune to these trends and excesses. I will detail below a couple of examples where no balance exists. One program is the exascale computing program which focuses on computational hardware. A second example is the support for AI. It is similarly hardware-focused. In both cases, we fail to recognize how the original success was catalyzed. The impacts, the need, and the route to progress are not seen clearly. We have programs that can only see computing hardware and fail to see the depth and nature of software.

In science software is a concrete instantiation of theory and mathematics. If theory and math are not supported, the software is limited in value. The progress desired by these efforts is effectively short-circuited. In computing, software has the lion’s share of the value. Hardware is necessary, but not where the largest advances have occurred. Yet the hardware is the most tangible and visible aspect of technology. In a mathematical parlance, the hardware is necessary, but not sufficient. We are advancing science in a clearly insufficient way.

“To put everything in balance is good, to put everything in harmony is better.” ― Victor Hugo

Balance in All Things

Extremism is simple. Reality is complex. Simple answers fall apart upon meeting the real World. This is true for politics and science. When extreme responses to problems are implemented, they invite an opposite extreme reaction. Lasting solutions require a subtle balance of perspectives and sources of progress. The best from each approach is necessary.

Take the issue of over-regulation as an example. Simply removing regulations invites the same forces that created the over-regulation in the first place. A far better approach is to pare back regulation thoughtfully. Careful identification of excess regulation build support for the project. The same thing applies to the freedom of speech and the excesses of “woke” or “cancel culture”. Those movements overstepped and created a powerful backlash. They were also responding to genuine societal problems of bigotry and oppressive hierarchy. A complete reversal is wrong and will create horrible excesses inviting a reverse backlash. Again, smaller changes that balance progress with genuine freedom would be lasting.

With those examples in mind, let us turn to science. In the computational world, we have seen a decade of excess. First with computing and pursuit of exascale high-performance computing. Next, in a starkly similar fashion artificial intelligence became an obsession. Current efforts to advance AI are focused on computational hardware. Other sources of progress are nearly ignored. In each case, there is serious hype along with an appealing simplicity in the sales pitch. In both cases, the simple approach short-changes progress and hampers broader success and long-term progress.

Let’s turn briefly to what a more balanced approach would look like.

At the core of any discussion of science should be the execution of the scientific method. This has two key parts working in harmony, theory and experiment (observation). Making these two approaches harmonize is the route to progress. Theory is usually expressed in mathematics and is most often solved on computers. If this theory describes what can be measured in the real world, we believe that we understand reality. Better yet, we can predict reality leading to engineering and control. Our technology is a direct result of this and much of our societal prosperity.

With this foundation, we can judge other scientific efforts. Take the recent Exascale program, which focused on creating faster supercomputers. The project focused on computing hardware and computer science while not supporting mathematics and theory vibrantly. This is predicated on some poor assumptions that theory is adequate (it isn’t) and our math is healthy. Both the theory and math need sustained attention. Examining history shows that math and theory have been to core of progress in computing. It is worse than this. The exascale focus came as Moore’s law ended. This was the exponential growth in computing power that held for almost a half-century starting in 1965. Its end was largely based on encountering physical barriers to progress. The route to increased computing value should shift to theory and math (i.e., algorithms). Yet, the focus was on hardware trying to breathe life into the dying Moore’s law. It is both inefficient and ultimately futile

Today, we see the same thing happening with AI. The focus is on computing hardware even though the growth in power is incremental. Meanwhile, the massive breakthrough in AI was enabled by algorithms. Limits on trust and correctness of AI are also grounded in the weakness of the underlying math for AI. A more vibrant and successful AI program would reduce hardware focus and increase math and algorithm support. This would serve both progress and societal needs. Yet we see the opposite.

We are failing to learn from our mistakes. The main reason is that we can’t call them mistakes. In every case, we are seeing excess breed more excess. Instead, we should balance and strike a middle way.

“Progress and motion are not synonymous.” ― Tim Fargo

A Couple Examples

The USA today is living through a set of extremes that are closely related. There is a burgeoning authoritarian oligarchy emerging as the central societal power. Much of the blame for this development is out-of-control capitalism without legal boundaries. There is virtually limitless wealth flowing to a small number of individuals. With this wealth, political power is amassing. A backlash is inevitable. The worst correction to this capitalist overreach would be a suspension of capitalism. Socialism seems to be the obvious cure. A mistake would be too much socialism. It would throw the baby up with the bathwater. A mixture of capitalism and socialism works best. A subtle balance of the two is needed. The most obvious societal example is health care where capitalism is a disaster. We have huge costs with worse outcomes.

In science, recent years have seen an overemphasis on computing hardware over all else. My comments apply to exascale and AI with near equality. Granted there are differences in the computing approach needed for each, but commonality is obvious. The value of that hardware is bound to software. Software’s value is bound to algorithms, which in turn are grounded in mathematics. That math can be discrete, information-related, or a theory of the physical world. This pipeline is the route to computing’s ability to transform our world and society. The answer is not to ignore hardware but to moderate it with other parts of the computing recipe for science. Without that balance the pipeline empties and becomes stale. That has probably already happened.

As I published this article, news of the death of Peter Lax arrived. I was buoyed by the prominence of his obituary in the New York Times. Peter’s work was essential to my career, and it is good to see him appropriately honored. Peter was the epitome of the creation of the value in the balance I’m discussing here. He was the consummate mathematical genius applying it to solve essential problems. While he contributed to applications of math, his work had the elegance and beauty of pure math. He also recognized the essential role of computing and computers. We would be wise to honor his contributions by following his path more closely. I’ve written about Peter and his work on several occasions here (links below).

“Keep in mind that there is in truth no central core theory of nonlinear partial differential equations, nor can there be. The sources of partial differential equations are so many – physical, probalistic, geometric etc. – that the subject is a confederation of diverse subareas, each studying different phenomena for different nonlinear partial differential equation by utterly different methods.”– Peter Lax

https://williamjrider.wordpress.com/2015/06/25/peter-laxs-philosophy-about-mathematics/

https://williamjrider.wordpress.com/2016/05/20/the-lax-equivalence-theorem-its-importance-and-limitations/

https://williamjrider.wordpress.com/2013/09/19/classic-papers-lax-wendroff-1960/

“To light a candle is to cast a shadow…” ― Ursula K. Le Guin

Epilog: What if Hallucinations are Really Bullshit?

07 Wednesday May 2025

Posted by Bill Rider in Uncategorized

≈ 3 Comments

“It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction.” ― Harry G. Frankfurt, On Bullshit

Its really cool when your blog post generates a lot of feedback. Its like when you give a talk, and you get lots of questions. It is a sign that people give a fuck. No questions, is not engagement and lots of no fucks given.

One friend sent me an article, “ChatGPT is Bullshit.” It was riffing on Harry Frankfurt’s amazing monograph, “On Bullshit.” To put it mildly, bullshit is much worse than a hallucination. Even if that hallucination is produced by drugs. Hallucinations are innocent and morally neutral. Bullshit is unethical. The paper makes the case that LLMs are bullshitting us, not offering some innocent hallucinations. We should apply the same standard to computational modeling of the classical sort.

This is, of course, not a case of anthropomorphizing the LLM. The people responsible for designing LLMs want it to provide answers. Providing answers makes the users of LLMs happy. Happy users use their product. Unhappy ones don’t. Bullshit is willful deception. It is deception with a purpose. We should be mindful about willful deception for classical modeling & simulation work. In a subtle way the absence of due diligence such as avoiding V&V is treading close to the line. If V&V is done and then silenced and ignored, you have bullshit. So I’ve seen a lot of bullshit in my career. So have you.

Bullshit is a pox. We need to recognize and eliminate bullshit. Bullshit is the enemy of truth. It is vastly worse than hallucinations and demands attention.

“The bullshitter ignores these demands altogether. He does not reject the authority of the truth, as the liar does, and oppose himself to it. He pays no attention to it at all. By virtue of this, bullshit is a greater enemy of the truth than lies are.” ― Harry G. Frankfurt,

Hicks, Michael Townsen, James Humphries, and Joe Slater. “ChatGPT is bullshit.” Ethics and Information Technology 26, no. 2 (2024): 1-10.

Does Modeling and Simulation Hallucinate Too?

03 Saturday May 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

One of the most damning aspects of the amazing results from LLMs are hallucinations. These are garbage answers delivered with complete confidence. Is this purely an artifact of LLMs, or more common than believed? I believe the answer is yes. Classical modeling and simulations using differential equations can deliver confident results without credibility. In the service of prediction this can be common. It is especially true when the models are crude or heavily calibrated then used to extrapolate away from data. The key to avoiding hallucination is the scientific method. For modeling and simulation this means verification and validation. This requires care and due diligence be applied in all cases of computed results.

“It’s the stupid questions that have some of the most surprising and interesting answers. Most people never think to ask the stupid questions.” ― Cory Doctorow

What are Hallucinations in LLMs?

In the last few years one of the most stunning technological breakthroughs are Large Language Models (LLMs, like ChatGPT, Claude, Gemini …). This breakthough has spurred visions of achieving artificial general intelligence soon. The answers to queries are generally complete and amazing. In many contexts we see LLMs replacing search as a means of information gathering. It is clearly one of the most important technologies for the future. There is the general view of LLMs broadly driving the economy of our future. There is a blemish on this forecast, hallucinations! Some of these complete confident answers are partial to complete bullshit.

“I believe in everything until it’s disproved. So I believe in fairies, the myths, dragons. It all exists, even if it’s in your mind. Who’s to say that dreams and nightmares aren’t as real as the here and now?” ― John Lennon

A LLM answers questions with unwavering confidence. Most of the time this is well grounded in objective facts. Unfortunately, this confidence is shown when results are false. Many examples show that LLMs will make up answers that sound great, but are lies. I asked ChatGPT to create a bio for me and it constructed a great sounding lie. It had me born in 1956 (1963) with a PhD in Math from UC Berkeley (Nuke Engineering, New Mexico). Other times I’ve asked to elaborate on experts in fields I know well. More than half the information is spot on, but a few experts are fictional.

“It was all completely serious, all completely hallucinated, all completely happy.” ― Jack Kerouac

The question is what would be better?

In my opinion the correct response is for the LLM to say, “I don’t know.” “That’s not something I can answer.” To a some serious extent this is starting to happen. LLMs will tell you they don’t have the ability to answer a questions. You also get responses like “this questions violate their rules.” We see the LLM community responding to this terrible problem. The current state is imperfect and hallucinations still happen. The community guidelines for LLMs are tantamount to censorship in many cases. That said, they are moving toward dealing with it.

Do classical computational models have the same issue?

Are they dealing with problems as they should?

Yes and no.

“I don’t paint dreams or nightmares, I paint my own reality.” ― Frida Kahlo

What would Hallucination be in Simulation?

Standard computational modeling is thought to be better because it is based on physical principles. We typically solve well defined and accepted governing equations. These equations are solved in a manner that is based on well known mathematical and computer science methods. This is correct, but it is not bulletproof. The reasons are multiple. One of the principal ways problems occur are the properties of the materials in a problem. A second way is the inclusion of physics not included in the governing equations (often called closure). A third major category is the construction of a problem in terms of initial and boundary conditions, or major assumptions. Numerical solutions can be under-resolved or produce spurious solutions. Mesh resolution can be suspect or inadequete for the numerical solution. The governing equations themselves include assumptions that may not be true or apply to the problem being solved..

“Why should you believe your eyes? You were given eyes to see with, not to believe with. Your eyes can see the mirage, the hallucination as easily as the actual scenery.” ― Ward Moore

The major weakness is the need for closure of the physical models used. This can take the form of constitutive relations for the media in the problem. It also applies to unresolved sales or physics in the problem. Constitutive relations usually abide by well defined principles quite often grounded in thermodynamics. They are the product of considering the nature of the material at scales under the simulation’s resolution. Almost always these scales are considered to be averaged/mean values. Thus the variability in the true solution is excluded from the problem. Large or divergent physics can emerge if the variability of materials is great at the scale of the simulation. Simple logic dictates that this variability grows larger as the resolution of a simulation becomes smaller.

A second connected piece of this problem is subscale physics not resolved, but dynamic. Turbulence modeling is the classical version of this. These models have significantly limited applicability and great shortcomings. This gets to the first category of assumption that needs to be taken in account. Is the model being used in a manner appropriate for it. Models also interact heavily with the numerical solution. The numerical effects/errors can often mimic the model’s physical effects. Numerical dissipation is the most common version of this, Turbulence is a nonlinear dissipative process, and numerical diffusion is often essential for stability. Surrounding all of this is the necessity of identifying unresolved physics to begin with.

Problems are defined by analysts in a truncated version of the universe. The interaction of a problem with that universe is defined by boundary conditions. The analyst also defines a starting point for a problem if it involves time evolution. Usually the state a problem starts from in a simple quiessiant version of reality. Typically it is far more homogeneous and simple that how reality is. The same goes to the boundary conditions. Each of these decisions influences the subsequent solution. In general these selections make problem less dynamically rich than reality.

Finally, we can choose the wrong governing equations based on assumptions about a problem. This can include choosing equations that leave out major physical effects. A compressible flow problem cannot be described by incompressible equations. Including radiation or multi-material effects is greatly complicating. Radiation transport has a heirarchy of equations ranging from diffusion to full transport. Each level of approximation involves vast assumptions and loss of fidelity. The more complete the equations are physically, the more expensive they are. There is great economy in choosing the proper level for modeling. The wrong choice can produce results that are not meaningful for a problem.

Classical simulations are distiguished by the use of numerical methods to solve equations. This produces solution to these equations far beyond where they are analytical. These numerical methods are grounded in powerful proven mathematics. Of course, the proven powerful math needs to be used and listened to. When it isn’t the solutions are suspect. Too often corners are cut and the theory is not applied. This can result in poorly resolved or spurious solutions. Marginal stability can threaten solutions. Numerical solutions can be non-converged or poorly resolved. Some of the biggest issues numerically are seen with solutions labeled as direct numerical simulation (DNS). Often the declaration of DNS means all doubt and scrutiny is short-circuited. This is dangerous because DNS is often treated as a substitute for experiments.

These five categories of modeling problems should convince the reader that mistakes are likely. These errors can be large and create extremely unphysical results. If the limitations or wrong assumptions are not acknowledged or known, the solution might be viewed as hallucinations. The solution may be presented with extreme confidence. It may seem to be impressive or use vast amounts of computing power. It may be presented as being predictive and an oracle. This may be far from the truth

The conclusion is that classical modeling can definitely hallucinate too! Holy shit! Houston, we have a problem!

“Software doesn’t eat the world, it enshittifies it” – Cory Doctorow

How to Detect Hallucinations?

To find hallucinations, you need to look for them. There needs to be some active doubt applied to results given by the computer. Period.

The greatest risk is for computational results to be a priori believed. Doubt is a good thing. Much of the problem with hallucinating LLMs is the desire to give the user an answer no matter what. In the process of always giving an answer, bad answers are inevitable. Rather than reply that the answer can’t be given or is unreliable, the answer is given with confidence. In response to this problem LLMs have started to respond with caution.

“Would you mind repeating that? I’m afraid I might have lost my wits altogether and just hallucinated what I’ve longed to hear.” ― Jeaniene Frost

The same thing happens with classical simulations. In my experience the users of our codes want to always get an answer. If the codes succeed at this, some of the answers will be wrong. Solutions do not include any warnings or caveats. There are two routes to doing this. Technically the harder route is for the code itself to give warnings and caveats when used improperly. This would require the code’s authors to understand its limits. The other route is V&V. This requires the solutions to be examined carefully for credibility using standard techniques. Upon reflection, both routes go through the same steps, but applied systematically to the code’s solutions. The caveats are simply applied up front if the knowledge of limits is extensive. This can only be achieved through extensive V&V.

Some of these problems are inescapable. There is a way to minimize these hallucinations systematically. In a nutshell the scientific method offers the path for this. Again, we see verification and validation is the scientific method for computational simulation. It offers specific techniques and steps to guard against the problems outlined above. These steps offer an examination of the elements going into the solution. The suitability of the numerical solutions of the models and the models themsevles are examined critically. Detailed comparisons of solutions are made to experimental results. We see how well models produce solutions that model reality. Of course, this is expensive and time consuming. It is much easier to just accept solutions and confidently forge ahead. This is also horribly irresponsible.

“In the land of the blind, the one-eyed man is a hallucinating idiot…for he sees what no one else does: things that, to everyone else, are not there.” ― Marshall McLuhan

How to Prevent Hallucinations?

To a great extent the problem of hallucinations cannot be prevented. The question is whether the users of software are subjected to it without caution. An alternative is for the users to be presented with a notice of caution with results. This does point to the need for caution to be exercised for all computational results. This is true for classical simulations and LLM results. All results should be treated with doubt and scrutiny.

V&V is the route to this scrutiny. For classical simulation V&V is a well-developed field, but often not practiced. For LLMs (AI/ML) V&V is nascent and growing, but immature. In both cases, V&V is difficult and time consuming. The biggest impediment is to V&V is a lack of willingness to do it. The users of all computational work would rather just blindly accept results than apply due diligence.

For the users it is about motivation and culture. Is the pressure on getting answers and moving to the next problem? or getting the right answer? My observation is that horizons are short-term and little energy motivates getting the right answer. With fewer experiments and tests to examine the solutions, the answer isn’t even checked. Where I’ve worked this is about nuclear weapons. You would think that due diligence would be a priority. Sadly it isn’t. I’m afraid we are fucked. How fucked? Time will tell.

“We’re all living through the enshittocene, a great enshittening, in which the services that matter to us, that we rely on, are turning into giant piles of shit.” – Cory Doctorow

V&V: Too Much Process And Not Enough Science

21 Monday Apr 2025

Posted by Bill Rider in Uncategorized

≈ Leave a comment

tl;dr

The narrative of verification and validation (V&V) is mired in process. The process is in the service of V&V as a means of assessment and credibility. This makes V&V as exciting as a trip to the DMV. The V&V community would be far better served to connect itself to science. Science by its very nature is inspirational serving as the engine of knowledge and discovery. As I noted before, V&V is simply the scientific method applied to computational science. This model serves far better than processes for engineering assessment.

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” ― Albert Einstein

The Symposium

A couple weeks ago I went to one of my favorite meetings of the year, the ASME VVUQ Symposium. It is an area of knowledge and personal expertise. I’ve committed and contributed a lot to it. The topics are important and interesting. The conference also seems to be slowly dying. The number of attendees and papers is dropping in number. This parallels the drop in interest in V&V as a whole. In published research, the V&V content has dropped off as practice is getting worse. Editorial standards are dropping. By the same token V&V content in research proposals is dropping off. Where it is expected, it is seen only as a duty and generally half-hearted.

The health of the conference can be measured by the length of it. The third day was canceled. Signs are bad.

In many places, it has been displaced by uncertainty quantification (UQ). UQ has the luxury of mostly producing results “in silico” totally artificially. You get results without having to go outside a code. Thus, objective reality can be avoided. The actual university because is a pain in the ass and a harsh mistress. UQ is full of complex mathematics and has inspired great work worldwide. It is an indispensable tool for V&V when used correctly. As computing has become the focus of funding, UQ has become the thing. Part of the reason is the huge appetite for computing UQ provides along with avoiding reality. The impact is the erosion of V&V.

There is a deeper perspective to consider. V&V has moved into practical use as part of applied engineering using simulation. V&V is a process to determine the quality of results. The key to that sentence is the word “process” and process sucks. Process is something to hate. Process is defined when practice is missing. Rather than an engine of progress and improvement, V&V has become an impediment to results. The movement to process is an implicit acknowledgment that practice doesn’t exist.

This is truly sad as V&V is equivalent to the scientific method. It absolutely should be standard practice. We should be working to change this.

“Societies in decline have no use for visionaries.” ― Anais Nin

Process is Death

“A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it.” ― Frank Herbert

Process is something that is getting a lot of attention lately. The explosion of process is being recognized as an overhead and impediment to getting shit done. Process is a bureaucracy with its dullness and boredom. Rather than being interesting and exciting, it is dull and formal. The same thing is true in government. Process is in the way of everything. Things are checked and double-checked, then triple-checked. As Tim Trucano said, “V&V takes the fun out of computational simulation.” Science is fun and V&V is all engineering and pocket-protectors.

Process, regulation, and procedure are also a reaction to problems. When principles do not guide action regulations become necessary. The lack of principle is still there and resistance will occur. We see this all the time across society. We also see process and procedure taking the place of a professional practice. If the professional practice is in place, the process becomes natural. It can also adapt and bend to the circumstance instead of blindly following. There is also a suspension of judgment that right sizes effort. Sometimes the situation calls for more effort and others call for less. A professional practice guided by principles can do this. Regulation is not this.

Fortunately, the process overload has been recognized broadly. Recently Ezra Klein’s book, Abundance, has put this into the discourse. It provides examples and evidence of process overreach while suggesting a path to a better future. The example of California’s high-speed rail is compelling. There the regulations and process have led to little progress and huge costs. I’ve seen the same with nuclear power where process and regulation have removed progress. Everything costs much more and takes longer than it should. Rather than produce safe and sensible progress, process is a recipe for no progress. V&V feels like its the same thing to many practicing engineers and scientists.

It doesn’t need to be. It shouldn’t be.

V&V needs to be an engine of progress and excellence. The right way to orient V&V is as science. This should be easy because V&V is the scientific method for computational science. Here V&V can find its principles. Verification connects directly to modeling, and validation connects directly to experiments. Modeling that provides good results for experiments is the goal. V&V provides proof of the success (or failure). Going through the steps of V&V can be a practice where practitioners of modeling find confidence. It can be a means to produce evidence for doubters. I will not that the practitioner should be the first doubter and need to convince themselves.

Science is Inspiration

“We are what we pretend to be, so we must be careful about what we pretend to be.” ― Kurt Vonnegut,

Science is the place to look for a different path for V&V. The fact is that V&V should be essential for the conduct of science when computing is used. Not as a demand, but because it is simply how the scientific method works. Models are essential for science and drive prediction. These models are then tested and refined by experiments of observations. Models are mathematical and most often applied via numerical methods and codes. This points directly to the practice of verification. Validation is exactly the synthesis of the modeling with comparison to measurements. My proposition is that more V&V should be happening naturally. Science done properly should pave the way.

This is not happening.

If one looks to the scientific literature, V&V practice is receding. If we go back in time 30 years we saw a push for V&V in publishing. Editorial standards were introduced to enforce the push. Referees were haphazard and uneven with modest support by editors. The result was a temporary advance in the practice. As time has proceeded the advance has blunted and we’ve regressed to the former mean.

To some extent, this is an indictment of the literature. There is a gap in current practice and the proper scientific method. Unfortunately, progress starting 30 years ago was not sustained. Part of the issue is genuine animosity toward V&V from many quarters. I attribute much of this animosity to the dull process aspect of V&V. A worry about V&V as regulation contributed to this pushback. V&V as a process and requirement also challenged the role of editors and referees as the ultimate gatekeepers of science.

“In the long run, we shape our lives, and we shape ourselves. The process never ends until we die. And the choices we make are ultimately our own responsibility.” ― Eleanor Roosevelt

Science has always had an element of magic to it. The ability of models expressed in mathematics to describe the universe is incredible. It does feel almost magical when you first encounter it. Progress is a constant source of wonder. V&V is often a source of doubt. As such it challenges progress and is resisted by many. Instead, V&V should be a source of further focus and inspiration for science. It is an engine for better science and more solid progress. Science should also be the place for V&V to claim its place and legitimacy. V&V provides evidence of where science should focus on progress. Again, this challenges the gatekeepers.

If V&V continues to be a regulatory and bureaucratic process, it will die, It becomes part of our modern decline and descent into mediocrity. The path forward for V&V is to be an engine of knowledge and discovery. This focuses on action through principles and the adoption of practices that science depends upon. Good V&V is good science and could flourish as such.

“Whoever fights monsters should see to it that in the process he does not become a monster. And if you gaze long enough into an abyss, the abyss will gaze back into you.” ― Friedrich Nietzsche

https://williamjrider.wordpress.com/2016/12/22/verification-and-validation-with-uncertainty-quantification-is-the-scientific-method/

https://williamjrider.wordpress.com/2016/10/25/science-is-still-the-same-computation-is-just-a-tool-to-do-it/