Direct numerical simulation (DNS): it is not what you think it is

“Your life is not a simulation; it’s the real game. Play wisely.”― Richelle E. Goodrich

Direct numerical simulation (DNS) is one of the most powerful uses of our vast computing power. With that power comes great responsibility. That responsibility is currently not being met by the vast majority of practitioners. The common issue is a lack of attention to accuracy. This is basic quality control. Some of what gets called direct numerical simulation is nothing more than marketing for the extremely expensive, powerful computers. Marketing because we spend so much time and money on them.

Numerical simulation in general is not practiced with the care its promise deserves. That promise is access to vast quantities of precise data that rival experiments in their power to unveil the mysteries of the universe. Much of the problem comes down to verification and validation. These activities are essential for ensuring the quality of computations. As a rule, DNS does not lend itself to high-quality verification and validation (V&V). Instead, they rely on rules of thumb and expansive claims about accuracy. Many of the people who consume DNS results treat a DNS as equivalent to a declaration that the results are exact. This is a patently absurd notion that should be rejected reflexively.

I have written about this before, and I will reiterate some of the main points here. Over the past ten years, I have encountered these practices more frequently, engaged with some of the most prominent practitioners, and gained perspective. It is also worth mapping perspectives on DNS onto the claims now being made about AI. As it turns out, the two subjects are closely connected. The hubris and the sweeping claims surrounding DNS feel like a reflection of the hubris and the sweeping claims about AI.

“The simulation had now become indistinguishable from real life.”― Ernest Cline

Questions about the legitimacy and accuracy of DNS are best framed in two ways. First, whether the physical laws being solved to high accuracy actually describe the physical phenomena of interest. Next, does the accuracy of the numerical treatment meet requirements? Second, the numerical treatment itself. Numerical solutions to the equations of physics, typically partial differential equations, are intrinsically approximate, and those approximations carry errors. In general, both the physical model and the numerical method are assumed to be highly accurate. It is damning that the errors associated with them are rarely, if ever, estimated and reported as part of a DNS study.

A good place to start is the most common and well-known version of DNS: Navier–Stokes fluid turbulence. This is the practice that made DNS famous, and it is often the most well-developed approach. As a result, it also exhibits almost all of the common pathologies. Both the good practices and the pathological ones deserve discussion, because the latter probably require more care than they are usually given. The habits of research communities often run counter to better practice, and they can encourage some of the more egregious examples of overreach and missing quality control.

“The Navier-Stokes equation probably contains all of turbulence.” – Uriel Frisch

This form of DNS begins with the widely accepted contention that the incompressible Navier–Stokes equations contain all of turbulence. Uriel Frisch states this explicitly in his book Turbulence. I think the claim deserves more scrutiny than it gets. For one thing, all of these physical laws are to some degree approximations of continuum behavior, behavior that is itself non-continuum in nature. The deeper problem is that incompressible flows do not exist in nature. There is no such thing as an incompressible flow. This is easy to see: an incompressible flow implies an infinite sound speed, or, as a friend from Los Alamos used to quip, superluminal sound waves (sound traveling faster than light). What incompressibility really does is eject thermodynamics from the system of equations in any meaningful sense. Given that fluid turbulence remains a mystery, throwing thermodynamics out of the equations seems more than a little foolish.

“The law that entropy always increases holds, I think, the supreme position among the laws of Nature. If someone points out to you that your pet theory of the universe is in disagreement with Maxwell’s equations – then so much the worse for Maxwell’s equations. If it is found to be contradicted by observation – well, these experimentalists do bungle things sometimes. But if your theory is found to be against the Second Law of Thermodynamics I can give you no hope; there is nothing for it to collapse in deepest humiliation.” ― Arthur Eddington

The folly runs deeper when you consider some of the best-known facts about turbulence. The first is the broad acceptance that turbulence has, in some form, a singularity associated with it. Proving the existence or non-existence of that singularity, whether smooth solutions exist for all time, is the essence of one of the Clay Millennium Prize problems. The singularity is seen most clearly in the famous Kolmogorov four-fifths law, which shows that as viscosity goes to zero, dissipation approaches a finite value. The compressibility that has been ejected from the equations is precisely a mechanism by which a singularity would naturally form; this is the same way one forms in standard compressible flow.

It would be a genuine irony if it turned out that turbulence has little or nothing to do with the incompressible flow equations. This would then mean that the Clay prize itself is meaningless. The solution would simply be an oddity of higher mathematics. The non-solution to the problem is probably telling us something! It would then be nothing more than the study of a challenging and oddly difficult class of equations that were believed to have physical significance, but in reality had little, other than as a useful approximation for a broad class of flows that does not include turbulence.

One key feature of compressible flows is the presence of a clear, phenomenological structure that leads to the formation of the singularities the four-fifths law points to. The same structure and dynamics appear in shock wave formation and propagation. The dissipation, or entropy creation, rates are functionally similar, being cubic in the difference in longitudinal velocity. The main difference is that compressible flow has a theory that only works in one dimension, whereas turbulence is a three-dimensional theory. What you have in turbulence is a field looking futilely at a horrendous physical system, incompressible Navier–Stokes, while pushing aside an obvious solution to the problem, compressible Navier–Stokes.

What we have is the pursuit of an essential physical theory using a system of equations that combines hyperbolic, parabolic, and elliptic forms, and that refuses to yield to the most powerful mathematical analyses available to mankind. We still do not have any constructive proof of a singularity. By removing the unphysical aspect of this system, the divergence-free velocity, we get singularities forming naturally. This is a well-posed system that matches the kinds of singularities and rate-of-production behavior we expect from theory and experiment. Frankly, it boggles my mind that we continue to pursue this theory down the incompressible rat hole.

Incompressibility removes sound waves from the equations, and it also removes thermodynamics. The key point is that sound waves are the precise physical mechanism in compressible flow that produces singularities. That is the other essential nonlinearity that the incompressible flow equations make completely degenerate. Frankly, it is no wonder we have failed to make real progress in nearly a century. This is the first and perhaps most important objection to current DNS practice.

The second concerns the numerical methods and the integration of the equations. The prevailing standards rest on rules of thumb established in the foundational channel-flow simulations of the early-to-mid 1980s, with resolution set relative to the Kolmogorov length. These give rough accuracy bounds; a stated error on the order of five percent is commonly used to set resolution. This is best defined in Moin and Mahesh’s review paper of 1998. It deserves more scrutiny. The current rules of thumb produce flows that look reasonably well resolved, but there is no well-established sense of the error. Usually, there is no real knowledge of the numerical errors incurred in integrating a DNS. To put it bluntly, error bars do not exist for these calculations. Where error bars do appear, they almost always reflect the statistical convergence of computed quantities, not the numerical error of the solution.

“This defines the minimum scale, the size of the smallest feature in the flow.” – Henshaw, Kreiss & Reyna

The Kolmogorov length is an energy-norm scale that marks where dissipation occurs in a turbulent flow (L2 norm). To say the least, it yields a fairly optimistic view of how computational effort scales with Reynolds number. Others have taken even more pessimistic views, most notably Kreiss, who worked from an L∞-norm length scale. The question is what is the necessary scale to resolve? That estimate puts detailed simulation of turbulence completely out of reach for any meaningful Reynolds number. This may well be the right view, if singularities are the heart of turbulence and the proper focus of any DNS. If turbulent flows are weak solutions perhaps a L1 norm view would be appropriate. My fear is that it is true: that the resolution of singularities in turbulent flow is exactly the secret we are missing, and the breakthrough we so badly want.

Now consider the cultural side of DNS practice. The published literature, and the credit for contributing to our knowledge of turbulence, is driven by computing DNS at the highest Reynolds number possible. That pursuit leads to corner-cutting and less care, which works directly against the questions raised above and against the error estimation and quality assurance the field so badly needs.

The field needs high quality because DNS is so often used to replace or augment experimental data. When computation stands in for experiment, it should be held to the same standards as experiment, the same rigorous procedures. Actually arguably to higher standards, since this is a man-made source of data. In almost every respect the opposite is true. DNS is simply assumed to be like experimental data, only more copious and easier to obtain, at least once you have the high-performance computing needed to produce it.

The same trends appear in other fields that use “first-principles” calculations to do DNS. In molecular dynamics, for example, potentials are used to describe the behavior of molecules. These potentials are highly accurate, but still approximate and imperfect, compact descriptions of the physical behavior rather than the behavior itself. The same mindset prevails: the prize goes to the biggest, most expansive, largest-scale simulation one can achieve. All of it works against the pursuit of quality. V&V is largely absent and surplus to requirements.

“It takes less time to do a thing right than to explain why you did it wrong.”― Longfellow

Finally, you reach the ragged edge of what gets called DNS. These are the simulations that are largely marketing exercises on the part of institutions looking to promote themselves. Here a DNS is simply a very large-scale calculation.

I have seen a great deal of this at the national labs, where you will find a code solving the Euler equations together with some other combination of physics to produce a very expensive, very detailed model of some system. It gets promoted as a DNS purely on the strength of the computing resources consumed. The calculation is enormous, and it is called a DNS by virtue of being massive.

This is not to say such exercises are useless. Calling them DNS does a disservice to every other DNS, and lends them an air of legitimacy and truth they have not earned. They are best understood as exploratory attempts to explain complex phenomena, a worthwhile and valuable use of computing, but not direct numerical simulation. What they really are is marketing, for the very expensive computers and the very expensive programs these laboratories are engaged in. That institutions get away with it calls into question the nature of peer review and the quality of the broader scientific enterprise they are part of.

All of this brings us full circle, back to ideas related to AI. The current push for computing at a massive scale is focused on AI, and you have the same claims that massive quantities of data and computing lead to some sort of magical access to the truth. Fortunately, we have already seen through this, in the much-noted discussion of how often large language models hallucinate and tell falsehoods. That is largely a positive thing to consider going forward. We see the problem; now we need to solve it.

A deeper issue to consider is whether the hallucinations we see in AI are also present in DNS. Do DNS results hallucinate as well, and if so, how do we find them? In both cases, identifying and eliminating these hallucinations is a key technological advance worth pursuing. Given the economic, political, and national security consequences of AI, that pursuit moves over into something much closer to a life-and-death struggle.

There is a clear path forward for both DNS and AI. This is V&V and lots of it. The approach to making progress is straightforward, even if the work is detailed, technically demanding, and requires handling uncertainty and the fidelity of calculations. That is why we fund research in the first place, right?

In both areas, the first priority should be genuine measurement and testing of accuracy, with a clear understanding of the error uncertainty and the computational cost. This applies whether we are dealing with DNS of turbulence or a LLM. Developing this accuracy is essential, because it is not simple to measure and has multiple layers.

“A brand that feels human earns something no algorithm can replicate: trust” ― Warren Kornblum

Efficiency is the second key factor. In other words, how much computing cost is needed to achieve a certain level of accuracy? To know the efficiency, knowledge of accuracy is essential. In the AI case, for example, the cost per token has become a problem. The recent approach to tokenmaxxing has led some companies to withdraw support for AI and reassess its utility and value. This is positive, if we focus on improving efficiency and avoid trying to solve problems by throwing more and more computing power at them. How efficiently and effectively we use the computing power we have matters greatly, and this has been a problem across computational science.

“There is nothing so useless as doing efficiently that which should not be done at all.” ― Peter F. Drucker

Ignorance of accuracy and efficiency has led to stagnation in methods and methodology. This comes with an attitude that says, “Methods are done, there is nothing to do here.” Nothing could be further from the truth. This stagnation is antithetical to progress. We see it in the quest for high-resolution methods, where an obsession formal high-order accuracy has killed the ability to develop more efficient, more effective methods. There we never measure accuracy on practical problems. In either case, the joint focus should be on the accuracy and fidelity achieved on practical, real-world problems, using idealized problems only to guide us. Only using idealized problems where accuracy there can be directly tied to accuracy in the real world.

Brute-force computing is an amazing thing to have. What has become clear is the vast cost of that computing. It is becoming a huge technical and political issue. We should feel duty-bound to use it as effectively and efficiently as possible. This pursuit of efficiency should be a unifying principle across the world of computational science, driving important, real-world impacts.

We should also recognize that the pathologies of high-performance computing that have consumed computational science for the past decade or more are now being inherited by AI. The whole notion of data centers is the sharp end of the spear here. The AI world needs to be more mindful about using that computing power efficiently and effectively. The unfortunate thing is that there is an obsession with raw computing power, without regard for the efficiency or the accuracy that results from its use. This has been a plague on the field, and it needs a correction sooner rather than later.

“People don’t buy what you do; they buy why you do it. And what you do simply proves what you believe”― Simon Sinek

References

Bethe, H. A. “The Theory of Shock Waves for an Arbitrary Equation of State.” Office of Scientific Research and Development, Report No. 545, 1942.

Frisch, Uriel. Turbulence: The Legacy of A. N. Kolmogorov. Cambridge: Cambridge University Press, 1995.

Henshaw, William D., Heinz-Otto Kreiss, and Luis G. Reyna. “Smallest Scale Estimates for the Navier–Stokes Equations for Incompressible Fluids.” Archive for Rational Mechanics and Analysis 112, no. 1 (1990): 21–44.

Kolmogorov, Andrey Nikolaevich. “The Local Structure of Turbulence in Incompressible Viscous Fluid for Very Large Reynolds Numbers.” Comptes Rendus (Doklady) de l’Académie des Sciences de l’URSS 30 (1941): 301–305.

Menikoff, Ralph, and Bradley J. Plohr. “The Riemann Problem for Fluid Flow of Real Materials.” Reviews of Modern Physics 61, no. 1 (1989): 75–130.

Moin, Parviz, and Krishnan Mahesh. “Direct Numerical Simulation: A Tool in Turbulence Research.” Annual Review of Fluid Mechanics 30 (1998): 539–578.

The Regularized Singularity

~ The Eyes of a citizen; the voice of the silent

Direct numerical simulation (DNS): it is not what you think it is

References

Leave a comment Cancel reply

References

Share this:

Related

Leave a comment Cancel reply