I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
― Abraham Maslow
High performance computing is a big deal these days and may become a bigger deal very soon. It has be
come a new battleground for national supremacy. The United States will very likely soon commit to a new program for achieving progress in computing. This program by all accounts will be focused primarily on the computing hardware first, and then the system software that directly connects to this hardware. The goal will be the creation of a new generation of supercomputers that attempt to continue the growth of computing power into the next decade, and provide a path to “exascale”. I think it is past time to ask, “do we have the right priorities?” “Is this goal important and worthy of achieving?”
Lack of direction, not lack of time, is the problem. We all have twenty-four hour days.
― Zig Ziglar
I’ll return to these two questions at the end, but first I’d like to touch on an essential concept in high performance computing, scaling. Scaling is a big deal, it measures success in computing, in a nutshell describes efficiency of solving problems particular with respect to changing problem size or computing resource. In scientific computing one of the primary assumptions is that bigger faster computers yield better, more accurate results that have greater relevance to the real world. The success of computing depends on scaling and breakthroughs in achieving it, defines the sort of problems that could be solved.
Nothing is less productive than to make more efficient what should not be done at all.
― Peter Drucker
There are se
veral types of scaling with distinctly different character. Lately the dominant scaling in computing has been associated with parallel computing performance. Originally the focus was on strong scaling, which is defined by the ability of greater computing resources to solve a problem of fixed size faster. In other words perfect strong scaling would result from solving a problem twice as fast with two CPUs than with one CPU.
Lately this has been replaced by weak scaling where the problem size is adjusted along with the resource. The goal is to solve a problem that is twice as big with two CPUs just as fast as the original problem is solved with one CPU. These scaling results depend both on the software implementation and the quality of the hardware. They are the stock and trade of success in the currently envisioned high performance-computing program nationally. They are also both relatively unimportant and poor measures of the power of computing to solve scientific problems.
Two things are infinite: the universe and human stupidity; and I’m not sure about the universe.
― Albert Einstein
Algorithmic scaling is another form of scaling and it is massive in its power. We are failing to measure, invest and utilize it in moving forward in computing nationally. The gains to be made through algorithmic scaling will almost certainly lay waste to anything that computing hardware will deliver. It isn’t that hardware investments aren’t necessary; they are simply grossly over-emphasized to a harmful degree.
The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom.
― Isaac Asimov
The archetype of a
lgorithmic scaling is sorting a list, which is an amazingly common and important function for a computer program. Common sorting algorithms are things like insertion, or quick-sort, and each comes with a scaling for the memory required and the number of operations to work to completion. In most cases the best that can be done is linear scaling, in other words for a list that is items long, it takes order
operations. This means that for a sufficiently large list the cost is proportional to some constant times the length of the list,
. Other high-grade algorithms like quicksort take order
, but may carry a smaller constant. These can be faster for shorter lists. If one chooses very poorly the sorting can scale like
. There are also
aspects of the algorithm and it’s scaling that speak to the memory-storage needed and the complexity of the algorithm’s implementation. These themes carry on to a discussion of more esoteric computational science algorithms next.
In scientific computing two categories of algorithm loom large over substantial swaths of the field: numerical linear algebra, and discretization-methods. Both of these categories have important scaling relations associated with their use that have a huge impact on the efficiency of solution. We have not been paying much attention at all to the efficiencies possible from these areas. Improvements in both areas could yield improvements in performance that would put any new computer to shame.
For numerical linear algebra the issue is the cost of solving the matrix problem with respect to the number of equations. For the simplest view of the problem one uses a naïve method like Gaussian elimination (or LU decomposition), which scales like where
is the number of equations to be solved. This method is designed to solve a dense matrix where there are few non-zero entries. In scientific computing the matrices are typically “sparse” meaning most entries are zero. An algorithm specifically for sparse matrices lowers the scaling to
. These methods both produce “exact” solutions to the system (modulo poorly conditioned problems).

If an approximate solution is desired or useful one can use lower cost iterative methods. The simplest methods like the Jacobi or Gauss-Seidel iteration also scale at . Modern iterative methods are based on the Krylov subspace with the conjugate gradient method being the classical method. As exact solutions these methods scale as
, but as iterative methods for approximate solutions the scaling lowers to $N^\frac{3}{2}$. One can do even better with multigrid methods, lowering the scaling to
.
Each of this sequence of methods has a constant in front of the scalin
g, and the constant gets larger as scaling gets better. Nonetheless it is easy to see that if you’re solving a billion unknowns the difference between and
is immense, a billion billion. The difference in constants between the two methods is several thousand. In the long run multigrid wins. One might even do better than multigrid with current research in data analysis producing sublinear algorithms for large-scale data analysis. Another issue is the difficulty of making multigrid work in parallel, as the method is inherently NOT parallel in important parts. Multigrid performance is also not robust and Krylov subspace methods still dominate actual use.
Learn from yesterday, live for today, hope for tomorrow. The important thing is to not stop questioning.
― Albert Einstein
Discretization can provide even great wins. If a problem is amenable to high-order accuracy, a higher order method will unequivocally win over a low-order method. The problem is that most practical problems you can get paid to solve don’t have this property. In almost every case the solution will converge at a first-order accuracy. This is the nature of the world. The knee-jerk response is that this means that high-order methods are not useful. This shows a lack of understanding on what they bring to the table and how they scale. High-order methods produce lower errors than low-order methods even when high-order accuracy cannot be achieved.
As a simple example take a high-order method that delivers half the error of a low order method. To get equivalent results the high-order method would take half the mesh defined by a number of “cell” or “elements” per dimension . If one is interested in time-dependent problems, the number of time steps is usually proportional to
. Hence a one-dimensional problem would require
degrees of freedom. For equivalent accuracy the high-order method would require
cells and one-fourth of the degrees of freedom. It breaks even at four times the cost. In three dimensional time dependent problems, the scaling is
and the break-even point is 16 in cost. This is imminently doable. Even larger improvements in accuracy would provide an even more insurmountable advantage.
The counter-point to these methods is their computational cost and complexity. The second issue is their fragility, which can be recast as their robustness or stability in the face of real problems. Still their performance gains are sufficient to amortize the costs given the vast magnitude of the accuracy gains and effective scaling.
An expert is a person who has made all the mistakes that can be made in a very narrow field.
― Niels Bohr
The last issue to touch upon is the need to make algorithms robust, which is just another word for stable. Work on stability of algorithms is simply not happening these days. Part of the consequence is a lack of progress. For example one way to view the lack of ability of multigrid to dominate numerical linear algebra is its lack of robustness (stability). The same thing holds for high-order discretizations, which are typically not as robust or stable as low order ones. As a result low-order methods dominate scientific computing. For algorithms to prosper work on stability and robustness needs to be part of the recipe.
If we knew what it was we were doing, it would not be called research, would it?
― Albert Einstein
Performance is a monotonically sucking function of time. Our current approach to HPC will not help matters, and effectively ignores the ability of algorithms to make things better. So “do we have the right priorities?” and “is this goal (of computing supremacy) important and worthy of achieving?” The answers are an unqualified NO and a qualified YES. The goal of computing dominance and supremacy is certainly worth achieving, but having the fastest computer will absolutely not get us there. It is neither necessary, nor sufficient for success.
This gets to the issue of priorities directly. Our c
urrent program is so intellectually bankrupt as to be comical, and reflects a starkly superficial thinking that ignores the sort of facts staring them directly in the face such as the evidence of commercial computing. Computing matters because of how it impacts the real world we live it. This means the applications of computing matter most of all. In the approach to computing taken today the applications are taken completely for granted, and reality is a mere afterthought.
Any sufficiently advanced technology is indistinguishable from magic.
― Arthur C. Clarke













We appear to be living in a golden age of progress. I’ve come increasingly to the view that this is false. We are living in an age that is enjoying the fruits of a golden age and following the inertia of a scientific golden age. The forces powering the “progress” we enjoy are not being returned to our future generations. So, what are we going to do when we run out of the gains made by our fore bearers?
Progress is a tremendous bounty to all. We can all benefit from wealth, longer and healthier lives, greater knowledge and general well-being. The forces arrayed against progress are small-minded and petty. For some reason the small-minded and petty interests have swamped forces for good and beneficial efforts. Another way of saying this is the forces of the status quo are working to keep change from happening. The status quo forces are powerful and well-served by keeping things as they are. Income inequality and conservatism are closely related because progress and change favors those who benefit from change. The people at the top favor keeping things just as they are.
Most of the technology that powers today’s world was actually developed a long time ago. Today the technology is simply being brought to “market”. Technology at a commercial level has a very long lead-time. The breakthroughs in science that surrounded the effort fighting the Cold War provide the basis of most of our modern society. Cell phones, computers, cars, planes, etc. are all associated with the science done decades ago. The road to commercial success is long and today’s economic supremacy is based on yesterday’s investments.
plenty there that needs to be done.
t up in trying to justify the funding for the path they are already taking. The damage done to long-term progress is accumulating with each passing year. Our leadership will not put significant resources into things that pay off far into the future (what good will that do them?). We have missed a number of potentially massive breakthroughs chasing progress from computers alone. The lack of perspective and balance in the course for progress shows a stunning lack of knowledge for the history of computing. The entire strategy is remarkably bankrupt philosophically. It is playing to the lowest intellectual denominator. An analogy that does the strategy too much justice would compare this to rating cars solely on the basis of horsepower.
The end product of our current strategy will ultimately starve the World of an avenue for progress. Our children will be those most acutely impacted by our mistakes. Of course we could chart another path that balanced computing emphasis with algorithms, methods and models. Improvements in our grasp of physics and engineering should probably be in the driver’s seat. This would require a significant shift in the focus, but the benefits would be profound.
ncertainty quantification is a hot topic. It is growing in importance and practice, but people should be realistic about it. It is always incomplete. We hope that we have captured the major forms of uncertainty, but the truth is that our assumptions about simulation blind us to some degree. This is the impact of “unknown knowns” the assumptions we make without knowing we are making them. In most cases our uncertainty estimates are held hostage to the tools at our disposal. One way of thinking about this looks at codes as the tools, but the issue is far deeper actually being the basic foundation we base of modeling of reality upon.
One of the really uplifting trends in computational simulations is the focus on uncertainty estimation as part of the solution. This work is serving the demands of decision makers who increasingly depend on simulation. The practice allows simulations to come with a multi-faceted “error” bar. Just like the simulations themselves the uncertainty is going to be imperfect, and typically far more imperfect than the simulations themselves. It is important to recognize the nature of imperfection and incompleteness inherent in uncertainty quantification. The uncertainty itself comes from a number of sources, some interchangeable.
Aleatory: This is uncertainty due to the variability of phenomena. This is the weather. The archetype of variability is turbulence, but also think about the detailed composition of every single device. They are all different in some small degree never mind their history after being built. To some extent aleatory uncertainty is associated with a breakdown of continuum hypothesis and is distinctly scale dependent. As things are simulated at smaller scales different assumptions must be made. Systems will vary at a range of length and time scales, and as scales come into focus their variation must be simulated. One might argue that this is epistemic, in that if we could measure things precisely enough then it could be precisely simulated (given the right equations, constitutive equation and boundary conditions). This point of view is rational and constructive only to a small degree. For many systems of interest chaos reigns and measurements will never be precise enough to matter. By and large this form of uncertainty is simply ignored because simulations can’t provide information.
bar. Too often these errors are ignored, wrongly assumed to be small, or incorrectly estimated. There is no excuse for this today.
gical if the conditions being modeled are known with exceeding precision. The problem is that such precision is virtually impossible for any circumstance. This is the core of the problem with simulating the aleatory uncertainty that so frequently remains untreated. It is almost completely ignored by a host of fundamental assumption in modeling that is inherited by simulations. These assumptions are holding back real progress in a host of fields of major importance.
methods are understood only superficially, and this results in a superficial uncertainty estimate. Often the black box thinking extends to the tool used to get uncertainty too. We then get the result from a superposition of two black boxes. Not a lot light bets shed on reality in the process. Numerical errors are ignored, or simply misdiagnosed. Black box users often simply do a mesh sensitivity study, and assume that small changes under mesh variation are indicative of convergence and small errors. They may or may not be such evidence. Without doing a more formal analysis this sort of conclusion is not justified. If code and problem is not converging, the small changes may be indicative of very large numerical errors or even divergence and a complete lack of control.
Moore’s law isn’t a law, but rather an empirical observation that has held sway for far longer than could have been imagined fifty years ago. In some way shape or form, Moore’s law has provided a powerful narrative for the triumph of computer technology in our modern World. For a while it seemed almost magical in its gift of massive growth in computing power over the scant passage of time. Like all good things, it will come to an end, and soon if not already.


For those of us doing real practical work on computers this program is a disaster. Even doing the same things we do today will be harder and more expensive. It is likely that the practical work will get harder to complete and more difficult to be sure of. Real gains in throughput are likely to be far less than the reported gains in performance attributed to the new computers too. In sum the program will almost certainly be a massive waste of money. The plan is for most of the money going to the hardware and the hardware vendors (should I think corporate welfare?). All of this will be done to squeeze another 7 to 10 years of life out of Moore’s law even though the patient is metaphorically in a coma already.
If someone gives you some data and asks you to fit a function that “models” the data, many of you know the intuitive answer, “least squares”. This is the obvious, simple choice, and perhaps, not surprisingly, not the best answer. How bad this choice may be depends on the situation? One way to do better is to recognize the situations where the solution via least squares may be problematic, and produce an undue influence on the results.
one. If the deviations are large or some of your data might be corrupt (i.e., outliers), the choice of least squares can be catastrophic. The corrupt data may have a completely overwhelming impact on the fit. There are a number of methods for dealing with outliers in least squares, and in my opinion none of them good.
Fortunately there are existing methods that are free from these pathologies. For example the least median deviation fit can deal with corrupt data easily. It naturally excludes outliers from the fit because of a different underlying model. Where least squares are the solution of a minimization problem in the energy or L2 norm, the least median deviation uses the L1 norm. The problem is that the fitting algorithm is inherently nonlinear, and generally not included in most software.
One of the problems is that least squares are virtually knee-jerk in its application. It is contained in standard software such as Microsoft Excel and can be applied with almost no thought. If you have to write your own curve-fitting program by far the simplest approach is to use least squares. It can often produce a linear system of equations to solve where alternatives are invariably nonlinear. The key point is to realize that this convenience has a consequence. If your data reduction is important, it might be a good idea to think about what you ought to do a bit more.
A week ago I received bad news, the review for a paper were back. One might think that getting a review back would be good, but it rarely is. These reviews are too often a horrible soul-crushing experience. In this case I had reports from two reviewers, and one of them delivered the ego thrashing I’ve come to fear.
In total the two reviews were generally consistent on the details of the paper, and the sorts of suggestions for bringing the paper into the condition needed to allow publication. The difference was the tone of the reviews. One of the reviews was completely constructive and detailed in its critique. Each and every critique was offered in a positive light even when the error was pure carelessness.
it could have been much easier. There is nothing wrong with being critical, but the way its done matters a lot.