Why do people prefer worse methods?

31 Tuesday Dec 2013

Here is the answer in a nutshell; people use an inappropriate approach and test problems to characterize their methods.

The reason I started doing work in verification was directly related to my work in developing better numerical methods. In the early 90’s I was developing an improved implementation of a volume tracking method (or VOF where an interface is tracked through volumes, and the interface is reconstructed geometrically). The standard implementation of the method resulted in horrific spaghetti code, and made subsequent development and debugging a nightmare. I had developed a better, more object-oriented implementation and was eager to share it. In the course of doing this implementation work, I engaged in extensive testing to put the method through its paces. I felt that the standard tests were insufficiently taxing and provided poor code coverage, so I came up with several alternative test problems. Existing tests involved translating objects on a grid, or engaging in solid body rotation. Difficulty was defined by the complexity of the shape being moved. My new problems used time dependent flowfields that had non-zero vorticity and could be reversed in time allowing exact error assessment. These new problems were a hit and have largely replaced the earlier (too) simple tests.

Now fifteen years on the paper is very highly cited, but mostly because of the better tests, not because of the reason the paper was actually written. I’d say about 75-90% of the citations are primarily for the tests, not the methodology. There is a lesson in this that is positive; people want to test their methods rigorously. For myself, I was a verification devotee thereafter largely because of the experience.

In terms of numerical methods and physics my greatest interest is the solution of hyperbolic PDEs. I find the mathematics and physics of these equations both engaging on multiple levels, and the numerical methods to solve these equations spell-bindingly interesting. It is also an area of numerical methods where verification is not the standard practice; at least for the application of the equations to full nonlinear PDEs that characterize the application of the method. This isn’t entirely fair; for simple linear equations with smooth solutions or nonlinear equations with discontinuities verification is commonplace and expected. Under these limited conditions authors regularly verify their methods and present results in virtually every paper. When a discontinuity forms accuracy in the sense of high-order convergence is lost; solutions are limited to first-order convergence or lower. Under these conditions verification is not done as a matter of course, in fact, it is exceedingly rare to the point of almost being unheard of.

This is extremely unfortunate. Verification is not only about order of accuracy; it is also about estimating the magnitude of numerical error.

The standard approach when smoothness is not present is to plot results at some resolution against an exact solution and show these results graphically. The main reason to show the results is to demonstrate a lack of oscillatory wiggles near the discontinuity. The error itself is easily (almost trivial) computable, but almost never presented. The basis of this reasoning is that all results are basically first-order accurate, and the error isn’t important. Implicit in this judgment is the belief that order of accuracy matter, and by virtue of this numerical error magnitude is unimportant. Unfortunately, this point of view is not entirely correct, and misses a key aspect of method character.

When all solutions are first-order accurate there are actually significant differences in the level of numerical error. Under mesh refinement these differences can become quite significant in terms of the quality of solution. Consider the following example with regard to the quality of solution, for first-order a factor of two increases in mesh resolution results in a halving of numerical error. If instead you changed the numerical method to halve the error on the original grid, the savings in compute time could be large. A halving of the mesh results in the computational work increasing by a factor of four in one dimension (assuming the Courant number is held constant). In two dimensions the increase in effort is a factor of eight and a factor of sixteen in three dimensions. If the more accurate numerical method does not require four or eight or sixteen times the effort, it will be a win. More accurate methods are most computationally intensive, but rarely so much that they aren’t more efficient than lower order methods. The cases where these dynamics play out are the closest to actual applications where the problems are discontinuous and never well behaved mathematically. Indeed it is this character that explains the rapid adoption of the second-order MUSCL (Van Leer differencing for you weapons’ lab folks) over the first-order methods.

The differences are not geometric as they would be with high-order accuracy and smooth solutions, but quite frankly no one really gives a damn about smooth solutions where high-order accuracy is achievable. They are literally only of academic interest. Seriously. So why, isn’t more rigorous error estimation on more difficult problems the status quo? Fundamentally people simply haven’t come to terms with the inevitable loss of accuracy for real problems, and the consequences this has on how methods should be designed and evaluated.

Despite this, the community solving hyperbolic PDEs believes that verification is being routinely done. It is. It is being done on cases that matter little, and only reflect on the order of accuracy of the method in cases no one cares about. For people doing applications work, it makes verification seem like a pitiful and useless activity. My suggestions that verification is done for error characterization and estimation is not considered useful. I looked back through three and a half decades of the literature at results for the venerable Sod’s shock tube problem (en.wikipedia.org/wiki/Sod_shock_tube), and could find no examples of accuracy being quoted with computed results. Sod showed runtime in his 1978 paper. This implies that the accuracy of the results is comparable and the computational cost at fixed resolution only matters. This seems to be an article of faith by the community, and it is wrong!

The fact is that both speed and accuracy matter, and results should be presented thusly.

There are large differences in the accuracy between methods. If one takes the measure of accuracy of computational effort required to achieve constant accuracy one can find a factor of 30 difference in the efficiency of different approaches (http://www.sciencedirect.com/science/article/pii/S0021999107000897). This difference in efficiency is in one dimension with the potential for dramatically larger differences in two or three dimensions. Yet this community almost systematically ignores this issue, and verification of methods for practical problems.

This is a long-winded way of saying that there are a lot of cases where people prefer to use methods that are worse in performance. Part of this preference is driven by the acceptance of the sort of verification practice discussed above. An example would be WENO (http://www.sciencedirect.com/science/article/pii/S0021999196901308) where elegance and the promise of high-order would seem to drive people, despite its low resolution. WENO stands for weighted essentially non-oscillatory, which blends order of accuracy preservation with lack of Gibbs oscillations in an elegant almost algebraic algorithm. Most of the development of WENO since its inception has revolved around accuracy, which is compounded by computing results for these linear problems. The focus of order-of-accuracy results in the method being quite poor at resolving discontinuities with performance at those points of the solution being on par with the minmod second-order method (the worst and most dissipative of the second-order TVD methods!). Instead of improving WENO for practical problems and focusing on its efficiency, the mathematical community has focused on its ability to achieve formally high-order accuracy in situations no one really cares about.

This might explain the lack of penetration of this class of method into practical computations in spite of thousands of citations. It is an utterly absurd state of affairs.

MUSCL (http://en.wikipedia.org/wiki/MUSCL_scheme) is an example of much more successful method, and as I showed with Jeff Greenough a well-written MUSCL code kills a WENO code in actual efficiency (http://www.sciencedirect.com/science/article/pii/S0021999103005965). This was just for one dimension and the gains for MUSCL over WENO in multiple dimensions are as great as expected. It isn’t a surprise that WENO has not been as successful for applications as MUSCL. A change in focus by the community doing WENO development might serve the method well.

PPM (http://crd.lbl.gov/assets/pubs_presos/AMCS/ANAG/A141984.pdf) is much better than either WENO or MUSCL. The gains in efficiency are real (about a factor of two over MUSCL in 1-D). Part of this resolution is due to the difference between a linear and parabolic local representation of the solution, and the parabolic profile’s capacity to represent local extrema. PPM can also exhibit bona fide high-order accuracy through the selection of high-order values for the edge values used in determining the parabola. The bottom line is that the PPM method has some intrinsic flexibility that can be ruthlessly exploited. PPM is widely used by astrophysicists for many applications. Beyond astrophysics it has been less successful and perhaps that should be studied.

WENO is elegant. It is a beautiful method, and I loved coding it up. I would have to say that my WENO code is extremely attractive, just as the mathematics underpinning these methods are appealing. This appealing veneer can be superficial and hides the true performance of the method. It is sort of like a beautiful person with a horrible personality or weak intellect; the attractive package hides the less than attractive interior. This isn’t to say that the method can’t be improved in terms of efficiency. Indeed through taking various lessons learned from MUSCL and PPM and applying them systematically to WENO, the method could be dramatically improved. In doing this some of the properties of WENO for smooth hyperbolic PDEs are undermined, while the properties for discontinuous problems are improved. It could be done quite easily.

Ultimately the community has to decide where it wants impact. Are hyperbolic PDEs important for applications?

Yes, and completely beyond a shadow of doubt.

Why does the published literature act the way it does? How has the community evolved to where the actual performance on more application-relevant testing matters less than beauty and elegance?

I think this whole issue says a great deal about how the applied mathematics community has drifted away from relevance. That is a topic for another day, but keenly associated with everything discussed above.

Thanks for listening, happy New Year.

What makes a computational simulation good?

26 Thursday Dec 2013

Posted by Bill Rider in Uncategorized

≈ 2 Comments

Tags

algorithms, calibration, efficiency, models, Moore's law, supercomputer, validation, verification

A better question is how do we improve computational simulations most effectively? Should we focus more on creating better computations instead of faster ones?

On Christmas morning I unwrapped a new iPhone 5S and got rid of my clunky ole iPhone 4S. Amazingly enough, the Linpac benchmark runs on iPhones with a cool little app (https://itunes.apple.com/us/app/linpack/id380883195 or https://itunes.apple.com/us/app/linpack-benchmark/id390185550?mt=8 or https://play.google.com/store/apps/details?id=com.greenecomputing.linpack&hl=en for the Android). More amazing than that the clunky iPhone 4 clocks in at around 130 Mflops, which coincidently enough is just about the same as the Cray XMP I used as a brand new professional in 1989. Moreover, the XMP I had access to in 1989 was one of the most powerful computers in the World. Yet today I happily chose to recycle my little “XMP” without a second thought. In 1989 that would have been unthinkable, just as unthinkable as holding that computational horsepower in the palm of my hand! The new iPhone 5S is just shy of a Gigaflop, and it mostly plays music, and surfs the Internet rather than computing turbulent fluid flows. What a World we live in!

One of the things that the XMP had was an awesome operating system called CTSS. In some ways it was a horror show with a flat file system, but in other ways it was a wonder. It could create something called a “drop file” that saved a complete state of a code that could be picked up by a debugger. You could change values of variables, or figure out exactly why your calculation had a problem. Of course this power could be misused, but you had the power. Soon Cray replaced CTSS with Unicos their version of Unix, and we had a modern hierarchical file system, but lost the power of the drop file. A lot of computational scientists would enjoy the power of drop files much more than a brand new supercomputer!

What’s the point of this digression? We have focused on supercomputing as the primary vehicle for progress in computational science for the past 25 years, while not putting nearly so much emphasis on how computations are done. Computing power comes without asking for it, and yesterday’s supercomputing power provides gaming today, and today’s supercomputing will power the games of tomorrow. None of this has really changed what we do on supercomputers, and changing what we do on supercomputers has real scientific value and importance.

The truth of the matter is that the most difficult problems in simulation will not be solved through faster computers alone. In areas I know a great deal about this is true; direct numerical simulation of turbulence has not yielded understanding, and the challenge of climate modeling is more dependent upon modeling. Those who claim that a finer mesh will provide clarity have been shown to be overly optimistic. Some characterized stockpile stewardship as being underground nuclear testing in a “box,” but like the other examples depends on greater acuity in modeling, numerical methods and physical theory. Computational simulation is a holistic undertaking dependent upon all the tools available, not simply the computer. Likewise, improvement in this endeavor is dependent on all the constituent tools.

Most of the money flowing into scientific computing is focused on making computations faster through providing faster computers. In my opinion we should be more focused upon improving the calculations themselves. Improving them includes improving algorithms, methods, efficiency, and models not to mention improved practice in conducting and analyzing computations. The standard approach to improving computational capability is the development of faster computers. In fact, developing the fastest computer in the world is a measure of economic and military superiority. The US government has made the development of the fastest computers a research priority with the exascale program gobbling up resources. Is this the best way to improve? I’m fairly sure it isn’t and our over emphasis on speed is extremely suboptimal.

Moore’s law has provided a fifty year glide path for supercomputing to ride. Supercomputers weathered the storm of the initial generation of commodity-based computing development, and continued to provide the exponential growth in computing power. The next ten years represents a significant challenge the nature of supercomputing. Computers are changing dramatically with the fundamental physical limits of current technology hitting limits. To achieve higher performances levels of parallelism need to grow to unpredicted levels. Moreover, existing challenges with computer memory, disc access and communication all introduce additional challenges. The power consumed by computers also poses a difficulty. All of these factors are conspiring to make the development of supercomputing in the next decade an enormous challenge, and by no means a sure thing.

I am going to question the default approach.

The signs pointing to the wastefulness of this approach have been with us for a while. During the last twenty years the actual performance for the bulk of computational simulations has been far below the improvements that Moore’s law would have you believe. Computational power is measured by the Linpac benchmark, which papers over many of the problems in making “real” applications work on computers. It solves a seemingly important problem of inverting a matrix using dense linear algebra. The problem in a nutshell is that dense linear algebra is not terribly important, and makes the computers look a lot better than they actually are. The actual performance as a proportion of the peak Linpac measured performance has been dropping for decades. Many practical applications run at much less than 1% of the quoted peak speed. Everything I mentioned above makes this worse, much worse.

Part of the problem is that many of methods, and algorithms used on computers are not changing or adapting to reflect the optimality of the new hardware. In a lot of cases we simply move old codes onto new computers. The codes run faster, but nowhere as fast as the Linpac benchmark would lead us to believe. The investment in computer hardware isn’t paying off to the degree that people advertise.

Computational modeling is extremely important to modern science. It reflects substantial new capability to the scientific community. Modeling is a reflection of our understanding of a scientific field. If we can model something, we tend to understand that thing much better. Lack of modeling capability usually reflects a gap in our understanding. Better put, computational modeling is important to the progress of science, and its status reflects the degree of understanding that exists in a given field. That said, faster computers do not provide any greater understanding in and of themselves. Period. Faster, more capable computers allow more complex models to be used, and those more complex models may yield better predictions. These complex models can be contemplated with better computers, but their development is not spurred by the availability of supercomputing power. Complex models are the product of physical understanding and algorithmic guile allowing for their solution.

I am going to suggest that there be a greater focus on the development of better models, algorithms and practice instead of vast resources focused on supercomputers. The lack of focus on models, algorithms and practice is limiting the effectiveness of computing far more greatly than the power of the computers. A large part of the issue is the overblown degree of improvement that new supercomputers provide, only a fraction of the reported power. There is a great deal of potential headroom for greater performance with computers already available and plugged in. If we can achieve greater efficiency, we can compute much faster without any focus at all on hardware. Restructuring existing methods or developing new methods with greater accuracy and/or greater data locality and parallelism can gain efficiency. Compilers are another way to improve code and great strides could be made there to the good of any code using computers.

One of the key areas where supercomputing is designed to make a big impact is direct numerical simulation (DNS), or first principles physical simulation. These calculations have endless appetites for computing power, but limited utility in solving real problems. Turbulence, for example, has generally eluded understanding and our knowledge seems to be growing slowly. DNS is often at the heart of the use case for cutting edge computing. Given its ability to provide results, the case for supercomputing is weakened. Perhaps now we ought to focus more greatly on modeling and physical understanding instead of brute force.

Advances in algorithms are another fruitful path for improving results. Algorithmic advances are systematically under-estimated in terms of their impact. Several studies have demonstrated that algorithmic improvements have added as much or more to computational power than Moore’s law. Numerical linear algebra is one area where the case is clear; optimization methods are another. Numerical discretization approaches may be yet another. Taken together the gains from algorithms may dwarf those from pure computing power. Despite this, algorithmic research is conducted as a mere after thought, and more often than not is cut first from a computational science program.

One of the key issues with algorithmic research is the “quantum” nature of the improvements. Rather coming is a steady, predictable stream, like Moore’s law, algorithm improvements are more like a phase transition where the performance jumps up changing my an order of magnitude when a break-through is made. Such breakthroughs are rare and the consequence of many less fruitful research directions. Once the breakthrough is made the efficiency of the method is improved in a small steady stream, but nothing like the original discovery. Many examples of these quantum phase transition type of improvements exist: conjugate gradient, multigrid, flux-limited finite differences, artificial viscosity, Karmakar’s method, and others.

The final area I will touch on is computational practice. This where things like verification and validation come into the picture. Modern computational science ought to be about being honest and straightforward about our capability, and V&V is one of the things at the heart of this. Too often computations are steered into agreement with reality by the heavy hand of calibration. In fact, calibration is almost always necessary in practice, but the magnitude of its impact is far too infrequently measured. Even more importantly, the physical nature of the calibration is not identified. In a crude sense calibration is a picture of our uncertainty. Too often calibration uses one sort of physics to cover up our lack of knowledge of something else. My experience has told me to look at turbulence and mixing physics as the first place for calibration to be identified.

If calibration is the public face of uncertainty, what is the truth? In fact, the truth is hard to find. Many investigations of uncertainty focus upon the lack of knowledge, which is distinctly different than physical uncertainty. Lack of knowledge is often explored via parametric uncertainty of the models used to close the physics. This lack of knowledge studied from parametric uncertainty often does not look like the physical sources of uncertainty, which arise from a lack of knowledge of precise initial conditions that blow up to large scale differences in physical states. These distinctions loom large in many applications such as climate and weather modeling. Unraveling the differences between the two types of uncertainty should be one of computational sciences greatest foci because of its distinct policy implications. It also figures greatly in the determination of the proper placement of future scientific resources.

Calibration is also used to paper over finite computational resolution. Many models need to be retuned (i.e., recalibrated) when computational resolution changes. This effect can easily be measured, but we stick our collective head in the sand. All one has to do is take a calibrated solution and systematically change the resolution. Repeatedly, people respond, “I can’t afford a refined calculation!” Then coarsen the mesh and see how big the changes are. If you can’t do this, you have big problems, and any predictive capability is highly suspect. This sort of estimation should provide a very good idea of how much calibration is impacting your solution. In most big computational studies calibration is important, and unmeasured. It is time to stop this, and come clean. Ending this sort of systematic delusion is far more important than buying bigger, faster computers. In the long run “coming clean” will allow us to improve computational science’s positive impact on society far more than short-term focus on keeping Moore’s law alive.

Computational science isn’t just computers, it is modeling, it is physical theory, it is algorithmic innovation and efficiency, it is mathematics, it is programming languages, programming practice, it is validation against experiments and measurements, it is statistical science, and data analysis. Computer hardware is only one of the things we should focus on, and that focus shouldn’t choke resources away from things that would actually make a bigger difference in the quality. Today it does. A balanced approach would recognize that greater opportunities exist in other aspects of computational science.

At least 13 Things a Systematic V&V Approach Will Do For You and Your Simulation

20 Friday Dec 2013

Posted by Bill Rider in Uncategorized

≈ Leave a comment

One of the big problems that the entire V&V enterprise has is the sense of imposition on others. Every simulation worth discussing does “V&V” at some level, and almost without exception they have weaknesses. Doing V&V “right” or “well” is not easy or simple. Usually, the proper conduct of V&V will expose numerous problems with a code, and/or simulation. It’s kind of like exposing yourself to an annual physical; its good for you, but you might have to face some unpleasant realities. In addition, the activity of V&V is quite broad and something almost always slips between the cracks (or chasms in many cases).

To deal with this breadth, the V&V community has developed some frameworks to hold all the details together. Sometimes these frameworks are approached as prescriptions for all the things you must do. Instead I’ll suggest that these frameworks are not recipes, nor should they be thought of as prescriptions. They are “thou should,” not “thou shalt,” or even “you might.“

Several frameworks exist today and none of them is fit for all purposes, but all of them are instructive on the full range of activities that should be at least considered, if not engaged in.

CSAU – Code Scaling Assessment and Uncertainty developed by the Nuclear Regulatory Committee to manage the quality of analyses done for power plant accidents. It is principally applied to thermal-fluid (i.e. thermal-hydraulic) phenomena that could potentially threaten the ability of nuclear fuel to contain radioactive products. This process led the way, but has failed in many respects to keep up to date. Nonetheless it includes processes and perspectives that have not been fully replicated in subsequent work. PCMM is attempting to utilize these lessons in improving its completeness.

PCMM – Predictive Capability Maturity Model developed at Sandia National Laboratories for the stockpile stewardship program in the last 10 years. As such it reflects the goals and objectives of this program and Sandia’s particular mission space. It was inspired by the CMMI developed by Carnegie Mellon University to measure software process maturity. PCMM was Sandia’s response to calls for greater attention to detail in defining the computational input into quantitative margins and uncertainty (QMU), the process for nuclear weapons’ certification completed annually.

CAS – Credibility Assessment Scale developed by NASA. They created a similar framework to PCMM for simulation quality in the wake of the shuttle accidents and specifically after Columbia where simulation quality played an unfortunate role. In the process that unfolded with that accident, the practices and approach to modeling and simulation was found to be unsatisfactory. The NASA approach has been adopted by the agency, but does not seem to be enforced. This is a clear problem and potentially important lesson. There is a difference between an enforced standard (i.e., CSAU) and one that comes across as well intentioned, but powerless directives. Analysis should be done with substantial rigor when lives are on the line. Ironically, formally demanding this rigor may not be the most productive way to achieve this end.

PMI, Predictive Maturity Index developed at Los Alamos. This framework is substantially more focused upon validation and uncertainty, and IMHO it gives a bit lax with respect to the code’s software and numerical issues. In my view, these aspects are necessary to focus upon given advances in the past 25 years since CSAU came into us in the nuclear industry.

Computational simulations are increasingly used in our modern society to replace some degree of expensive or dangerous experiments and tests. Computational fluid and solid mechanics are ever more commonplace in modern engineering practice. The challenge of climate change may be another avenue where simulation quality is scrutinized and could benefit from a structured, disciplined approach to quality. Ultimately, these frameworks serve the role of providing greater confidence (faith) in the simulation results and their place in decision-making. Climate modeling is a place where simulation and modeling plays a large role, and the decisions being made are huge.

The question lingers in the mind, “what can these frameworks do for me?” My answer follows:

V&V and UQ are both deep fields with numerous deep subfields. Keeping all of this straight is a massive undertaking beyond the capacity of most professional scientists or engineers.
Everyone will default to focusing on where they are strong and comfortable, or interested. For some people it is mesh generation, for others it is modeling, and for yet others it is analysis of results. Such deep focus may not lead (or is not likely to lead) to the right sort of quality. Where quality is needed is dependent upon the problem itself and how the problem’s solution is used.
These are useful outlines for all of the activities that a modeling and simulation project might consider. Project planning can use the frameworks to develop objectives and subtasks, prioritize and review.
These are menus of all the sort of things you might do, not all the things you must do.
They provide a sequenced set of activities, prepared in a sequenced rational manner with an eye toward what the modeling and simulation is used for.
They help keep your activities in balance. They will help keep you honest.
You will understand what is fit for purpose, when you have put too much effort into a single aspect of quality.
V&V and UQ are developing quickly and the frameworks provide a “cheat sheet” for all of the different aspects.
The frameworks flexibility is key, not every application necessarily should focus on every quality aspect, or apply every quality approach in equal measure.

10. Validation itself is incredibly hard in both breadth and depth. It should be engaged in a structured, thoughtful manner with a strong focus on the end application. Validation is easy to do poorly.

11. The computational science community largely ignores verification of code and calculations. Even when it is done, it is usually done poorly.

12. Error estimation and uncertainty too rarely include the impact of numerical error, and estimate uncertainty primarily through parametric changes in models.

13. Numerical error is usually much larger than acknowledged. Lots of calibration is actually accounting for the numerical error, or providing numerical stability rather than physical modeling.

Referee or Coach? What is the right model for peer review?

13 Friday Dec 2013

Posted by Bill Rider in Uncategorized

≈ Leave a comment

Tags

anonymous, coach, open, peer review, Referee

The process of peer review is viewed as one of the cornerstones of our research system. The approach to peer review is often cast as an adversarial interaction where the peer’s job is to critique the research, and in the process vet it for suitability for publication. The most common form of this peer interaction is the refereeing of research papers for journals (and conferences).

Very few scientists can’t share a few war stories about their experiences with peer review, and most of us have been on both the receiving and the giving end of a contentious exchange. It is almost a form of professional hazing (or it IS a form of this). Almost everyone can relate to the moment of dread upon opening the report from the referee. What outlandish things will they demand? Will they engage in ad hominem attacks? What if it is an outright rejection? I often left thinking, is this really the best system? It is pretty clear that it’s not.

On the other hand, research without peer review is viewed dimly as it should be. Numerous times the referees report and my response to it made for a better paper. A few times the referee has succeeded in making me publish a worse paper. Seriously. I’ve had referees defend backwards and ignorant practices and demand they continue. A better perspective on the topic is whether the current form is actually the most beneficial form for this practice? I think we might do well to rethink our model.

I’ll make a modest suggestion for a huge problem; we should strive to engage more as “coaches” instead of as “referees”.

Sports analogies are often useful to may people, or at the very least they are useful to me. Sports have four categories of people involved: players, coaches, referees and fans. The players are the source of the action; coaches in training, planning and orchestrating the action and referees make sure the rules are followed. Fans are primarily interested in the outcomes and only influence the outcome indirectly (or as much as they are allowed to). Fans are basically like the (interested) public, and while this isn’t my focus today, although it might be an analogy worth developing. I’m going to apply this analogy to work in science and how different people can or should adopt roles that parallel the actors in sports.

Referees are given authority; their authority is sanctioned. This is true for a game and this is true for a peer review. Despite this, they are generally reviled. What happens when you try to referee without sanctioned authority? Nothing good, that’s what. Even with the authority granted the act of refereeing is met with some degree of distain.

At work, I spend a lot of time worrying about the quality of other people’s work. In fact I’m paid to do this, and sometimes have to do this professionally in reviewing papers or projects. This seems to be the lot of someone who does V&V. What is the right way to do this? It is easy for this sort of worrying-based activity to become horribly adversarial.

After all, I know the right way to do things. Of course I don’t, but the referee in me acts this way all the time. This sort of interaction is bound to go south.

In this vein I’ve probably tended to act more like a referee in a game. Everyone knows that players, coaches and fans generally despise referees. Referees are not only hated, they are conned all the time. Players and coaches love to get away with cheating the referee, and expect to get away with it. This gets really ugly fast and I’ve come to realize this isn’t the best way to approach improving quality, which ought to be my main objective. Even in the situation the sports referee is encouraged to engage in “social engineering” instead of hard-nosed policing. Often you are negotiating and urging the players into behaving themselves. This is much more effective than calling foul after foul because the game rapidly turns into a farce. On the other hand there is fuzzy line that can’t be crossed where you must assert authority. This line once crossed often results in a red card being shown. Whenever I’ve had to do this it’s filled me with regret. “What could I have done differently to prevent this?”

Of course, referees also appear prominently in science as those anonymous people who review manuscripts. Referees are viewed as being utterly essential to the entire system of the peer-reviewed literature. They are also those who worry about quality, and are often despised. Almost everyone who has published a paper has had to deal with an asshole referee. I certainty have. I might also honestly assess my own behavior as a referee; I’ve probably been an asshole too. Hiding behind the cloak of anonymity has much to do with this; although the editors of journals know who the referees are and they have to tolerate the asshole behavior to some degree for it to persist. It could be that some editors have a different threshold for what constitutes unreasonable or unprofessional conducts. Of course some people are just assholes and enjoy being ones.

Of course, the game refereeing is a public, open activity and here the analogy gets a bit thin. As poorly as refereeing goes in under the public eye, an anonymous referee would be worse. The outright rejection of a paper is the proverbial red card, and its handed out anonymously. I think this aspect of the system is ultimately harmful. It isn’t that some work shouldn’t be rejected, but rather it shouldn’t be rejected behind a veil. More than this, do we really give the rejected work a chance? Or the feedback it needs to succeed? More often than not, the answer is no, and the system itself aids in the process.

Given this and lots of scandals appearing in the public eye one might reasonably ask whether the current system of peer review is working. Most of the embarrassing cases are not in fields remotely related to computational science and physics, but it is probably only a matter of time till something bubbles up. The flip side to the horrible, asshole referee is the crony referee. The sports analogy is the corrupt referee who fixes the game. In the academic literature this ought to be more clearly reviled. Again, the specter of anonymity raises its ugly head to assist the problem’s prominence. Again, I might ask myself the question of whether I’m even slightly guilty of this sin? The key point is that the nasty or the crony referees do not serve science well and their influence is hurting research. This isn’t to say that most of the reviews aren’t well intentioned and professional. Most are, but enough aren’t that it might be worth considering a different path. This could be coupled to other aspects of the open-source, open-publishing movement, which is also coupled to reproducibility another big trend in publishing and peer review.

A more positive thing to think about is whether the current system is beneficial, or as beneficial as it could be. The referees and editors are the gatekeepers, who can allow certain things to be published, and other things not to be published (in a given journal). For the most part they do not work with the authors to improve the work. A referee’s report usually only contains a small number of the items that might be improved. The system limits the amount of feedback the referee might give. Too much feedback is a sign that the paper should be rejected.

I’ve seen cases where an earnest review on my part that was too detailed resulted in the editor rejecting the paper. It was the exact opposite of what I wanted! I ended regretting not writing a more effusive statement of the value and worth of the paper despite its flaws or necessary (in my opinion!) improvements. The fact is that if I want to reject a paper I will not write a long review. I will write why it isn’t “up to scratch” and what is valuable and worth focusing on when resubmitting. A long review means, “I’m interested”.

The system is geared toward an adversarial relationship rather than a collaborative one. The general adversarial approach permeates science and greatly undermines the collective ability of the community to solve deep problems. We should develop a peer review process that confers positive properties onto this essential interaction. Negative behaviors are deeply empowered by the anonymity of the interaction. If the referee was known to the authors and a collaborative model were established, the behavior of the referees would be better, and the spirit of collaboration could be established and used to improve the work. Maybe an email exchange or phone call could be used to work out the details of how to improve a paper instead of the horribly inefficient system of review & revision. Communication could take place in hours or days that now takes weeks or months.

This is where the analogy of “coach” comes in (teacher or tutor works too). The referee could be recast as a coach. A coach for the different subfields the paper covers. For example, I could be assigned as a “verification” or “numerical methods” coach for a paper. As a coach I would suggest things to do in order for the work to improve. Perhaps I might be given a minor, lesser stake in the paper. In other words, the paper would still have authors, but the “coaches” would also be identified in a manner more significant than an acknowledgement, but less significant than authors. This role would have to have value professionally for the idea to work. Maybe this could lead to a better, more collaborative community for publishing. A formal role would assist the reviewers in being more energetic and focused in their reviews. They would become a stakeholder in the paper, not simply an anonymous servant. Ultimately a positive interaction with a “coach” might lead to a more substantial interaction later.

Coaches operate in many ways, but their objectives are similar to win (i.e., succeed) and often winning is related to teaching. Great coaches are very often great teachers. The best coaches not only work toward the success of their team (or players), but also to develop their teams of players to their potential. Coaches work in numerous ways to achieve these ends through drills, feedback, and challenges. Each of these approaches develops something greater than what the team or players can do with themselves. As such, the coach is a model for developing an improved capability. This is basically another way of stating the objectives of the model for replacing referees with something more positive and constructive. I am saying that we should think more like a coach and less like a referee.

There is a need for some authority to be exercised. There is a reason we have referees, and they are necessary. Referees are adopted when the stakes are high and the outcome is meaningful. Should we be viewing all of research as a high stakes “game”? Or would approaching most research as a practice where we want to improve and saving the game approach to when things really matter better serve us. This certainty should be applied to the educational setting where the current system does not maximize the training that young professionals receive. Perhaps something more balanced can be used, where we have both coaches and referees being assigned to work with a given research. The key for effective refereeing in fairness and impartiality, along with competence something we clearly don’t have enough of today.

I think the same model should be adopted in looking at simulation quality in V&V and UQ work. Those of us with such expertise should act more like coaches and less like referees. This way we can work constructively to make computations better, we become part of the team instead of part of the “other” team, or the enemy in the middle.

Postscript on Trust (or Trust and Inequality)

06 Friday Dec 2013

Posted by Bill Rider in Uncategorized

≈ 1 Comment

The thoughts associuated with last week’s post continued to haunt me. It hung over my thoughts as I grappled with the whole concept of thankfulness for last week’s holiday. I know implicitly that I have a tremendous amont to be thankful for at both a personal and professional level. By all but the most generous standards, I am very lucky, certainly far better off than the vast, vast majority of humanity. Nonetheless, it is hard to shake the sense of missed opportunity and squandered potential. The USA is not a healthy country right now, and that fills me with a sadness.

With a system that is so dysfunctional most of what we can be thankful for is close to home. Things we used to take for granted, like freedom, and that your children had a better chance at a good life than you do are not secure any more. We have transformed into a society where corporate rights exceed personal or human rights, and there is little trust in any institution public or private. I wrote about the lack of trust in science last week and how it makes a scientists work both overly expensive and underly effective. It is a tragic situation.

Of course the Internet has a lot to do with it. Or at the very least the Internet has amplified many of the trends that were already apparent preceding its rise to societal prominence. In a sense the Internet is a giant communication magnifier, but also allows anonymous communication at a scale unimaginable. I am reminded of what Clay Shirky noted that we may be in the midst of huge reorganization of society similar to what happened after Gutenberg invented the printing press. Its a big deal, but not much fun to be part of. Anonymity can be a horrible thing if misused. We are seeing this in spades. You really can have too much information, or have it too unfiltered. Worse yet than a filter is the propaganda that spins almost everything we read toward some world-view. The money and talent is largely in the corporate camp, and we are being sold ideas they favor with a marketers talent for shaping our views.

Part of my inability to shake these ideas came in the form of a “longread” recommendation about the 40 year decline in the prospects of the American worker (http://prospect.org/article/40-year-slump) starting in 1974. After massive growth in equality in the years following World War II, it ended in 1974, and began the steady march toward today’s perverse levels of inequality. This was the same year as I had posited the beginning of the decline in trust for public institutions such as science. While some institutions such as science, research and government have lost the public’s trust, the corporation has become the centerpiece for society.

Stockholder return has become the benchmark for success. This has become a stark contrast to the actual health or long-term prospects for a company. Likewise, the government-funded research has become starkly risk adverse and commenserately short term focused. Society as a whole will suffer the effects of these twin pathologies. Neither public, nor private interests are investing in the future. Our infrastructure crumbles and no one even thinks of creating a 21^st Century infrastructure because we can’t keep our 20^th Century infrastructure healthy. R&D has become all development, and has to be attached to the bottom line preferably the next quarter. This progress adverse corporate principle is adopted whole-cloth by the government because of a faith-driven belief that the “market knows best” and business practice is inherently superior to other approaches. One might consider that such research would be application-mission oriented, but no it simply risk adverse and pursued because “success,” such as it is defined, is almost a sure thing.

In the process, the “1%” has risen to dizzying heights of wealth, largely through the decision making process where corporations invest far less capital in their futures and instead to pay massive dividends to their stockholders. The implications for science, and society are far reaching. Is there a connection? I think it is all part of the same thing. Any science that makes life difficult or complicated for a corporation is attacked because the profit motive and margin has become the sole measure of societal success. The human toll is massive and our society shows no mercy or thought for the suffering all this wealth accumulation is unleashing.

I am reminded about the problems of correlation being linked to causation, but the seeming coincidence in timing might be more meaningful. How are these two issues connected? I’m not sure, but I do think there is one. Perhaps science being an endeavour for the common good has lost favor because its benefits can be shared by all rather than a few. Everything that produces a shared benefit seems to be in decline and the entire society is being structured to serve the very top of the food chain. The wealthy have become apex predators in the human ecosystem, and seem have no qualm about “feeding” on the ever-expanding masses in poverty. The loss of trust powers the problem because of the unique nature of the American phyche and its devotion to the individual above all. Now we are transitioning to a view where the devotion is to the corporation (but really the rich who profit from the corporate “success”).

So these thems stretch across the expanse of years that define nearly my entire life, the mystery of how to leave something better for my children is unsolved. This is what really haunts me, and why I’m not as thankful as I would like to be.

The Regularized Singularity

~ The Eyes of a citizen; the voice of the silent

Monthly Archives: December 2013

Why do people prefer worse methods?

What makes a computational simulation good?

At least 13 Things a Systematic V&V Approach Will Do For You and Your Simulation

Referee or Coach? What is the right model for peer review?

Postscript on Trust (or Trust and Inequality)