machine-learning | The Regularized Singularity

Tags

ai, artificial-intelligence, machine-learning, tech, technology

tl;dr

When I examine the arc of my career I see a repeated over-emphasis on focusing on computers instead of balanced progress. The first epoch happened in the mid-1990’s where buying fast computers was sold to replace nuclear testing. We are in the midst of a longer period of only focusing on computers (the Exascale project). This is a simple narrative and sells. It is also wrong. Progress depends on support for broader activities. The result is a diminished level of progress. We are witnessing this again with AI via DeepSeek’s revelations this week.

“Don’t let the noise of others’ opinions drown out your own inner voice.” ― Steve Jobs

Progress is not Guaranteed

I’ve devoted my life to science and its progress. The promise of doing exactly this was the reason I joined Los Alamos back in 1989. Their track record in science was stellar, powered by a pantheon of science superstars. I was honored and lucky to join them. At that time Los Alamos still had a marvelous spirit of discovery. I benefited from a fantastic sense of generosity from my fellow scientists who shared their knowledge with me. It made me who I am, and shaped my career. The other thing that has been a centerpiece of my career there and later is stockpile stewardship. This program is the care and understanding of the nuclear weapons stockpile using science and engineering. A big part of this program is the use of modeling and simulation as a tool so that testing the nukes is unnecessary.

A big part of modeling and simulation is computers. The truth is that the bigger the computer the better. Well, not necessarily big, but faster computers are “always” better. There is a caveat to the “always” that needs deep consideration. Faster and bigger is always better comes with caveats and those conditions are subtle and complex. Being subtle and complex they are ignored. Ignored at our peril. Even science has succumbed to the superficial nature of today’s society. In a nutshell, many technical fundamentals need to be in place for the bigger and faster is better to hold. Today those fundamentals are at risk. The risk comes from them being ignored with astounding regularity.

To review, one of the key aspects of stockpile stewardship when it was initiated in the mid-1990s was simulating nuclear weapons. The original approach was basically to put computer codes on the fastest computers in the World. The program happened exactly at the time that the basic approach to high-performance computing changed from Cray vector computers to massively parallel computing. This meant rewriting the codes to use this new type of computer. The process of replacing the old codes (deemed legacy codes) was difficult. This was because the legacy codes were used to design and analyze weapons during the test era. So the legacy codes were subjected to repeated testing against difficult experiments. Thus legacy codes were trusted and ably used. Therefore they were held onto and revered almost as sacred. It took more than a decade to replace them and it was a mighty struggle.

The program did not explicitly want to produce better codes with better methods or algorithms or physics. Nonetheless, some of this happened because methods, algorithms, and physics make a big difference in modeling quality. This almost happened in a subrosa fashion as modernized codes only really meant codes running on modern computers.

Simple Narratives Win

“Physics is to math what sex is to masturbation.” ― Richard Feynman

The reason for this is easy to see. A faster computer is obviously better than a slower computer. The speed makes for an easy narrative about improvement. We live in an age where simple narratives rule. The public and politicians alike seem to recoil from complex and subtle explanations or solutions. In the wake of this trend, we see a loss of effectiveness and a massive waste of resources. The constant din of a simple solution to progress is reliance on computing power to solely carry progress. This did not make sense in the era when Moore’s law was in effect. It makes even less sense with Moore’s law being dead.

Moore’s law was an empirical law about the growth of power in computing over a long period. It was first observed by Gordon Moore in 1965 and held until around 2015. It was a powerful exponential law that had computer power doubling every 18 months to 2 years. If one does the math over that 50-year period (a factor of about a million). This yields phenomenal speed-ups in computing. Physical limits of computing hardware basically led to the slowing down and end of Moore’s law. Computers are speeding up, but much more slowly now. The response to the government funding was to then focus on computing, which is mind-blowing. The money was applied to try to bring the dead patient to life. This produced the National Exascale program with its focus on computing hardware.

Why is this so dumb?

Over the long history of computational science other advances have been as beneficial as computing power. In a nutshell, algorithmic advances have led to more improvements than computing. In the modern era, the focus and support for algorithmic advances have slowed to a trickle. Even in the early years the algorithmic advances received less support but had an equal or greater impact on computing capability. Nonetheless, the support never reflected the value of the approach. The reasons will be explored next.

“Creativity is intelligence having fun.” ― Albert Einstein

Algorithmic advances never created the sort of steady improvement of Moore’s law. They tend to be episodic and unpredictable. Modern project management is not suitable for such things. Progress is often fallow for long periods with a sudden advance. In a low-trust environment, this is unacceptable. Algorithmic research is extremely risky. Again, the risk is something that low trust annihilates. The long-term impact of the failure to invest in algorithmic work is a profound and massive reduction in computing benefits. I would argue that we have lost orders of magnitude of computing ability through a lack of investment in algorithms alone.

Reality is Complex

As I was writing the news broke of DeepSeek, the Chinese AI Chatbot that shattered the narratives around LLMs. The assumptions that American companies and the government had about advancing AI were overturned.

Why did this happen?

Our actions as a nation forced the Chinese to adapt and innovate. American companies were following the path of brute force. This was exactly like the computational science’s exascale program. The warning signs have been brewing for a while. The LLMs are running out of data to scale. We are running out of computer power too. The energy demands were huge and excessive. We are starting to see AI as a threat to the environment. The training of LLM models is inefficient and very expensive. In a nutshell, the brute force approach was about to collapse as a source of progress. This was predictable.

This problem was ripe for disruption and a bolt from the blue. DeepSeek looks like that bolt.

I will return to this. The narrative applies to our approach to computational science more broadly. We have an overreliance on brute force while discounting and ignoring the power of innovation and efficiency. We can all see the lessons embedded in the DeepSeek episode unfolding have been present and obvious for years. Obvious does not mean that they are acknowledged. Obvious does not compel our leaders to action. The overreliance on computing power is a simple narrative that is hard to dislodge unless is smacks us in the face.

Here is the truth and a lesson worth holding close. Reality is a real motherfucker. Reality is complex and dangerous. Reality will eventually win every battle. Reality is undefeated. Reality will fuck you up. This is the maxim of “fuck around and find out.”

There are many paths to progress. As exemplified by the DeepSeek example when we are denied the obvious path, people innovate. In computational science in the USA, we have made computer power the obvious choice. At the same time, other paths are ignored and systematically divested from. In some cases, paths to progress are explicitly removed from possibility. This looks like a doubling down on the focus on the approach being taken even as evidence piles up that it’s stupid. The worst part of this is the outright ignorance and avoidance of learning from the past. We ought to know better because the evidence is overwhelming.

Balance and Opportunity

“The formulation of the problem is often more essential than its solution, which may be merely a matter of mathematical or experimental skill.” ― Albert Einstein

Over the long time computational science has progressed on many fronts. There is no doubt that raw computing power is part of the reason. Computers are tangible and obvious signposts to progress. Various eras of computational science are clearly marked by the computers used to do the science. Early computers are far different than vector crays, or massively parallel computers. Today’s massive GPU computers and data centers are emblematic of today. Models, algorithms, and computer codes are far more abstract and less obvious to the casual observer. Nonetheless, the abstract aspects of progress are essential, and perhaps more important.

The models produced by the codes are solved using algorithms that harness computer power. The computers are useless without them. A bad algorithm assures that the computer itself is used ineffectively and wastes time and energy. Almost the entire utility of modeling and simulation is bound to modeling, thus its importance. Computer code has become an important part of modern life with an entire discipline devoted to it. At least the code is somewhat paid attention to.

The problem with models and algorithms is twofold. Above I focused upon the abstract nature of them as a problem. Being abstract they are difficult to understand. They have a second more difficult issue surrounding their progress. This is the episodic nature of progress. Both models and algorithms require difficult theoretical work highly prone to failure for progress. Often improvements are many years apart with extensive failure. At the same time when modeling or algorithms do improve, the leap in performance is large and essentially discontinuous. It looks like a quantum leap as opposed to the incremental steady climb of Moore’s law. Risk-averse program managers wanting predictable outcomes recoil from this. As a result, the work in this area is not favored. Years of failure are punished rather than seen as laying the ground for glorious success. All of this equals the choking off of progress in these areas.

The damage to the potential progress is massive. Rather than seeing a balanced approach to progress, we put all our effort into incremental computing growth. The equal or greater source of progress is ignored because we don’t know how to manage it. Computer codes move along being adapted to new computers, but encoding old models and algorithms. These new codes would nominally be perfect vehicles for introducing new algorithms and models. More often than not the codes simply move along reimplementing old models and algorithms. In many cases, we simply get the same wrong answers with poor efficiency for a greater cost. This is nothing short of a tragedy.

On occasion, we get a peek at these things. The example of DeepSeek is one such view and it was a shock. Suddenly we saw that everything we thought and had been told about LLMs was suspect. The reason for this is the acceptance of the narrative that the quality of LLMs is built on massive data and computing. The breakthrough we saw a couple years ago was powered by an algorithm (ChatGPT and LLMs were enabled by the “Transformer” algorithm). After this, we were lulled into just seeing it deriving from raw computational power. Plus it was great for NVIDIA stock and our 401Ks. It did not spur and investment into what actually drove the progress.

The algorithm was the actual “secret sauce.”

“The measure of intelligence is the ability to change.” ― Albert Einstein

Why are we such idiots? Why do we make the same mistakes over and over? Seeing the rise of computing focus while everything else fades. We learn nothing from the past. Reality is coming for us again.

“We can only see a short distance ahead, but we can see plenty there that needs to be done.” ― Alan Turing

The Regularized Singularity

~ The Eyes of a citizen; the voice of the silent

Tag Archives: machine-learning

Watching Regression of Progress Live