Why we don't have 128-bit CPUs

We do, depending on how you count it.

There's two major widths in a processor. The data register width and the address bus width, but even that is not the whole story. If you go back to a processor like the 68000, the classic 16-bit processor, it has:

32-bit data registers
16- bit ALU
16-bit data bus
32-bit address registers
24-bit address bus

Some people called it a 16/32 bit processor, but really it was the 16-bit ALU that classified it as 16-bits.

If you look at a Zen 4 core it has:

64-bit data registers
512-bit AVX data registers
6 x 64-bit integer ALUs
4 x 256-bit AVX ALUs
2 x 128-bit data bus to DDR5 (dual edge 64-bit)
~40-bits of addressable physical RAM

So, what do you want to call this processor?

64-bit (integer width), 128-bit (physical data bus width), 256-bit (widest ALU) or 512-bit (widest register width)? Do you want to multiply those numbers up by the number of ALUs in a core? ...by the number of cores on a piece of silicon?

Me, I'd say Zen4 was a 256-bit core, but you could argue any of the above numbers.

Basically, it's a measurement that lost all meaning so people stopped using it.

I would say that you make a decent argument that the ALU has the strongest claim to the “bitness” of a CPU. In that way, we are already beyond 64 bit.

For me though, what really defines a CPU is the software that runs natively. The Zen4 runs software written for the AMD64 family of processors. That is, it runs 64 bit software. This software will not run on the “32 bit” x86 processors that came before it ( like the K5, K6, and original Athlon ). If AMD released the AMD128 instruction set, it would not run on the Zen4 even though it may technically be enough hardware to do so.

The Motorola 68000 only had a 16 but ALU but was able to run the same 32 bit software that ran in later Motorola processors that were truly 32 bit. Software written for the 68000 was essentially still native on processors sold as late as 2014 ( 35 years after the 68000 was released ). This was not some kid of compatibility mode, these processors were still using the same 32 bit ISA.

The Linux kernel that runs on the Zen4 will also run on 64 bit machines made 20 years ago as they also support the amd64 / x86-64 ISA.

Where the article is correct is that there does not seem to be much push to move on from 64 bit software. The Zen4 supports instructions to perform higher-bit operations but they are optional. Most applications do not rely on them, including the operating system. For the most part, the Zen4 runs the same software as the Opteron ( released in 2003 ). The same pre-compiled Linux distro will run on both.
At less than a tenth the size, this is actually a better explanation than the article. Already correcting the fact that we do at the very beginning.
If you absolutely had to put a bit width on the Zen 4, the 2x128 bit data bus is probably the best single measure totaling 256 bit IMO.
- Even then, at what point do you measure it? DDR interface is likely very much narrower than the interfaces between cache levels. Where does the core end and the memory begin?
I gave up trying to figure out what the "bitness" of CPUs were around the time the Atari Jaguar came out and people described it as 64 bit because it had 32 bit graphics chip plus a 32 bit sound chip.

It's been mostly marketing bollocks since forever.
- The Jaguar lied with the truth, and I say this as someone who still owns one.
Not to mention most "8-bit" CPUs had a 16 bit address bus.
- Yes, because 256 memory locations is a bit limiting.
So, you're saying it already goes to '11'?
With AMX, now we have 1024 bit processors!
I'm surprised some marketing genius at the intel/amd hasnt started using the bigger numbers
- I expect the engineers are telling the marketing people "No! You can't do that. You'll scare everyone that it's incompatible."
Very well said. I think you make your point quite effectively!
I see it as the number of possible instructions.

As in, 8 bit 8085 had 2⁸ possible instructions, 32 bit ones had 2³² and already had enough possible combinations that we couldn't come up with enough functions to fill the provided space.

CC BY-NC-SA
- So "instruction encoding length".
  
  I don't think that works though. For something like RISC-V, RV64 has a maximum 32-bit instruction encoding. For x86-64 those original 8-bit intructions still exist, and take up a huge part of the encoding space, cutting the number of n-bit instructions to more like 2^(n-7)

Is this a question?

We haven't even come close to exhausting 64-bit addresses yet. If you think the bit number makes things faster, it's technically the opposite.

It's a link to an article I found interesting. It basically details why we're still using 64-bit CPUs, just as you mentioned.
- Comment OP must never learn anything new. Good find.
We don't even have true 64-bit addressing yet. x86-64 uses only 48 bits of a 64 bit address and 64-bit ARM can use anything between 40 and 52 depending on the specific configuration.
Yeah, 64 bit handles almost all use cases we have. Sometimes we want double the precision (a double) or length (a long), but we can do that without being 128-bit. It's harder to do half. Sure, it'd be slightly faster for some things, but it's not significant.
- And you can get 128-bit data to the CPU, so those things can be fast if we need them to be.
- There's plenty of instructions for processing integers and fp numbers from 8 bits to 512 bits with a single instruction and register. There's been a lot of work in packed math instructions for neural network inference.
Is this a question?

For the people who don't know the answer? Yes.

Not everything you see is intended for your consumption. Let people enjoy learning things.
- I totally agree. I know a teacher who who likes to say:
  
  "I believe there really is no such thing as a dumb question. As long as it's an honest question (not rhetorical or sarcastic), then it's a genuine request for more information. So even if it's coming from a place of extreme ignorance, asking a question is an attempt to learn something, and the effort should be applauded."
Is this a question?

Woah, meta.

Yes, it is.

This is not a question, though.

We used to drive bicycles when we were children. Then we started driving cars. Bicycles have two wheels, cars have four. Eight wheels seems to be the logical next step, why don't we drive eight-wheel vehicles?

Lobbying by the auto corporations obviously. More wheels is more better
- So VM? Actually makes sense.
- Huh, I've been in that train. Sudden, random hit of Nostalgia.
Funny how we are moving back to bicycles, as cars aren’t scalable solution.
- Bus is, though
- But we aren't really.
Some of us drive 18-wheeled vehicles.
- Some of us drive 48-wheeled vehicles.
See here's where this analogy is perfect. Sometimes a bicycle is the best solution, just like how sometimes a microcontroller is the best solution. You use the tool you need for the job, and American product design is creating way too many "smart" products just like how American town planning demands too many cars. Bring back the microcontroller! Bring back the bike!
I mean we do right?

Trains are typically 2 x 4 bogies.

But then high speed rail have fewer wheels due to friction.
https://youtube.com/watch?v=vmRDcXADnbA

32 bit CPU’s having difficulty accessing greater than 4gb of memory was exclusively a windows problem.

You still had a 4GB memory limit for processes, as well as a total memory limit of 64GB. Especially the first one was a problem for Java apps before AMD introduced 64bit extensions and a reason to use Sun servers for that.
- Yeah I acknowledged the shortcomings in a different comment.
  
  It was a duct take solution for sure.
Interesting! Do you have a link to a write up about this? I don’t know anything about the windows memory manager
- Only slightly related, but here's the compiler flag to disable an arbitrary 2GB limit on x86 programs.
  
  Finding the reason for its existence from a credible source isn't as easy, however. If you're fine with an explanation from StackOverflow, you can infer that it's there because some programs treat pointers as signed integers and die horribly when anything above 7FFFFFFF gets returned by the allocator.
- Intel PAE if the answer, but it still came with other issues, so 64 was still the better answer.
  
  Also the entire article comes down to simple math.
  
  Bits is the number of digits.
  
  So like a 4 digit number maxes out at 9999 but an 8 digit number maxes out at 99 999 999
  
  So when you double the number of digits, the max size available is exponential. 10^4 bigger in this case. It just sounds small because you’re showing that the exponent doubles.
  
  10^4 is WAY smaller than 10^8
- It was actually 3gb because operating systems have to reserve parts of the memory address space for other things. It's more difficult for all 32bit operating systems to address above 4gb just most implemented additional complexity much earlier because Linux runs on large servers and stuff. Windows actually had a way to switch over to support it in some versions too. Probably the NT kernels that where also running on servers.
  
  A quick skim of the Wikipedia seems like a good starting point for understanding the old problem.
  
  https://en.m.wikipedia.org/wiki/3_GB_barrier
- https://en.wikipedia.org/wiki/Physical_Address_Extension
I'm not sure what you are talking about. Linux got PAE in 1999. Windows XP got PAE in 2001.
Not really, Raspberry Pi had that same issue with its 32 bit distros.

The comments on this one really surprised me. I thought the kinds of people who hang out on XDA-developers were developers. I assumed that developers had a much better understanding of computer architecture than the people commenting (who of course may not be representative of all readers).

I also get the idea that the writer is being vague not to simplify but because they genuinely don’t know the details, which feels even worse.

I think it’s a D-tier article. I wouldn’t be surprised if it was half gpt. It could have been summarized in a single paragraph, but was clearly being drawn out to make screen real-estate for the ads.
- The majority of articles I come across are exactly like this, needlessly drawing everything out to maximize word count and, thus, maximize ad space.

Because computers have come even close to needing more than 16 exabytes of memory for anything. And how many applications need to do basic mathematical operations on numbers greater than 2^64. Most applications haven't even exceeded the need for 32 bit operations, so really the push to 64bit was primarily to appease more than 4GB of memory without slow workarounds.

I know a google engineer who was saying they're having to update their code bases to handle > 16 exabytes of storage, if you can imagine. But yeah, that's storage, not RAM.
- I would kind of enjoy the trouble of needing to store and owning the place for 16 exabytes...
Tons of computing is done on x86 these days with 256 bit numbers, and even 512-bit numbers.
- Being pedantic, but...
  
  The amd64 ISA doesn't have native 256-bit integer operations, let alone 512-bit. Those numbers you mention are for SIMD instructions, which is just 8x 32-bit integer operations running at the same time.
- You can always combine integer operations in smaller chunks to simulate something that's too big to fit in a register. Python even does this transparently for you, so your integers can be as big as you want.
  
  The fundamental problem that led to requiring 64-bit was when we needed to start addressing more than 4 GB of RAM. It's kind of similar to the problem of the Internet, where 4 billion unique IP addresses falls rather short of what we need. IPv6 has a host of improvements, but the massively improved address space is what gets talked about the most since that's what is desperately needed.
  
  Going back to RAM though, it's sort of interesting that at the lowest levels of accessing memory, it is done in chunks that are larger than 8 bits, and that's been the case for a long time now. CPUs have to provide the illusion that an 8-bit byte is the smallest addressible unit of memory since software would break badly were this not the case, but it's somewhat amusing to me that we still shouldn't really need more than 32 bits to address RAM at the lowest levels even with the 16 GB I have in my laptop right now. I've worked with 32-bit microcontrollers where the byte size is > 8 bits, and yeah, you can have plenty of addressible memory in there if you wanted.

tell that to playstation2 owners

Dreamcast disagrees. https://www.consoledatabase.com/consoleinfo/segadreamcast/ /s

Was that a marketing thing? Because the SH-4 was only 32-bit AFAIK.
- Only thing I can find is that it has 128-bit graphics-oriented floating-point unit delivering 1.4 GFLOPS.
  
  Probably only for marketing reasons. Everyone was desperate not to be worse than N64.
- Yes.
So cool.

We do. Next question.

Why are we not using them in end-user devices
- We are.
  
  Addressing-wise, no we don't have consumer level 128bit CPUs and probably won't ever need them.
  
  Instructions though, SSE had some 128bit ops (OR/XOR, MOVE) and AVX is 128bit vector math. AVX2 is 256bit vector math, AVX512 is- you guessed it 512bit vector math. AltiVec on PPC had 128bit vectors 20 years ago.
- There's no benefit.

John Mashey wrote about this nearly 30 years ago. This Usenet thread is worth a read.

That would be like 6 minutes abs.

That's crazy. You can't do six. It's seven! SEVEN MINUTE ABS!
What's this in reference to?
- There's Something about Mary (1998)

Uh, the PlayStation 2 would like a word?

Not true 128 bit. It has 128 bit SIMD capabilities, but that’s about it. Probably mostly because of marketing reasons to show how much better it is than N64 (which also is “64 bit” for marketing reasons).

In that case, we’re having 512 bit computers now: https://en.wikipedia.org/wiki/AVX-512

so i guess the next bit after 64 cpu is qu-bit, quantum bit

Quantum computers won't displace traditional computers. There's certain niche use-cases for which quantum computers can become wildly faster in the future. But for most calculations we do today, they're just unreliable. So, they'll mostly coexist.
- In other words like GPUs. GPUs suck ass at complex calculations. They however, work great for a large number of easy calculations, which is what is needed for graphics processing.
- Presumably you’d have a QPU in your regular computer, like with other accelerators for graphics etc, or possibly a tiny one for cryptography integrated in the CPU
Probably not in consumer grade products in any foreseeable future.

https://pixelfed.social/p/sedah/710395721014627367

Would it be a downside? Slower? Very costly?

If you made memory access lines twice as wide, they'd take up more space. More space means (a) chips run slower, because it takes time for the electricity to get there (b) they'd be bigger and more expensive.

The main problem with 32-bit, as others have noticed, is that that's not really so much RAM. CPUs do addition and subtraction the way we were taught at school - 'carry the one', they've an overflow bit that's set when your sum doesn't fit in the columns. On 8-bit CPUs, we were always checking back when adding up large numbers. On 64-bit CPUs, we can deal with truly massive numbers anyway, it's not such a hassle. And they're so fast at doing sums anyway and usually waiting for memory, it's barely a hassle.

Moving to 128-bit would give us a truly minuscule, probably unmeasurable, benefit in exchange for significant downsides. We could make them, but it would be pointless.
More complexity with barely any (practical) benefits for consumers.

Okay, so why can't we just not use exponentially growing values? Like 96 bit (64 + 36). I'd the something intrinsic about the size increases that they HAVE to be exponential? Why not linear scaling? 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, etc.

We can, but it's awkward to do so. By having everything work with powers of 2 you don't need to have everything the same size, but can still pack things in memory efficiently.

If your registers were 48bits long, you can use it to store 6 bytes, or 3 short ints, but only one int with 16-bits going unused. If they are powers of two in size, you can always fit smaller things in them with no wasted space.
- A better example is to explain the chaos of having to go to the grocery store and pick up some hot dogs and buns. You know the pain.
Because CPU registers are all powers of 2, i.e. exponential in this fashion. And it's also just the same reason - 64 is high enough, why go to 96 or 80 or something?
In binary, when you add one more numeric place, things double. Not doubling would be like having two digit decimal numbers but only allowing people to count to 50.

Even the newest "64-bit" cpus are really just 48-bit (or 36-bit on low end) or if bleeding edge 56-bit physical adressing processors. This is the maximum amount of virtual memory a process can have access to. You could memory map all your hard disks an still have room to map more physical memory to VMA.