Bit manipulation instructions are part and parcel of any curriculum that teaches CPU architecture. They are the basic building blocks for many more complex instructions.
https://five-embeddev.com/riscv-bitmanip/1.0.0/bitmanip.html
I can see quite a few items on that list that imnsho should have been included in the core and for the life of me I can't see the rationale behind leaving them out. Even the most basic 8 bit CPU had various shifts and rolls baked in.
This is the reason behind the profiles like RVA23 which include bitmanip, vector and a large number of other extensions. Real chips coming very soon will all be RVA23.
Neat. I can't wait to get my hands on a devboard.
The earlierst I know of coming is the SpaceMit K3, which Sipeed will have dev boards for.
The Milk-V Jupiter 2 (coming out in April) is RV23 too
Nice board but very low on max RAM.
The Milk-V Titan (https://milkv.io/titan) can take up to 64GB which is fine considering the number of cores and the cost of RAM. If you needed and could afford more RAM you'd be better off distributing the work across more than one board.
I simply want to replace my desktop with open hardware. That board would be fine, thank you for the pointer.
Unfortunately they found a bug and had to redesign the boards. I've had one of these on pre-order since last year. Latest is I think they're intending to ship them next month (April).
The SpacemiT K3 (https://www.spacemit.com/products/keystone/k3 https://www.cnx-software.com/2026/01/23/spacemit-k3-16-core-...) is the one everyone is waiting for. We have one in house (as usual, cannot discuss benchmarks, but it's good). Unfortunately I don't think there is anyone reputable offering pre-orders yet.
Ok! I will keep an eye out. It is one of the most interesting developments for me hardware wise in the last decade, and I definitely want to show my support by buying one or more of the boards. Respin is always really annoying this late in, the post mortem on that must make for interesting reading.
You're super lucky to have your hands on one!
32-bit barrel shifters consume significant area and RISC-V was developed to support resource constrained low cost embedded hardware in a minimal ISA implementation.
The 32-bit ARM architecture included a barrel shifter as part of its basic design, as in every instruction had a shift field.
If a CPU built in 1985 with a grand total of 26 000 transistors could afford it, I am pretty sure that anything built in this century could afford it too.
26k is a lot of transistors for an embedded MCU.
You'd be excluding many small CPUs which exist within other chips running very specialized code.
As profiles mandate these instructions anyway, there's no good reason to complicate the most basic RISC-V possible.
RISC-V is the ISA for everything, from the smallest such CPUs to supercomputers.
What MCUs are you thinking of?
To the best of my knowledge (and Google-fu), 26K really isn't a lot of transistors for an embedded MCU - at least not a fully-featured 32-bit one comparable to a minimal RISC-V core. An ARM Cortex M0, which is pretty much the smallest thing out there, is around 10K gates => around 40K transistors. This is also around the same size as a minimal RISC-V core AFAICT.
The ARM core has a shifter, though.
There's reason RV32E and RV64E, with half the registers, are a thing. RV32I/RV64I isn't small enough.
There are many chips in the market that do embed 8051s for janitorial tasks, because it is small and not legally encumbered. Some chips have several non-exposed tiny embedded CPUs within.
RISC-V is replacing many of these, bringing modern tooling. There's even open source designs like SERV that fit in a corner of an already small FPGA, leaving room for other purposes.
Per https://en.wikipedia.org/wiki/Transistor_count, even an 8051 has 50K transistors, which reinforces my claim that 26K really doesn't seem like a big ask for an MCU core. Whether that means a barrel shifter is worth it or not is a totally orthogonal question, of course.
(Although I do have to eat my words here - I didn't check that Wikipedia page, and it does actually list a ~6K RISC-V core! It's an experimental academic prototype "made from a two-dimensional material [...] crafted from molybdenum disulfide"; I don't know if that construction might allow for a more efficient transistor count and it's totally impractical - 1KHz clock speed, 1-bit ALU, etc. - for almost any purpose, but it is technically a RISC-V implementation significantly smaller than 26K)
I don't know if that construction might allow for a more efficient transistor count and it's totally impractical - 1KHz clock speed, 1-bit ALU, etc. - for almost any purpose, but it is technically a RISC-V implementation significantly smaller than 26K
That sounds like a microcoded RISC-V implementation, which can really be done for any ISA at the extreme expense of speed.
If I'm not mistaken, microcode is a thing at least on Intel CPU's, and that is how they patched Spectre, Meltdown and other vulnerabilities – Intel released a microcode update that BIOS applies at the cold start and hot patches the CPU.
Maybe other CPU's have it as well, though I do not have enough information on that.
> There's reason RV32E and RV64E, with half the registers, are a thing. RV32I/RV64I isn't small enough.
This is actually kind of counter to your point. The really tiny micro-controllers from the 80s only had 224 bits of registers. RV32E is at least twice that (16 registers*32 bits), and modern mcus generally use 2-4kbs of sram, so the overhead of a 32 bit barrel shifter is pretty minimal.
IIUC this is a lot less true in the modern era. Even with 24nm transistors (the cheapest transistor last time I checked), modern microcontrollers have a fairly big transistor budget for the core (since 80+% of the transistors are going to sram anyway).
It was the case even 15 years ago when Cortex M0/M3 really started to get traction, that the processor area of ARM cores was small enough to not make a difference in practice.
You can save a lot of silicon by doing 8 or 16 bit shifters and then doing the rest at the code generation level. Not having any seems really anemic to me.
Yeah I don’t get it. Shifts and rolls are among the simplest of all instructions to implement because they can be done with just wires, zero gates. Hard to imagine a justification for leaving them out.