It would need to be ported, but the Talkie library for the AVR / STM / SAMD / ESP, with its roots dating back to the TI speak and spell toy, gets a phenome engine with a good vocabulary into less than 8k. It’s not musical though lol.
The pwm on the CH32v003 is pretty similar to the STM32 implementation, so porting might not take much.
It would be really cool to have phenome/text based vocabulary like talkie on that little chip! Since it uses text/phenomes it can have a large vocabulary for such a tiny chip. It would be possible to have about 1500 words in an 8k dictionary, 2k for code, and 4K for the phenome engine and still fit in the 16k of flash it comes with.
Incidentally, there is another riscV from WCH that also features BLE, 200K+ of flash, and 18K of RAM in an ESSOP-10 package (same size as the SOP8 but only 4 GPIO). It’s around $0.41 in Q1. The vocabulary with that would be 20k plus words with 100k left for code lol.
It’s just nuts what a dollar will get you these days in that space.