It's probably worth mentioning the 2400bps (300 bytes per second) LPC10 codec built into SoX. If you have SoX installed, try
rec -t lpc10 speech.lpc
and then speaking into your microphone for ten or fifteen seconds before you ^C it. Then play it back with play speech.lpc
It will sound very robotic but pretty comprehensible, at least with an adult male voice in English, and it preserves a lot of the prosody and enunciation that is so hard to get out of speech-synthesis packages.12KiB of data at 300 bytes per second would be 41 seconds of recorded speech.
Decoding the LPC10 data on the CH32V003 might be tricky. On amd64, running `make CFLAGS=-Os` followed by `ld -r -o tmp.o *.o` inside sox-14.4.2+git20190427/lpc10 yields a tmp.o with 25243 bytes of text (including .rodata, etc.) and 356 bytes of data. I'm not optimistic that RISC-V would compress that to fit inside the CH32's flash. And I find the code in that directory inscrutable; it's Fortran that's been compiled to C.
Still, it seems plausible that you could massage the LPC10 data into a format that something like Talkie would understand.