The CRU design is definitely convoluted, but there *are* some instructions dedicated to using it. But like Chuck(G) said, it really depends on your design.
To me it seems like the 9900 was the test-bed for a lot of ideas about CPU design, and unfortunately the designers decided to try them all out in a single CPU. The CRU I/O, no stack pointer, having the general purpose registers stored in external RAM (allegedly to speed up context switching), read-before-write on all memory operations, etc.
Also, the read-before-write makes memory mapped I/O more complicated.
I have thought a lot about making a home-brew around the 9900 too, since I really like the assembly language on the CPU, but they really messed up the hardware implementation. But, we can fix that now.
With and FPGA you can implement the 9900 and correct those little problems, plus run it faster (100MHz 9900 anyone?) I've actually started work on a 9900 core (seems no one has done that yet), so we'll see how it goes.
If you do make a system around the 9900, please make all the RAM 16-bit and zero wait-state. That is probably the single biggest kludge of the 99/4A.