- Add cpuid / getXCR0 functions for the cbe to use instead of asm blocks
- Don't cast between 128 bit types during truncation
- Fixup truncation to use functions for shifts / adds
- Fixup float casts for undefined values
- Add test for 128 bit integer truncation