LemonBoy 61e9e82bdc std: Make the CRC32 calculation slightly faster
Speed up a little the slicing-by-8 code path by replacing the
(load+shift+xor)*4 sequence with a single u32 load plus a xor.

Before:

```
iterative:  1018 MiB/s [000000006c3b110d]
small keys:  1075 MiB/s [0035bf3dcac00000]
```

After:

```
iterative:  1114 MiB/s [000000006c3b110d]
small keys:  1324 MiB/s [0035bf3dcac00000]
```
2020-09-13 16:32:21 -04:00
..
2020-08-30 21:28:11 -07:00