An interesting feature of the AArch64 ABI simplifies this code relative to other platforms. AArch64 reserves an extra register (x8) for the address of struct returns, giving the
This saves one load on the message send path for each tree depth (2
loads in the common case, 3 if you have a lot of selectors), which
should improve cache usage considerably.
Note: This is a checkpoint commit. Currently, every objc_msgSend()
implementation except for x86-64 is broken.