I'm writing an extremely optimized leaf function and to make it run faster I want to use R13 as a general purpose register. I preserve R13 by moving it to one of VFP registers before using it and before returning from function I restore it by moving it back. It looks like this:

/* Start of the function */
push { r4 - r12, r14 }
vmov s0, r13
/* Body of the function. Here I use R13
 * as a general purpose register */
vmov r13, s0
pop { r4 - r12, r14 }
bx lr

And it works. But I have read that some operating systems assume that R13 is always used as a stack pointer, and using it as a general purpose register can cause crashes. I should also say that this function is intended to run only on Android (Linux). Thanks!

  • not a good idea, and no real reason to use it for anything other than a stack. if you get an interrupt it is game over you crash. wrapping this with protection from interrupts is worse than just wrapping this code with pushing and popping some other register and using it. so just use another register. – old_timer Mar 6 at 15:34
  • Not sure that the r13 register is used for the interrupt stack. Interrupts are odd on ARM. – Robin Davies Mar 6 at 15:53
  • @old_timer: I assume/hope the OP is already using all the other GP registers, or they wouldn't be trying to scrounge 1 more by saving/restoring the stack pointer. – Peter Cordes Mar 6 at 22:08
  • use the stack then, dont mess with interrupts, esp not on android, linux... – old_timer Mar 7 at 0:46
  • whether or not r13 is used depends on the core and the mode., that is a good point though... – old_timer Mar 7 at 0:50

Obviously you should only consider this if you're already using all the other GP registers, including lr, and can't shift some of your work to NEON registers, e.g. using packed-integer even if you only care about the low 32 bits.

(Using SIMD regs for more scalar integer is usually only useful if there's an isolated set of values that don't interact with the other values in your algorithm, and you don't need to branch on them or use them as pointers. Transfer between int and SIMD is slow on some ARM CPUs.)

This is very non-standard, and only even possibly safe in user-space, not kernel

If you have any signal handlers installed, your stack pointer must be valid when one of those signals arrives. (And that's asynchronous.)

There's no other async usage of the user-space stack pointer in Linux beyond signal handlers. (Except if you're debugging with GDB and use print foo(123) where foo is a function in the target process.)

As mentioned in comments on Can I use rsp as a general purpose register (the x86-64 equivalent of this question), there's a workaround even for signals:

Use sigaltstack to set up an alternative stack, and specify SA_ONSTACK in the flags for sigaction when installing a handler.

As @Timothy points out, if your scratch value of SP could be an integer that happens to "point" into the alt stack, the signal dispatch mechanism will assume this is a nested signal and won't modify SP (because in an actual nested-signal case that would overwrite the first signal handler's still in use stack). So you could be one push away from SP going into an unmapped page, unless you allocate twice as much as you need, and only pass the top half to sigaltstack. (Maybe just 2k or 4k for simple signal handlers that return after not doing much).

This should be safe even with nested signals: only the outer-most signal handler can start near the bottom of the alt stack, and use some of the allocated space beyond the actual altstack. Another signal will use space below that, if SP is still within the altstack. Or it will use the top of the altstack if SP has gotten outside the altstack.

Or you can avoid the need for this over-allocation by using SP to hold a pointer to something else that's definitely not the alt stack, if any of your GP registers need to be a pointer. Having it be a valid pointer opens you up to corruption instead of faults if a debugger uses the current SP for something, or if you get the altstack mechanism wrong. But that's just a difference in failure mode: either is catastrophic.

Hardware interrupts save state on the kernel stack, not the user-space stack. If they used the user stack:

  1. user-space could crash the OS by having an invalid SP.
  2. user-space could gain kernel privileges by having another user-space thread modify the kernel's stack data (including return addresses.)

(All user-space threads of a process share the same page table, and can read/write each other's stack mappings.)

Linux/Android is very different from a lightweight RTOS without virtual memory or strict enforcement of privilege separation.

  • 1
    As the stack switching is triggered using the stack pointer value it is nessescary to allocate double the required stack size and pass the top half to sigaltstack. – Timothy Baldwin Mar 6 at 17:57
  • 1
    @TimothyBaldwin: why would it have to be contiguous with the thread's main stack at all? Or the same size? If your signal handlers are all simple, you might only need a page or two for them to run and make a sigreturn system call to get the kernel to restore the old context. – Peter Cordes Mar 6 at 22:06
  • 1
    The OP is using the stack pointer as the general purpose register, it may contain any value including values in the range passed to sigaltstack. Suppose a signal occurs with the stack pointer pointing to base of the range passed to sigaltstack, in that case the current value the stack pointer will be used as the alternative stack is apparently already in use therefore there must be enough space below the range passed to sigaltstack for the signal handler. – Timothy Baldwin Mar 12 at 0:26
  • @TimothyBaldwin: Ah I see, I didn't know signal stacks checked if SP was already in that range and if so didn't start from the top of the block. But now that you mention it, obviously nested signals shouldn't do that. – Peter Cordes Mar 12 at 0:46

When a context switch/irq will trigger while your code is executing, the OS/hw will probably assume that R13 is TOS, so it will save it in the idea that it can restore the TOS when it resumes execution.

This might be a problem in your case.

A sensible approach would be to make the piece of code critical and somehow force the system tick/irq to pend until the routine finishes/R13 is restored.

You are probably better off using LR (R14) if you really need the extra register.

  • The OP is running in user-space on Linux. Hardware interrupts save state on the kernel stack, not the user-space stack. If they used the user stack for anything: 1. user-space could crash the OS by having an invalid SP. 2. user-space could gain kernel privileges by having another user-space thread modify the kernel's stack data. (All user-space threads of a process share the same page table, and can read/write each other's stack mappings.) What you say would be true on a lightweight RTOS without virtual memory and separate user/kernel stacks, but not Linux/Android unless I'm very mistaken – Peter Cordes Mar 6 at 22:03
  • You are correct. It was not obvious at first to me that he's doing userspace stuff. – iocapa Mar 7 at 8:28
  • That's a fair point, you couldn't do this in the Linux kernel. Of course, unless you're an Android phone vendor, you don't get to run kernel code. And in case anyone else is wondering, using VFP regs to save/restore integer also implies non-kernel, unless this was inside a kernel_fpu_begin() / kernel_fpu_end() block. – Peter Cordes Mar 7 at 8:43
  • @PeterCordes You did miss an important point here. LR is a much better register to use than SP. In fact, the newer EABI by ARM allows use of LR; you need to annotate your assembler to prevent attempts to trace it; alternate section information can be used to provide tracing info if needed. 'LR' is definitely used as a general purpose register in places within the ARM Linux kernel. Other OS's/hypervisors, etc use a banked LR as scratch to boot strap context switches while the banked SP is used for context stores. – artless noise Mar 7 at 16:56
  • @artlessnoise: I was assuming that the OP was already using LR. It's not an instead, it's an "as well". BTW, did you mean to comment on my answer? I wasn't aware that there was any expectation to not use LR. – Peter Cordes Mar 7 at 20:54

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.