For a 32-bit windows application is it valid to use stack memory below ESP for temporary swap space without explicitly decrementing ESP?
Consider a function that returns a floating point value in
ST(0). If our value is currently in EAX we would, for example,
PUSH EAX FLD [ESP] ADD ESP,4 // or POP EAX, etc // return...
Or without modifying the ESP register, we could just :
MOV [ESP-4], EAX FLD [ESP-4] // return...
In both cases the same thing happens except that in the first case we take care to decrement the stack pointer before using the memory, and then to increment it afterwards. In the latter case we do not.
Notwithstanding any real need to persist this value on the stack (reentrancy issues, function calls between
PUSHing and reading the value back, etc) is there any fundamental reason why writing to the stack below ESP like this would be invalid?
First of all, if you care about efficiency, can't you avoid x87 in your calling convention?
movd xmm0, eax is a more efficient way to return a
float that was in an integer register. (And you can often avoid moving FP values to integer registers in the first place, using SSE2 integer instructions to pick apart exponent / mantissa for a
log(x), or integer add 1 for
nextafter(x).) But if you need to support very old hardware, then you need a 32-bit x87 version of your program as well as an efficient 64-bit version.
But there are other use-cases for small amounts of scratch space on the stack where it would be nice to save a couple instructions that offset ESP/RSP.
Based on other answers and discussion in comments under them:
It is explicitly documented as being not safe by Microsoft: (for 64-bit code, I didn't find an equivalent statement for 32-bit code but I assume there is one)
Stack Usage (for x64)
All memory beyond the current address of RSP is considered volatile: The OS, or a debugger, may overwrite this memory during a user debug session, or an interrupt handler.
So that's the documentation, but the reasons stated don't make sense for the user-space stack. The important part is that they document it as not guaranteed safe, not this bogus reasoning / explanation for the rule.
Hardware interrupts can't use the user stack; that would let user-space crash the kernel with
mov esp, 0, or worse take over the kernel by having another thread in the user-space process modify return addresses while an interrupt handler was running. This is why kernels always configure things so interrupt context is pushed onto the kernel stack.
Modern debuggers run in a separate process, and are not "intrusive". Back in 16-bit DOS days, without a multi-tasking protected-memory OS to give each task its own address space, debuggers would use the same stack as the program being debugged, between any two instructions while single-stepping. But I hope no current debuggers do that now. The Windows calling convention allows debuggers to step on stack memory below RSP, but I see no reason why they would do that or why the convention needs to allow that.
As far as we can tell, what you propose is safe in practice in user-space code on current 32 and 64-bit Windows.
Related: the x86-64 System V ABI (Linux, OS X, all other non-Windows systems) does define a red-zone for user-space code (64-bit only): 128 bytes below RSP that is guaranteed not to be asynchronously clobbered. Unix signal handlers can run asynchronously between any two user-space instructions, but the kernel respects the red-zone by leaving a 128 byte gap below the old user-space RSP, in case it was in use. With no signal handlers installed, you have an effectively unlimited red-zone even in 32-bit mode (where the ABI does not guarantee a red-zone). Compiler-generated code, or library code, of course can't assume that nothing else in the whole program has installed a signal handler.
So the question becomes: is there anything on Windows that can asynchronously run code using the user-space stack between two arbitrary instructions? (i.e. any equivalent to a Unix signal handler.)
It seems the answer is "no", for current versions of Windows, but there's no guarantee that future Windows won't include a new feature.
Current compilers don't take advantage of space below ESP/RSP on Windows, even though they do take advantage of the red-zone in x86-64 System V (in leaf functions that need to spill / reload something, exactly like what you're doing for int -> x87.)
Things that you'd think might be a problem in current Windows, and why they're not:
The guard page stuff below ESP: as long as you don't go too far below the current ESP, you'll be touching the guard page and trigger allocation of more stack space instead of faulting. This is fine as long as the kernel doesn't check user-space ESP and find out that you're touching stack space without having "reserved" it first.
kernel reclaim of pages below ESP/RSP: apparently Windows doesn't currently do this. So using a lot of stack space once ever will keep those pages allocated for the rest of your process lifetime, unless you manually
VirtualAlloc(MEM_RESET)them. (The kernel would be allowed to do this, though, because the docs say memory below RSP is volatile. The kernel could effectively zero it asynchronously if it wants to, copy-on-write mapping it to a zero page instead of writing it to the pagefile under memory pressure.)
SEH (Structured Exception Handling: hardware exceptions like divide by zero are delivered somewhat similarly to C++ exceptions): Unless you have a
catch()clause in the current function, SEH will unwind the stack so the current stack frame is discarded. If you do have a
catch()in the current function, this could be the one case where stack memory below ESP is clobbered between any 2 instructions.
APC (Asynchronous Procedure Calls): They can only be delivered when the process is in an "alertable state", which means only when inside a
callto a function like
calling a function already uses an unknown amount of space below E/RSP, so you already have to assume that every
callclobbers everything below the stack pointer. Thus these "async" callbacks are not truly asynchronous with respect to normal execution the way Unix signal handlers are. (fun fact: POSIX async io does use signal handlers to run callbacks).
Console-application callbacks for ctrl-C and other events (
SetConsoleCtrlHandler). This looks exactly like registering a Unix signal handler, but in Windows the handler runs in a separate thread with its own stack. (See RbMm's comment)
And apparently there are no other ways that another process (or something this thread registered) can trigger execution of anything asynchronously with respect to the execution of user-space code on Windows.
Thus it seems that current Windows does have a 4096 byte red-zone below ESP (or maybe more if you touch it incrementally?), but RbMm says nobody takes advantage of it in practice.
Obviously anything that would synchronously clobber it (like a
call) must be avoided, again same as when using the red-zone in the x86-64 System V calling convention. (See https://stackoverflow.com/tags/red-zone/info for more about it.)