Consider the following loop:
for i in range(20): if i == 10: subprocess.Popen(["echo"]) # command 1 t_start = time.time() 1+1 # command 2 t_stop = time.time() print(t_stop - t_start)
“command 2” command takes systematically longer to run when “command 1” is run before it. The following plot shows the execution time of
1+1 as a function of the loop index
i, averaged over 100 runs.
1+1 is 30 times slower when preceded by
It gets even weirder. One may think that only the first command run after
subprocess.Popen() is affected, but it is not the case. The following loop shows that all commands in the current loop iteration are affected. But the subsequent loops iterations seem to be mostly OK.
var = 0 for i in range(20): if i == 10: # command 1 subprocess.Popen(['echo']) # command 2a t_start = time.time() 1 + 1 t_stop = time.time() print(t_stop - t_start) # command 2b t_start = time.time() print(1) t_stop = time.time() print(t_stop - t_start) # command 2c t_start = time.time() var += 1 t_stop = time.time() print(t_stop - t_start)
Here’s a plot of the execution times for this loop, average over 100 runs:
- We get the same effect when replacing
subprocess.Popen()(“command 1”) with
time.sleep(), or rawkit’s libraw C++ bindings initialization (
libraw.bindings.LibRaw()). However, using other libraries with C++ bindings such as libraw.py, or OpenCV’s
cv2.warpAffine()do not affect execution times. Opening files don’t either.
- The effect is not caused by
time.time(), because it is visible with
timeit.timeit(), and even by measuring manually when
- It also happens without a for-loop.
- This happens even when a lot of different (possibly CPU- and memory-consuming) operations are performed between “command 1” (
subprocess.Popen) and “command 2”.
- With Numpy arrays, the slowdown appears to be proportional to the size of the array. With relatively big arrays (~ 60 M points), a simple
arr += 1operation can take up to 300 ms!
Question: What may cause this effect, and why does it affect only the current loop iteration?
I suspect that it could be related to context switching, but this doesn’t seem to explain why a whole loop iteration would affected. If context switching is indeed the cause, why do some commands trigger it while others don’t?
my guess would be that this is due to the Python code being evicted from various caches in the CPU/memory system
perflib package can be used to extract more detailed CPU level stats about the state of the cache — i.e. the number of hits/misses.
I get ~5 times the
LIBPERF_COUNT_HW_CACHE_MISSES counter after the
from subprocess import Popen, DEVNULL from perflib import PerfCounter import numpy as np arr =  p = PerfCounter('LIBPERF_COUNT_HW_CACHE_MISSES') for i in range(100): ti =  p.reset() p.start() ti.extend(p.getval() for _ in range(7)) Popen(['echo'], stdout=DEVNULL) ti.extend(p.getval() for _ in range(7)) p.stop() arr.append(ti) np.diff(np.array(arr), axis=1).mean(axis=0).astype(int).tolist()
2605, 2185, 2127, 2099, 2407, 2120, 5481210, 16499, 10694, 10398, 10301, 10206, 10166
(lines broken in non-standard places to indicate code flow)