I was trying to understand how long supervisord waits between retries. After a few internet searches I came up empty handed. I pitched the question to other developers at the place I work and none of them seemed to know the answer. I decided to take a few minutes to find out.
If you have the config option for
autorestart set to
unexpected then supervisord will try to restart your process when it fails. By default it will retry 3 times. You can override that default by setting the
startretries option. The documentation mentions that, “Each start retry will take progressively more time”, but it doesn’t mention how much time that is.
If my a database, network connection, or 3rd party API were to go down, how long would it have to be down before the process that relies on it would switch to the FAILED state? Is it milliseconds? Seconds? Minutes?
It turns out that supervisord restarts the process after 1 second, then after 2 more seconds, then after 3 more seconds. All three retry attempts happen in about 6 seconds. As far as I can tell this is not currently configurable.
|First Retry||1 Second|
|Second Retry||+2 Seconds|
|Third Retry||+3 Seconds|
|Total Time||6 Seconds|
Supervisord defaults to successful exit codes of 0 and 2.
In order to test this I setup a very simple bash script that simply outputs the date and time and then exits with a non-zero exit code. Then I added that bash script to supervisord and started it. Finally, I looked at the output log file to see the timestamps for when it started. Here’s the script I used.
#!/bin/bash date exit 1
After a few seconds supervisord had already switched the task to
FAILED. I looked at the output log file and saw the following.
Wed Oct 3 09:35:35 MDT 2018 Wed Oct 3 09:35:36 MDT 2018 Wed Oct 3 09:35:38 MDT 2018 Wed Oct 3 09:35:41 MDT 2018
As you can see it started the process once at 35 seconds. Then it restarted it 3 times at 36, 38, and 41 seconds. 1, 2, and 3 seconds later.