aboutsummaryrefslogtreecommitdiff
path: root/supervise-daemon-guide.md
blob: 06a49d2f1bbdf6a696c05a407f205e8a3e853675 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# Using supervise-daemon

Beginning with OpenRC-0.21 we have our own daemon supervisor,
`supervise-daemon`, which can start a daemon and restart it if it
terminates unexpectedly.

The following is a brief guide on using this capability.

* **Use Default start, stop and status functions**.
  If you write your own start, stop and status functions in your service
  script, none of this will work. You must allow OpenRC to use the default
  functions.

* **Daemons must not fork**.
  Any daemon that you would like to have monitored by `supervise-daemon`
  must not fork. Instead, it must stay in the foreground. If the daemon
  forks, the supervisor will be unable to monitor it.

  If the daemon can be configured to not fork, this should be done in the
  daemon's configuration file, or by adding a command line option that
  instructs it not to fork to the `command_args_foreground` variable shown
  below.

## Health checks

Health checks are a way to make sure a service monitored by
`supervise-daemon` stays healthy. To configure a health check for a
service, you need to write a `healthcheck()` function, and optionally an
`unhealthy()` function in the service script. Also, you will need to set
the `healthcheck_timer` and optionally `healthcheck_delay` variables.

### healthcheck() function

The `healthcheck()` function is run repeatedly based on the settings of
the `healthcheck_*` variables. This function should return zero if the
service is currently healthy or non-zero otherwise.

### unhealthy() function

If the `healthcheck()` function returns non-zero, the `unhealthy()` function
is run, then the service is restarted. Since the service will be
restarted by the supervisor, the unhealthy function should not try to
restart it; the purpose of the function is to allow any cleanup tasks
other than restarting the service to be run.

## Variable settings

The most important setting is the supervisor variable. At the top of
your service script, you should set this variable as follows:

```sh
supervisor=supervise-daemon
```

Several other variables affect the way services behave under
supervise-daemon. They are documented on the  openrc-run man page, but I
will list them here for convenience:

```sh
command_args_foreground="arguments"
```

This should be used if the daemon you want to monitor
forks and goes to the background by default. This should be set to the
command line option that instructs the daemon to stay in the foreground.

```sh
healthcheck_delay=seconds
```

This is the delay, in seconds, before the first health check is run.
If it is not set, we use the value of `healthcheck_timer`.

```sh
healthcheck_timer=seconds
```

This is the  number of seconds between health checks. If it is not set,
no health checks will be run.

```sh
respawn_delay
```

This is the number of seconds to delay before attempting to respawn a
supervised process after it dies unexpectedly.
The default is to respawn immediately.

```sh
respawn_max=x
```

This is the maximum number of times to respawn a supervised process
during the given respawn period.
The default is 10. 0 means unlimited.

```sh
respawn_period=seconds
```

This works in conjunction with `respawn_max` and `respawn_delay` above to
decide if a process should not be respawned for some reason.

For example, if respawn period is 10 and `respawn_max` is 2, the process
would need to die 3 times within 10 seconds to no longer be respawned.
Note that `respawn_delay` will delay all of this, so in the above scenario
a `respawn_delay` of greater than 5 will cause infinite respawns.

By default, this is unset and `respawn_max` applies to the entire lifetime
of the service.