LP#
1494486: Limit damage caused by dropped drone XMPP sockets
It is apparently possible for drones to get into a state where their XMPP
socket is closed but they don't notice. This is bad because the drone can
continue to receive requests from its listener but can no longer respond
to them. To limit the pain this can cause, we should kill the drone as soon
as we notice this condition.
To avoid overhead, this commit notices when the socket returns an error (or
raises a signal, in Perl) upon write, and exits immediately. One message
will be lost, but the drone will no longer be a black hole that does nothing
but absorb requests it can never fill.
To test
-------
[1] Start an OpenSRF stack and look for a drone process.
[2] Use lsof to identify which socket that drone is using
to talk to XMPP.
[3] Use gdb to attach to the process and close the socket, e.g.,
$ gdb -p $PID
(gdb) p close(11) # or whatever the socket number was
(gdb) c
[4] Use srfsh to make requests of that service. Eventually, one
of them will hit the drone.
[5] Sans patch, the request will get handled by the drone, but
the results will never get sent, and the drone will remain
available to handle other requests.
[6] With the patch, the drone will exit when it discovers that it
can no longer write to the XMPP socket.
Signed-off-by: Mike Rylander <mrylander@gmail.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>