The client sends results back earlier (i.e., before all jobs are
done) if the client response time (CLIENT_JOB_REQUEST_SEC) is
exceeded. This makes sure that extraordinarily long-running
experiments get reported back before, e.g., the LIDO job timeout
kills the Fail* instance.
Change-Id: I3ada0360ec54b63f80a7008570ca514449720220
Quoting connect(3posix): "If connect() fails, the state of the socket is
unspecified. Conforming applications should close the file descriptor and
create a new socket before attempting to reconnect."
Change-Id: Ibcdcc0f546560a41009832894659a37947243f2f
This prevents client and server from being sent a SIGPIPE (and
terminating) when the other side unexpectedly closes the connection.
It's way easier to handle this condition when checking the write()
return value, than to do anything smart in a SIGPIPE handler. More
details:
<http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly>
Change-Id: I1da5bf5ef79c8b7b00ede976e96ed4f1c560049d
The new troughput is now calculated as:
0.5*old throughput + 0.5* the current throughput of the last job-set.
This prevents excessive variations in the calculation of the new
throughput.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@2079 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
A campaign server now tells all clients a unique run ID (the UNIX timestamp
when it was started). This allows us to ignore results from "old" clients
that talked to another server before, and to tell them to die.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1677 8c4709b5-6ec9-48aa-a5cd-a96041d1645a