882d4f381b8af72d8859b7d849c241d9578e24ff
The JobClient currently waits a LONG time until it really shuts down
after not having reached the server in sendResultsToServer() (which is
unfortunately the by far most probable point in the code to determine
this):
- A different bug (fixed in the previous commit) provoked the
situation that a (way) too large amount of jobs was fetched
before.
- sendResult() (called after each experiment iteration) realized
that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
prematurely call home to send first results (without planning to
get new jobs yet).
- If the server was gone (done, or aborted), connect in
sendResultsToServer() failed after several retries and timeouts.
- All subsequent calls to sendResult() retried connecting to the
server (again, with retries and timeouts), once for each remaining
job.
- When all jobs were done, getParam() tries to connect a last time,
finally telling the experiment that nobody's home.
This resulted in client shutdown times of up to four hours (for the
default CLIENT_JOB_LIMIT of 1000) after the campaign server
terminated. This change solves the issue by not handing out new
(cached) jobs after the connect failed once, making the experiment
terminate quickly.
Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
This is an import of the old danceos svn repository. The Fail* development started with rev 187, but this git import only contains revisions 956 and newer due to directory structure changes. Imported from external gitsvn checkout. http://www.kernel.org/pub/software/scm/git/docs/howto/using-merge-subtree.html
Description
Languages
C++
45%
C
36.8%
Python
8.2%
Shell
1.7%
Makefile
1.6%
Other
6.2%