The JobClient currently waits a LONG time until it really shuts down
after not having reached the server in sendResultsToServer() (which is
unfortunately the by far most probable point in the code to determine
this):
- A different bug (fixed in the previous commit) provoked the
situation that a (way) too large amount of jobs was fetched
before.
- sendResult() (called after each experiment iteration) realized
that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
prematurely call home to send first results (without planning to
get new jobs yet).
- If the server was gone (done, or aborted), connect in
sendResultsToServer() failed after several retries and timeouts.
- All subsequent calls to sendResult() retried connecting to the
server (again, with retries and timeouts), once for each remaining
job.
- When all jobs were done, getParam() tries to connect a last time,
finally telling the experiment that nobody's home.
This resulted in client shutdown times of up to four hours (for the
default CLIENT_JOB_LIMIT of 1000) after the campaign server
terminated. This change solves the issue by not handing out new
(cached) jobs after the connect failed once, making the experiment
terminate quickly.
Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
The client sends results back earlier (i.e., before all jobs are
done) if the client response time (CLIENT_JOB_REQUEST_SEC) is
exceeded. This makes sure that extraordinarily long-running
experiments get reported back before, e.g., the LIDO job timeout
kills the Fail* instance.
Change-Id: I3ada0360ec54b63f80a7008570ca514449720220
Since several jobs can be fetched from the server, it is interesting to
know how much undone jobs are still available. This will accomplished by
the new method getNumberOfUndoneJobs().
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@2041 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
A campaign server now tells all clients a unique run ID (the UNIX timestamp
when it was started). This allows us to ignore results from "old" clients
that talked to another server before, and to tell them to die.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1677 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
The FailBochs client is not linked by the Bochs build system anymore, but
by our cmake scripts (make fail-client):
- All Bochs libraries are merged into libfailbochs.a (a new target
within the Bochs Autotools scripts).
- The previous libfail.a is *not* a merge of all Fail* libraries anymore,
but pulls these in via library dependencies.
Additionally I did a lot of build system cleanup, e.g. additional external
libraries may now be pulled in where they're needed.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1390 8c4709b5-6ec9-48aa-a5cd-a96041d1645a