When we want to use a bounded job queue, job producer and job consumer must
run in different threads in order not to deadlock the campaign. Ideally
this functionality moves to the CampaignManager later (including the
retrieval of existing experiment results and storing new results).
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1946 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
Synchronize re-sending jobs in sendPendingExperimentData() and modifying
(or indirectly, via getDone() and the campaign, deleting) jobs in the
m_runningJobs queue.
a) sendPendingExperimentData needs an intact job to serialize and send it.
b) After moving the job to m_doneJobs, it may be retrieved and deleted
by the campaign at any time.
Additionally, receiving a result overwrites the job's contents. This
already may cause breakage in sendPendingExperimentData (a).
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1943 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
This modification creates and uses multiple intermediate snapshot states
(one every 1,000,000 instructions) to fast-forward to the FI site.
Unfortunately this doesn't work yet; the trace seems to change in many (not
all!) cases we do this. One possible cause could be an incorrect
(off-by-one or alike?) restoration of the serial device timers, and
therefore an earlier/later transition to "output buffer empty", resulting
in eCos' serial putc function needing a different amount of polling loop
iterations. Needs more investigation.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1940 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
For the eCos kernel test campaign we define a relatively "special" metric
to compare FI results from different applications: Instead of aligning
the fault-space dimensions (e.g., by artificially adding "don't care" space
to the right for shorter application variants), we only keep the data
address axis constant. The rationale behind this is that -- despite the
benchmark applications' run-to-completion behavior -- an operating system's
scheduler usually runs ad infinitum, and that we therefore can extrapolate
from a short application run to any longer period.
In essence, this means we use a failure/experiment-length quotient (ignoring
the Y axis for simplicity, as it is constant in size) to compare
application variants, instead of an absolute failure count. One important
side effect is that we do not punish any application slowdown with this
metric. (And simply prolongating non-susceptible program sections with,
e.g., NOPs, seemingly "improves" fault tolerance.)
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1922 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
As we have a global CampaignManager instance in the fail-cpn library, a
JobServer member variable is not such a good idea. Essentially, we started
all JobServer threads (which is done in its constructor) within a
fail-client before this commit.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1915 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
Over a few indirections, this broke Bochs' 16550 UART simulation after
transferring a few hundreds of bytes. The UART uses internal timers to
simulate the configured baud rate; these timers cease to work after
repeated misuse from the Fail/Bochs bridge.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1887 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
This allows single-stepping through REP/REPZ/REPNZ iterations. We mainly
need this for a little more realistic timing model when, e.g., copying
large memory areas with REP MOVSB.
Be aware that memory-access tracing only works reliably for REPxx-prefixed
instructions if Bochs was configured with --disable-repeat-speedups, as
this Bochs optimization completely circumvents the usual memory access
paths.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1885 8c4709b5-6ec9-48aa-a5cd-a96041d1645a