This prevents integer overflows when using addresses > 2GiB, which are
common for x86 operating systems with paging (Linux, Fiasco.OC) or
some test cases on the PandaBoard.
Note that this results in slightly different result table definitions
when automatically translating an experiment's protobuf message in the
DatabaseCampaign.
This change affects all existing protobuf messages to prevent
copy/paste propagation of this issue.
Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
This change became necessary as we observed weird fail-client SIGSEGV
crashes with both Bochs and Gem5 backends and different experiments.
Some Fail* components are instantiated statically: the
SimulatorController instance "simulator", containing the
ListenerManager and the CoroutineManager, and the active
ExperimentFlow subclass(es)
(experiments/instantiate-experiment*.ah.in). The experiment(s) is
registered as an active flow in the CoroutineManager at startup.
As plugins (which are ExperimentFlows themselves) are often created on
an experiment's stack, ExperimentFlows deregister themselves on
destruction (e.g., when leaving the plugin variable's scope). The
core problem is, that the creation and destruction order of statically
instantiated objects depends on the link order; if the experiment is
destroyed after the CoroutineManager, its automatic self-deregistering
feature talks to the smoking ruins of the latter.
This change removes all static instantiations of ExperimentFlow and
replaces them with constructions on the heap. Additionally it makes
sure that the CoroutineManager recognizes that a shutdown is in
progress, and refrains from touching potentially already destroyed
data structures when a (mistakenly globally instantiated)
ExperimentFlow deregisters in this case.
Change-Id: I8a7d42fb141222cd2cce6040ab1a01f9de61be24
This change introduces a CMake-style FindMySQL.cmake properly looking for
libmysqlclient_r with mysql_config. This also fixes linking on some
machines.
Change-Id: Ifdbfdc3c7440dead37a8b63aaa86732d636aa0e2
- Don't enforce the join order, MariaDB usually gets this right.
- Update DB statistics before terminating.
Change-Id: If7bbbe146321430d199811062d05b3c179c5732f
We're not yet using the common DatabaseCampaign as it doesn't allow
for additional experiment parameters (such as "variant" and
"benchmark") yet. TODO: Integrate changes in DatabaseCampaign in a
generic way and use it.
Change-Id: I45480003be433654aea8d3a417fbfa66be31155b
When eCos is built as a multiboot binary, some of its data structures
are still at very low (<<1M) addresses, but the rest moves to
addresses >1M. This change makes sure our invalid mem access
detection is not overly generous.
Change-Id: If8265a407b3706a4ff71562b316e05aa22255f62
This allows us to generate prerequisites (traces, state snapshots, memory
maps) in parallel, and without the previous shell script hacks.
Change-Id: I05a0321a794b4033d05eed20f5bffbd1e910cf1b
An experiment talking to a campaign server via the JobClient/JobServer
interface needs the FailControlMessage.proto compiler to run before the
experiment is compiled. A dependency on fail-comm ensures this.
- getSection/getSymbol now returns an ElfSymbol reference.
Searching by address now searches if address is within
symbol address and symbol address + size.
So we can test, if we are *within* a function, object or
section and not only at the start address.
At the moment, the experiment only works with a hard-coded address for the _stext symbol because the ElfReader cannot extract the symbol's address from the elf-binary. The experiment has been tested locally with PREREQUISITES = 1 and PREREQUISITES = 0. In the latter case, the test only considered the client/server communication, i.e., no FI has been performed at all (NOT tested yet).
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@2066 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
The includes of these headers have already been removed from the experiments. In the current code, the content of the header BochsRegister.hpp is rather simply copied to x86/Architecture.hpp. It is therefore necessary to revisit the code soon (especially the FIXME related to register IDs).
Another problem is that there is no generalization of register IDs. Thus, all experiments are currently specific to a concrete architecture (which is not desired).
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@2010 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
When we want to use a bounded job queue, job producer and job consumer must
run in different threads in order not to deadlock the campaign. Ideally
this functionality moves to the CampaignManager later (including the
retrieval of existing experiment results and storing new results).
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1946 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
This modification creates and uses multiple intermediate snapshot states
(one every 1,000,000 instructions) to fast-forward to the FI site.
Unfortunately this doesn't work yet; the trace seems to change in many (not
all!) cases we do this. One possible cause could be an incorrect
(off-by-one or alike?) restoration of the serial device timers, and
therefore an earlier/later transition to "output buffer empty", resulting
in eCos' serial putc function needing a different amount of polling loop
iterations. Needs more investigation.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1940 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
For the eCos kernel test campaign we define a relatively "special" metric
to compare FI results from different applications: Instead of aligning
the fault-space dimensions (e.g., by artificially adding "don't care" space
to the right for shorter application variants), we only keep the data
address axis constant. The rationale behind this is that -- despite the
benchmark applications' run-to-completion behavior -- an operating system's
scheduler usually runs ad infinitum, and that we therefore can extrapolate
from a short application run to any longer period.
In essence, this means we use a failure/experiment-length quotient (ignoring
the Y axis for simplicity, as it is constant in size) to compare
application variants, instead of an absolute failure count. One important
side effect is that we do not punish any application slowdown with this
metric. (And simply prolongating non-susceptible program sections with,
e.g., NOPs, seemingly "improves" fault tolerance.)
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1922 8c4709b5-6ec9-48aa-a5cd-a96041d1645a