This change removes support for earlier LLVM versions; making them
work as well is simply too tedious.
Change-Id: I372a151279ceb2bfd6de101c9e0c15f0a4b18c03
I did this mainly so server and client use a common networking API
IMO, using Boost::asio results in nicer name-lookup code.
Since no longer needed, I removed the SocketComm stuff.
The client is still synchronous; I see no benefit in having it
asynchronous.
I'm not super happy with the random backoff by the clients, if they
can't connect to the server. It makes the code really messy, 3 retries
is totally arbitrary, as is the backup windows. I believe launching
the server and clients in the correct order should be handled by a
launch script
Change-Id: Ifea64919fc228aa530c90449686f51bf63eb70e7
This patch overhauls the FAIL* server code to leverage Boost asio to be able to
handle a large number of clients (>4000). In this implementation the server is
now single threaded. I've not encountered any problems with this for up to
about 10k clients. Boost ASIO can also be used multithreaded, but I assume the
FAIL* internal data structures (Synchronized*) will become a bottleneck first.
The code now additionally depends on Boost Coro and Boost Context, as well as
a C++ 14 compiler, although the only C++14 feature required is a lambda capture
with initializer, such as [ x = std::move(x) ]. gcc-4.9.2 does this.
The code could (and probably should) be cleaned up more. Comments are wordy,
code is unnecessary now (multiple server threads), code is not self-contained
(headers spread dependencies), many ifdef's (server performance measuring
should be runtime rather than a compile time option), and much more. But for
this patch I was going for a minimal changeset the get the functionality in,
to have an easier review. Alas, FAIL* has no Unit-test suite to run the changes
against.
To handle such a large number of clients more changes were necessary, for
example server status output is now performed every 1s, instead for every
request.
The class Minion was removed completely; the only thing it was doing was
encapsulate an int.
The server has now a runtime-configurable port, or it can select a free port on
its own if none is specified. This requires the CampaignManager to add a port
argument and instantiate the JobServer dynamically.
Change-Id: Iad9238972161f95f5802bd2251116f8aeee14884
- search for libdwarf.h in new locations (e.g., /usr/include/libdwarf/)
- build Bochs with -std=gnu++98 (gnu++14 is default since GCC 6.1)
- specify "proto2" syntax for protobuf messages
- minor build-system and C++ namespace fixes
Change-Id: I16dbc622c797ef8e936fe3c0fb9b03029d27529d
This change removes the hard compile-time dependency from the
performance-improving dedicated listener-list implementation
(core/sal/perf/) to basic watchpoints / breakpoints being enabled in
the cmake config. This allows to keep the CONFIG_FAST_* switches
enabled in practically every experiment.
The primary reason for this change was the recent insight that enabled
breakpoints with disabled CONFIG_FAST_BREAKPOINTS can massively slow
down an experiment even if the latter does not use a single breakpoint
itself.
Change-Id: I5e3f5c1632ed1ee98a3ec887f18b174fa0e15773
Before this change, the GenericExperiment only recorded port 0xe9 output
*after* the fault was injected. When a fault was injected during the
workload's output loop, the output data before that point in time was
missing, and the experiment outcome was wrongly classified as SDC.
This change moves the logging activation to before the fast-forwarding
step (DatabaseExperiment::cb_before_fast_forward). It also makes sure the
DatabaseExperiment only clears its own listeners instead of also touching
the SerialOutputLogger's one.
Change-Id: I66bda4ee318d271ddda6f7ade4e817bf9d14cf46
Limit the serial-output logger buffer to prevent overly large memory
consumption in case the target system ends up, e.g., in an endless loop.
The buffer is limited to (golden-run output size)+1 to be able to detect
the case when the target system makes a correct output but faultily adds
extra characters afterwards.
Change-Id: I50c082f8fb09a702d87ab83732ca3e3463c46597
This change prevents an integer overflow in the memory-access listener
for WRITE_OUTERSPACE. Instead of matching all addresses above
maxima_data, l_mem_outerspace never matched in the
generic-experiment's "--catch-write-outerspace" mode.
Change-Id: I8f4ee4515af3998b7c2a8e83c7a18306c26d8d66
This change adds detection of SDCs to GenericTracing and
GenericExperiment via Bochs's I/O port E9.
Change-Id: Ie036aa97468b45cad94b6c8f73d1ef2d227547b2
This change introduces the ability to inject burst faults to
the DatabaseCampaign/-Experiment and thus to all derived
campaigns/experiments.
Change-Id: I491d021ed3953562bd7c908e9de50d448bc8ef33
Up until now only generic-tracing had the feature to directly
pass an ELF file to the experiment. generic-experiment lacked
that functionality and resorted to using the $FAIL_ELF_PATH
environment variable.
This change introduces the "--elf-file" command line argument
to generic-experiment.
Change-Id: Ie74de9e1781275ab247786856e13e412bac39224
ERIKA Enterprise is a OSEK conforming embedded RTOS. The supplied tracer
and experiment are similar to the cored-{tracing,tester} experiment, but
checks the integrity of the RTOS application in a different
manner. Stacks and stackpointers are located differently in ERIKA. This
experiment was used in RTAS'15 Hoffmann et al.
Change-Id: Idc8d874eb4d4ef15837f903270cfa521bc9514a2
With the instantiate-indirect.ah method, we can choose between different
experiment flows at runtime. By this, we can combine tracing and actual
injection into one fail-client binary. A -Wf,--mode={tester,tracer}
switch does hand the control to different experiment flows.
Change-Id: Ia268489ff6bc74dffea745b7aedcb36e262e8079
For redoing the bench-coptermock-isorc experiment, we have to change the
timeout settings. We now use a soft timeout setting. A soft timeout is
resetted after each checkpoint event. If a hard timeout (2 seconds) is
reached although the soft timeout was resetted, we also abort the injection.
Change-Id: Ib7c2b1ad201641f47434a11d3273dde797e0012e
The checkpoint plugin is able to use dynamic values from within the
target to calculate its ranges. If the experiment injects faults within
those dynamic values, we will always get an digest error, even if it has
no influence on the outcome of the experiment or the integrity of the
data range.
Therefore, the plugin now provides a possiblity to cache those dynamic
values, before injecting them. This has to be done explicitly within the
experiment.
Change-Id: Ib2cbedd570ea9ab9c97efc152279e8eb79c573f4
The injection_instr_absolute can be NULL, if the trace was imported by
--faultspace-rightmargin R. The database-experiment then aborted the
injection, since a non present injection instruction is encoded as 0,
which is != 0.
Change-Id: I0abcbf102e8b26678ea574d6f73741c2cfac6781
This options performs a restore to the saved state of the machine immediately
after saving (default: off). This option is needed when the state is used by
other experiments that depend on the trace, which slighty differs without a
restore.
Change-Id: I4fdf4c5e03779bb9c6e0a0fa335ceae3e20608a5
The generic-tracing experiment now supports logging of I/O port access to file.
Therefore, the serialoutput plugin needs to be included in the experiment
configuration. Without the --serial-file option specified, logging is disabled.
Change-Id: I9e60d8ffd598ee04a50b4d92fc283f75382d478a
- Add missing iomanip header: Without this one, Fail/gem5 does not
compile.
- Remove unnecessary sal/bochs header: This seems to be a relic from
when the DatabaseExperiment was Bochs-specific.
Change-Id: I91c991795c2c2e76359e9d11415f5119d225a4ab
This change makes MemoryAccessListeners deliver linear addresses
instead of virtual ones deprived of their segment selector. Even in
modern operating systems, segment selectors are still used for, e.g.,
thread-local storage.
The hooks within MemAccess.ah could maybe be implemented in a simpler
and less fragile way using the BX_INSTR_LIN_ACCESS instrumentation
hook, but this needs more investigation.
Change-Id: I0cee6271d6812d0a29b3a24f34d605a327ced7da
* Removed all command-line options.
* Read all required information from *-traceinfo.txt file or kernel elf file.
* Record error_corrected (but only in the 'OK' case).
* Add support for multiple variants (similar to the ecos experiment).
Change-Id: I933e52881fc6bee0750d8aaef813fe2539166b06
Due to a bug (most likely a copy and paste issue), the detected-marker
group was defined to point to the "FAIL_marker"-set, which would be
redundant. This commit will correctly map it to the "DETECTED_marker"
group.
Change-Id: I7de688357006ced1adf2423e213ae6633629cb81
The color_assert_port symbol does not exist in all dOSEK variant,
therefore we add the listener only if the symbol exists. Otherwise the
invalid handler will trigger on INV_ADDR
Change-Id: I7b81940a8413850527efb9e4bae86248794c622c
Use the newly introduced SimulatorController::getCPUCount() instead of
BX_SMP_PROCESSORS to figure out the number of CPUs the back end provides.
Change-Id: I6d6521ae508154366ab5d0c23ddcb6f2de99aa04
This change adds some missing headers needed for compiling the
PandaBoard variant, which seems to not have seen a compiler for a
while.
Change-Id: Ifb54abb4dc676fafc29ecbae97bafaa547fcfc80
This change adapts several experiments, including the
DatabaseExperiment framework, to the restore() behavior update from
the previous change. Existing traces should continue to be usable.
This is not tested yet, mainly because I don't have access to most of
the experiment targets / guest systems necessary for testing. Please
test your own experiments if possible, or at least leave me a note
that you couldn't test it!
Especially the cored-voter/experiment.cc update may be broken, but
maybe the "FISHY" +2 in there was not OK in the first place.
Change-Id: I0c5daeabc8fe6ce0c3ce3e7e13d02195f41340ad
BochsController::restore() now recreates a state more expectable from
the experiment. The state is now the same that save() leaves behind
in its most prominent use case after hitting a breakpoint. This
change breaks backwards compatibility with some experiments, see
below!
Right after a breakpoint on a specific address fired and
BochsController::save() was called, another breakpoint on that
specific address would not fire again (unless that instruction is
executed again later on).
Up to this change, the situation after calling
BochsController::restore() was different: A breakpoint on that
specific address would fire twice. This difference led to the problem
that running the tracing plugin after save() would work fine
(recording the current instruction once, since 3dc752c "tracing: fix
loss of first dynamic instruction"), but running it after restore()
would record the current instruction *twice*.
This change aligns restore()'s behavior to that of save(). The
implications for existing experiments, traces and results are:
- Existing result data should be not affected at all, as
trace.time1/time2 were correct before this change. Nevertheless,
the assumption time2-time1 >= instr2-instr1 does not hold for
equivalence classes including the first instruction, if the latter
was faultily recorded twice (see below).
- Existing traces that were recorded after a restore() (with a
tracing plugin including the aforementioned commit 3dc752c)
contain the first instruction twice. An affected trace can be
corrected with this command line:
dump-trace old.tc | tail -n +2 | convert-trace -f dump -t new.tc
- For experiments that record traces after a restore() (such as
ecos_kernel_test), nothing changes, as both the tracing and the
fast-forwarding before the fault injection now see one instruction
event less.
- Experiments that record traces after a save(), especially those
that rely on the generic-tracing experiment for tracing, now see
one instruction event less, before they need to inject their
fault. These experiments need to be adjusted, for example
dciao-kernelstructs now should use bp.setCounter(injection_instr)
instead of bp.setCounter(injection_instr+1).
Change-Id: I913bed9f1cad91ed3025f610024d62cfc2b9b11b
BochsController::save() now can in principle be called multiple times
in a row. Not that this would really make sense, but the results are
consistent now.
Change-Id: Ib4c6eb571a364b0f7ea6142c8cfec004a12f98b3
BochsHelpers.hpp is included by some aspect headers, which are implicitly
included into many (all?) translation units. As in most TUs the "static
inline" defined getCPU function is not used, every time a "unused function"
warning was generated.
Change-Id: Ibb903fe7a11aaf1f455a626c8bf8b86f50857645
This fixes the resource-leaking "should never happen" case when no
element is found by returning a notfound member. Found by Coverity
Scan, CID 25555.
Change-Id: I9055ae0a3b31e61f3a8e3b098ec5613c3b5535f6
Only tracing the instruction pointer was broken, memory accesses were
always traced additionally. Found by Coverity Scan, CID 25495.
Change-Id: Ideb66175865c85bcd48f4b3786d5d8f16810d4f1
The contained state is not used over function boundaries anyways.
Found by Coverity Scan, CID 25689.
Change-Id: I34e42c227710be4859f6d62de9311c4201ed29b0
This most probably is not a real problem, but does not take much work
to fix. Found by Coverity Scan, in several reports.
Change-Id: I8bd12e3f7afeb4b1c4e1b057bdbd95da9aa9211c