Commit Graph

945 Commits

Author SHA1 Message Date
805bede338 util: LLVM code cleanups
Among others, rename instr_info to instr to avoid shadowing the class
member with the same name.

Change-Id: I53d2ee08f11a944528931bf8cb4003ec64391016
2018-09-03 14:14:27 +02:00
527763e87f JobServer: remove "come again" diagnostic
The "--[Server] No workload, come again..." appears every time a
larger job set is loaded from the database, once for every client that
knocks.  This isn't helpful and scrolls out relevant information,
hence I'm removing it for now.

Change-Id: Ic7ca5b3a0c096b384ba4803df5b482a96bf803b1
2018-08-27 20:20:53 +02:00
8426084e5a CampaignManager: avoid parameter-name clash
The -p parameter is already being used by several campaign servers for
the prune method to restrict to (which was broken in commit
6c120004e), hence allow only --port to choose a different server TCP
port at runtime.

Change-Id: Ia30e40d564e85a9702118dc28df4988ec628e491
2018-08-27 15:08:17 +02:00
d11579db30 GenericTracing/-Experiment: fix missing dependency
Change-Id: Iec285afbd3315b3fb124e97a9ce0fb10b60e6f52
2018-08-09 10:59:27 +02:00
3a47b20df2 JobServer: use steady_clock for interval measurement
std::chrono::system_clock is not monotonic, instead use
std::chrono::steady_clock for interval measurements.

Change-Id: I231affecfe8e89481720e47b59132fc838cdf73c
2018-08-03 22:00:23 +02:00
a547b0d5b4 JobServer: print completion percentage and ETA
If the JobServer is provided a total number of experiments by the
campaign, it now prints a completion percentage and an estimated
remaining runtime along the usual progress reports.

Change-Id: Ibd781ba8bff9af3a85683bbd29728216e316da57
2018-08-03 19:53:45 +02:00
f89794329c JobServer: progress-report overhaul
The JobServer progress-report output now shows the total number of
completed jobs instead of the (almost always zero) inbound queue fill
level.  Additionally, the current number of incoming results per
second is shown, which also prepares for an ETA calculation in the
following commit.

Change-Id: I6b71c45f44b9e6b9b17c059959a90068b51c165c
2018-08-03 19:51:07 +02:00
453a6efe0b GenericExperiment: command-line --help overhaul
Change-Id: I8eff38043efcbeef0026c7a26dd6cc14fa6af673
2018-08-01 14:32:58 +02:00
a256e1c5af GenericExperiment: optionally continue if symbol not found
When prefixing a symbol name with '?', the GenericExperiment does not abort
in case the symbol is not found in the provided ELF binary:

fail-client -Wf,--detected-marker=?eddiErrorHandler
[...]
[GenericExperiment] ELF Symbol not found, ignoring: eddiErrorHandler

Change-Id: Iec12416ce8e38ff0ee1704e3a725c2cadc97b756
2018-08-01 14:19:05 +02:00
1c774ce50d JobClient: fix retry delay
Only wait for the retry delay if really retrying.

Change-Id: If12bd3745c799edc5933874d9a44d049646e0e87
2018-08-01 14:19:05 +02:00
00882f98ad JobClient: resolve endpoint only once
The JobClient now resolves the server IP once (lazily, when needed) instead
on each connect attempt, reducing the amount of DNS requests sent out.

Change-Id: I9804048d3252da333cb3addbe94a01fdf3c707c8
2018-07-31 12:33:52 +02:00
ad0640cedd GenericExperiment: fix output formatting
Change-Id: I42c49fbeb15cdebd3f77124554efb8c1f40f429f
2018-07-31 12:33:48 +02:00
742ec092eb DatabaseExperiment: fix output formatting
Change-Id: If882a9ec68b5d2d040d8a047c2b1ea53eea4c21f
2018-07-31 12:29:20 +02:00
2c7640fe90 import-trace: record stats on failed register mappings
The import-trace tool now systematically collects statistics on which
LLVM -> FAIL* register ID mappings failed during import, and presents
those after the import finished.

Change-Id: Ied67853d754483277868fe21bf2c6efeaeb60f09
2018-07-30 14:36:33 +02:00
d581fd27a2 GenericTracing: typo
Change-Id: I02b39a7ad0db49899dd602c1da472b76472da979
2018-07-30 14:20:48 +02:00
d370ded9b9 generic-experiment: generalize serial-output monitoring
The generic-experiment now learned to record and compare output on an
arbitrary serial port.  Using Bochs' port 0xe9 hack (parameter
--e9-file) is kept for compatibility reasons.

Change-Id: I5b1aa02d244e8b474919e1bdf043e523ea0e4f45
2018-07-27 21:12:41 +02:00
226545de58 util: LLVM test code output simplified
llvmDisTest now explicitly catches LLVMtoFailTranslator::notfound.

Change-Id: I45306212d45e00cfabb867159a13ce6d247e8e0f
2018-07-27 08:55:16 +02:00
eef19b80a0 FAIL* works with LLVM 3.9, 4.0, 5.0 or 6.0
Change-Id: I5480c3451daac7c8ea6160a9afe5ce557b73afb1
2018-07-27 08:55:09 +02:00
5d5927a88a DatabaseExperiment: add register FI
Calling the DatabaseCampaign with --inject-registers or
--force-inject-registers now injects into CPU registers.  This is achieved
by reinterpreting data addresses in the DB as addresses within the register
file.  (The mapping between registers and data addresses is implemented in
core/util/llvmdisassembler/LLVMtoFailTranslator.hpp.)  The difference
between --inject-registers and --force-inject-registers is what the
experiment does when a data address is not interpretable as a register: the
former option then injects into memory (DatabaseCampaignMessage,
RegisterInjectionMode AUTO), the latter skips the injection altogether
(FORCE).

Currently only compiles together with the Bochs backend; the
DatabaseExperiment's redecodeCurrentInstruction() function must be
moved into the Bochs EEA to remedy this.

Change-Id: I23f152ac0adf4cb6fbe82377ac871e654263fe57
2018-07-24 09:45:00 +02:00
54f3d3f9b6 x86: add amd64 registers
Floating-point related registers are still missing.

Change-Id: If0e0fa2b25cf2fda6e23aeddb3a72744e6c079a6
2018-07-24 09:24:45 +02:00
dd1b18e580 remove unused elfinfo/*
elfinfo was what ElfReader started from, but is not needed in itself
anymore.  The code has been mostly rewritten, so an explicit mention
of the original authors is not necessary anymore.

Change-Id: Iea48c80f9174504bbb56cc02ee2de5eda4a81489
2018-07-24 09:22:29 +02:00
9bd58cb294 ElfReader: read 64-bit ELF binaries
ElfReader now detects whether a 32- or 64-bit ELF is opened, and uses
the corresponding elf.h data structures.  Internally maps 32-bit ELF
structures onto 64-bit structures to use common processing code.

Change-Id: Ib42a4b21701aeadac7568e369a80c08f2807694e
2018-07-24 09:21:12 +02:00
3d292cb217 generic-tracing: add error handling
Instead of using assert() (which only does something in a Debug
build), explicitly fail when a user-specified symbol is not found.

Change-Id: I33ac59ca4483ee65ba70c264b5153a7766a919d2
2018-07-24 09:21:09 +02:00
e63f7376f8 JobClient: connect to IPv4 endpoints only
As long as the JobServer only listens on IPv4 endpoints, it makes no
sense to attempt a connect to an IPv6 endpoint on the client side.

(However, it's 2018 and we should also be capable of using IPv6 on
both the client and server side ...)

Change-Id: I9c3916466c350ce74a31cef3b6ae0e7ac56367c7
2018-07-24 09:16:33 +02:00
c5e0825c6f Database: reduce varchar cols to fit MyISAM indexes
MyISAM indexes are limited to 1000 bytes per index.  Recently, Linux
distros (e.g. Debian 9) started to default MariaDB installations to
utf8mb4, which can use up to 4 bytes per character.  Hence, two
varchar columns indexed in a single key have a total maximum length of
250.  Instead, we use some lower, round numbers.

Change-Id: I4b53bc217912bc7070102a0af4938763e61b041d
2018-07-24 09:16:33 +02:00
ff3a5fb498 move to LLVM 3.9
This change removes support for earlier LLVM versions; making them
work as well is simply too tedious.

Change-Id: I372a151279ceb2bfd6de101c9e0c15f0a4b18c03
2018-07-24 09:15:33 +02:00
baaa6c3ce8 JobClient/Server fixes
- Retain original CLIENT_RETRY_COUNT semantics after Boost::Asio
  switch
- JobClient is C++11 now, too
- Message reception copy/paste error fixes

Change-Id: I19c474b2a79cd2ac8657e8d58d6170202d096fb0
2018-05-09 17:43:28 +02:00
9272c5cbed Move JobClient to Boost::asio as well
I did this mainly so server and client use a common networking API
IMO, using Boost::asio results in nicer name-lookup code.
Since no longer needed, I removed the SocketComm stuff.
The client is still synchronous; I see no benefit in having it
asynchronous.

I'm not super happy with the random backoff by the clients, if they
can't connect to the server. It makes the code really messy, 3 retries
is totally arbitrary, as is the backup windows. I believe launching
the server and clients in the correct order should be handled by a
launch script
Change-Id: Ifea64919fc228aa530c90449686f51bf63eb70e7
2018-05-09 17:41:52 +02:00
9ae8123433 JobServer: fix C++14 dependency
The recent Boost.Asio overhaul requires C++14 features, not only C++11.

Change-Id: I6decf0e6532956f7061d8a9021ec2c8406679266
2018-05-03 16:28:26 +02:00
6f41ad73d3 util: MemoryMap test failure more verbose
Change-Id: Ie42e1983d8cc5658b7e88d59cdbe689e6aefe9f2
2018-05-03 15:24:52 +02:00
6c120004eb Use boost-asio to improve FAIL* server performance
This patch overhauls the FAIL* server code to leverage Boost asio to be able to
handle a large number of clients (>4000). In this implementation the server is
now single threaded. I've not encountered any problems with this for up to
about 10k clients. Boost ASIO can also be used multithreaded, but I assume the
FAIL* internal data structures (Synchronized*) will become a bottleneck first.

The code now additionally depends on Boost Coro and Boost Context, as well as
a C++ 14 compiler, although the only C++14 feature required is a lambda capture
with initializer, such as [ x = std::move(x) ]. gcc-4.9.2 does this.

The code could (and probably should) be cleaned up more. Comments are wordy,
code is unnecessary now (multiple server threads), code is not self-contained
(headers spread dependencies), many ifdef's (server performance measuring
should be runtime rather than a compile time option), and much more. But for
this patch I was going for a minimal changeset the get the functionality in,
to have an easier review. Alas, FAIL* has no Unit-test suite to run the changes
against.

To handle such a large number of clients more changes were necessary, for
example server status output is now performed every 1s, instead for every
request.

The class Minion was removed completely; the only thing it was doing was
encapsulate an int.

The server has now a runtime-configurable port, or it can select a free port on
its own if none is specified. This requires the CampaignManager to add a port
argument and instantiate the JobServer dynamically.

Change-Id: Iad9238972161f95f5802bd2251116f8aeee14884
2017-09-15 06:26:14 +02:00
3ad42e270c fixes for Debian 9
- search for libdwarf.h in new locations (e.g., /usr/include/libdwarf/)
- build Bochs with -std=gnu++98 (gnu++14 is default since GCC 6.1)
- specify "proto2" syntax for protobuf messages
- minor build-system and C++ namespace fixes

Change-Id: I16dbc622c797ef8e936fe3c0fb9b03029d27529d
2017-08-01 14:12:03 +02:00
d0d62de3f4 sal: remove perf dependency to watchpoints/breakpoints
This change removes the hard compile-time dependency from the
performance-improving dedicated listener-list implementation
(core/sal/perf/) to basic watchpoints / breakpoints being enabled in
the cmake config.  This allows to keep the CONFIG_FAST_* switches
enabled in practically every experiment.

The primary reason for this change was the recent insight that enabled
breakpoints with disabled CONFIG_FAST_BREAKPOINTS can massively slow
down an experiment even if the latter does not use a single breakpoint
itself.

Change-Id: I5e3f5c1632ed1ee98a3ec887f18b174fa0e15773
2016-12-03 17:49:07 +01:00
3c6502a111 ecos_kernel_test: check if addr_errors_corrected is mapped before access
Change-Id: I08e751feeffc41a51312b8a9ad4b28a57a45a487
2016-09-09 11:26:07 +02:00
85844b86cc ecos_kernel_test: compare serial output for coptermock benchmark
Change-Id: Ic4f13035d55c811bda7fa020114141b816a11ed8
2016-08-29 10:50:35 +01:00
436930de71 rampage: fix integer overflow
Change-Id: I18ee65335efd0207c27da9524d74be5d5a575329
2016-08-01 16:39:42 +02:00
2aeded20be rampage: link correctly with Bochs
Change-Id: I7a0231c6b6e8983f86b94c2bfde78d2524dbfc8d
2016-08-01 16:36:17 +02:00
d3d2faf680 globally rename Fail* to FAIL*
Change-Id: Ief2cb687cc69dd92c2e04f9314f0f1347e0a84ed
2016-07-26 17:41:32 +02:00
449ac1a692 DatabaseExperiment: local debug helper code
Change-Id: Ibf42c93df26f6123edc867147621a011665e9c43
2016-03-11 20:59:01 +01:00
39b120f7ca GenericExperiment: record output during complete runtime
Before this change, the GenericExperiment only recorded port 0xe9 output
*after* the fault was injected.  When a fault was injected during the
workload's output loop, the output data before that point in time was
missing, and the experiment outcome was wrongly classified as SDC.

This change moves the logging activation to before the fast-forwarding
step (DatabaseExperiment::cb_before_fast_forward).  It also makes sure the
DatabaseExperiment only clears its own listeners instead of also touching
the SerialOutputLogger's one.

Change-Id: I66bda4ee318d271ddda6f7ade4e817bf9d14cf46
2016-03-11 20:59:01 +01:00
5bd7c4a9c5 GenericExperiment: limit output logger buffer
Limit the serial-output logger buffer to prevent overly large memory
consumption in case the target system ends up, e.g., in an endless loop.
The buffer is limited to (golden-run output size)+1 to be able to detect
the case when the target system makes a correct output but faultily adds
extra characters afterwards.

Change-Id: I50c082f8fb09a702d87ab83732ca3e3463c46597
2016-03-11 20:59:01 +01:00
e08deef9d5 GenericExperiment: prevent integer overflow
This change prevents an integer overflow in the memory-access listener
for WRITE_OUTERSPACE.  Instead of matching all addresses above
maxima_data, l_mem_outerspace never matched in the
generic-experiment's "--catch-write-outerspace" mode.

Change-Id: I8f4ee4515af3998b7c2a8e83c7a18306c26d8d66
2016-03-11 20:45:50 +01:00
cae6860e4e generic-tracing typo
Change-Id: Ie642487f3c30ec2b99d0d4ee469acb6a987e17c3
2016-03-11 19:01:30 +01:00
d46b81eb3d GenericTracing/-Experiment: add SDC detection
This change adds detection of SDCs to GenericTracing and
GenericExperiment via Bochs's I/O port E9.

Change-Id: Ie036aa97468b45cad94b6c8f73d1ef2d227547b2
2016-03-11 19:01:17 +01:00
ad558abeb6 DatabaseCampaign/-Experiment: add burst faults
This change introduces the ability to inject burst faults to
the DatabaseCampaign/-Experiment and thus to all derived
campaigns/experiments.

Change-Id: I491d021ed3953562bd7c908e9de50d448bc8ef33
2016-03-11 19:01:17 +01:00
bcf75bceee GenericExperiment: target ELF file now specifiable
Up until now only generic-tracing had the feature to directly
pass an ELF file to the experiment. generic-experiment lacked
that functionality and resorted to using the $FAIL_ELF_PATH
environment variable.
This change introduces the "--elf-file" command line argument
to generic-experiment.

Change-Id: Ie74de9e1781275ab247786856e13e412bac39224
2016-03-10 15:15:27 +01:00
bfcb1b415b experiments: experiment for tracing and testing ERIKA
ERIKA Enterprise is a OSEK conforming embedded RTOS. The supplied tracer
and experiment are similar to the cored-{tracing,tester} experiment, but
checks the integrity of the RTOS application in a different
manner. Stacks and stackpointers are located differently in ERIKA. This
experiment was used in RTAS'15 Hoffmann et al.

Change-Id: Idc8d874eb4d4ef15837f903270cfa521bc9514a2
2015-09-18 12:51:57 +02:00
1e572faa04 dosek: merge trace and test experiment
With the instantiate-indirect.ah method, we can choose between different
experiment flows at runtime. By this, we can combine tracing and actual
injection into one fail-client binary. A -Wf,--mode={tester,tracer}
switch does hand the control to different experiment flows.

Change-Id: Ia268489ff6bc74dffea745b7aedcb36e262e8079
2015-09-18 12:51:57 +02:00
f73008f60a cored-tester: adapt the cored tester experiment
For redoing the bench-coptermock-isorc experiment, we have to change the
timeout settings. We now use a soft timeout setting. A soft timeout is
resetted after each checkpoint event. If a hard timeout (2 seconds) is
reached although the soft timeout was resetted, we also abort the injection.

Change-Id: Ib7c2b1ad201641f47434a11d3273dde797e0012e
2015-09-18 12:51:56 +02:00
07b6275cbb plugin/checkpoint: add possibility to cache data
The checkpoint plugin is able to use dynamic values from within the
target to calculate its ranges. If the experiment injects faults within
those dynamic values, we will always get an digest error, even if it has
no influence on the outcome of the experiment or the integrity of the
data range.

Therefore, the plugin now provides a possiblity to cache those dynamic
values, before injecting them. This has to be done explicitly within the
experiment.

Change-Id: Ib2cbedd570ea9ab9c97efc152279e8eb79c573f4
2015-09-18 12:51:56 +02:00