Commit Graph

1432 Commits

Author SHA1 Message Date
805bede338 util: LLVM code cleanups
Among others, rename instr_info to instr to avoid shadowing the class
member with the same name.

Change-Id: I53d2ee08f11a944528931bf8cb4003ec64391016
2018-09-03 14:14:27 +02:00
527763e87f JobServer: remove "come again" diagnostic
The "--[Server] No workload, come again..." appears every time a
larger job set is loaded from the database, once for every client that
knocks.  This isn't helpful and scrolls out relevant information,
hence I'm removing it for now.

Change-Id: Ic7ca5b3a0c096b384ba4803df5b482a96bf803b1
2018-08-27 20:20:53 +02:00
8426084e5a CampaignManager: avoid parameter-name clash
The -p parameter is already being used by several campaign servers for
the prune method to restrict to (which was broken in commit
6c120004e), hence allow only --port to choose a different server TCP
port at runtime.

Change-Id: Ia30e40d564e85a9702118dc28df4988ec628e491
2018-08-27 15:08:17 +02:00
d11579db30 GenericTracing/-Experiment: fix missing dependency
Change-Id: Iec285afbd3315b3fb124e97a9ce0fb10b60e6f52
2018-08-09 10:59:27 +02:00
3a47b20df2 JobServer: use steady_clock for interval measurement
std::chrono::system_clock is not monotonic, instead use
std::chrono::steady_clock for interval measurements.

Change-Id: I231affecfe8e89481720e47b59132fc838cdf73c
2018-08-03 22:00:23 +02:00
a547b0d5b4 JobServer: print completion percentage and ETA
If the JobServer is provided a total number of experiments by the
campaign, it now prints a completion percentage and an estimated
remaining runtime along the usual progress reports.

Change-Id: Ibd781ba8bff9af3a85683bbd29728216e316da57
2018-08-03 19:53:45 +02:00
f89794329c JobServer: progress-report overhaul
The JobServer progress-report output now shows the total number of
completed jobs instead of the (almost always zero) inbound queue fill
level.  Additionally, the current number of incoming results per
second is shown, which also prepares for an ETA calculation in the
following commit.

Change-Id: I6b71c45f44b9e6b9b17c059959a90068b51c165c
2018-08-03 19:51:07 +02:00
500d060376 import-trace: progress and summary report for FullTraceImporter
Change-Id: I13a4352f6addc972ce2e24768d4079780ed1f554
2018-08-01 14:32:58 +02:00
453a6efe0b GenericExperiment: command-line --help overhaul
Change-Id: I8eff38043efcbeef0026c7a26dd6cc14fa6af673
2018-08-01 14:32:58 +02:00
a256e1c5af GenericExperiment: optionally continue if symbol not found
When prefixing a symbol name with '?', the GenericExperiment does not abort
in case the symbol is not found in the provided ELF binary:

fail-client -Wf,--detected-marker=?eddiErrorHandler
[...]
[GenericExperiment] ELF Symbol not found, ignoring: eddiErrorHandler

Change-Id: Iec12416ce8e38ff0ee1704e3a725c2cadc97b756
2018-08-01 14:19:05 +02:00
1c774ce50d JobClient: fix retry delay
Only wait for the retry delay if really retrying.

Change-Id: If12bd3745c799edc5933874d9a44d049646e0e87
2018-08-01 14:19:05 +02:00
00882f98ad JobClient: resolve endpoint only once
The JobClient now resolves the server IP once (lazily, when needed) instead
on each connect attempt, reducing the amount of DNS requests sent out.

Change-Id: I9804048d3252da333cb3addbe94a01fdf3c707c8
2018-07-31 12:33:52 +02:00
ad0640cedd GenericExperiment: fix output formatting
Change-Id: I42c49fbeb15cdebd3f77124554efb8c1f40f429f
2018-07-31 12:33:48 +02:00
742ec092eb DatabaseExperiment: fix output formatting
Change-Id: If882a9ec68b5d2d040d8a047c2b1ea53eea4c21f
2018-07-31 12:29:20 +02:00
68229afa84 import-trace: fix same-address symbol import
This bugfix makes sure that from a set of symbols with the same
address, only the first one gets imported.

After an assessment whether analysis scripts can deal with multiple
symbols at the same address, an import of all symbols should be made
possible in the future.  This will also require to relax the
primary-key constraint of the `symbols' table.

Change-Id: I61c4ddb1af1556d44eab54e53eaa3d0fc20de7c1
2018-07-30 16:00:48 +02:00
2c7640fe90 import-trace: record stats on failed register mappings
The import-trace tool now systematically collects statistics on which
LLVM -> FAIL* register ID mappings failed during import, and presents
those after the import finished.

Change-Id: Ied67853d754483277868fe21bf2c6efeaeb60f09
2018-07-30 14:36:33 +02:00
d581fd27a2 GenericTracing: typo
Change-Id: I02b39a7ad0db49899dd602c1da472b76472da979
2018-07-30 14:20:48 +02:00
d370ded9b9 generic-experiment: generalize serial-output monitoring
The generic-experiment now learned to record and compare output on an
arbitrary serial port.  Using Bochs' port 0xe9 hack (parameter
--e9-file) is kept for compatibility reasons.

Change-Id: I5b1aa02d244e8b474919e1bdf043e523ea0e4f45
2018-07-27 21:12:41 +02:00
226545de58 util: LLVM test code output simplified
llvmDisTest now explicitly catches LLVMtoFailTranslator::notfound.

Change-Id: I45306212d45e00cfabb867159a13ce6d247e8e0f
2018-07-27 08:55:16 +02:00
45c7906d41 import-trace: cleanup
Change-Id: I9f658c1bb9881fd1ef70f1744b6a2e2c36ad7142
2018-07-27 08:55:16 +02:00
eef19b80a0 FAIL* works with LLVM 3.9, 4.0, 5.0 or 6.0
Change-Id: I5480c3451daac7c8ea6160a9afe5ce557b73afb1
2018-07-27 08:55:09 +02:00
5d5927a88a DatabaseExperiment: add register FI
Calling the DatabaseCampaign with --inject-registers or
--force-inject-registers now injects into CPU registers.  This is achieved
by reinterpreting data addresses in the DB as addresses within the register
file.  (The mapping between registers and data addresses is implemented in
core/util/llvmdisassembler/LLVMtoFailTranslator.hpp.)  The difference
between --inject-registers and --force-inject-registers is what the
experiment does when a data address is not interpretable as a register: the
former option then injects into memory (DatabaseCampaignMessage,
RegisterInjectionMode AUTO), the latter skips the injection altogether
(FORCE).

Currently only compiles together with the Bochs backend; the
DatabaseExperiment's redecodeCurrentInstruction() function must be
moved into the Bochs EEA to remedy this.

Change-Id: I23f152ac0adf4cb6fbe82377ac871e654263fe57
2018-07-24 09:45:00 +02:00
8adc859223 import-trace: fail gracefully if --elf is missing but needed
Change-Id: Ib154326507e307b65099f1b84c44e796b1aef98a
2018-07-24 09:24:47 +02:00
54f3d3f9b6 x86: add amd64 registers
Floating-point related registers are still missing.

Change-Id: If0e0fa2b25cf2fda6e23aeddb3a72744e6c079a6
2018-07-24 09:24:45 +02:00
dd1b18e580 remove unused elfinfo/*
elfinfo was what ElfReader started from, but is not needed in itself
anymore.  The code has been mostly rewritten, so an explicit mention
of the original authors is not necessary anymore.

Change-Id: Iea48c80f9174504bbb56cc02ee2de5eda4a81489
2018-07-24 09:22:29 +02:00
9bd58cb294 ElfReader: read 64-bit ELF binaries
ElfReader now detects whether a 32- or 64-bit ELF is opened, and uses
the corresponding elf.h data structures.  Internally maps 32-bit ELF
structures onto 64-bit structures to use common processing code.

Change-Id: Ib42a4b21701aeadac7568e369a80c08f2807694e
2018-07-24 09:21:12 +02:00
3d292cb217 generic-tracing: add error handling
Instead of using assert() (which only does something in a Debug
build), explicitly fail when a user-specified symbol is not found.

Change-Id: I33ac59ca4483ee65ba70c264b5153a7766a919d2
2018-07-24 09:21:09 +02:00
c11547a952 visualfail: migrate to PHP 7
The "mysql" extension is gone, using "mysqli" now.  Plus tiny
syntax/operator precendence changes.

Change-Id: Icad4329521a2a42d7cd73408588d210431ec04c6
2018-07-24 09:21:04 +02:00
385830969c faultspaceplot: add sanity check
faultspaceplot.sh now fails gracefully if the requested
variant/benchmark combination does not exist in the database.

Change-Id: Ied3b5a0e72cc5ae8e6ce352b65486f15bb13576b
2018-07-24 09:20:37 +02:00
e64fd740fe data-aggregator: EAFC+coverage from sampling
This change adds global fault-coverage and occurrence count
measurement scripts that work with sampling results.

Change-Id: I14d94a2c549cff3256fc7b0800cfd4a702e6ad35
2018-07-24 09:19:50 +02:00
0baca64468 data-aggregator: document "onwrite" fault model requirements
The *-onwrite.sh analysis scripts only work if import-trace was not
run with --no-write-ecs, i.e. they only work if writing memory
accesses were imported into the "trace" table.

Change-Id: Icb2ea4e72d2200c886d4f9074f2da0f9bfd6ac85
2018-07-24 09:16:33 +02:00
3abdc51043 data-aggregator: more script renamings
The terms "occurrences" and "coverage" contradict each other.

Change-Id: I2c3b3a2fa046f9ab77ea4bec9a13eafa2ebfba58
2018-07-24 09:16:33 +02:00
a88b014578 data-aggregator: rename resulttype-* to global-*
This is more in line with the other scripts' names.

Change-Id: I8f645a3b93bce60fe167eeb93bb8c8e285f4038a
2018-07-24 09:16:33 +02:00
f5b34a962c data-aggregator: fix alphabetic resulttype sorting
Depending on SQL-statement nesting, some scripts already correctly sorted
resulttypes alphabetically, but some sorted along the numeric ENUM value
behind the resulttype name.  This change explicitly converts the resulttype
to a string before sorting.

Change-Id: Ia18aa4e75b94a6a9f7bb125953bc85b86b3cbd6e
2018-07-24 09:16:33 +02:00
27b697200b data-aggregator: specifically limit to fspmethod 'basic'
In their current implementation, the data-aggregator scripts do not work
correctly on sampling results.

Change-Id: I1035970b352f513d725bd1a40ac9262368ffbcc0
2018-07-24 09:16:33 +02:00
e63f7376f8 JobClient: connect to IPv4 endpoints only
As long as the JobServer only listens on IPv4 endpoints, it makes no
sense to attempt a connect to an IPv6 endpoint on the client side.

(However, it's 2018 and we should also be capable of using IPv6 on
both the client and server side ...)

Change-Id: I9c3916466c350ce74a31cef3b6ae0e7ac56367c7
2018-07-24 09:16:33 +02:00
c5e0825c6f Database: reduce varchar cols to fit MyISAM indexes
MyISAM indexes are limited to 1000 bytes per index.  Recently, Linux
distros (e.g. Debian 9) started to default MariaDB installations to
utf8mb4, which can use up to 4 bytes per character.  Hence, two
varchar columns indexed in a single key have a total maximum length of
250.  Instead, we use some lower, round numbers.

Change-Id: I4b53bc217912bc7070102a0af4938763e61b041d
2018-07-24 09:16:33 +02:00
c88c034ca7 cmake: default build type 'Release'
+Make available build types explicit (pull-down in CMake GUI)

Change-Id: Ib2cdd31ad038cef1bb27fcd14f089a35a9751e76
2018-07-24 09:16:33 +02:00
be0b7b630c doc update
Change-Id: Ie8f9011b7718c971de74ab40689c9de7fbeb3b18
2018-07-24 09:16:33 +02:00
ff3a5fb498 move to LLVM 3.9
This change removes support for earlier LLVM versions; making them
work as well is simply too tedious.

Change-Id: I372a151279ceb2bfd6de101c9e0c15f0a4b18c03
2018-07-24 09:15:33 +02:00
baaa6c3ce8 JobClient/Server fixes
- Retain original CLIENT_RETRY_COUNT semantics after Boost::Asio
  switch
- JobClient is C++11 now, too
- Message reception copy/paste error fixes

Change-Id: I19c474b2a79cd2ac8657e8d58d6170202d096fb0
2018-05-09 17:43:28 +02:00
9272c5cbed Move JobClient to Boost::asio as well
I did this mainly so server and client use a common networking API
IMO, using Boost::asio results in nicer name-lookup code.
Since no longer needed, I removed the SocketComm stuff.
The client is still synchronous; I see no benefit in having it
asynchronous.

I'm not super happy with the random backoff by the clients, if they
can't connect to the server. It makes the code really messy, 3 retries
is totally arbitrary, as is the backup windows. I believe launching
the server and clients in the correct order should be handled by a
launch script
Change-Id: Ifea64919fc228aa530c90449686f51bf63eb70e7
2018-05-09 17:41:52 +02:00
191219ad06 data-aggregator: variant-durations.sh w/o filter
Change-Id: I7a3164635fc2fbd65d99fc8bba66e956d505a515
2018-05-09 15:25:45 +02:00
42d6ff4a97 data-aggregator: "on write" fault model metrics
Change-Id: I784618fd4b3a0074153ce074957b57e363c54657
2018-05-09 15:25:45 +02:00
bbe60745e1 data-aggregator: script overhaul + modularization
Change-Id: I4353db1475f00956d19d91c8c558c34506ec836b
2018-05-09 15:25:45 +02:00
9ae8123433 JobServer: fix C++14 dependency
The recent Boost.Asio overhaul requires C++14 features, not only C++11.

Change-Id: I6decf0e6532956f7061d8a9021ec2c8406679266
2018-05-03 16:28:26 +02:00
5a5a99145c bochs: fix ac++-caused preprocessor namespace clash
When building with an experiment activated, the generated
instantiate-<experimentname>.ah gets included in each and every FAIL*
translation unit including Bochs's ones.  In the case of the
generic-experiment (and probably many others), this indirectly included
Google protobuf headers, which failed to compile for Bochs's gui/wx.cc and
gui/x.cc: The included X headers pollute the preprocessor namespace by
an internal protobuf "Status" class.

Change-Id: I613f5c792a9519cf2573eddc7fef6266c7168494
2018-05-03 16:26:13 +02:00
6f41ad73d3 util: MemoryMap test failure more verbose
Change-Id: Ie42e1983d8cc5658b7e88d59cdbe689e6aefe9f2
2018-05-03 15:24:52 +02:00
4a068792e8 fixes for Ubuntu 17.10
- Bochs: wx_gtk3 needs g(d|t)k2

Change-Id: I0a014e3ce7f1d40d215d5309e842db618a2971ed
2018-03-01 15:57:24 +01:00
6c120004eb Use boost-asio to improve FAIL* server performance
This patch overhauls the FAIL* server code to leverage Boost asio to be able to
handle a large number of clients (>4000). In this implementation the server is
now single threaded. I've not encountered any problems with this for up to
about 10k clients. Boost ASIO can also be used multithreaded, but I assume the
FAIL* internal data structures (Synchronized*) will become a bottleneck first.

The code now additionally depends on Boost Coro and Boost Context, as well as
a C++ 14 compiler, although the only C++14 feature required is a lambda capture
with initializer, such as [ x = std::move(x) ]. gcc-4.9.2 does this.

The code could (and probably should) be cleaned up more. Comments are wordy,
code is unnecessary now (multiple server threads), code is not self-contained
(headers spread dependencies), many ifdef's (server performance measuring
should be runtime rather than a compile time option), and much more. But for
this patch I was going for a minimal changeset the get the functionality in,
to have an easier review. Alas, FAIL* has no Unit-test suite to run the changes
against.

To handle such a large number of clients more changes were necessary, for
example server status output is now performed every 1s, instead for every
request.

The class Minion was removed completely; the only thing it was doing was
encapsulate an int.

The server has now a runtime-configurable port, or it can select a free port on
its own if none is specified. This requires the CampaignManager to add a port
argument and instantiate the JobServer dynamically.

Change-Id: Iad9238972161f95f5802bd2251116f8aeee14884
2017-09-15 06:26:14 +02:00