Commit Graph

40 Commits

Author SHA1 Message Date
33d40df4bd DatabaseCampaign/-Experiment: add randomjump faults
Taken from experiments/erika-tester.

Change-Id: Ic1aa72d1bfc839009297892cf9e50b3edb53fef9
2020-05-23 22:52:00 +02:00
5d5927a88a DatabaseExperiment: add register FI
Calling the DatabaseCampaign with --inject-registers or
--force-inject-registers now injects into CPU registers.  This is achieved
by reinterpreting data addresses in the DB as addresses within the register
file.  (The mapping between registers and data addresses is implemented in
core/util/llvmdisassembler/LLVMtoFailTranslator.hpp.)  The difference
between --inject-registers and --force-inject-registers is what the
experiment does when a data address is not interpretable as a register: the
former option then injects into memory (DatabaseCampaignMessage,
RegisterInjectionMode AUTO), the latter skips the injection altogether
(FORCE).

Currently only compiles together with the Bochs backend; the
DatabaseExperiment's redecodeCurrentInstruction() function must be
moved into the Bochs EEA to remedy this.

Change-Id: I23f152ac0adf4cb6fbe82377ac871e654263fe57
2018-07-24 09:45:00 +02:00
9272c5cbed Move JobClient to Boost::asio as well
I did this mainly so server and client use a common networking API
IMO, using Boost::asio results in nicer name-lookup code.
Since no longer needed, I removed the SocketComm stuff.
The client is still synchronous; I see no benefit in having it
asynchronous.

I'm not super happy with the random backoff by the clients, if they
can't connect to the server. It makes the code really messy, 3 retries
is totally arbitrary, as is the backup windows. I believe launching
the server and clients in the correct order should be handled by a
launch script
Change-Id: Ifea64919fc228aa530c90449686f51bf63eb70e7
2018-05-09 17:41:52 +02:00
3ad42e270c fixes for Debian 9
- search for libdwarf.h in new locations (e.g., /usr/include/libdwarf/)
- build Bochs with -std=gnu++98 (gnu++14 is default since GCC 6.1)
- specify "proto2" syntax for protobuf messages
- minor build-system and C++ namespace fixes

Change-Id: I16dbc622c797ef8e936fe3c0fb9b03029d27529d
2017-08-01 14:12:03 +02:00
ad558abeb6 DatabaseCampaign/-Experiment: add burst faults
This change introduces the ability to inject burst faults to
the DatabaseCampaign/-Experiment and thus to all derived
campaigns/experiments.

Change-Id: I491d021ed3953562bd7c908e9de50d448bc8ef33
2016-03-11 19:01:17 +01:00
d58694521c dbcampaign: don't include fspmethod/variant ID in job msg
These IDs don't make sense by themselves but only after a lookup in the
database, which clients usually don't have (and don't need) access to.

Conflicts:
	src/core/comm/DatabaseCampaignMessage.proto.in

Change-Id: Ice739463552039b7fb48581722ea2e05984cea47
2015-01-21 14:53:32 +01:00
6ebd9b003a DatabaseExperiment: base class for distributed fail experiments
The DatabaseExperiment is a class a concrete experiment can inherit
from. It handles the communication with the campaign server. Does the
fast forward to the fault location, injects the fault and gives the
result over experiment outcome to the child class.

Change-Id: I1fb676da6c704cd570a638f0dfaadd4f1a9845e4
2014-10-22 17:30:07 +02:00
0ba03ca97e cmake: Set PROTOBUF_IMPORT_DIRS in toplevel dir
The variable PROTOBUF_IMPORT_DIRS has to be set in the toplevel
CMakeLists.txt, since the import path has to be available for all .proto
files within all subdirectories. Without this addition, the
GenericExperiment will fail to compile.

Change-Id: I676e0abd83bd1c5d247afcd33e7522e72da3dc2f
2014-10-22 16:35:17 +02:00
7ee105016c cmake/bochs: more strict dependency handling for libfailboch_external
The configure step for libfailbochs_external could be executed parallel
to other build steps, which required the files produced by the configure
step. Therefore a race-condition occurred. By giving the configure step
an explicit target name, more correct dependencies could be modeled
within bochs.cmake.

Change-Id: If2d7dafdace23b0eba6efcdff3ed0bfca2423048
2014-10-21 12:39:42 +02:00
c827750090 Merge branch 'failpanda'
Conflicts:
	src/core/comm/DatabaseCampaignMessage.proto.in
	src/core/cpn/CMakeLists.txt
	src/core/cpn/DatabaseCampaign.cc
	src/core/sal/ConcreteCPU.hpp
	src/core/sal/SALConfig.hpp
	src/core/util/CMakeLists.txt

Change-Id: Id86b93d0e3ea4d9963fcc88605eec0603575ec83
2014-06-03 12:24:49 +02:00
277958b31b cleanups
Change-Id: I8022d937477668253c613e97c3a579ae65084b1e
2014-06-03 11:47:20 +02:00
5ccc6e3525 comm: ExperimentData needs a virtual destructor
Classes deriving from ExperimentData usually contain the
experiment-specific Protobuf message, which needs to be properly
destroyed.  This is particularly a problem in the generic
DatabaseCampaign, as it never downcasts ExperimentData objects
retrieved from JobServer::getDone().  As the embedded
DatabaseCampaignMessage (usually named "fsppilot") is allocated on the
heap (this happens in the campaign's cb_send_pilot() function, asking
for a mutable_fsppilot()), the lack of a virtual destructor in
ExperimentData led to a memory leak, rendering the campaign server
inoperable after handling ~1E7 messages (with a 4GiB / 32-bit process
memory limit).

Change-Id: I4cb8a26d5a702e03189c4aae340051ce62a9c9ce
2014-02-25 13:32:56 +01:00
83ac40dfb5 comm: use unsigned trace positions in protobuf
Change-Id: I91927ce2c54989c724ea537a6bd344de8c467a16
2014-01-23 18:53:19 +01:00
e824e7a0fa cpn: Parsing of unsigned int fixed
As atoi caps the value of a unsigned int bigger than (2^31 - 1) other
than just letting it overflow to the corresponding negative value on
32Bit-integer machines, it must not be used for parsing to unsigned int.

TODO: Also apply this fix to all other unsigned values (in database)
which get parsed by atoi.

Change-Id: I96e29b14d36479ab6e567c527a40feb0b5fb14e5
2014-01-23 18:53:18 +01:00
8b5098abdd tools: added compute-hops and dump-hops tools
As these tools work closely together with fail components, its
easiest, to build them in this context. As these tools don't
really matter for fail use, they might never be pushed to the
master branch.

Change-Id: I8c8bd80376d0475f08a531a995d829e85032371b
2014-01-23 18:53:11 +01:00
c142818325 cpn: Generic wrapper for injection point
As for the pandaboard to navigate fast to the injection
instruction we need to deliver a hop chain to the fail-client,
this commit adds a generic wrapper for a injection point.
For now we have only the two options hop chain and instruction
offset, so it is activated via a cmake ON/OFF switch.

Change-Id: Ic01a07a30ac386d4316e6d6d271baf1549db966a
2014-01-22 18:04:29 +01:00
582459c5bb panda: non-returning openocd-loop at terminate
Previously for correct termination, the PandaController called
the finish-function of the openocd wrapper, invoked a coroutine
switch and waited for the openocd wrapper to finish up and switch
coroutine again, so the PandaController could exit with correct
exitStatus. Now the openocd-wrapper directly exits with chosen
exit status.

Change-Id: I8d318a4143c53340896ccee4d059a0d79fdcfe89
2014-01-22 17:43:31 +01:00
db0b82daca fail: modifications for pandaboard support
Change-Id: I52d3c9b9862b206a000394c45126f0afdfee081f
2014-01-22 17:43:31 +01:00
4cb97a7fa5 formatting, typos, comments, details
Change-Id: Iae5f1acb653a694622e9ac2bad93efcfca588f3a
2014-01-22 13:08:13 +01:00
7591c9edc5 Merge branch 'jobclientserver-fixes' 2014-01-22 13:07:59 +01:00
122eb8c9dc use uint32 for addresses in protobuf msgs
This prevents integer overflows when using addresses > 2GiB, which are
common for x86 operating systems with paging (Linux, Fiasco.OC) or
some test cases on the PandaBoard.

Note that this results in slightly different result table definitions
when automatically translating an experiment's protobuf message in the
DatabaseCampaign.

This change affects all existing protobuf messages to prevent
copy/paste propagation of this issue.

Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
2014-01-21 00:08:41 +01:00
73adc71437 jobserver: use non-blocking accept
To allow the JobServer to shutdown properly, the accept() loop in
JobServer::run() needs to regularly check whether we're done.  This
change introduces a timed, non-blocking variant of accept() into
SocketComm to achieve this.

Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
2014-01-20 22:48:08 +01:00
9c984b9704 fail/cpn: (Database)Campaign no longer loses jobs
Up until now the JobServer was silently losing jobs and only claiming to be
finished - a workaround for this was to restart the campaign until all jobs
were finished according to the database and the campaign's output.
This change fixes the underlying problem, so a single campaign-run suffices
and does no longer lose any jobs.
Debugging this was awful and took us quite some time...

Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c
2014-01-15 12:59:13 +01:00
abd9decf0b fail/cpn: removed USE_SIZE_PREFIX from SocketComm
This removes the ability to directly parse protobufs from the socket, because
google::protobuf::Message::ParseFromFileDescriptor() needs a EOF after each message;
thus preventing us from sending multiple Message objects over a single socket.

Change-Id: I67c0f631071470d6e0ae597e42848036a6db3656
2014-01-15 12:56:38 +01:00
d26fc28fa4 cpn/database: include data_width in the fsppilot during prune step
During the prune step the data_width of the injected location was not
propagated before. It is now stored in fsppilot (database layout change!) and
sent in the fsppilot protobuf message.

Change-Id: I0562f6fc8957adea0f8a9fb63469ca5e3f4b7b2d
2013-09-11 10:27:04 +02:00
9843b520c1 dbcampaign: select multiple variants/benchmark pairs
The variant/benchmark selection now can use SQL LIKE syntax, all unfinished
pilots from all selected variants are sent to the clients. E.g.:

./cored-voter-server  -v x86-cored-voter -b simple-% -p basic

Will select the fsppilots in the variants:

- x86-cored-voter/simple-ip/basic
- x86-cored-voter/simple-instr/basic

The variant and benchmark information is now sent within the
fsppilot.

Change-Id: I287bfcddc478d0b79d89e156d6f5bf8188674532
2013-07-05 10:19:58 +02:00
6d8b3331d8 doxygen: doc generation fixed
Doxygen skips undesired directories and files now. In addition, the
documentation of the "fail" namespace has been fixed. Note that there
are still several warnings (due to incomplete documentations) in the
Doxygen output.

Change-Id: Idad4f1ecff453765b307fa40a5c1cebc0c2ce2bb
2013-05-29 13:34:12 +02:00
880e7a81ff comm: ignore SIGPIPE
This prevents client and server from being sent a SIGPIPE (and
terminating) when the other side unexpectedly closes the connection.
It's way easier to handle this condition when checking the write()
return value, than to do anything smart in a SIGPIPE handler.  More
details:
<http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly>

Change-Id: I1da5bf5ef79c8b7b00ede976e96ed4f1c560049d
2013-04-29 15:32:12 +02:00
e5fe9dd525 core/sal: interface for backend-specific notion of time
This adds an interface for a backend-specific notion of time, e.g. CPU
cycles since simulator start, and a concrete implementation for the
Bochs backend.  This is needed to record CPU idle times (e.g., HLT
instruction), and for target backends capable of more timing-accurate
execution.

This change also modifies the tracing plugin to add the time to all
trace events.

Change-Id: I93ac1d54c07f32b0b8f84f333417741d8e9c8288
2013-04-10 13:00:49 +02:00
c24ed774b0 experiments/dciao-kernelstructs: new database driven experiment for DCiAO
The dciao-kernelstructs experiment does a trace imported by the
DCiAOKernelImporter:

   bin/import-trace -t trace.pb  -i DCiAOKernelImporter --elf-file app.elf

Pruned by the basic method:

   bin/prune-trace

and does CiAO fault injection experiments, where the results are
stored in the database.

Change-Id: I485dc2e5097b3ebaf354241f474ee3d317213707
2013-04-03 10:39:51 +02:00
f18cddc63c DatabaseCampaign: abstract campain for interaction with MySQL Database
The DatabaseCampaign interacts with the MySQL tables that are created
by the import-trace and prune-trace tools. It does offer all
unfinished experiment pilots from the database to the
fail-clients. Those clients send back a (by the experiment) defined
protobuf message as a result. The custom protobuf message does have to
need the form:

   import "DatabaseCampaignMessage.proto";

   message ExperimentMsg {
       required DatabaseCampaignMessage fsppilot = 1;

       repeated group Result = 2 {
          // custom fields
          required int32 bitoffset = 1;
          optional int32 result = 2;
       }
   }

The DatabaseCampaignMessage is the pilot identifier from the
database. For each of the repeated result entries a row in a table is
allocated. The structure of this table is constructed (by protobuf
reflection) from the description of the message. Each field in the
Result group becomes a column in the result table. For the given
example it would be:

    CREATE TABLE result_ExperimentMessage(
           pilot_id INT,
           bitoffset INT NOT NULL,
           result INT,
           PRIMARY_KEY(pilot_id)
    )

Change-Id: I28fb5488e739d4098b823b42426c5760331027f8
2013-04-02 09:52:42 +02:00
58c5ef98f4 typo-fix: ExperimentData.hpp
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@2020 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2013-01-24 13:26:22 +00:00
d7842c2ad7 The Jobclient can get several jobs with one request
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1963 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-11-30 16:50:02 +00:00
hsc
49d1608969 correct sanity checks for client/server communication
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1933 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-11-14 13:31:53 +00:00
hsc
61ab977f01 bugfix: read(2) returns 0 on EOF
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1925 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-11-13 00:17:25 +00:00
hsc
3dcc684400 read(2)/write(2) wrappers for reliable delivery
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1850 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-10-26 16:13:41 +00:00
hsc
7513dacad1 properly deal with clients that talked to another campaign server before
A campaign server now tells all clients a unique run ID (the UNIX timestamp
when it was started).  This allows us to ignore results from "old" clients
that talked to another server before, and to tell them to die.

git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1677 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-09-23 17:28:07 +00:00
hsc
f9c96ddf2d prefix internal libraries to avoid naming conflicts with system libraries
This is a precaution to avoid current and future naming conflicts with
common system libraries.  libutil (part of libc) is the first, but probably
not the last example that already caused trouble twice.

git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1614 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-09-12 07:52:30 +00:00
hsc
4a4b3ea7e2 FailBochs build process reversed
The FailBochs client is not linked by the Bochs build system anymore, but
by our cmake scripts (make fail-client):
 -  All Bochs libraries are merged into libfailbochs.a (a new target
    within the Bochs Autotools scripts).
 -  The previous libfail.a is *not* a merge of all Fail* libraries anymore,
    but pulls these in via library dependencies.

Additionally I did a lot of build system cleanup, e.g. additional external
libraries may now be pulled in where they're needed.

git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1390 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-06-29 22:22:41 +00:00
2575604b41 Fail* directories reorganized, Code-cleanup (-> coding-style), Typos+comments fixed.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1321 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
2012-06-08 20:09:43 +00:00