Classes deriving from ExperimentData usually contain the
experiment-specific Protobuf message, which needs to be properly
destroyed. This is particularly a problem in the generic
DatabaseCampaign, as it never downcasts ExperimentData objects
retrieved from JobServer::getDone(). As the embedded
DatabaseCampaignMessage (usually named "fsppilot") is allocated on the
heap (this happens in the campaign's cb_send_pilot() function, asking
for a mutable_fsppilot()), the lack of a virtual destructor in
ExperimentData led to a memory leak, rendering the campaign server
inoperable after handling ~1E7 messages (with a 4GiB / 32-bit process
memory limit).
Change-Id: I4cb8a26d5a702e03189c4aae340051ce62a9c9ce
This prevents integer overflows when using addresses > 2GiB, which are
common for x86 operating systems with paging (Linux, Fiasco.OC) or
some test cases on the PandaBoard.
Note that this results in slightly different result table definitions
when automatically translating an experiment's protobuf message in the
DatabaseCampaign.
This change affects all existing protobuf messages to prevent
copy/paste propagation of this issue.
Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
To allow the JobServer to shutdown properly, the accept() loop in
JobServer::run() needs to regularly check whether we're done. This
change introduces a timed, non-blocking variant of accept() into
SocketComm to achieve this.
Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
Up until now the JobServer was silently losing jobs and only claiming to be
finished - a workaround for this was to restart the campaign until all jobs
were finished according to the database and the campaign's output.
This change fixes the underlying problem, so a single campaign-run suffices
and does no longer lose any jobs.
Debugging this was awful and took us quite some time...
Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c
This removes the ability to directly parse protobufs from the socket, because
google::protobuf::Message::ParseFromFileDescriptor() needs a EOF after each message;
thus preventing us from sending multiple Message objects over a single socket.
Change-Id: I67c0f631071470d6e0ae597e42848036a6db3656
During the prune step the data_width of the injected location was not
propagated before. It is now stored in fsppilot (database layout change!) and
sent in the fsppilot protobuf message.
Change-Id: I0562f6fc8957adea0f8a9fb63469ca5e3f4b7b2d
The variant/benchmark selection now can use SQL LIKE syntax, all unfinished
pilots from all selected variants are sent to the clients. E.g.:
./cored-voter-server -v x86-cored-voter -b simple-% -p basic
Will select the fsppilots in the variants:
- x86-cored-voter/simple-ip/basic
- x86-cored-voter/simple-instr/basic
The variant and benchmark information is now sent within the
fsppilot.
Change-Id: I287bfcddc478d0b79d89e156d6f5bf8188674532
Doxygen skips undesired directories and files now. In addition, the
documentation of the "fail" namespace has been fixed. Note that there
are still several warnings (due to incomplete documentations) in the
Doxygen output.
Change-Id: Idad4f1ecff453765b307fa40a5c1cebc0c2ce2bb
This prevents client and server from being sent a SIGPIPE (and
terminating) when the other side unexpectedly closes the connection.
It's way easier to handle this condition when checking the write()
return value, than to do anything smart in a SIGPIPE handler. More
details:
<http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly>
Change-Id: I1da5bf5ef79c8b7b00ede976e96ed4f1c560049d
This adds an interface for a backend-specific notion of time, e.g. CPU
cycles since simulator start, and a concrete implementation for the
Bochs backend. This is needed to record CPU idle times (e.g., HLT
instruction), and for target backends capable of more timing-accurate
execution.
This change also modifies the tracing plugin to add the time to all
trace events.
Change-Id: I93ac1d54c07f32b0b8f84f333417741d8e9c8288
The dciao-kernelstructs experiment does a trace imported by the
DCiAOKernelImporter:
bin/import-trace -t trace.pb -i DCiAOKernelImporter --elf-file app.elf
Pruned by the basic method:
bin/prune-trace
and does CiAO fault injection experiments, where the results are
stored in the database.
Change-Id: I485dc2e5097b3ebaf354241f474ee3d317213707
The DatabaseCampaign interacts with the MySQL tables that are created
by the import-trace and prune-trace tools. It does offer all
unfinished experiment pilots from the database to the
fail-clients. Those clients send back a (by the experiment) defined
protobuf message as a result. The custom protobuf message does have to
need the form:
import "DatabaseCampaignMessage.proto";
message ExperimentMsg {
required DatabaseCampaignMessage fsppilot = 1;
repeated group Result = 2 {
// custom fields
required int32 bitoffset = 1;
optional int32 result = 2;
}
}
The DatabaseCampaignMessage is the pilot identifier from the
database. For each of the repeated result entries a row in a table is
allocated. The structure of this table is constructed (by protobuf
reflection) from the description of the message. Each field in the
Result group becomes a column in the result table. For the given
example it would be:
CREATE TABLE result_ExperimentMessage(
pilot_id INT,
bitoffset INT NOT NULL,
result INT,
PRIMARY_KEY(pilot_id)
)
Change-Id: I28fb5488e739d4098b823b42426c5760331027f8
A campaign server now tells all clients a unique run ID (the UNIX timestamp
when it was started). This allows us to ignore results from "old" clients
that talked to another server before, and to tell them to die.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1677 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
This is a precaution to avoid current and future naming conflicts with
common system libraries. libutil (part of libc) is the first, but probably
not the last example that already caused trouble twice.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1614 8c4709b5-6ec9-48aa-a5cd-a96041d1645a
The FailBochs client is not linked by the Bochs build system anymore, but
by our cmake scripts (make fail-client):
- All Bochs libraries are merged into libfailbochs.a (a new target
within the Bochs Autotools scripts).
- The previous libfail.a is *not* a merge of all Fail* libraries anymore,
but pulls these in via library dependencies.
Additionally I did a lot of build system cleanup, e.g. additional external
libraries may now be pulled in where they're needed.
git-svn-id: https://www4.informatik.uni-erlangen.de/i4svn/danceos/trunk/devel/fail@1390 8c4709b5-6ec9-48aa-a5cd-a96041d1645a