The JobClient currently waits a LONG time until it really shuts down
after not having reached the server in sendResultsToServer() (which is
unfortunately the by far most probable point in the code to determine
this):
- A different bug (fixed in the previous commit) provoked the
situation that a (way) too large amount of jobs was fetched
before.
- sendResult() (called after each experiment iteration) realized
that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
prematurely call home to send first results (without planning to
get new jobs yet).
- If the server was gone (done, or aborted), connect in
sendResultsToServer() failed after several retries and timeouts.
- All subsequent calls to sendResult() retried connecting to the
server (again, with retries and timeouts), once for each remaining
job.
- When all jobs were done, getParam() tries to connect a last time,
finally telling the experiment that nobody's home.
This resulted in client shutdown times of up to four hours (for the
default CLIENT_JOB_LIMIT of 1000) after the campaign server
terminated. This change solves the issue by not handing out new
(cached) jobs after the connect failed once, making the experiment
terminate quickly.
Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
If we don't properly initialize the job timing statistics, the number
of jobs to be requested in the second request to the server is based
on the wrong timings. In our test case, CLIENT_JOB_LIMIT jobs were
requested at once.
Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4
Delay insertion of to-be-sent jobs into m_runningJobs until they are
really sent, as getMessage() won't work anymore (as in: segfault) if
this job is concurrently re-sent (due to campaign end), its result is
received, and deleted in the campaign. This becomes non-hypothetical
with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC.
Additionally, reinsert the remaining jobs into the input queue if
communication fails, instead of inefficiently delaying redistribution
until the campaign end.
Change-Id: If85e3c8261deda86beb8d4d93343429223753f22
Bounding the outgoing queue is always a good idea: If the campaign has
separate threads for outgoing and incoming jobs (true for the
DatabaseCampaign), this keeps memory requirements reasonable. If the
campaign works in a single thread, this is not disadvantageous either.
Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86
To allow the JobServer to shutdown properly, the accept() loop in
JobServer::run() needs to regularly check whether we're done. This
change introduces a timed, non-blocking variant of accept() into
SocketComm to achieve this.
Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
To avoid accessing destroyed resources in CommThreads talking to clients,
we need to properly join them on shutdown. The m_CommMutex becomes a
JobServer member to make sure it isn't destroyed before the JobServer
itself.
Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0
This change cleans up in/out queue synchronization in the job server.
End-of-jobs conditions are now properly signaled through the
SynchronizedQueue, allowing to resume and abort blocked readers when
no more input is expected.
Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906
According to
<http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
(potentially) threaded clients should use the reentrant
libmysqlclient_r. This is just a precaution, I haven't seen any
issues with the normal libmysqlclient.
Change-Id: Icb29df6dd54eb666e3b43b73fbda406acccd11cb
According to
<http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
a MySQL connection handle must not be used concurrently with an open
result set and mysql_use_result() in one thread
(DatabaseCampaign::run()), and mysql_query() in another
(DatabaseCampaign::collect_result_thread()). This indeed leads to
crashes when bounding the outgoing job queue (SERVER_OUT_QUEUE_SIZE),
and maybe even more insidous effects in other cases. The solution is
to create separate connections for both threads.
Additionally, call mysql_library_init() before spawning any threads.
Change-Id: I2981f2fdc67c9a2cbe8781f1a21654418f621aeb
Up until now the JobServer was silently losing jobs and only claiming to be
finished - a workaround for this was to restart the campaign until all jobs
were finished according to the database and the campaign's output.
This change fixes the underlying problem, so a single campaign-run suffices
and does no longer lose any jobs.
Debugging this was awful and took us quite some time...
Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c
This removes the ability to directly parse protobufs from the socket, because
google::protobuf::Message::ParseFromFileDescriptor() needs a EOF after each message;
thus preventing us from sending multiple Message objects over a single socket.
Change-Id: I67c0f631071470d6e0ae597e42848036a6db3656
"removed" unneccessary memory-mapping ("Step 0")
cleaned out ExperimentData - now consists only of fsppilot and resultset
resultset now contains bitoffset which is part of result-table's primary key
adapted code to work with msg.fsppilot() instead of ExperimentData-values
Change-Id: I3b310e7a71d4b28479028250cd5722b3b2ce9f8c
Although we know that a known_outcome=1 pilot does not exhibit
behavior different from the golden run, the database schema does not
yet know what this behavior looks like (in terms of result-table
column values). In order to be able to JOIN valid results for all
memory writes in the trace table (fspgroup maps them all onto *one*
pilot per variant), we need to run these experiments, too.
Additionally, don't join the fspgroup table; we only need this one for
result calculations afterwards.
Change-Id: Idcd2991274fede84526b1eee68a231774625d11a
As non-gzipped trace files cause import-trace to always import zero
events, the input file is now openend as in the dump-trace tool, where
opening non-gzipped files obviously works fine.
In the medium term we should find a centralized solution for this,
instead of re-implementing it all over the place.
Change-Id: I75845c03c0bbdc2b6b578b83d492b7dbbb40f051
With the recent updates to record one additional instruction at the trace
start, I broke memory-map handling (restrictMemoryAddresses() and
restrictInstructionAddresses()). This change repairs this functionality.
Change-Id: I0daf9f474d0efe3f8e30a168c0ccc1e993e7ddc6
Listens on a configurable SUT's global variable.
On read access a signal pattern value is calculated and sent back
to the SUT.
Currently, only a superimposable sine wave signal form is implemented.
Further signal forms can be implemented by inheriting from the
abstract SignalForm class.
Change-Id: I2e6cf49cd44797999691c9e9cf0c54dd3c96875e
Logs access to a given global variable of the SUT, given by
a symbol name, and outputs value when variable is written to file.
Format:
<Simulation time>;<Value of variable>
Change-Id: I81b581e571be4255a1a2200c41e7c16657ddfd3d
Add two new breakpoints to L4Sys experiment that allow detecting that
execution terminated with an error: vga_console_blink() is called by the
kernel if JDB was entered (meaning we are hanging, e.g., due to an
assertion); also longjmp() is only used by PF handling code after no
valid page fault handling could be performed
Change-Id: Ice61039c4bd07815a316bbc0bdb39f3483d9a1da
* after injecting a fault, track how many instructions it takes until
execution deviates from original execution
* also track what the first deviating EIP value is
Change-Id: I18a9250517ca90214728c2c4b036b412f5dbf224
When a register in the extended trace was dereferenced and the value
was smaller than the memory pool size, but the address was not mapped
an assertion occured and the tracing plugin terminated the
simulator. Now the dereferenced memory address is checked for being
mapped and not being smaller than the memory pool.
Change-Id: I9ac954988ef860969679f9f360814c5e4b66f473
The ElfImporter is not a real trace importer, but we locate it
into the import-trace utility, since here the infrastructure is
already in place to import things related to an elf binary into
the database.
The ElfImporter calls objdump and dissassembles an elf binary
and imports the results into the database.
Change-Id: I6e35673c8dbee3b7e8dfc7549d10e5dca9b55935
* introduce L4SYS_ADDRESS_SPACE_TRACE to indicate that we want
to trace instructions in a different AS from the one we are starting
the experiment in
* add CR3Run() to determine address space ID
Change-Id: I7bdaf1e858a6dd369af5175bd56e1b4e2d5f05ef
The internal m_iponly / m_memonly bools are a bit hackish; especially it's
unclear what should happen if both are set. The m_tracetype enum now
encompasses all possible configurations, while the plugin's user interface
remains unchanged.
Change-Id: Ibdd872b5cc5781836428b27bfb2db3825700e671