The dependency on fail-comm exists not only at compile time (the
latter is due to protobuf header generation).
Change-Id: I2bae51e763d9a385bda94e77df3e88619fa28a30
As openocd is able to read maximally 4-Byte sized chunks,
this will be done for performance improvement.
Change-Id: I79f85e580240f913b5a3d7b49bc0698390644ca8
If we halt in a watchpoint, we need to figure out, which one it was.
This will from now on be done by decoding the current instruction.
Change-Id: Ib62df0016c60044f2618af00e853b4373eb00bd7
As we want to use the cortex-m3 only for memory access, it should
always be halted. To achieve this, we need to halt it after every
reboot.
Change-Id: I5f0edf4986b65aea5a2aa59020247b9676de4dcb
Halting can now be done for cortex-a9 and cortex-m3 target with the halting
function which originally was only able to halt the cortex-a9 target.
Change-Id: I9ced64253405654c4155c8f776534bc7231387b2
As the cycle counter seems to be running forth, we need to halt it, to
get exact cycle count values.
Change-Id: Id85c052b88cec48b25ee0975ad47369587e08096
As we need hop chains to efficiently navigate to the injection
point on pandaboard, this campaign uses these. As we do not yet
have a component, which automatically navigates to a generic
InjectionPoint (API needs to be properly designed), we do this
explicitly.
Change-Id: I26ca6ebb3f05cde735f9641551a8ce5478e463f6
As for the pandaboard to navigate fast to the injection
instruction we need to deliver a hop chain to the fail-client,
this commit adds a generic wrapper for a injection point.
For now we have only the two options hop chain and instruction
offset, so it is activated via a cmake ON/OFF switch.
Change-Id: Ic01a07a30ac386d4316e6d6d271baf1549db966a
Single-stepping as in tracing sometimes fails in a long step-chain.
So we repeat the step until it really stepped. It can be observed, that
if openocd returns a step error, it never accidently steps nonetheless.
To ensure this behaviour, we could check for correct pc.
Change-Id: I05f82e2af0ca822cd6cd5571ffc3845f4e6a1d91
As non-gzipped trace-files cause the tool to always import zero events, the
input file is now openend as in the dump-trace tool, where opening non-gzipped
files obviously works fine.
Change-Id: If2575dbeb93ed657c7b8ddd9f14f41b5cc7bf7c6
Added opcode parser of the F.E.H.L.E.R-project for analysis of
memory access in mmu-abort handling, tracing, etc.
Change-Id: I5912fa4a4d51ee0501817c43bae05e87ac0e9b90
Added performance monitor hw-function cycle count.
Also fix for single-stepping exit, some additional register
exits and prevention of reboot failures.
Change-Id: I74196905dc39ecc14ae78366e7e1cb70ec7092f1
Polling of current target system state was done non-blocking until now.
Because of this, when the target was executing a longer time, the main-loop
was walked through several times and so unwanted state changes were
triggered. After this fix, the polling of execution state is blocking
in a while-loop until the target system hits any halting condition.
Also added some minor fixes
Change-Id: I4cbbef6eb6ff6ff8a3451affb8409a0df6a95fc5
Previously for correct termination, the PandaController called
the finish-function of the openocd wrapper, invoked a coroutine
switch and waited for the openocd wrapper to finish up and switch
coroutine again, so the PandaController could exit with correct
exitStatus. Now the openocd-wrapper directly exits with chosen
exit status.
Change-Id: I8d318a4143c53340896ccee4d059a0d79fdcfe89
In some cases the write-pilot is located at the upper boundary of the
experiment and thus is in a race situation with the experiment's end.
If the experiment's end occurs first, the campaign ends and complains
about missing data, otherwise everything is fine.
This patch circumvents this via using "the first" writing pilot; iff the
only write is located at the experiment's end, the race will still occur,
but cleverly written experiment code can, according to hsc, circumvent it.
Change-Id: I6a27a8c4770c04ea8dcaef8aa7bd85d18f43f0b5
Unfortunately this implicit dependency is currently not resolved anywhere
else (e.g., FindBoost.cmake), although the 'net heavily discusses this
issue.
Change-Id: I8a7c8518394cdba27e591fed250623011d988067
As 32-bit libc6 atoi() caps the value of unsigned ints bigger than
2^31-1 (instead of just letting it overflow to the corresponding
negative value, as on x86_64), it must not be used especially for the
conversion of 32-bit pointers.
Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f
This prevents integer overflows when using addresses > 2GiB, which are
common for x86 operating systems with paging (Linux, Fiasco.OC) or
some test cases on the PandaBoard.
Note that this results in slightly different result table definitions
when automatically translating an experiment's protobuf message in the
DatabaseCampaign.
This change affects all existing protobuf messages to prevent
copy/paste propagation of this issue.
Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
The new CLIENT_JOB_INITIAL configuration option allows to configure
the client to request more than one job in the first request round.
If a reasonable initial value is chosen, this removes the job ramp-up
after each fail-client restart, and slightly improves overall
throughput.
Change-Id: Idac2721264ec264c520d341fac64a8311a974708
The JobClient currently waits a LONG time until it really shuts down
after not having reached the server in sendResultsToServer() (which is
unfortunately the by far most probable point in the code to determine
this):
- A different bug (fixed in the previous commit) provoked the
situation that a (way) too large amount of jobs was fetched
before.
- sendResult() (called after each experiment iteration) realized
that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
prematurely call home to send first results (without planning to
get new jobs yet).
- If the server was gone (done, or aborted), connect in
sendResultsToServer() failed after several retries and timeouts.
- All subsequent calls to sendResult() retried connecting to the
server (again, with retries and timeouts), once for each remaining
job.
- When all jobs were done, getParam() tries to connect a last time,
finally telling the experiment that nobody's home.
This resulted in client shutdown times of up to four hours (for the
default CLIENT_JOB_LIMIT of 1000) after the campaign server
terminated. This change solves the issue by not handing out new
(cached) jobs after the connect failed once, making the experiment
terminate quickly.
Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
If we don't properly initialize the job timing statistics, the number
of jobs to be requested in the second request to the server is based
on the wrong timings. In our test case, CLIENT_JOB_LIMIT jobs were
requested at once.
Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4
Delay insertion of to-be-sent jobs into m_runningJobs until they are
really sent, as getMessage() won't work anymore (as in: segfault) if
this job is concurrently re-sent (due to campaign end), its result is
received, and deleted in the campaign. This becomes non-hypothetical
with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC.
Additionally, reinsert the remaining jobs into the input queue if
communication fails, instead of inefficiently delaying redistribution
until the campaign end.
Change-Id: If85e3c8261deda86beb8d4d93343429223753f22
Bounding the outgoing queue is always a good idea: If the campaign has
separate threads for outgoing and incoming jobs (true for the
DatabaseCampaign), this keeps memory requirements reasonable. If the
campaign works in a single thread, this is not disadvantageous either.
Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86
To allow the JobServer to shutdown properly, the accept() loop in
JobServer::run() needs to regularly check whether we're done. This
change introduces a timed, non-blocking variant of accept() into
SocketComm to achieve this.
Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
To avoid accessing destroyed resources in CommThreads talking to clients,
we need to properly join them on shutdown. The m_CommMutex becomes a
JobServer member to make sure it isn't destroyed before the JobServer
itself.
Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0
This change cleans up in/out queue synchronization in the job server.
End-of-jobs conditions are now properly signaled through the
SynchronizedQueue, allowing to resume and abort blocked readers when
no more input is expected.
Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906
The Fail* tools expect trace events to be ordered in a specific way:
memory-access events are supposed to come *after* the instruction
event for the instruction that caused them. Using a different order
may cause subtle problems with both fault-space pruning and fast
forwarding. This change introduces a warning message when such a
malformed trace is detected (i.e., when the instruction pointer of a
memory-access event does not match the preceding instruction event).
Change-Id: I8ae7420fd8ff26e2574590748bdcc5a63db76490
According to
<http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
(potentially) threaded clients should use the reentrant
libmysqlclient_r. This is just a precaution, I haven't seen any
issues with the normal libmysqlclient.
Change-Id: Icb29df6dd54eb666e3b43b73fbda406acccd11cb
According to
<http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
a MySQL connection handle must not be used concurrently with an open
result set and mysql_use_result() in one thread
(DatabaseCampaign::run()), and mysql_query() in another
(DatabaseCampaign::collect_result_thread()). This indeed leads to
crashes when bounding the outgoing job queue (SERVER_OUT_QUEUE_SIZE),
and maybe even more insidous effects in other cases. The solution is
to create separate connections for both threads.
Additionally, call mysql_library_init() before spawning any threads.
Change-Id: I2981f2fdc67c9a2cbe8781f1a21654418f621aeb
Up until now the JobServer was silently losing jobs and only claiming to be
finished - a workaround for this was to restart the campaign until all jobs
were finished according to the database and the campaign's output.
This change fixes the underlying problem, so a single campaign-run suffices
and does no longer lose any jobs.
Debugging this was awful and took us quite some time...
Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c