Commit Graph

420 Commits

Author SHA1 Message Date
9df6d983bf util/llvmdisassembler: compile with -fno-rtti
For some reason, this is required even when LLVM is not built using
-fno-rtti.

Change-Id: I992799c8b54135a0a87b2de7c4a3d57f2d3670d9
2014-02-26 14:46:23 +01:00
5ccc6e3525 comm: ExperimentData needs a virtual destructor
Classes deriving from ExperimentData usually contain the
experiment-specific Protobuf message, which needs to be properly
destroyed.  This is particularly a problem in the generic
DatabaseCampaign, as it never downcasts ExperimentData objects
retrieved from JobServer::getDone().  As the embedded
DatabaseCampaignMessage (usually named "fsppilot") is allocated on the
heap (this happens in the campaign's cb_send_pilot() function, asking
for a mutable_fsppilot()), the lack of a virtual destructor in
ExperimentData led to a memory leak, rendering the campaign server
inoperable after handling ~1E7 messages (with a 4GiB / 32-bit process
memory limit).

Change-Id: I4cb8a26d5a702e03189c4aae340051ce62a9c9ce
2014-02-25 13:32:56 +01:00
5ee96032c9 jobserver: gracefully handle thread creation failures
Due to the previous DatabaseCampaign fix, this may not be necessary
anymore, but it's nevertheless a good idea to handle thread creation
failures properly.

Change-Id: I8317a77dd5338509727e737040944320e7755ae3
2014-02-25 13:32:56 +01:00
25a390970a DatabaseCampaign: avoid table locking
It is necessary to copy pilot IDs of existing results to a temporary table
before fetching undone jobs from the DB: Otherwise, due to MyISAMs
table-level locking, collect_result_thread() will block in INSERT (SHOW
PROCESSLIST state "Waiting for table level lock") until the (streamed)
pilot query finishes.  As one pilot query follows after the other,
collect_result_thread() may even starve until the memory for the
JobServer's "done" queue runs out, resulting in a crash and the loss of all
queued results.

Change-Id: Ib0ec5fa84db466844b1e9aa0e94142b4d336b022
2014-02-25 13:32:55 +01:00
bc2103c527 sal/bochs: don't show errors in non-verbose mode
The patched eCos variant we analyze intentionally overflows the 16550
UART FIFOs, flooding the terminal with Bochs error messages.  Enabling
CONFIG_BOCHS_NON_VERBOSE now also enforces ignoring error messages,
regardless of log verbosity settings in the bochsrc.

Change-Id: If14e2532234e61bf60720a45150ef4973e8d508b
2014-02-25 13:32:55 +01:00
1df43e9726 import-trace: major speedup
Using Database::insert_multiple() instead of prepared statements
speeds up trace import by a factor of 3-4.  While being there, we now
properly deal with nonexistent extended trace values (i.e., put NULLs
into the DB).

Side note: The ElfImporter should switch to insert_multiple(), too.

Change-Id: I96785e9775e3ef4f242fd50720d5c34adb4e88a1
2014-02-25 13:32:55 +01:00
58fa4c59cc sal/bochs: fix handling of unmapped memory
Up to now, BochsMemory::isMapped() always returned true in 32-bit protected
mode with a 4GB linear address space (as used by, e.g., eCos), even for
addresses greater than the configured memory size.  This led to lots of
bogus memory dereferences in the (extended) tracing plugin.

This change (a follow-up to commit 5171645) additionally checks the return
value of getHostMemAddr(), and announces BX_RW (read/write access) instead
of BX_READ as the intended type of memory access.  In the aforementioned
scenario, memory addresses greater than the memory size are now correctly
detected as "not mapped".

Change-Id: Ic2fa7554c869cb90191164535a601bae4dbb49b6
2014-02-17 23:24:16 +01:00
4b921a5fe3 util: MemoryMap test
Change-Id: I54680685326a85cfd723a47e8aef8d71662c9aeb
2014-01-30 15:26:20 +01:00
4bcce14659 util: space-efficient MemoryMap
We now use boost::icl::interval_set internally, consuming extremely
lower amounts of memory.  boost::icl was introduced with Boost 1.46;
Debian 7.0 comes with 1.49, so this dependency should be no problem
anymore.

Both the class interface and the memory-map file format stay the same.

Change-Id: I38e8148384c90aa493984d0f6280817df00f1702
2014-01-30 15:26:12 +01:00
119ae40be9 util/Database: added a wrapper function for mysql_real_escape_string()
Change-Id: I999aad3c35c5f389fa3acfe8d7a11c417c478787
2014-01-28 11:07:34 +01:00
13175c259b import-trace: import debug info
If the --debug option is set, the line number table of the elf binary will
be imported into the database. The information will be stored in the
"dbg_mapping" table.

If the --sources option is set, the source files will be imported
into the database. Only the files that were actually used in the
elf binary will be imported.

Change-Id: I0e9de6b456bc42b329c1700c25e5839d9552cdbb
2014-01-28 11:07:34 +01:00
c48c7296fb util/WallclockTimer: bugfix: include ostream
This only compiled everywhere because all users included (i)ostream.

Change-Id: I29b0fb13a01606fdffd8ebdb9701eff652065916
2014-01-24 20:33:32 +01:00
85e3911202 Merge branch 'ubuntu-saucy-fixes' 2014-01-24 17:02:44 +01:00
83ac40dfb5 comm: use unsigned trace positions in protobuf
Change-Id: I91927ce2c54989c724ea537a6bd344de8c467a16
2014-01-23 18:53:19 +01:00
eab469192c cpn: regard equivalence classes of length 1
Previously the code did not handle equivalence classes, which consist
only of one instruction (length 1). As these classes for example
come up at two consecutive read instructions, we have to handle them.

Change-Id: Ib9e475a782828a380dfc79f5b390ca9192f4b8e3
2014-01-23 18:53:19 +01:00
286a28e7ae smarthops: calculate target instruction & allow zero input
As we might need information of target instruction (in case of
checkpoint, etc.) this information is now added to the output
protobuf message.

Trace-Events are generated also for position zero, so this case is
also regarded.

Change-Id: I69ff4818e7f8d6771923802f65bf0aa1b81883c5
2014-01-23 18:53:19 +01:00
ba765c16c2 cpn: pruning-aware injection points
As we gain some degrees of freedom in choice of the specific
injection instruction offset, this can be used to minimize
navigational costs. This is a first approach towards pruning-aware
injection points.

To do so, we need to modify the sql query, which gets the pilots,
so we additionally join with the trace table to get begin and
end information for equivalence classes, which are feeded into
the creation of InjectionPoints.

Change-Id: I343b712dfcbed1299121f02eee9ce1b136a7ff15
2014-01-23 18:53:19 +01:00
d7a9a2811d cpn: Not every InjectionPointHops calcs smart-hops
As the InjectionPoint is considered to be a container for abstract
"points in time" which can be navigated to, not every object of
a InjectionPointHops needs a smart-hopping calculator.

Change-Id: I150a46cf79a2b9d8ddb2d24a6d89dc3d4246cdb3
2014-01-23 18:53:19 +01:00
e824e7a0fa cpn: Parsing of unsigned int fixed
As atoi caps the value of a unsigned int bigger than (2^31 - 1) other
than just letting it overflow to the corresponding negative value on
32Bit-integer machines, it must not be used for parsing to unsigned int.

TODO: Also apply this fix to all other unsigned values (in database)
which get parsed by atoi.

Change-Id: I96e29b14d36479ab6e567c527a40feb0b5fb14e5
2014-01-23 18:53:18 +01:00
8b5098abdd tools: added compute-hops and dump-hops tools
As these tools work closely together with fail components, its
easiest, to build them in this context. As these tools don't
really matter for fail use, they might never be pushed to the
master branch.

Change-Id: I8c8bd80376d0475f08a531a995d829e85032371b
2014-01-23 18:53:11 +01:00
17e76c140b cpn: needs comm and MySQL at link time
The dependency on fail-comm exists not only at compile time (the
latter is due to protobuf header generation).

Change-Id: I2bae51e763d9a385bda94e77df3e88619fa28a30
2014-01-23 14:31:24 +01:00
16ce7a4fee panda: read memory in chunks of 4 byte
As openocd is able to read maximally 4-Byte sized chunks,
this will be done for performance improvement.

Change-Id: I79f85e580240f913b5a3d7b49bc0698390644ca8
2014-01-22 18:04:36 +01:00
2285427a4c panda: invalidate cache by default
When writing to memory, related cache-lines should by default be invalidated.

Change-Id: I5477fa86c75d512f483a5b61f147666d3d845ea2
2014-01-22 18:04:36 +01:00
c142818325 cpn: Generic wrapper for injection point
As for the pandaboard to navigate fast to the injection
instruction we need to deliver a hop chain to the fail-client,
this commit adds a generic wrapper for a injection point.
For now we have only the two options hop chain and instruction
offset, so it is activated via a cmake ON/OFF switch.

Change-Id: Ic01a07a30ac386d4316e6d6d271baf1549db966a
2014-01-22 18:04:29 +01:00
146984f2fc openocd: Added support for MMU memory watch
AccessListener with long width are implemented as MMU page-faults

Change-Id: I85208463b1f7eb3dbab187287caa387394a4af90
2014-01-22 17:54:09 +01:00
809af0ae55 openocd: add cycle counter for trace timestamp
Added performance monitor hw-function cycle count.
Also fix for single-stepping exit, some additional register
exits and prevention of reboot failures.

Change-Id: I74196905dc39ecc14ae78366e7e1cb70ec7092f1
2014-01-22 17:54:03 +01:00
1feab4fd54 sal: wrong include fixed
Include of ArmArchitecture was misspelled

Change-Id: Iba3e0a9f1b687cfcd640c74ad9d185f0ffabe510
2014-01-22 17:47:17 +01:00
d4776cf628 panda: fix in breakpoints aspect
Halt condition type not properly set

Change-Id: I17c780216606b89a7c8a0ace03ac3788582d95ac
2014-01-22 17:47:17 +01:00
98a478badd openocd: arm register mapping
Mapping register id (ArmArchitecture) to openocd register id.

Change-Id: Id951ce1606e1720e7bc2fd7d6686cff8c1d5c9b4
2014-01-22 17:47:10 +01:00
582459c5bb panda: non-returning openocd-loop at terminate
Previously for correct termination, the PandaController called
the finish-function of the openocd wrapper, invoked a coroutine
switch and waited for the openocd wrapper to finish up and switch
coroutine again, so the PandaController could exit with correct
exitStatus. Now the openocd-wrapper directly exits with chosen
exit status.

Change-Id: I8d318a4143c53340896ccee4d059a0d79fdcfe89
2014-01-22 17:43:31 +01:00
0d2a5175cf panda: comment fix & remove unimplemented functions
Change-Id: Ibe533a41871bbf186272d6df43966dabb692dede
2014-01-22 17:43:31 +01:00
db0b82daca fail: modifications for pandaboard support
Change-Id: I52d3c9b9862b206a000394c45126f0afdfee081f
2014-01-22 17:43:31 +01:00
749631e21c fail: add support for pandaboard
Change-Id: I1525c9b36d58bf53ad238a553d914f183f983bba
2014-01-22 17:43:23 +01:00
4cb97a7fa5 formatting, typos, comments, details
Change-Id: Iae5f1acb653a694622e9ac2bad93efcfca588f3a
2014-01-22 13:08:13 +01:00
7591c9edc5 Merge branch 'jobclientserver-fixes' 2014-01-22 13:07:59 +01:00
fa1690bd1f Merge "core/sal: Added features that indicate whether FAIL* is initialized" 2014-01-21 15:35:22 +01:00
813414984c util: boost::thread 1.53 depends on boost::system
Unfortunately this implicit dependency is currently not resolved anywhere
else (e.g., FindBoost.cmake), although the 'net heavily discusses this
issue.

Change-Id: I8a7c8518394cdba27e591fed250623011d988067
2014-01-21 00:29:34 +01:00
4e21b42374 cpn: use strtoul for conversion of unsigned ints
As 32-bit libc6 atoi() caps the value of unsigned ints bigger than
2^31-1 (instead of just letting it overflow to the corresponding
negative value, as on x86_64), it must not be used especially for the
conversion of 32-bit pointers.

Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f
2014-01-21 00:10:56 +01:00
122eb8c9dc use uint32 for addresses in protobuf msgs
This prevents integer overflows when using addresses > 2GiB, which are
common for x86 operating systems with paging (Linux, Fiasco.OC) or
some test cases on the PandaBoard.

Note that this results in slightly different result table definitions
when automatically translating an experiment's protobuf message in the
DatabaseCampaign.

This change affects all existing protobuf messages to prevent
copy/paste propagation of this issue.

Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
2014-01-21 00:08:41 +01:00
de39bf6120 jobclient: use initializer list
Change-Id: I7eb42f947bbabd61e1aad9224cedd7ffceec4f10
2014-01-20 22:48:08 +01:00
5ffcb82138 jobclient: initial number of jobs configurable
The new CLIENT_JOB_INITIAL configuration option allows to configure
the client to request more than one job in the first request round.
If a reasonable initial value is chosen, this removes the job ramp-up
after each fail-client restart, and slightly improves overall
throughput.

Change-Id: Idac2721264ec264c520d341fac64a8311a974708
2014-01-20 22:48:08 +01:00
2c31bf79b0 jobclient: expect communication failures
This change makes the JobClient act properly on communication aborts.

Change-Id: I0a76489f117e9721546215e3b627002605e25452
2014-01-20 22:48:08 +01:00
882d4f381b jobclient: bugfix: faster shutdown at campaign end
The JobClient currently waits a LONG time until it really shuts down
after not having reached the server in sendResultsToServer() (which is
unfortunately the by far most probable point in the code to determine
this):

 -  A different bug (fixed in the previous commit) provoked the
    situation that a (way) too large amount of jobs was fetched
    before.
 -  sendResult() (called after each experiment iteration) realized
    that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
    prematurely call home to send first results (without planning to
    get new jobs yet).
 -  If the server was gone (done, or aborted), connect in
    sendResultsToServer() failed after several retries and timeouts.
 -  All subsequent calls to sendResult() retried connecting to the
    server (again, with retries and timeouts), once for each remaining
    job.
 -  When all jobs were done, getParam() tries to connect a last time,
    finally telling the experiment that nobody's home.

This resulted in client shutdown times of up to four hours (for the
default CLIENT_JOB_LIMIT of 1000) after the campaign server
terminated.  This change solves the issue by not handing out new
(cached) jobs after the connect failed once, making the experiment
terminate quickly.

Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
2014-01-20 22:48:08 +01:00
ee7bc23d85 jobclient: bugfix: initialize timing statistics
If we don't properly initialize the job timing statistics, the number
of jobs to be requested in the second request to the server is based
on the wrong timings.  In our test case, CLIENT_JOB_LIMIT jobs were
requested at once.

Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4
2014-01-20 22:48:08 +01:00
1f6e275e5e jobserver: bugfix: potential race
Delay insertion of to-be-sent jobs into m_runningJobs until they are
really sent, as getMessage() won't work anymore (as in: segfault) if
this job is concurrently re-sent (due to campaign end), its result is
received, and deleted in the campaign.  This becomes non-hypothetical
with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC.

Additionally, reinsert the remaining jobs into the input queue if
communication fails, instead of inefficiently delaying redistribution
until the campaign end.

Change-Id: If85e3c8261deda86beb8d4d93343429223753f22
2014-01-20 22:48:08 +01:00
128b54b045 jobserver: outgoing jobqueue bounded by default
Bounding the outgoing queue is always a good idea:  If the campaign has
separate threads for outgoing and incoming jobs (true for the
DatabaseCampaign), this keeps memory requirements reasonable.  If the
campaign works in a single thread, this is not disadvantageous either.

Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86
2014-01-20 22:48:08 +01:00
73adc71437 jobserver: use non-blocking accept
To allow the JobServer to shutdown properly, the accept() loop in
JobServer::run() needs to regularly check whether we're done.  This
change introduces a timed, non-blocking variant of accept() into
SocketComm to achieve this.

Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
2014-01-20 22:48:08 +01:00
8671669053 jobserver: join remaining threads on shutdown
To avoid accessing destroyed resources in CommThreads talking to clients,
we need to properly join them on shutdown.  The m_CommMutex becomes a
JobServer member to make sure it isn't destroyed before the JobServer
itself.

Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0
2014-01-20 22:48:08 +01:00
8505ddbb04 jobserver: synchronization cleanup
This change cleans up in/out queue synchronization in the job server.
End-of-jobs conditions are now properly signaled through the
SynchronizedQueue, allowing to resume and abort blocked readers when
no more input is expected.

Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906
2014-01-20 22:48:08 +01:00
5ac108ea4b Merge branch 'mysql-concurrency-fixes' 2014-01-20 18:35:35 +01:00