Commit Graph

646 Commits

Author SHA1 Message Date
836325e74b generic-tracing: cleanups
Change-Id: I5c3d1131248910228cb4fee44cf107c750c01e21
2014-02-19 19:08:46 +01:00
85152238da tracing: fix endless loop when only tracing mem accesses
With m_tracetype=TRACE_MEM, bool first was never reset to false in the
tracing plugin's main loop.  This bug was most probably never
triggered, though, as nobody only traces memory accesses.

This change also slightly simplifies the internal logic in the tracing
plugin.

Change-Id: I65d7df6a3781ec552cfb892bbf3394b421e227f1
2014-02-19 19:08:46 +01:00
01c1321b48 tracing: bugfix for mem dereferences at mapping boundary
As we copy a 32-bit word from the dereferenced address, we also need to
check whether address+3 is also mapped.  (Yes, I've seen this in the
wild.)

Change-Id: I43f891c56e077333670c9cb48c0ee8e9342fa41d
2014-02-17 23:24:16 +01:00
58fa4c59cc sal/bochs: fix handling of unmapped memory
Up to now, BochsMemory::isMapped() always returned true in 32-bit protected
mode with a 4GB linear address space (as used by, e.g., eCos), even for
addresses greater than the configured memory size.  This led to lots of
bogus memory dereferences in the (extended) tracing plugin.

This change (a follow-up to commit 5171645) additionally checks the return
value of getHostMemAddr(), and announces BX_RW (read/write access) instead
of BX_READ as the intended type of memory access.  In the aforementioned
scenario, memory addresses greater than the memory size are now correctly
detected as "not mapped".

Change-Id: Ic2fa7554c869cb90191164535a601bae4dbb49b6
2014-02-17 23:24:16 +01:00
4b921a5fe3 util: MemoryMap test
Change-Id: I54680685326a85cfd723a47e8aef8d71662c9aeb
2014-01-30 15:26:20 +01:00
4bcce14659 util: space-efficient MemoryMap
We now use boost::icl::interval_set internally, consuming extremely
lower amounts of memory.  boost::icl was introduced with Boost 1.46;
Debian 7.0 comes with 1.49, so this dependency should be no problem
anymore.

Both the class interface and the memory-map file format stay the same.

Change-Id: I38e8148384c90aa493984d0f6280817df00f1702
2014-01-30 15:26:12 +01:00
119ae40be9 util/Database: added a wrapper function for mysql_real_escape_string()
Change-Id: I999aad3c35c5f389fa3acfe8d7a11c417c478787
2014-01-28 11:07:34 +01:00
13175c259b import-trace: import debug info
If the --debug option is set, the line number table of the elf binary will
be imported into the database. The information will be stored in the
"dbg_mapping" table.

If the --sources option is set, the source files will be imported
into the database. Only the files that were actually used in the
elf binary will be imported.

Change-Id: I0e9de6b456bc42b329c1700c25e5839d9552cdbb
2014-01-28 11:07:34 +01:00
d307dd2ecb dciao-kernelstructs: reuse sobres experiment for ISORC2014
Differences:

- the task activation order is determined in the faulty experiment as
  well as in the golden run (which is now done by
  fail-generic-tracing) by observing a variable fail_virtual_port.
- There is a panic value read from the fail_virtual_port
- The golden run task activation is determined by giving an extended
  trace to task_activation.py. The script collects all writes to
  fail_virtual_port, and determines the activation from this.

Change-Id: Id401b78933b45a4b2cf031fc0a8b5ac90151ec24
2014-01-27 10:32:09 +01:00
c48c7296fb util/WallclockTimer: bugfix: include ostream
This only compiled everywhere because all users included (i)ostream.

Change-Id: I29b0fb13a01606fdffd8ebdb9701eff652065916
2014-01-24 20:33:32 +01:00
85e3911202 Merge branch 'ubuntu-saucy-fixes' 2014-01-24 17:02:44 +01:00
17e76c140b cpn: needs comm and MySQL at link time
The dependency on fail-comm exists not only at compile time (the
latter is due to protobuf header generation).

Change-Id: I2bae51e763d9a385bda94e77df3e88619fa28a30
2014-01-23 14:31:24 +01:00
4cb97a7fa5 formatting, typos, comments, details
Change-Id: Iae5f1acb653a694622e9ac2bad93efcfca588f3a
2014-01-22 13:08:13 +01:00
7591c9edc5 Merge branch 'jobclientserver-fixes' 2014-01-22 13:07:59 +01:00
fa1690bd1f Merge "core/sal: Added features that indicate whether FAIL* is initialized" 2014-01-21 15:35:22 +01:00
813414984c util: boost::thread 1.53 depends on boost::system
Unfortunately this implicit dependency is currently not resolved anywhere
else (e.g., FindBoost.cmake), although the 'net heavily discusses this
issue.

Change-Id: I8a7c8518394cdba27e591fed250623011d988067
2014-01-21 00:29:34 +01:00
4e21b42374 cpn: use strtoul for conversion of unsigned ints
As 32-bit libc6 atoi() caps the value of unsigned ints bigger than
2^31-1 (instead of just letting it overflow to the corresponding
negative value, as on x86_64), it must not be used especially for the
conversion of 32-bit pointers.

Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f
2014-01-21 00:10:56 +01:00
122eb8c9dc use uint32 for addresses in protobuf msgs
This prevents integer overflows when using addresses > 2GiB, which are
common for x86 operating systems with paging (Linux, Fiasco.OC) or
some test cases on the PandaBoard.

Note that this results in slightly different result table definitions
when automatically translating an experiment's protobuf message in the
DatabaseCampaign.

This change affects all existing protobuf messages to prevent
copy/paste propagation of this issue.

Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
2014-01-21 00:08:41 +01:00
de39bf6120 jobclient: use initializer list
Change-Id: I7eb42f947bbabd61e1aad9224cedd7ffceec4f10
2014-01-20 22:48:08 +01:00
5ffcb82138 jobclient: initial number of jobs configurable
The new CLIENT_JOB_INITIAL configuration option allows to configure
the client to request more than one job in the first request round.
If a reasonable initial value is chosen, this removes the job ramp-up
after each fail-client restart, and slightly improves overall
throughput.

Change-Id: Idac2721264ec264c520d341fac64a8311a974708
2014-01-20 22:48:08 +01:00
2c31bf79b0 jobclient: expect communication failures
This change makes the JobClient act properly on communication aborts.

Change-Id: I0a76489f117e9721546215e3b627002605e25452
2014-01-20 22:48:08 +01:00
882d4f381b jobclient: bugfix: faster shutdown at campaign end
The JobClient currently waits a LONG time until it really shuts down
after not having reached the server in sendResultsToServer() (which is
unfortunately the by far most probable point in the code to determine
this):

 -  A different bug (fixed in the previous commit) provoked the
    situation that a (way) too large amount of jobs was fetched
    before.
 -  sendResult() (called after each experiment iteration) realized
    that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
    prematurely call home to send first results (without planning to
    get new jobs yet).
 -  If the server was gone (done, or aborted), connect in
    sendResultsToServer() failed after several retries and timeouts.
 -  All subsequent calls to sendResult() retried connecting to the
    server (again, with retries and timeouts), once for each remaining
    job.
 -  When all jobs were done, getParam() tries to connect a last time,
    finally telling the experiment that nobody's home.

This resulted in client shutdown times of up to four hours (for the
default CLIENT_JOB_LIMIT of 1000) after the campaign server
terminated.  This change solves the issue by not handing out new
(cached) jobs after the connect failed once, making the experiment
terminate quickly.

Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
2014-01-20 22:48:08 +01:00
ee7bc23d85 jobclient: bugfix: initialize timing statistics
If we don't properly initialize the job timing statistics, the number
of jobs to be requested in the second request to the server is based
on the wrong timings.  In our test case, CLIENT_JOB_LIMIT jobs were
requested at once.

Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4
2014-01-20 22:48:08 +01:00
1f6e275e5e jobserver: bugfix: potential race
Delay insertion of to-be-sent jobs into m_runningJobs until they are
really sent, as getMessage() won't work anymore (as in: segfault) if
this job is concurrently re-sent (due to campaign end), its result is
received, and deleted in the campaign.  This becomes non-hypothetical
with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC.

Additionally, reinsert the remaining jobs into the input queue if
communication fails, instead of inefficiently delaying redistribution
until the campaign end.

Change-Id: If85e3c8261deda86beb8d4d93343429223753f22
2014-01-20 22:48:08 +01:00
128b54b045 jobserver: outgoing jobqueue bounded by default
Bounding the outgoing queue is always a good idea:  If the campaign has
separate threads for outgoing and incoming jobs (true for the
DatabaseCampaign), this keeps memory requirements reasonable.  If the
campaign works in a single thread, this is not disadvantageous either.

Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86
2014-01-20 22:48:08 +01:00
73adc71437 jobserver: use non-blocking accept
To allow the JobServer to shutdown properly, the accept() loop in
JobServer::run() needs to regularly check whether we're done.  This
change introduces a timed, non-blocking variant of accept() into
SocketComm to achieve this.

Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
2014-01-20 22:48:08 +01:00
8671669053 jobserver: join remaining threads on shutdown
To avoid accessing destroyed resources in CommThreads talking to clients,
we need to properly join them on shutdown.  The m_CommMutex becomes a
JobServer member to make sure it isn't destroyed before the JobServer
itself.

Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0
2014-01-20 22:48:08 +01:00
8505ddbb04 jobserver: synchronization cleanup
This change cleans up in/out queue synchronization in the job server.
End-of-jobs conditions are now properly signaled through the
SynchronizedQueue, allowing to resume and abort blocked readers when
no more input is expected.

Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906
2014-01-20 22:48:08 +01:00
5ac108ea4b Merge branch 'mysql-concurrency-fixes' 2014-01-20 18:35:35 +01:00
84aac60a70 use libmysqlclient_r to ensure thread safety
According to
<http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
(potentially) threaded clients should use the reentrant
libmysqlclient_r.  This is just a precaution, I haven't seen any
issues with the normal libmysqlclient.

Change-Id: Icb29df6dd54eb666e3b43b73fbda406acccd11cb
2014-01-20 18:34:51 +01:00
8f9ee3fddd DatabaseCampaign: run statistics update when finished
Change-Id: Ib68e54ba82e988db0d2d74ffafa6dc9bd54cd272
2014-01-20 18:34:51 +01:00
33b63651ae DatabaseCampaign: MySQL / concurrency fixes
According to
<http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
a MySQL connection handle must not be used concurrently with an open
result set and mysql_use_result() in one thread
(DatabaseCampaign::run()), and mysql_query() in another
(DatabaseCampaign::collect_result_thread()).  This indeed leads to
crashes when bounding the outgoing job queue (SERVER_OUT_QUEUE_SIZE),
and maybe even more insidous effects in other cases.  The solution is
to create separate connections for both threads.

Additionally, call mysql_library_init() before spawning any threads.

Change-Id: I2981f2fdc67c9a2cbe8781f1a21654418f621aeb
2014-01-20 18:34:51 +01:00
0534b503a6 Merge branch 'use_size_prefix-REMOVED' 2014-01-15 13:54:25 +01:00
9c984b9704 fail/cpn: (Database)Campaign no longer loses jobs
Up until now the JobServer was silently losing jobs and only claiming to be
finished - a workaround for this was to restart the campaign until all jobs
were finished according to the database and the campaign's output.
This change fixes the underlying problem, so a single campaign-run suffices
and does no longer lose any jobs.
Debugging this was awful and took us quite some time...

Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c
2014-01-15 12:59:13 +01:00
abd9decf0b fail/cpn: removed USE_SIZE_PREFIX from SocketComm
This removes the ability to directly parse protobufs from the socket, because
google::protobuf::Message::ParseFromFileDescriptor() needs a EOF after each message;
thus preventing us from sending multiple Message objects over a single socket.

Change-Id: I67c0f631071470d6e0ae597e42848036a6db3656
2014-01-15 12:56:38 +01:00
0a5e54e9aa ecos_kernel_test experiment bugix: don't resume if 'experiment reached finish() before FI'
Change-Id: Id0bb9400b8aa28307ed385a8c32b91b17254ba1c
2014-01-15 12:52:44 +01:00
3c7861ff06 core/sal: Added features that indicate whether FAIL* is initialized
GEM5 throws a reset trap during initialization.
This happens before the startup function is called.
This leads to problems because the startup function fills the m_CPUs list.
m_CPUs is needed for the TrapListener.
Therefore, we only react on traps after initialization.
This is needed in the following commit (see gem5/src/arch/arm/faults.cc).

Change-Id: I9ec6fd453705feb54b4f8a87d024181323a2d7ef
2014-01-14 13:07:21 +01:00
efbb6c6831 Merge "sal/gem5: getTimerTicks(), getTimerTicksPerSecond() implemented" 2014-01-14 12:45:13 +01:00
f359364888 sal/gem5: getTimerTicks(), getTimerTicksPerSecond() implemented
Change-Id: I01fdb5e4bdd61fc761e93ef77904c830131c9ed6
2014-01-14 12:13:55 +01:00
34065fea60 weather-monitor: command line parameter are forwarded now
Parameters that are specified on the command line are now also forwarded.

Change-Id: I0e636f14dba43ef7877ce6e6deca1abb1f00a8a6
2014-01-03 16:38:02 +01:00
0907dfb0ae weather-monitor: now is a DatabaseCampaign
"removed" unneccessary memory-mapping ("Step 0")
cleaned out ExperimentData - now consists only of fsppilot and resultset
resultset now contains bitoffset which is part of result-table's primary key
adapted code to work with msg.fsppilot() instead of ExperimentData-values

Change-Id: I3b310e7a71d4b28479028250cd5722b3b2ce9f8c
2013-12-11 14:38:01 +01:00
839913592a Merge "Coding Guideline: Fixes." 2013-12-06 20:18:06 +01:00
ab9c0edf10 DatabaseCampaign: run jobs for known-outcome exps, too
Although we know that a known_outcome=1 pilot does not exhibit
behavior different from the golden run, the database schema does not
yet know what this behavior looks like (in terms of result-table
column values).  In order to be able to JOIN valid results for all
memory writes in the trace table (fspgroup maps them all onto *one*
pilot per variant), we need to run these experiments, too.

Additionally, don't join the fspgroup table; we only need this one for
result calculations afterwards.

Change-Id: Idcd2991274fede84526b1eee68a231774625d11a
2013-12-05 19:27:44 +01:00
85fffe007e tracing: bugfix for enabled memory maps
With the recent updates to record one additional instruction at the trace
start, I broke memory-map handling (restrictMemoryAddresses() and
restrictInstructionAddresses()).  This change repairs this functionality.

Change-Id: I0daf9f474d0efe3f8e30a168c0ccc1e993e7ddc6
2013-11-18 15:49:06 +01:00
bd91549367 Merge "gem5: restore works now" 2013-11-13 17:20:53 +01:00
45e0b41022 gem5: restore works now
The function restore(PATH) can now be used to restore a checkpoint.

Change-Id: I25faf9f6335261d2b3ade4185eae93983ece9f97
2013-11-13 17:15:19 +01:00
f31548c026 Merge "core/sal: register issue fixed" 2013-11-13 16:08:40 +01:00
6fa0ae970b Coding Guideline: Fixes.
Sorry, for the small changesets.

Change-Id: I12e7b1b4efff0c63020613e399f8185ace97aec7
2013-11-11 13:27:43 +01:00
cf95437e65 RealtimeLogger: Fixed coding guideline issues.
Change-Id: I1172e0c60e2d6e895b4d3f99eb1a023c348bd3b3
2013-11-11 13:18:26 +01:00
8f0db45dfe Exp: Base system for the real time systems lecture
Change-Id: I3e5b8c6e60b57e6ec03500e9ee109fd5fb322cb2
2013-11-11 13:08:26 +01:00