[Rock-dev] Discussion about fault response tables

Tue May 7 16:29:26 CEST 2013

On 05/06/2013 03:47 PM, Chris Mueller wrote:
> Hi,
>
> i have some small points that come up in my mind:
>
> 1) I've less experience with professional fault-toleranced systems. Its
> maybe additional helpful in the beginning to specify the types of
> possible responses the system
> could model. e.g.:
>   1. Spawning a new task, that has been failed due to hardware defects,
> software bugs, etc. (retry)
>   2. Retry a task with another property values because the configuration
> of e.g. a detector is currently completely messed up for the current
> environmet conditions.
>   3. Start a repair task to replace the failed task until the system is
> back in a stable state (thats one of the common use-cases roby is
> currently providing)
>   4. Abandoning a mission if its absolutely not possible to success the
> failed plan und try to continue the rest of the plan (That's a more
> high-level failure)
>   5. Spawning a complete alternative plan if the success of a specific
> task / action is not possible and necessary for the rest of a plan.
> I guess, there are problably much more concret response types, that
> could be retrieved from our past experiences and requirements with
> several systems.
> This could help to concretize the system, because in my opinion it's not
> a trivial matter to design a system model that could handle each kind of
> error.
All these points are covered with the proposed fault response tables.
1, 2, 3 and 5 are provided by on_fault. It is interesting to note that, 
from the point of view of Roby, 2, 3 and 5 are equivalent.

> 2) if a fault exception is thrown in the system, a fault handler
> (on_fault ...) should also provide the task that has been failed. An
> ideologic example could be:
It is already provided by exception.origin

> on_fault EXCEPTION do |exception, failed_task|
>     failed_task.prepare_restart
>     failed_task.reconfigure(:param => BETTER_PARAM_VALUE)
>     failed_task.respawn
> end
Looking at the task level is ill-conceived as you can only very rarely 
do that.

> 3) Fault tolerance tables could be probably also visualized in syskit
> browse/roby-display. Would be later helpful for implementing and
> debugging the
> response management. That would be great.
As well as the action interface in general, yes.

> 4) Could you conretize a little more the meaning of "symbol" within the
> FAULT_MATCHER specification (maybe with an example)?
> Is it some kind of custom signal that can be thrown from any
> composition/task when a specific data port doesn't output an expected
> value within a given time?
> (thats currently my interpretation about the conecept 'data_predicates'
> mentioned in the wiki).
It is just a mean to give a name to an error. For instance, you would do

# Monitor a battery level
fault :battery_low do
   battery0_dev.status_port.battery_level < 1
end
# React to it
on_fault :battery_low do |exception|
   surface
end

-- 
Sylvain Joyeux (Dr.Ing.)
Space & Security Robotics

!!! Achtung, neue Telefonnummer!!!

Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Phone: +49 (0)421 178-454136
Fax:   +49 (0)421 218-454150
E-Mail: robotik at dfki.de

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.:    DE 148646973
Steuernummer:  19/673/0060/3
-----------------------------------------------------------------------