PICASSO: a PI-CAlculus-based Static SOftware analyzer

Examples and Applications

Encoding map-reduce as a graph rewriting system.

The map-reduce algorithm takes a collection of data elements as input and computes a set of key/value pairs. The algorithm consists of two stages: the mapping and the reduction stage. In the mapping stage each input data value is mapped to a set of key/value pairs. In the reduction stage, the values across all input data that are associated with the same key are aggregated and then reduced to a single key/value pair. A typical example of an application of map-reduce is counting key words in web documents. The input data is a set of web documents. The mapping stage counts the occurrences of the key words in each document. The reduction stages sums up the occurrences for all documents.
For this example, we are inspired by an implementation of a map-reduce algorithm in Scala using actors. The code is in the book Actors in Scala chapter 9. In the implementation of the mapping stage, the master actor creates a worker actor for each element of the input data and sends the data to the workers, which apply the mapping function and send the lists of key/value pairs back to the master. In the reduction stage, the master creates a reducer actor for each key and sends it the aggregated values for that key. The reducer then reduces all these values to a single value which is again sent back to the master.

Encoding conventions. Heap objects (including actors) are represented by nodes in the graph and references between objects by edges. Each node is labeled by the internal state of the associated object and each edge is labeled by the name of the associated reference variable. For instance the node corresponding to an actor is labeled with the class of the object and the value of the program counter of the associated thread, e.g. master1 for an instance of the master actor at program location 1.

Simple graph rewriting rule: creating the mapper

The first rule we consider models the master creating a mapper for a piece of input data.
The left-hand-side (LHS) of the rule matches the master having a reference to a piece of input data. The right-hand-side (RHS) shows the master with a reference to a mapper which has a reference to the input. Furthermore, we also needs to tell which nodes in the LHS correspond to nodes in the RHS. In this case the master and input are in both and the mapper is new.

This rule, written in Picasso format, looks like:

transition "master: make mappers"
pre
    (m, master) -> (i, input)
post
    (m, master) -> (w, mapper)
    (w, mapper) -> (i, input)
==>
    m -> m
    i -> i
<==
Edges are written (id, label) -> (id, label).
Single node [not shown here] are node (id, label).
The LHS is pre and the RHS post.
==> is a partial morphism between LHS and RHS (given using the node ids).
For the moment you can ignore <==.

A graphical representation of the rule is

trs cluster_trslhs LHS cluster_trsrhs RHS trslhs0 (input,0) trsrhs0 (input,0) trslhs0->trsrhs0 trslhs1 (master,0) trslhs1->trslhs0 trsrhs2 (master,0) trslhs1->trsrhs2 trsrhs1 (mapper,0) trsrhs1->trsrhs0 trsrhs2->trsrhs1
This image is produce by Picasso and is included in the output. You can notice that nodes have an associated depth which is always 0 in this example. Nodes with depth zero are normal nodes. We will see later the meaning of non-zero depth. [hint: this is related to the finite representation of some class of infinite graphs.]

Guarded rule: no more input, all the mappers are created

Picasso provides guarded transitions. A rewriting rule can be guarded by an inhibitory graph. The semantics is that the rewriting rule should not fire on a graph G if the inhibitor can be matched to a subgraph of G.
For instance, a guard can be used to express the fact that the master has created all the mappers, i.e. there is no more input. In Picasso input format:

transition "master: mappers created"
pre
    node (m, master)
post
    node (m, master1)
==>
    m -> m
<==
no
    (m, master) -> (i, input)
==>
    m -> m
Notice that there is also a partial morphism between the guard and the LHS.

The graphical representation of the rule:

trs cluster_trsguard GUARD cluster_trslhs LHS cluster_trsrhs RHS trsguard0 (input,0) trsguard1 (master,0) trsguard1->trsguard0 trslhs0 (master,0) trslhs0->trsguard1 trsrhs0 (master1,0) trslhs0->trsrhs0

Rule with wildcard and nested graphs

A wildcard is a special node label that is denoted by the underscore symbol _. When matching the LHS of a rule in a graph G, a wildcard matches against all possible labels and is instantiated by the matching. This is useful for translating code that applies to an object regardless of what its internal state is. Using wildcards, we can model the behavior of such code using a single rewriting rule.
For instance, the mapper does not need to know the precise state, i.e. label, of the master to send back the intermediate result. Instead a wildcard can be used to match the master actor in any state.

transition "mapper"
pre
    (m, _) -> (w, mapper)
    (w, mapper) -> (i, input)
post
    (m, _) -> (k, key)* [inter]
    (k, key)* -> (v, value)**
==>
<==
    m -> m
Notice that the morphism between wildcard nodes is given as part of <==. Edge's label are given inside [] after the edge.

This rule also make use of what we call nested graphs. A nested graph represents the downward-closure of all graphs that are obtained by recursively unfolding the nested subgraph components. In this case, the RHS represents many keys each one of them pointing to many values.
Warning: it is not advised to use nested graphs as part of the LHS of a transition. Originally, transition were supposed to work only on normal graphs. We allowed nested graphs in the RHS since the resulting semantics is clear. On the other hand, the semantics of nested graphs as part of the LHS is not clear to us, thus not supported.

The graphical representation of the rule:

trs cluster_trslhs LHS cluster_trsrhs RHS trslhs0 (input,0) trslhs1 (mapper,0) trslhs1->trslhs0 trslhs2 (_,0) trslhs2->trslhs1 trsrhs2 (_,0) trslhs2->trsrhs2 trsrhs0 (value,2) trsrhs1 (key,1) trsrhs1->trsrhs0 trsrhs2->trsrhs1 inter

Initial state

A nested graph that represents the downward-closure of the set of initial states. The following graph represents a master actor master with an arbitrary number of associated input data values:

init
    (m, master) -> (i, input)*

The graphical representation of the graph: first by Picasso where the depth is the number next to the node label, then a (easier to understand) version where the nested subgraph is shown by surrounding it with a dashed box.

init 0 (input,1) 1 (master,0) 1->0 image/svg+xml master input

Evaluation steps

We are now going to explain how Picasso evaluates a rewriting rule on a nested graph.
Consider the following rule which models a step of the reduction stage in which a reducer has reduced its last value, sends the accumulated value back to the master, and then terminates.

transition "reducer"
pre
    (m, _) -> (w, reducer)
    (w, reducer) -> (k1, key)
    (w, reducer) -> (v1, value)
post
    (m, _) -> (k2, key) [result]
    (k2, key) -> (v1, value)
==>
    k1 -> k2
    v1 -> v1
<==
    m -> m
no
    (k1, key) -> (v2, value)
==>
    k1 -> k1
The inhibitor graph in the guard is used to express that all values have been reduced. The wildcard is used to match the master actor, regardless of what its internal state is.

trs cluster_trsguard GUARD cluster_trslhs LHS cluster_trsrhs RHS trsguard0 (value,0) trsguard1 (key,0) trsguard1->trsguard0 trslhs0 (value,0) trsrhs0 (value,0) trslhs0->trsrhs0 trslhs1 (reducer,0) trslhs1->trslhs0 trslhs2 (key,0) trslhs1->trslhs2 trslhs2->trsguard1 trsrhs1 (key,0) trslhs2->trsrhs1 trslhs3 (_,0) trslhs3->trslhs1 trsrhs2 (_,0) trslhs3->trsrhs2 trsrhs1->trsrhs0 trsrhs2->trsrhs1 result

Picasso divides the application of a rule in 4 steps as follows:

Unfolding
When the LHS matches a part of a nested subgraph, we unfold the nested subgraph to get a concrete copy of the matched part.
Inhibiting
The extension of rewriting rules with guards potentially breaks the monotonicity property that is required for WSTS. Therefore, we use the idea of monotonic abstraction to obtain a sound implementation of guarded rewriting rules. Monotonic abstraction adds additional transitions to the model to enforce monotonicity. Concretely, if the inhibitor graph matches a graph G, we remove the matched part from G to obtain G1 on which we apply the rewriting rule as usual. Since G1 is not necessarily reachable in the original system, guarded rules introduce a potential source of incompleteness to the analysis.
Rewriting
The unfolded match of the LHS is replaced by the RHS.
Folding
To maintain a compact representation of nested graphs in the tool, we fold nested graphs by removing subgraphs which are subsumed within the nested graph itself.

image/svg+xml master reducer key value key value result master key value result key value result unfolding inhibiting rewriting folding value master reducer key value key value result value reducer key value value master reducer key value key value result value reducer key value reducer key value value master key value result reducer key value value

Notice that the final nested graph is identical to the nested graph to which the rule has been applied. This is what makes nested graph suitable to represent the result of the analysis performed by Picasso. The covering set can indeed be represented by a finite union of nested graphs.

Running Picasso

Please refer to the manual for the details of installing Picasso. Assuming that you have done so and the example of interest is in a file named example.dbp, it is possible to run the example with the command:
# java -jar picasso-core.jar example.dbp -r
The -r option tells Picasso to store the result of the analysis in an html report. For instance, the report produced by Picasso on the map-reduce example can be found here.

Proving termination of depth-bounded processes.

Running Picasso with the --termination option will produce a numerical abstraction of the program. This abstraction can be used to prove termination of a DBS (see publications for the details).

Consider the following example:

/*  Compact encoding of Treiber's stack (content of the stack is abstracted):
 *  S(x, head) = out(x)(head).S(x, head)
 *             + in(x)(h2, h, succ, fail).([head = h].out(succ)().S(x, h2) |
 *                                         [head != h].out(fail)().S(x, head))
 *
 *  Op1(x) = in(x)(h).Op2(x,h)
 *   Op2(x,h) = (\nu h2,succ,fail) out(x)(h2, h, succ, fail).(in(succ)().0 |
 *                                                            in(fail)().Op1(x))
 */

init
  (s, S) -> (h, head) [head]
  (o, Op1)* -> (s, S) [x]

transition "client: Op1 -> Op2" 
pre
  (s, S) -> (h, head) [head]
  (o, Op1) -> (s, S) [x]
post
  (s, S) -> (h, head) [head]
  (o, Op2) -> (s, S) [x]
  (o, Op2) -> (h, head) [h]
  (o, Op2) -> (h2, head) [h2]
==>
  s -> s
  o -> o
  h -> h
<==

transition "client: Op2 Success" 
pre
  (s, S) -> (h, head) [head]
  (o, Op2) -> (s, S) [x]
  (o, Op2) -> (h, head) [h]
  (o, Op2) -> (h2, head) [h2]
post
  (s, S) -> (h2, head) [head]
  node (h, head)
==>
  s -> s
  h -> h
  h2 -> h2
<==

transition "client: Op2 Fail" 
pre
  (s, S) -> (h, head) [head]
  (o, Op2) -> (s, S) [x]
  (o, Op2) -> (h2, head) [h]
  (o, Op2) -> (h2, head) [h2]
post
  (s, S) -> (h, head) [head]
  (o, Op1) -> (s, S) [x]
  node (h2, head)
  node (h2, head)
==>
  s -> s
  o -> o
  h -> h
  h2 -> h2
  h2 -> h2
<==
Assuming that this file is named example2.dbp, it is possible to run the example with the command:
# java -jar picasso-core.jar --termination example2.dbp
In case ARMC diverges we can add some hints in the form of transition predicates. Either the additional option --namedTPreds Op or --makeTPreds should be enough help in this example.

Further applications

More examples about proving termination of DBS can be found here.

There is also some ongoing work to extract state machine interfaces for groups of interconnected objects. Preliminary results are available here.

IST Austria