The map-reduce algorithm takes a collection of data elements as input and computes a set of key/value pairs.
The algorithm consists of two stages: the mapping and the reduction stage.
In the mapping stage each input data value is mapped to a set of key/value pairs.
In the reduction stage, the values across all input data that are associated with the same key are aggregated and then reduced to a single key/value pair.
A typical example of an application of map-reduce is counting key words in web documents.
The input data is a set of web documents.
The mapping stage counts the occurrences of the key words in each document.
The reduction stages sums up the occurrences for all documents.
For this example, we are inspired by an implementation of a map-reduce algorithm in Scala using actors.
The code is in the book Actors in Scala chapter 9.
In the implementation of the mapping stage, the master actor creates a worker actor for each element of the input data and sends the data to the workers, which apply the mapping function and send the lists of key/value pairs back to the master.
In the reduction stage, the master creates a reducer actor for each key and sends it the aggregated values for that key.
The reducer then reduces all these values to a single value which is again sent back to the master.
Encoding conventions. Heap objects (including actors) are represented by nodes in the graph and references between objects by edges. Each node is labeled by the internal state of the associated object and each edge is labeled by the name of the associated reference variable. For instance the node corresponding to an actor is labeled with the class of the object and the value of the program counter of the associated thread, e.g. master1 for an instance of the master actor at program location 1.
The first rule we consider models the master creating a mapper for a piece of input data. The left-hand-side (LHS) of the rule matches the master having a reference to a piece of input data. The right-hand-side (RHS) shows the master with a reference to a mapper which has a reference to the input. Furthermore, we also needs to tell which nodes in the LHS correspond to nodes in the RHS. In this case the master and input are in both and the mapper is new.
This rule, written in Picasso format, looks like:
transition "master: make mappers" pre (m, master) -> (i, input) post (m, master) -> (w, mapper) (w, mapper) -> (i, input) ==> m -> m i -> i <==
A graphical representation of the rule is
Picasso provides guarded transitions. A rewriting rule can be guarded by an inhibitory graph. The semantics is that the rewriting rule should not fire on a graph G if the inhibitor can be matched to a subgraph of G. For instance, a guard can be used to express the fact that the master has created all the mappers, i.e. there is no more input. In Picasso input format:
transition "master: mappers created" pre node (m, master) post node (m, master1) ==> m -> m <== no (m, master) -> (i, input) ==> m -> m
The graphical representation of the rule:
A wildcard is a special node label that is denoted by the underscore symbol _. When matching the LHS of a rule in a graph G, a wildcard matches against all possible labels and is instantiated by the matching. This is useful for translating code that applies to an object regardless of what its internal state is. Using wildcards, we can model the behavior of such code using a single rewriting rule. For instance, the mapper does not need to know the precise state, i.e. label, of the master to send back the intermediate result. Instead a wildcard can be used to match the master actor in any state.
transition "mapper" pre (m, _) -> (w, mapper) (w, mapper) -> (i, input) post (m, _) -> (k, key)* [inter] (k, key)* -> (v, value)** ==> <== m -> m
This rule also make use of what we call nested graphs. A nested graph represents the downward-closure of all graphs that are obtained by recursively unfolding the nested subgraph components. In this case, the RHS represents many keys each one of them pointing to many values. Warning: it is not advised to use nested graphs as part of the LHS of a transition. Originally, transition were supposed to work only on normal graphs. We allowed nested graphs in the RHS since the resulting semantics is clear. On the other hand, the semantics of nested graphs as part of the LHS is not clear to us, thus not supported.
The graphical representation of the rule:
A nested graph that represents the downward-closure of the set of initial states. The following graph represents a master actor master with an arbitrary number of associated input data values:
init (m, master) -> (i, input)*
The graphical representation of the graph: first by Picasso where the depth is the number next to the node label, then a (easier to understand) version where the nested subgraph is shown by surrounding it with a dashed box.
We are now going to explain how Picasso evaluates a rewriting rule on a nested graph. Consider the following rule which models a step of the reduction stage in which a reducer has reduced its last value, sends the accumulated value back to the master, and then terminates.
transition "reducer" pre (m, _) -> (w, reducer) (w, reducer) -> (k1, key) (w, reducer) -> (v1, value) post (m, _) -> (k2, key) [result] (k2, key) -> (v1, value) ==> k1 -> k2 v1 -> v1 <== m -> m no (k1, key) -> (v2, value) ==> k1 -> k1
Picasso divides the application of a rule in 4 steps as follows:
Notice that the final nested graph is identical to the nested graph to which the rule has been applied. This is what makes nested graph suitable to represent the result of the analysis performed by Picasso. The covering set can indeed be represented by a finite union of nested graphs.
Please refer to the manual for the details of installing Picasso.
Assuming that you have done so and the example of interest is in a file named example.dbp, it is possible to run the example with the command:
# java -jar picasso-core.jar example.dbp -r
The -r option tells Picasso to store the result of the analysis in an html report.
For instance, the report produced by Picasso on the map-reduce example can be found here.
Running Picasso with the --termination option will produce a numerical abstraction of the program. This abstraction can be used to prove termination of a DBS (see publications for the details).
Consider the following example:
/* Compact encoding of Treiber's stack (content of the stack is abstracted): * S(x, head) = out(x)(head).S(x, head) * + in(x)(h2, h, succ, fail).([head = h].out(succ)().S(x, h2) | * [head != h].out(fail)().S(x, head)) * * Op1(x) = in(x)(h).Op2(x,h) * Op2(x,h) = (\nu h2,succ,fail) out(x)(h2, h, succ, fail).(in(succ)().0 | * in(fail)().Op1(x)) */ init (s, S) -> (h, head) [head] (o, Op1)* -> (s, S) [x] transition "client: Op1 -> Op2" pre (s, S) -> (h, head) [head] (o, Op1) -> (s, S) [x] post (s, S) -> (h, head) [head] (o, Op2) -> (s, S) [x] (o, Op2) -> (h, head) [h] (o, Op2) -> (h2, head) [h2] ==> s -> s o -> o h -> h <== transition "client: Op2 Success" pre (s, S) -> (h, head) [head] (o, Op2) -> (s, S) [x] (o, Op2) -> (h, head) [h] (o, Op2) -> (h2, head) [h2] post (s, S) -> (h2, head) [head] node (h, head) ==> s -> s h -> h h2 -> h2 <== transition "client: Op2 Fail" pre (s, S) -> (h, head) [head] (o, Op2) -> (s, S) [x] (o, Op2) -> (h2, head) [h] (o, Op2) -> (h2, head) [h2] post (s, S) -> (h, head) [head] (o, Op1) -> (s, S) [x] node (h2, head) node (h2, head) ==> s -> s o -> o h -> h h2 -> h2 h2 -> h2 <==
More examples about proving termination of DBS can be found here.
There is also some ongoing work to extract state machine interfaces for groups of interconnected objects. Preliminary results are available here.