Warning: closedir() expects parameter 1 to be resource, null given in /home/aware/public_html/etc/gamma/index.php on line 23
.aware gamma zine - Dynamic Programming
.aware eZine Gamma - Kung Foo Training Kamp
γ
Ω


[==============================================================================]
[-----------------[ Dynamic programming with functional style ]----------------]
[==============================================================================]


       _.d####b._
     .############.
   .################.
  .##################.__ __ __ __ _ _ __ __ 
  ##############/´_`|#\ V  V // _` | '_/ -_)
  ##############\__,|# \_/\_/ \__,_|_| \___|
  ###########>"<######    
  *#########(   )####* 
   ##########>.<#####    author: Teferi
    ################       date: 2008/08/08
     *############*       
       "T######T"



--[ 00 ]----------------------------------------------[ Table Of Contents ]-----

  1.  Introduction
  2.  Introducing: The algorithm
  3.  The naive approach
  4.  Digression 1: Untying the recursive knot
  5.  The untied approach
  6.  Digression 2: Y-Combinators
  7.  Re-introducing the caching
  8.  Getting rid of the imperative cache
  9.  Digression 3: Monads 
 10.  Final solution 
 11.  Conclusions and further perspectives



--[ 01 ]---------------------------------------------------[ Introduction ]-----


Today you are going to learn a fair bit of techniques that will aid you in 
programming in functional languages. The way I will introduce these techniques
is by translating a imperative algorithm that uses the paradigm of dynamic
programming into a funtional algorithm.

This article is intended for people who are not yet functional programming 
gurus, but like the prospect of learning some more helpful stuff. You should,
however, be familiar with the basics of this sort of programming. Also, some
basic knowledge of OCaml [1] might be helpful, since that will be the language
I will be using. This language makes it easier to cross the border from
an imperative setting to a functional one (yes, I will be cheating quite a bit
in doing that, but the final result will be purely functional).

As you probably know, the usual way of dynamic programming is solving small
subproblems and combining them, bottom up, into a larger subproblem - until
you are left with the final problem and a set of subsolutions to combine them
into an actual solution. Ok, this was really informal, I think we need to 
have a deeper look at the principles at work here:
To be able to use dynamic programming you first need to identify something
called a "principle of optimality" in literature [2]. In my lecture notes from
a few years ago there was an example what such a principle might look like:

  OptSol(x1,...,xn) = Comb(OptSol(x1,...,x(n-1)),OptSol(x2,...,xn))

But do not be fooled by the general look of this formula, the "principle of
optimality" might look quite different in other cases. Actually it will even
look different in the case we are going to discuss. All you have to remember
is that you need to be able to derive a optimal solution for a problem from
optimal solutions of the subproblems. For more on this, have a look at the 
wikipedia entry on the Bellman Equation[3].

To find the optimal solution, we now need to partition our problem into 
subproblems, solve these and combine them. Sounds a bit like "Divide and 
Conquer" doesn't it? Well, please don't try divide and conquer for 
solving the subproblems, because for most cases they tend to overlap quite a
bit. A quick example for this:

fib(0) = 1
fib(1) = 1
fib(x) = fib(x-1)+fib(x-2)

This is the divide and conquer method. You can easily see that this creates
an overhead by calculating quite a bit of the Fibonacci sequence more than
once. If you do it this way, though:

fib(0) = 1
fib(1) = 1
for i = 2:n
  fib(i) = fib(i-1) + fib(i-2)
end

then you are repeatedly caching the solutions, so you can access them again, in
case they overlap. We will go down the "Divide and Conquer" again, when we want
to see how to do this in a functional setting. On the other hand, this will 
only be a small side track, we hope to leave as soon as possible.



--[ 02 ]--------------------------------------------------[ The Algorithm ]-----


Ok, so now we need to decide on an Algorithm to twist around. Since I am mainly
working in the field of MIR (Music Information Retrieval) I'll propose dynamic
time working (DTW). Any objections to that? No? Ok then here it goes.

The goal of this algorithm is to find out how much it costs to align two 
sequences of data. These sequences could be anything, in my case they usually
are feature sequences for musical pieces. Another really popular application
for this algorithm is gene sequences. Let's start out with a nice picture:
 (editor's note: lol nice picture T)

Sequence 1: a a b c
            |/ /| |
Sequence 2: a b b c

We have two pointers, one pointing into each of these sequences, starting at the
beginning. The cost of one step is measured by a metric on the feature space 
(our a's, b's and c's in this case), representing the "distance" between two
features. In each step we can go forward one step on either sequence alone or on
both of them. Now, when we sum up all the costs, what will be the minimum cost
we have to pay, to walk across the whole pair of sequences? In our example above,
if we assume the discrete metric, we do not have any costs at all. The way to 
walk across them is hinted at by the bars. For more information I'll have to 
redirect you to wikipedia again [4]. For more ideas on how to use this algorithm 
I can point you to [5].

To compute the cost, we could of course calculate all possible action-sequences
and then find the optimal one, but that is a big no-no. Instead we have a closer
look at the problem: When we are at the end of both sequences with optimal
solutions for all states that lead to the end, and simply take the best one, we 
obtain an optimal solution for the problem itself.

Hey, that sounds almost like a principle of optimality. More formally:

DTW(0,0) = dist(s1[0],s2[0])
DTW(a,b) = infinity; if a or b are smaller than zero
DTW(a,b) = dist(s1[a],s2[b]) + min {DTW(a-1,b-1), DTW(a  ,b-1), DTW(a-1,b  )}

Iterating over this gives us:

int DTWDistance(char s[1..n], char t[1..m], int d[1..n,1..m]) {
    declare int DTW[0..n,0..m]
    declare int i, j, cost

    for i := 1 to m
        DTW[0,i] := infinity
    for i := 1 to n
        DTW[i,0] := infinity
    DTW[0,0] := 0

    for i := 1 to n
        for j := 1 to m
            cost:= d[s[i],t[j]]
            DTW[i,j] := minimum(DTW[i-1,j  ] + cost,    // insertion
                                DTW[i  ,j-1] + cost,    // deletion
                                DTW[i-1,j-1] + cost)    // match

    return DTW[n,m]
}

(Thanks to wikipedia for sparing me from having to write this)

However, this relies haviely on loops, and we want to be able to do it 
functional style. So the next step is to discard this algorithm and start over 
again.



--[ 03 ]---------------------------------------------[ The naive approach ]-----


We will at first do this very naively using "divide and conquer". So we have our
first solution:

let rec dtw ?pos s1 s2 dist =
   match pos with
      None -> dtw ~pos:(Array.length s1-1,Array.length s2-1) s1 s2 dist 
   |  Some (a,b) ->
         if a <0 || b < 0 then
            infinity
         else if a = 0 && b = 0 then
            dist s1.(0) s2.(0)
         else
            dist s1.(a) s2.(b) +.
               List.fold_left min  infinity
                  [dtw ~pos:(a-1,b-1) s1 s2 dist ; 
                   dtw ~pos:(a  ,b-1) s1 s2 dist ;
                   dtw ~pos:(a-1,b  )  s1 s2 dist ]
   ;;

Quite simple, isn't it? We transformed our equation into an algorithm. So now
we point this to our mp3 files and get busy with something else? Well, not
exactly. As I pointed out before, this algortihm is horribly inefficient, so
on long enough sequences of features it would terminate briefly before the
milky way implodes. We now have to find a way to introduce caching.

let rec dtw s1 s2 dist =
  let cache = Hashtbl.create 10 in
  let rec dtw' (a,b) =
     try Hashtbl.find cache (a,b)
     with Not_found ->
        if a <0 || b < 0 then
           infinity
        else if a = 0 && b = 0 then
           dist s1.(0) s2.(0)
        else
           dist s1.(a) s2.(b) +.
              List.fold_left min  infinity
                 [dtw' (a-1,b-1); 

                  dtw' (a  ,b-1);
                  dtw' (a-1,b  )]
   in
   dtw' (Array.length s1-1,Array.length s2-1)
   ;;

Yes, I admit, that is not as pretty as before, and it's even _cheating_, as we
are using an imperative Hashtable for caching. We are going to get rid of these
deficiencies soon, but first a little digression to a very useful technique.



--[ 04 ]-----------------------[ Digression 1: Untying the recursive knot ]-----


So what does it mean, when someone tells you to untie the recursive knot? It 
means that there is a recursion hidden somewhere that we are going to get rid
of. Indeed, we are going to write some recursive definition without the use 
of recursion. Theory proves that this can be done.
If you are not familiar with lambda-calculus, now is the time to have a look at
it: For example, check [7] or find a more entertaining introduction in [8].

As you might know, functional programming is loosely based on lambda-calculus.
In lambda-calculus we only have one type (let's call it F here) and this type
consists of functions from F to F... Somewhat like this:

type F = F -> F;;  
(* No, this won't work in OCaml, unless you turn -rectypes on *)

All these functions are nameless functions (so called lambda-functions). So, 
without a name - how are you going to refer to a function for recursion? Well,
actually you won't, because you can't. There is no such thing as recursion in
lambda-calculus. However, let's just see how we can do something similar to
recursion, in a less formal version of lambda-calculus:

let fac' = \ fac n -> if n = 0 then 1 else n * fac fac (n-1) in
    fac' fac'

It might take a while until this approach starts to make sense, depending on
your knowledge of mathematics in general and especially of lambda-calculus.
What we do here is to pass, to the function, a version of itself. It then
gets called, again passing it to itself. This is no recursion, since we do not
explicitly refer to the function in itself. The recursive knot however has to
be retied, and this happens in the call "fac' fac'". If you really want to
understand the idea of this, try programming this version of the factorial
function on your own in a programming language of your choice. It actually
works in almost any language, I did it myself in Java, C, OCaml and BASH a 
few years ago. You might have to fiddle around a bit though, to convince the
compiler that you know what you are doing.

So here is one example on how to use this technique in a practical way:
Let's say you have two functions, that are mutually recursive, we'll just call
them even and odd here (thanks to Dr. Jon Harrop for giving this example on the
caml_beginners::[] Mailinglist [6]).

let rec even ?(xs=[]) ?(ys=[]) = function
   | [] -> xs, ys
   | h::t -> odd ~xs:(h::xs) ~ys t
and odd ?(xs=[]) ?(ys=[]) = function
   | [] -> xs, ys
   | h::t -> even ~xs ~ys:(h::ys) t;;

These two functions partition a list into those elements at odd places and
those at even places:

# even [0;1;2;3;4;5];;
- : int list * int list = ([4; 2; 0], [5; 3; 1])

Now we untie these definitions: 

let rec even odd xs ys = function
   | [] -> xs, ys
   | h::t -> odd (h::xs) ys t;;

let odd even xs ys = function
   | [] -> xs, ys
   | h::t -> even xs (h::ys) t;;

Now we could, for example, put both functions into seperate modules, we just
have to retie the knot at some other place:

let rec even' xs = even odd' xs and odd' xs = odd even' xs;;

We'll be able to do this almost automatically lateron, but first we have to take
some further steps and learn some more.



--[ 05 ]--------------------------------------------[ The untied approach ]-----


So let's just apply the newly learned to our algorithm. We will go back to the 
first step and use the version without caching, to make it more concise and
readable. Here's the algorithm again:

let rec dtw ?pos s1 s2 dist =
   match pos with
      None -> dtw ~pos:(Array.length s1-1,Array.length s2-1) s1 s2 dist 
   |  Some (a,b) ->
         if a <0 || b < 0 then
            infinity
         else if a = 0 && b = 0 then
            dist s1.(0) s2.(0)
         else
            dist s1.(a) s2.(b) +.
               List.fold_left min  infinity
                  [dtw ~pos:(a-1,b-1) s1 s2 dist ; 
                   dtw ~pos:(a  ,b-1) s1 s2 dist ;
                   dtw ~pos:(a-1,b  )  s1 s2 dist ]
   ;;

The optional pos argument can get quite annoying, so we'll rid ourselves of it
first:

let dtw s1 s2 dist = 
   let rec dtw' (a,b) =
      if a <0 || b < 0 then
         infinity
      else if a = 0 && b = 0 then
         dist s1.(0) s2.(0)
      else
         dist s1.(a) s2.(b) +.
            List.fold_left min  infinity
               [dtw (a-1,b-1); 
                dtw (a  ,b-1);
                dtw (a-1,b  )]
   in
   dtw' (Array.length s1-1,Array.length s2-1)
   ;;

This is much better. Now we can easily untie the recursive knot and retie it
again, at the end:

let dtw s1 s2 dist = 
   (* untied version *)
   let dtw' dtw (a,b) =
      if a <0 || b < 0 then
         infinity
      else if a = 0 && b = 0 then
         dist s1.(0) s2.(0)
      else
         dist s1.(a) s2.(b) +.
         List.fold_left min infinity
            [dtw (a-1,b-1); 
             dtw (a  ,b-1);
             dtw (a-1,b  )]
   in
   (* retying the knot *)
   let rec dtw = dtw' dtw in
   dtw (Array.length s1-1,Array.length s2-1)
   ;;


Now - what have we actually achieved by this? You should be able to tell that
we're still horribly inefficient, because there is no caching. This will be
re-introduced in a later stage, though, when we automatize the step of
retying. For this, we need to have a look at Y-Combinators.



--[ 06 ]------------------------------------[ Digression 2: Y-Combinators ]-----


Let's see if we can extract the process of retying the knot from the function
above. It seems that at the end, we take a function f, apply it to an untied
version of itself, yielding the untied version we just used. Well, sounds like
black magic at first. But read it a few times, and you will see that this is
just the basic pattern for recursion. In a general way it can be programmed
like this:

let y f = 
   let rec f' arg = f f' arg in
   f'
   ;;

The type of this is: 
val y : 
   (('a -> 'b) -> 
     'a -> 'b) -> 
     'a -> 'b


This is one possibility for for the so called Y-Combinator. An alternative
would be:

(* careful, this one wont work as expected *)
let rec y f = f (y f);;

At least it would be in pure lambda calculus. However OCaml is not a lazy 
language, and will hence try to evaluate the full recursion, before calculating 
anything else. We have to trick it into doing normal-order (or lazy) evaluation
here. We do that using eta-expansion [9], i.e. adding an argument to the 
functions on both sides of the equation:

let rec y f x = f (y f) x;;

In pure lambda calculus we would have the following 
 (taken from Wikipedia again [10])

Y = \f . (\x . f(x x)) (\x . f (x x))

Applying this to a function gives us:

    Y g = (\f . (\x . f (x x)) (\x . f (x x))) g
=>  Y g = (\x . g (x x)) (\x . g (x x))
=>  Y g = (\y . g (y y)) (\x . g (x x))
=>  Y g = g ((\x . g (x x)) (\x . g (x x)))
=>  Y g = g (Y g)

So using this would give us an infinite sequence akin to:

g ( g ( g ( g ( g ( g ( g ( ... ( Y g) ... )))))))

This is the sequence the computer attempted to compute when we gave it the 
simple form of the Y-Combinator. If you look at this from a different 
perspective, you will notice that this is very much like a fixed-point iteration
on the function g. So the Y-Combinator is one of the basic forms of fixed-point
combinators. For any function you pass to it in pure lambda calculus, it will
yield the fixed point of that function. That was quite a shock to Church and
other researchers, when they studied lambda calculus, because there are some
functions that simply should not have fixed points, like the "not" function or
the increment function. In lambda calculus they do however. This is due to 
non-terminating functions having a value as well. Well, I better not get into 
more depth here, because we only need the Y-Combinator for our recursive knots
and not to find fixed points.



--[ 07 ]-------------------------------------[ Re-introducing the caching ]-----


So this is where we stoped:

let dtw s1 s2 dist = 
   (* untied version *)
   let dtw' dtw (a,b) =
      if a <0 || b < 0 then
         infinity
      else if a = 0 && b = 0 then
         dist s1.(0) s2.(0)
      else
         dist s1.(a) s2.(b) +.
         List.fold_left min infinity
            [dtw (a-1,b-1); 
             dtw (a  ,b-1);
             dtw (a-1,b  )]
   in
   (* retying the knot *)
   let rec dtw = dtw' dtw in
   dtw (Array.length s1-1,Array.length s2-1)
   ;;

Now we add the Y-Combinator to this and we get the following:

let y f = 
   let rec f' arg = f f' arg in
   f'
   ;;

let dtw s1 s2 dist = 
   let dtw' dtw (a,b) =
      if a <0 || b < 0 then
         infinity
      else if a = 0 && b = 0 then
         dist s1.(0) s2.(0)
      else
         dist s1.(a) s2.(b) +.
         List.fold_left min infinity
            [dtw (a-1,b-1); 
             dtw (a  ,b-1);
             dtw (a-1,b  )]
   in
   y dtw' (Array.length s1-1,Array.length s2-1)
   ;;

Now what did we do this for? Quite simple: We want to re-introduce the caching,
but we don't do this in our function directly. We rather produce a version of
the y-combinator, which does the caching for us:

let y f =
   let cache =Hashtbl.create 10 in
   let rec f' arg =
      try 
         let v = Hashtbl.find cache arg in
         v
      with Not_found -> 
         let v = f f' arg in
         Hashtbl.add cache arg v;
         v
   in
   f'
   ;;

let dtw s1 s2 dist = 
   let dtw' dtw (a,b) =
      if a <0 || b < 0 then
         infinity
      else if a = 0 && b = 0 then
         dist s1.(0) s2.(0)
      else
         dist s1.(a) s2.(b) +.
            List.fold_left min infinity
               [dtw (a-1,b-1);
                dtw (a  ,b-1);
                dtw (a-1,b  )]
   in
   y dtw' (Array.length s1-1,Array.length s2-1)
   ;;

Now, whenever a value is to be calculated, the combinator automatically checks
if it has been calculated before. Also, the combinator produces the cached
version of the function, at each call to itself. So whenever we pass different
arguments to our final dtw function, it will produce a unique cached and 
recursive version, which is then applied. Thus we do not even have to worry,
about cleaning the cache or anything.



--[ 08 ]----------------------------[ Getting rid of the imperative cache ]-----


The next step is rather straight-forward. We have to exchange the imperative 
cache using a Hashtable with a functional one using a map. Unfortunately, 
the Map interface in OCaml is not what it should be (Please, INRIA. Give us a 
better standard library containing all the stuff we need), so we have to code
a module which will then be used as a argument to a functor etc. etc.

Straight-forward code:

module OrderedPair =
struct
   type t = int * int;;
   let compare (a1,b1) (a2,b2) =
      let v1 = a1-a2 in
      if v1 = 0 then b1-b2 else v1
   ;;
end

module PairMap = Map.Make(OrderedTupel);;

I will assume this has been done, wherever I use a map using pairs as keys. If
you don't understand this part, don't worry too much, it is not at all important
for the other steps. Ok, let's see how our new combinator looks now: 

let y f arg =
   let rec f' (cache,arg) = 
      try 
         let v = PairMap.find arg cache in
         (cache,v)
      with Not_found ->
         let (cache',v) =  f f' (cache,arg) in
         (PairMap.add arg v cache',v)
   in
   let _,v = f' (PairMap.empty,arg) in
   v
   ;;

This is basicaly the same as we did before, except that our DTW function has
gotten one extra argument. This is to take care that the cache is passed on 
through all recursive calls. This is also reflected in the changed type:

val y : 
   (('a PairMap.t * PairMap.key -> 'a PairMap.t * 'a) ->
     'a PairMap.t * PairMap.key -> 'a PairMap.t * 'a) ->
      PairMap.key -> 'a

The uncurrying has been done, to make the input and output of the function more
similar, so it's easier to follow what is happening here. A rather similar and
uncurried version can be found in [11]. 

The next part is to include the threading of the cache in our dtw algorithm:

let dtw s1 s2 dist = 
   let dtw' dtw (cache,(a,b)) =
      Printf.printf "Calling DTW on %d %d\n" a b;
      if a <0 || b < 0 then
         cache,infinity
      else if a = 0 && b = 0 then
         cache,dist s1.(0) s2.(0)
      else
         let cache',p1 = dtw (cache,(a-1,b-1)) in
         let cache'',p2 = dtw (cache',(a  ,b-1)) in 
         let cache''',p3 = dtw (cache'',(a-1,b  )) in
         let v = dist s1.(a) s2.(b) +.
            List.fold_left min infinity
               [p1; p2; p3]
         in
         cache''',v
   in
   y dtw' (Array.length s1-1,Array.length s2-1)
   ;;

So now we have occurences of the cache floating around everywhere in our main
algorithm. It would be nice if there was a way to get rid of those. Well, we
are not the first ones to be facing such a problem. What we are going to use
this time are monads, or more particular a kind of state monad.



--[ 09 ]-------------------------------------------[ Digression 3: Monads ]-----


Ok, here comes the last technique you need to know. You've probably heard about
this before, because it is often mentioned in the context of functional 
programming. What I am talking about are Monads. Monads are a rather powerful
device, allowing us to cope with the loss of side effects in a purely 
functional environment. Some concepts that can be realized using monads are,
for example: I/O, list comprehensions, stateful programming, exceptions.
One sentence that occurs a lot when reading up on this topic is that some kinds 
of monads are so simple, that "you could have invented them yourself, and 
probably have". Isn't it good to know how smart you *really* are?

So what exactly are monads? Informally speaking, monads are some kind of wrapped
up values. Whenever you have a monad, it will somehow contain one or more
values, but you cannot easily access those values. Let's look at a simple
example, one that I personally find really easy to understand and one that has
not been covered that much: Functions from Ints to something else ('a).
That's already a monad. The values you have, in those functions, are wrapped:
You need to know the place if you want to access a specific value. However, it
would be a nuisance if you always had to figure out the place (or some other
sort of key) first. You can also provide rules on how to combine the values, 
without knowing the place. Which places are combined then depends a lot on the
implementation of the monadic operators. Oh, and of course, there is a simple
way to construct monads: The return function, which every monad must supply.
The return function constructs a monad in the "most simple way", whatever that 
means. For our function monad the return looks quite simple:

let return a =
   fun ( _ : int) -> a;;

See, we wrapped up the value a at all places. Now we have another operator to
give rules on how to handle values inside. It's usually called >>= and it has
the type:

val (>>=) :
   'a monad -> ('a -> 'b monad) -> 'b monad

So it takes a monad and a rule, and constructs a new monad from that. One simple
example on how to use it:

m >>= fun x ->
return x+2;;

This will construct a function monad that contains the former value increased by
two at all places. Or we can do a heavyside-function on all our values:

m >>= fun x ->
if x > 10 then return 1 else return 0

And quite a lot more. The way the >>= is implemented for this is:

let (>>=) m fm =
   fun (v : int) ->
      (fm (m v)) v;;

So now is a time to get a bit more formal. For our definitions to make sense, 
we would like them to follow a few simple axioms [12]:


1. "return" must preserve all information about its argument.
   
   (return a) >>= f <-> f a
   m >>= return <-> m

2. Binding two functions in succession is the same as binding one function that
   can be determined from them.
   
   (m >>= f) >>= g <-> m >>= (\x -> (f x >>= g))


(Here the symbol "<->" is used to indicate that these two expressions are 
 equivalent)

We'll see if we got those for our function monad:

1.1: 
    (return a) >>= f 
<=> fun v -> f ((return a) v) v           | Definition of >>=
<=> fun v -> f ((fun _ -> a) v) v         | Definition of return
<=> fun v -> f a v                        | Beta-reduction for "(fun _ -> a) v"
<=> f a                                   | Eta-contraction

1.2:
    m >>= return
<=> fun v -> return (m v) v               | Definition of >>=
<=> fun v -> (fun _ -> (m v)) v           | Definition of return
<=> fun v -> m v                          | Beta-reduction
<=> m                                     | Eta-contraction

2:
    (m >>= f) >>= g
<=> fun v -> (g ((m >>= f) v)) v          | Definition of >>=
<=> fun v ->                              |
     (g ((fun v' -> (f (m v')) v') v)) v  | Definition of >>=
<=> fun v ->                              |
     (g ((f (m v)) v)) v                  | Beta-reduction
<=> fun v ->                              |
     (fun v' ->                           |
       (g ((f (m v)) v')) v') v           | Beta-expansion
<=> fun v ->                              |
     (f (m v) >>= g) v                    | Definition of >>=
<=> fun v ->                              |
     ((fun x -> f x >>= g) (m v)) v       | Beta-expansion
<=> m >>= (fun x -> f x >>= g)            | Definition of >>=

                                                                     [ q.e.d. ]


Now, instead of the >>= and the lambda expressions you can often use the
do-notation:

m >>= fun x ->
return x+2;;

becomes 

do 
   x <- m
   return x+2

For the exchange between those two notations see a nice post on the 
Jane-street-Blog [13].

As mentioned before, you can also wrap up values with error codes, seperating
real calculations from those that have produced errors [14]. Or do list
comprehensions using those operators:

let return a =
   [a];;

let (>>=) m fm =
   List.concat (List.map f xs);;

For more information I recommend the chapters on monads from the Haskell School
of Expression [15].



--[ 10 ]-------------------------------------------------[ Final Solution ]-----


Ok, here is our code so far:

let y f arg =
   let rec f' (cache,arg) = 
      try 
         let v = PairMap.find arg cache in
         (cache,v)
      with Not_found ->
         let (cache',v) =  f f' (cache,arg) in
         (PairMap.add arg v cache',v)
   in
   let _,v = f' (PairMap.empty,arg) in
   v
   ;;


let dtw s1 s2 dist = 
   let dtw' dtw (cache,(a,b)) =
      Printf.printf "Calling DTW on %d %d\n" a b;
      if a <0 || b < 0 then
         cache,infinity
      else if a = 0 && b = 0 then
         cache,dist s1.(0) s2.(0)
      else
         let cache',p1 = dtw (cache,(a-1,b-1)) in
         let cache'',p2 = dtw (cache',(a  ,b-1)) in 
         let cache''',p3 = dtw (cache'',(a-1,b  )) in
         let v = dist s1.(a) s2.(b) +.
            List.fold_left min infinity
               [p1; p2; p3]
         in
         cache''',v
   in
   y dtw' (Array.length s1-1,Array.length s2-1)
   ;;

We wanted to get a monad to do the threading of the cache through the recursions
for us. Thus, instead of producing values, we produce functions that depend on a
cache to produce values. Things like this are usually called state monads. 
Because the only type of state we have is our cache, I will call this a cache
monad or cmonad here. Let's see how our operators look:

let return a = fun c -> c,a;;

Ok, that was easy, we just pass on the cache and always return the same value
a here.

let (>>=) m1 fm2  =
   fun c0 ->
       let (c1,v1) = m1 c0 in
       let m2 = fm2 v1 in
       m2 c1
   ;;

Hmm, not to bad either. We just put the cache down our first function, see what
comes back and pass that on to the next. Now all we have to do is to reverse the
order of the arguments in our Y-´Combinator, passing the position first and
then the cache, and then use the monad in our main function:


let y f arg =
   let rec f' arg cache = 
      try 
         let v = PairMap.find arg cache in
         let a,b = arg in
         (cache,v)
      with Not_found ->
         let (cache',v) =  f f' arg cache in
         (PairMap.add arg v cache',v)
   in
   let _,v = f' arg PairMap.empty in
   v
   ;;

let dtw s1 s2 dist = 
   let dtw' dtw (a,b) =
      if a <0 || b < 0 then
         return infinity
      else if a = 0 && b = 0 then
         return (dist s1.(0) s2.(0))
      else
         dtw (a-1,b-1) >>= fun p1 ->
         dtw (a  ,b-1) >>= fun p2 ->
         dtw (a-1,b  ) >>= fun p3 ->
         let v = dist s1.(a) s2.(b) +.
            List.fold_left min infinity
               [p1; p2; p3]
         in
         return v
   in
   y dtw' (Array.length s1-1,Array.length s2-1)
   ;;


Looks a bit more complicated than the imperative version, doesn't it? Well,
yeah. If you look at the Y-Combinator, it does. But you can wrap those 
combinators in a library and reuse them. Then all you have to do is to write
down your principle of optimality and it will produce an dynamic programming 
algorithm for you. Neat, huh?



--[ 11 ]---------------------------[ Conclusions and Further perspectives ]-----


Ok, so that is all. I hope you enjoyed learning a few more functional 
programming techniques. If you really want to get a hang of those, I suggest you
do a few calculations or proofs on your own. Figuring out what actually goes on
in a program is one of the best things you can do to increase your programming
skill. Hum, let me get this short example from Hudak's book:

instance Monad Label where
  return a
    = Label (\ s -> (s,a))
  Label lt0 >>= flt1
    = Label $ \ s0 ->
       let (s1,a1) = lt0 s0
           Label lt1 = flt1 a1
       in lt1 s1

mlabel :: Tree a -> Tree Integer
mlabel t = let Label lt = mlab t
           in snd (lt 0)

mlab :: Tree a -> Label (Tree Integer)
mlab (Leaf a)
   = do n <- getlabel
        return (Leaf n)
mlab (Branch t1 t2)
   = do t1' <- mlab t1
        t2' <- mlab t2
        return (Branch t1' t2')

getLabel :: Label Integer
getLabel = Label (\ n -> (n+1,n))

mtest = let t = Branch (Leaf 'a') (Leaf 'b')
        in mlabel t

or alternatively:

mtest = let t = Branch (Leaf 'a') (Leaf 'b')
        in mlabel (Branch t t)

If you do those calculations you will probably be left with several pages of 
writing you'll better hide under your bed, because people visiting you would
think that you have gone completely nuts. On the other hand, it will also give
you a deeper understanding on how things work beneath the hood of a functional
program. Oh, yeah, the final code for the example I used looks like this:

module OrderedPair =
struct
   type t = int * int;;
   let compare (a1,b1) (a2,b2) =
      let v1 = a1-a2 in
      if v1 = 0 then b1-b2 else v1
   ;;
end

module PairMap = Map.Make(OrderedPair);;

type 'a cmonad = 'a PairMap.t -> 'a PairMap.t * 'a;;

let return a = fun c -> c,a;;

let (>>=) m1 fm2  =
   fun c0 ->
       let (c1,v1) = m1 c0 in
       let m2 = fm2 v1 in
       m2 c1
   ;;

let y f arg =
   let rec f' arg cache = 
      try 
         let v = PairMap.find arg cache in
         let a,b = arg in
         Printf.printf "Retrieving cached value for %d %d\n" a b;
         (cache,v)
      with Not_found ->
         let (cache',v) =  f f' arg cache in
         (PairMap.add arg v cache',v)
   in
   let _,v = f' arg PairMap.empty in
   v
   ;;

let dtw s1 s2 dist = 
   let dtw' dtw (a,b) =
      Printf.printf "Calling DTW on %d %d\n" a b;
      if a <0 || b < 0 then
         return infinity
      else if a = 0 && b = 0 then
         return (dist s1.(0) s2.(0))
      else
         dtw (a-1,b-1) >>= fun p1 ->
         dtw (a  ,b-1) >>= fun p2 ->
         dtw (a-1,b  ) >>= fun p3 ->
         let v = dist s1.(a) s2.(b) +.
            List.fold_left min infinity
               [p1; p2; p3]
         in
         return v
   in
   y dtw' (Array.length s1-1,Array.length s2-1)
   ;;

I added some printfs here, so you can see how it builds up the table for the 
DTW. All the techniques described here have been taken further in [11], where 
they use Multi-Stage-Programming (a special type of Meta-Programming) in 
MetaOCaml to build a library that can solve dynamic programming methods 
blazingly fast.

I also found something called "Algebraic Dynamic Programming" [16,17], but I
haven't managed to do more than to scan over those papers so far.  Maybe I'll
tell you something about this the next time.



-------------------------------------------------------------[ References ]-----

[1]  http://caml.inria.fr/
[2]  R. Bellman. Dynamic Programming. Princeton University Press, 1957.
[3]  http://en.wikipedia.org/wiki/Bellman_equation
[4]  http://en.wikipedia.org/wiki/Dynamic_time_warping
[5]  M. Müller. Information Retrieval for Music and Motion. Springer 2007.
[6]  http://tech.groups.yahoo.com/group/ocaml_beginners/message/9075
[7]  Achim Jung. A short Introduction to the Lambda Calculus. 
     Available at: http://www.cs.bham.ac.uk/~axj/pub/papers/lambda-calculus.pdf
[8]  David C. Keenan. To Dissect a Mockingbird: A Graphical Notation for the 
     Lambda Calculus with Animated Reduction. 
     Available at: http://users.bigpond.net.au/d.keenan/Lambda/
[9]  O. Danvy, K. Malmkjær, J. Palsberg. Eta-expansion does The Trick.
     ACM Transactions on Programming Languages and Systems (TOPLAS) 18.6.
     ACM, 1996.
[10] http://en.wikipedia.org/wiki/Fixed_point_combinator
[11] K. Swadi, W. Taha, O. Kiselyov. Staging dynamic programming algorithms.
     April 2005. Unpublished manuscript.
     available from http://www.cs.rice.edu/~taha/publications.html.
[12] http://en.wikipedia.org/wiki/Monads_in_functional_programming
[13] http://ocaml.janestcapital.com/?q=node/23
[14] D. Teller, A. Spiwack, T. Varoquaux. Catch me if you can: Towards 
     type-safe, hierarchical, lightweight, polymorphic and efficient error
     management in OCaml. 2008, unpublished manuscript.
     available at: http://lambda-the-ultimate.org/node/2892
[15] P. Hudak. The Haskell School of Expression.
     Cambridge University Press, 2000.
[16] R. Giegerich, C. Meier. Algebraic Dynamic Programming. 
     Springer LNCS. Springer 2002
[17] R. Giegerich  and P. Steffen. Implementing Algebraic Dynamic Programming
     in the Functional and the Imperative Programming Paradigm.
     Springer LNCS. Springer 2002.

--------------------------------------------------------------------[ EOF ]-----
This page is part of the .aware network. Content and design © 2004 - 2014 .aware network
Due to certified insanity, we are not responsible for anything, period.