No. 1 the Foreach Loop

Last weekend I attended JB Evian’s code camp in the l’Ardeche département in the south of France, just to the north of the Provence region that forms the south east part of the French Mediterranean coast. (I was very interested in visiting this area since my wife and I became begin fans of Nigel Farrell’s “A Place in the France”  and  A Place in the France – Indian Summer when we first move to France, Nigel’s problems seem to dwarf any we had and helped put things in perspective).

Anyway the aim of the project was to create a decompile. The problem is quite easy to describe, take the IL codes and reconstruct the C# code that the created them (we also did some work on constructing VB.NET or F# in the place of C# but haven’t got very far on this front). With C# it’s quite easy to create a first version that’s made up of the method calls and operations, because there’s roughly a one to one mapping between C# and most IL constructs. However C# has lots of constructs that generate quite complex IL patterns, recognising these patterns is quite challenging. A good example of this is the foreach loop. A foreach loop is expanded by the compiler into something that looks a lot like:

IEnumerator<string> V_0 = strings.GetEnumerator();

try

{

      while (V_0.MoveNext())

      {

            string V_1 = V_0.Current;

            Console.WriteLine(V_1);

      }

}

finally

{

      if (V_0 != null)

      {

            V_0.Dispose();

      }

}

Reconstructing this expansion from the IL is fairly easy since each of the constructs shown has an equivalent in IL, spotting that this is an IL loop is fairly tricky. The first attempt we made took about 300 lines of C# code. To try and improve on this we looked at re-implementing this in F# and came up with a solution that’s about 160 lines long, quite an improvement but still a fair amount of code. The F# implementation is shorter for two main reasons, the pattern match you over C# types and using the option type (Some / None) to single the success or failure of an operations and return a value at the same time.

Let’s look at code fragment that demonstrates these two F# features, here’s a fairly typical example:

let isVariableAssign (exp: Expression) =

    match exp with

    | :? AssignExpression as assign ->

        match assign.Target with

        | :? VariableReferenceExpression as var ->

            Some { Target = var.Variable; Expression = assign.Expression }

        | _ -> None

    | _ -> None

Here we can see that we are looking for a variable assignment. Here the “exp” is a C# class that can we know is of base type “Expression”, which is abstract so we know we will receive a concrete class that inherits from it, in this case we’re interested in knowing if it’s a “AssignExpression”. We test this using the :? syntax:

    match exp with

    | :? AssignExpression as assign ->

We can use the keyword “as” to give a name object of this type in the case when a match is made. So in this case we have “assign” which is typed as an “AssignExpression” giving access to all the members that are specific to the “AssignExpression” without having to perform the cast ourselves.

The option type is used to communicate whether match was successful or not. In the case where it wasn’t successful we return “None”, when was successful we return “Some” and the structure of “Some” allows us to return some extra information, in this case a record:

        match assign.Target with

        | :? VariableReferenceExpression as var ->

            Some { Target = var.Variable; Expression = assign.Expression }

        | _ -> None

While this is a satisfy saving we can go further if we use F#’s union types to represent the tree rather than C# types. This is because union types can be decomposed further into within a pattern matching statement. In short you can simply write out the data structure that you are looking for:

    match stmts with

    | ExpressionStatement

        (AssignExpression

           { Target = VariableDeclarationExpression _;

             Expression =

               MethodInvocationExpression

                { Method = MethodReferenceExpression { Target = exp; Method = getEnum } } })

      ::  TryStatement

            { Try = [ WhileStatement

                        { Condition =

                            MethodInvocationExpression

                                { Method = MethodReferenceExpression { Method = moveN } };

                          Body =

                            (ExpressionStatement

                                (AssignExpression

                                    { Target = VariableDeclarationExpression var;

                                      Expression =

                                        MethodInvocationExpression

                                            { Method =

                                                MethodReferenceExpression { Method = getCurr } } })) :: rest } ];

              CatchClauses = [];

              Finally = Some [ IfStatement { Condition =

                                                BinaryExpression { Left = VariableReferenceExpression _;

                                                                   Operator = BinaryOperator.ValueInequality;

                                                                   Right = LiteralExpression null };

                                             Then =

                                                    [ ExpressionStatement

                                                        (MethodInvocationExpression

                                                            { Method =

                                                                MethodInvocationExpression

                                                                    { Method =

                                                                        MethodReferenceExpression { Method = disp } } }) ]

                                             Else = None } ] }

      :: tail

      when getEnum.Name = "GetEnumerator" &&

           moveN.Name = "MoveNext" &&

           getCurr.Name = "get_Current" &&

           disp.Name = "Dispose" -> ForEachStatement

                                        { Variable = var;

                                          Expression = exp;

                                          Body = rest; }

                                    :: matchForEach tail

Despite the advantages of F# we decided not to implement it the C# decompiler in F#, this is because a lot of work has already been done in C# and we don’t want to throw this away, and also there’s quite a few developers on the project who don’t know F# yet. However when we get round to implementing the F# decompiler we’ll definitely be using F# and really digging into some of these techniques.

 The full code is available here. You’ll need to download and compile it against the latest version of Cecil.Decompile, which is available from the mono project svn.

Feedback:

Feedback was imported from my only blog engine, it's no longer possible to post feedback here.

re: How to Recognise Code Constructs (from quite a long way away) - Luis Diego Fallas

Very nice post and very interesting work with the decompiler.

I think you have another alternative to do pattern matching with the existing C# types by using Active Patterns.

Active Patterns were very useful to manipulate Expression Trees while creating a LINQ provider here.

Looking forward to read about the F# decompiler!

Thanks,
Luis

re: How to Recognise Code Constructs (from quite a long way away) - Robert Pickering

Thanks for you comments Luis. I had considered active patterns over the C# types, but have a feeling that union types will still work better. However it's probably worth at least trying active patterns at some point.

I should also stress that I'm quite busy with other projects so, work on the F# decompiler is quite a long way off.

Thanks,
Robert

Evilznet.com &amp;raquo; Blog Archive &amp;raquo; Evilznet apr??s un CodeCamp - http://www.evilznet.com/?p=51

Evilznet.com &amp;raquo; Blog Archive &amp;raquo; Evilznet apr??s un CodeCamp