Strings and F# Immutable Lists

In .NET strings are immutable. I think most .NET programmers agree that this is a good thing, as it generally makes code that works with strings safer and more predictable. However, operations that involve large amounts of string manipulation perform poorly because each time a string concatenation occurs then both strings must be copied. His often leads people to believe that all immutable objects will perform poorly. The aim of the article is to show that this is not the case, or at least it’s not as black and white as you may think. Specifically we’re going to compare how.NET strings differ considerable in performance characteristics to F#’s immutable lists.

Before we dive into immutable lists, let’s take a look at some code that demonstrates some poorly performing string concatenation and then take a look at how we can fix this.

class Program

{

    static void Main(string[] args)

    {

        var strings = new List<string> { "Toto", "Titi", "Tata" };

        var res = "";

        foreach (var s in strings)

           {

            res += s;

           }

        Console.WriteLine(res);

    }

}

 

As the length of the list “strings” grows, we will start to see the performance of this code degrade. The performance of this code for a list of 3 items, will be perfectly acceptable, but we’ll start to serious perf problems as we approach 100s or 1000s of items (also depending on the length of the strings in the list). This is because both “res” and “s” must be copied to perform the concatenation. So as “res” grows with each iteration of the loop so does the time to perform the copy.

The way to fix this is fairly obvious, we use a string builder:

class Program2

{

    static void Main(string[] args)

    {

        var strings = new List<string> { "Toto", "Titi", "Tata" };

        var res = new StringBuilder();

        foreach (var s in strings)

        {

            res.Append(s);

        }

        Console.WriteLine(res.ToString());

    }

}

 

What’s more interesting is to look at why this fixes the problem. It would be fairly easy to say a string builder is mutable therefore faster, but this isn’t quite the answer we’re looking for. The reason using a string builder is faster, in this case, is because we avoid the cost of copying “res” each iteration, only the newly input “s” needs to be copied into the string builder. While a string builder is capable of performing inserts/removes in this case we get good performance because we limit ourselves to append to the end of collection of characters (string builder could be thought of as just a special type of collections of characters).

F#’s lists are immutable linked list. This means each items in the list is either empty, which always represents the end of a list, or the item stored in the list along with a reference to the remaining list. This means a graphical representation of an F# list would look something like this:

 F# linked list

We build up F# lists by appending new items to the head of the list. Here in C# is how we would build up an F# list from the frameworks generic List<> class.

class Program3

{

    static void Main(string[] args)

    {

        var strings = new List<string> { "Toto", "Titi", "Tata" };

        var res = ListModule.Empty<string>();

        foreach (var s in strings)

        {

            res = FSharpList<string>.Cons(s, res);

        }

        Console.WriteLine(res.ToString());

    }

}

 

While the syntax for working with F# lists from C# is rather ugly, I think this example makes it clear how F# list work. Each item is the collection “strings” is taken in turn and append to the head of the F# list using its “Cons” operator. In many ways this is similar to how the string builder behaves, we have no need to make a copy of “res”, we simple append each new “s” to the end of it. For those of you curious to see what this would look like in F# here’s the equivalent in F#, although it is really idiomatic F# since it uses a mutable identifier:

let main() =

    let strings = new ResizeArray<string>(["Tata"; "Toto"; "Titi"])

    let mutable res = List.empty

    for s in strings do

        res  <- s :: res

    printfn "%A" res

 

main()

 

In conclusion, you need to look beyond whether a data structure is mutable or immutable to see whether it will perform well. In some circumstances F#’s immutable list can perform very well, i.e. when you only need the ability to append to the end of the collection and enumerate the collection. In other circumstance they may perform poorly, i.e. when you need random access to items within the lists. These performance tradeoffs also need to be balanced with the fact the F# list can also safely be shared between threads or tasks, we can be very convenient in many concurrency scenarios.

Bookmark
dotnetkicks+, digg+, reddit+, del.icio.us+, dzone+, facebook+

Print | posted @ Sunday, April 18, 2010 10:10 AM

Comments on this entry:

Gravatar # re: Strings and F# Immutable Lists
by Jonathan Allen at 4/19/2010 7:24 AM

That seems quite silly to me. Linked lists can grow cheaply, but each node wastes a lot of memory on pointers and object overhead. You have to waste even more memory on back-pointers unless you want to limit yourself to backwards-only iteration. If you want a count, then you have to either iterate over the whole thing, give up immutablility, or keep a counter in each node.

To me it makes a lot more sense to just use a normal CLR List. Once completely built, just place a read-only wrapper around it or do a one-time copy into a ReadOnlyCollection.
Gravatar # The Morning Brew - Chris Alcock &amp;raquo; The Morning Brew #582
by Pingback/TrackBack at 4/19/2010 9:35 AM

The Morning Brew - Chris Alcock &amp;raquo; The Morning Brew #582
Gravatar # re: Strings and F# Immutable Lists
by Robert Pickering at 4/21/2010 11:42 AM

Yes, linked list are less memory efficient than a list which uses an array as its underlying storage, like the BCL’s List<T>. However, List<T> is not perfect in memory efficiently either, since the underlying array is doubled in size each time it runs out of space there are cases where it’s almost as memory inefficient as a linked list. While the programmer can control this though the .Capacity property, generally no one bothers to that. Why? Because give the size of memory available in modern machines, memory efficient simple isn’t the issue that it was, you can afford to have a list were each element has a forward pointer or has extra space available in its underlying store. If your collection really is the size that memory efficiency is going to be a problem, then I don’t think either List<T> or F# list is going to the answer to your problem, you’re probably going to need to move to some kind of lazy collection than enumerates over a file (a stream or F#’ seq collection, for example).

I agree that ReadOnlyCollection<T> can be useful in some circumstances, but it suffers from the same problem as .NET strings. If you want to add even one item you need to copy the whole collection into a mutable list, then wrap this in another read only collection. It’s also debatable whether it’s to safe share ReadOnlyCollection<T> between multiple threads. Since ReadOnlyCollection<T> doesn’t copy the collection it’s given in the constructor, if a thread has access to the underlying collection then it may modify the collection, possibly while other threads were enumerating what they thought was a read only collection.

So I think F# immutable list are a big win multi thread scenarios, since then can be safely shared between threads and each thread can enumerate the collection or add items to its local copy of list. Yes, the lack of random access and having to enumerate to count, is a trade off you have to live with (as a I already mentioned in the conclusion). At the end of the day, I don’t believe there are any perfect solutions, just a series of tradeoffs you have balance.

Your comment:

(Note: all comments are moderated so it may take sometime to appear)

Title:
Name:
Email:
Website:
 
Italic Underline Blockquote Hyperlink
 
 
Please add 5 and 3 and type the answer here: