Tuesday, December 9, 2008

.net structures performance tips and tricks

Recently I dug into clr structure performance and found it's rather funny. Some revelations were new to me.

At first. If you have a structure containing two boolean fields, it is five times faster than structure containing three boolean fields, but it is equal to four boolean fields one! I merely return structure from method several million times and capture elapsed time by Stopwatcher (it is common scenario when we pass structure to method or return from it):
public struct TestStruct
{
    public bool Field1;
    public bool Field2;
    public bool Field3;
    public bool Field4;

    public static TestStruct CreateNew()
    {
        return default(TestStruct);
    }
}

static void Main(string[] args)
{
    const int iterations = 50000000;
    var sw = Stopwatch.StartNew();

    for (int i = 0; i < iterations; i++)
    {
        var a = TestStruct.CreateNew();
    }

    sw.Stop();
    Console.WriteLine("elapsed {0} ms", sw.ElapsedMilliseconds);
    Console.ReadKey();
}
It seems like it depends on how optimal structure layout is for clr. A boolean field takes one byte. Let's add one more boolean field (fifth). Now we have the five times slump again. But, if we add one more boolean field (sixth), we have the same loss in performance. Seventh such field makes performance seven times slower :).

Conclusionthe optimal structure size values are 1, 2, 4, 8. Let's take a look at another example:


public bool Field1;
public bool Field2;
public bool Field3;
public bool Field4;
public bool Field5;
public short Field6;
public bool Field7;


public short Field1;
public bool Field2;
public bool Field3;
public bool Field4;
public bool Field5;
public bool Field6;
public bool Field7;

The structure fields ordering from the first column four times slower than from the second. It seems it is because Field6 allocation starts from an uneven byte 5. Conclusion: field starting byte should be even if the field takes more than one byte.

The structure from our last example has eight bytes size and it's equal to a single long type allocation in memory. What if we create a 9 bytes size structure. Yes, it would be slowly and it is four times slowly. But the even and uneven allocation rules are still correct and can hit performace in approximately 2 times.

Can we make up more generalize conclusion? I think so. If we create a structure with a single long type field and then will add one more long field and then one more we will see that we have approximately arithmetical progression in growth of elapsed intereval for our loop. Conclusion: the less structure the better. Difference between one and two longs allocation is highest: five times, but then it becomes less and less for 2 vs 3, 3 vs 4 long type fields structures.

Comparing a simple class object creation time with 2 long type fields structure shows that such the structure is still 3 times faster. But a six long type fields structure creation time is approximately equal to that of the class.

No comments:

Post a Comment