Thursday 29 May 2008

Wanted Language Feature: Reinterpret Cast of Byte Arrays

I am a huge fan of C#, but one of the most frustrating things about it is dealing with byte arrays which actually represent some other type of data. For example, suppose I have an array of bytes that I know actually contains some floating point numbers. What I would like to be able to do is:

byte[] blah = new byte[1024];
float[] flah = (float[])blah;

But of course, this won't compile. There are two options:

1. Create a new array of floats and copy the contents of the byte array into it, using the BitConverter.ToSingle method. I could then access the contents as floats. The disadvantages are obvious. It requires twice the memory, and copying it across is not free. Also if I modify any values, they may need to be copied back into the original byte array.

2. Using the unsafe and fixed keywords, pin the byte array where it is and obtain a float pointer. The disadvantages are obvious. First, pinning objects interferes with the garbage collector, reducing performance (and performance is often exactly what you want when you are dealing with arrays of numbers), and second, as the keyword suggests, pointers are unsafe. Here's some example code from my open source audio library NAudio that shows me using this method to mix some audio:

unsafe void Sum32BitAudio(byte[] destBuffer, int offset, byte[] sourceBuffer, int bytesRead)
{
    fixed (byte* pDestBuffer = &destBuffer[offset],
              pSourceBuffer = &sourceBuffer[0])
    {
        float* pfDestBuffer = (float*)pDestBuffer;
        float* pfReadBuffer = (float*)pSourceBuffer;
        int samplesRead = bytesRead / 4;
        for (int n = 0; n < samplesRead; n++)
        {
            pfDestBuffer[n] += (pfReadBuffer[n] * volume);
        }
    }
}

But does it really need to be this way? Why can't the .NET framework let me consider a byte array to be a float array, without the need for copying, pinning or unsafe code? I've tried to think through whether there would be any showstoppers for a feature like this being added...

1. The garbage collector shouldn't need any extra knowledge. The float array reference would be just like having another byte array reference, and the garbage collector would know not to delete it until all references were gone. It could be moved around in memory if necessary without causing problems.

2. Sizing need not be an issue. If my byte array is not an exact multiple of four bytes in length, then the corresponding float array would simply have a length as large as possible.

3. This would only work for value types which themselves only contained value types. Casting an array of bytes to any type that contained a reference type would of course be unsafe and allow you to corrupt pointers. But there is nothing unsafe about casting say an array of bytes into an array of DateTimes. The worst that could happen would be to create invalid DateTime objects.

The benefits of adding this as a language feature would go beyond simply playing with numbers. It would be ideal for interop scenarios, removing the need for Marshal.PtrToStructure in many cases. Imagine being able to write code like the following:

byte[] blah = new byte[1024];
int x = MyExternalDllFunction(blah);
if (x == 0)
{
    MyStructType myStruct = (MyStructType)blah;
}
else
{
    MyOtherStructType myOtherStruct = (MyOtherStructType)blah;
}

What do you think? Would you use this feature if it was in C#? It needn't be implemented as a cast. It could be a library function. But the key thing would be to create two different struct or array of struct types that provided views onto the same block of managed memory.

1 comment:

Anonymous said...

Comming from C/C++ world this is an efficent way of coding. I am writing a program where I wanted to do this type of cast. I'm reading several hundred mega bytes and reading all data as bytes and reinterpreting them as floats is very fast.