Saturday, May 10, 2008

Cloning Objects In .Net

(Download source at the end of the article)

The issue of cloning pops up about once a year for me, and I'm always perplexed at the complexity of what seems to be a very simple thing...give me a copy of my object, please!

This is what you want to do, but of course it won't work:

Foo f = new Foo();

Foo f2 = new Foo();

f = f2; //'Point' variable f to the memory where variable f2 resides


This is what is referred to as 'shallow cloning'. Essentially with the above code, you have two variables pointing to the same space in memory. In all the research I've done, there seems to be confusion about what is a 'deep clone'. If I am told that a method will do a deep clone, I am assuming that the object, all it's properties, and all their children's properties will be cloned along with it. (I'm going to call this 'recursive cloning'.) However, in a lot of my reading not everyone holds this to be true. Many people will tell you that deep cloning is only cloning properties one level deep...as such:

    [Serializable]
    public class A : BaseClass
    {
        public string AProp { get; set; } //Cloned
        public List<B> AChildren { get; set; } //Copies a reference from the source object
    }
 
    [Serializable]
    public class B : BaseClass
    {
        public string BProp { get; set; } //Not cloned
        public List<C> BChildren { get; set; } //Not cloned
    }

So now that I have that straight in my mind, I need to think about my options.

1) Manual clone
For each one of your objects, implement the ICloneable interface and new up your object and start assigning your object's properties to the new object. (Sidebar: It's becoming taboo to use ICloneable.)

I'm throwing out a manual cloning technique, because I'm lazy and who wants to have to remember to add another line of code in your Clone() method every time you add a property. You can get your recursive clone using this approach, but it's going to be time consuming and make your code less readable. My vision for a clone method is to sock it away in a base class...with manual cloning you're going to have to implement a Clone() method for each concrete class.

2) Reflection
Code using reflection is hard to read - I'll say that first because it's my biggest peeve with it. Besides that, you're going to have a hard time implementing a 'recursive clone' with it. The reason its difficult is because you have to implement a clone implementation for each generic collection type. If you're only using primitives (string, int, long, etc) then it's great.
3) Serialization
This is the clear winner. Serializing your object to another place in memory and then de-serializing it into a new instance is a great way to get a recursive clone. If you don't want to clone a particular field, just use the [NonSerialized] attribute.
I did a quick and dirty performance test on a clone method using reflection(doing a recursive clone) and a method using serialization. Here are the results:

For 100,000 objects with an object graph 3 levels deep (see sample app attached):
Serialization: 20.234 seconds.
Reflection: 17.613 seconds.

Download the source here for an example of cloning using serialization and reflection.

Why is cloning useful?
Two reasons I can think of off the top of my head:

1) To allow your user to create a copy of a domain object in the application. Maybe you want to allow the user to copy an "Order" object because they want to create a new order object with very similar details to an existing one, but don't want to go through all the work to create it by scratch.

2) To keep track of state changes to your object. This is especially applicable when developing SOAs. The client made changes to your object, but before you apply the changes you want to keep a copy in memory so you can say things like "if the user changed the status, then we need to do xyz."

0 comments: