copy module provides these two functions. When you use assignment operator Python just copies the references, not whole copy of the object.

copy performs shallow copy while deepcopy performs deep copy. copy and deepcopy behave exactly the same if the object you are copying is not a compound object i.e. the object does not contain other objects. One simple of compound object is list. Let’s see these in action.

1
2
3
a = [1, 2, [3, 4]]
b = a
print(id(a), id(b))

The numbers are same which means they are same objects. id function takes an object as input and returns an integer that is guaranteed to be unique and constant throughout the object’s lifetime.

Ok. Lets use copy function to copy a and assign it to b.

1
2
3
4
5
import copy

b = copy.copy(a)
print(id(a), id(b))

This time the numbers are different. This means a and b are different objects but what about the objects contained in those lists? Let’s check.

1
print(id(a[0]), id(b[0]))

They return the same number. That means they are the same object. Since they are same object changing the value in one place should modify another right? Let’s try it.

1
2
3
a[0] = 5
print(a, b)
print(id(a[0]), id(b[0]))

The numbers are not same anymore. But why? Because we created a new instance of integer “5” and then told Python to keep the reference to “5” that we just created as the first item of the list. This is usually the expected behaviour.

But let’s try modifying an object in the list in-place i.e. we will not update the reference of the object but modify it. In our example, the 3rd object is a list which can be mutated by replacing an element in the list or calling “append” function and some other functions as well. Let’s try to replace something in the list contained in the list “a” and see what happens.

1
2
3
a[2][0] = 3.1
print(a, b)
print(id(a[2]), id(b[2]))

This time the change was reflected in both lists in a and b. Since we modified the list in “a[2]” in-place i.e. we did not update the reference to this inner list, the change was reflected in the list in “b[2]” as well because we did a shallow copy. Shallow copy only “creates a duplicate” of the “main object” but does nothing about the inner object references. This can lead to very subtle bugs that are hard to track down.

To prevent this, use deepcopy instead. It will recursively copy the objects so you’ll get a “true duplicate” that you can modify to your heart’s extent without having to worry about modifying original copy. Let’s run the same experiment as above but we’ll use deepcopy this time.

1
2
3
4
b = copy.deepcopy(a)
print(a, b)
print(id(a), id(b))
print(id(a[2]), id(b[2]))

This time we see different numbers for the “main list” as well as the “inner list”. Let’s change one of the elements of inner list in “a”.

1
2
a[2][0] = 3.5
print(a, b)

The change is only reflected in the list “a” and not in list “b”.

Now that we have some idea about this, how does Python know how to do copy or deepcopy user defined classes? Simple. You have to implement __copy__() and __deepcopy__() methods and Python will call these functions depending on the “type of copy” you are doing.

Categories:

Updated:

Comments