Python Object References

Mon, Jul 9, 2018 6-minute read

Python Object Reference ELI5


I first learnt about variables through the analogy of “variables are boxes” and that we assign things to those boxes Turns out, this isn’t particularly helpful in objected orientated programming. This post is about how python treats object assignment and some of the hidden gotcha’s that can cause unintended errors along the way. Instead of “boxes” it is better to think of variables as “labels” that we attach to objects. And, as everything in python is an object its important to remember that all objects have three things; identity, type and values. Values are the only things that change once an object is created, and it values that we often care about, and hence label.

Labels not boxes

Extending the “labels” metaphor a little we look at the assignment of variables.

a = 2 # we label the integer 2 as 'a'
b = a # 'a' is now labelled as 'b'
c = b # and 'b' is now labelled as 'c'

Above we can see that the object 2} is assigned to the variable ‘a’. Each subsequent assignment thereafter is simply a reference to the same object. When viewed through this lense you can start to see how objects have labels. It is not feasible that the 2 can exist in three different boxes rather we visualise 2 having three sticky notes attached to it. If we changed a like this a = 20 then it is just a matter of peeling off the sticky note with a written on it from 2 and attaching it to 20. To further aid in this thinking, always read assignments from right to left. The right side is where the object is created or retrieved and the left is what we bind to it (the label.. Enough you get it already!)

When an object like 2 has many labels we called this aliasing. Aliasing is an important concept to grasp, and to illustrate why we will examine the identity of a, b, and c.

print(f'a id: {id(a)}') # original object')  # a id: 139886603774600
print(f'b id: {id(b)}') # alias of a')       # b id: 139886603774600
print(f'c id: {id(c)}') # copy of a')        # c id: 139886603774600

All aliases of a have the same identity which in python is unique integer representing its C memory address. If any change were to be made the identity integer would also change to reflect that.

When is == true?

Let’s check out Equality and Identity (and aliases, too)

An object’s identity never changes once it has been created. However its values might, and generally this is what we care about more. Python gives us the option to check either like so:

a == b # compares the values
a is b # compares the identities

Lets extend this using a more complex example using some dictionaries.

batman = {'name': 'Bruce Wayne', 'job': 'crime fighter'}
bruce = batman
print(batman == bruce)  # True
print(batman is bruce)  # True

Both batman and bruce are equal in identity, and their values. Suppose we have a vigilante crime fighter out there pretending to be batman, named manbat, does he have the same equality?

manbat = {'name': 'Bruce Wayne', 'job': 'crime fighter'}
print(batman == manbat) # True
print(batman is manbat) # False

In this case, both manbat and batman share equal values but not the same identity. manbat is not an alias of bruce or batman, and thus has his own unique identity. This is because we created an entirely new identity albeit with the same values as batman.

Much of the time we care mostly about the values an object holds not its identity but you will see is in a lot during conditionals such as:

if x is None:
  do something
if x is not None:
  do something else

Alias Issues

Something I didn’t realise until it came back to haunt me much later is that aliases can have unintended side effects with mutable types. Let’s say we have two lists, the original and its alias. The alias will have items added to it but we want the original untouched for whatever reason.

orig = [10, 20, 30, [100, 200]]
new = orig

Looks good, we can now make changes to new.

new.append('FizzBuzz')
print(orig) # [10, 20, 30, [100, 200], 'FizzBuzz']
print(new)  # [10, 20, 30, [100, 200], 'FizzBuzz']

After appending to new it becomes apparent that this change has affected both lists. This happens because the alias works two way with mutable types. I think this is really important to know - aliases are not copies!

Copies

If aliases aren’t copies then how do we copy?

orig = [10, 20, 30, [100, 200]]
new = list(orig)
# dict(x) also works this way
print('orig id:', id(orig)) # orig id: 140443406513496
print('new id:', id(new))   # new id:  140443402343535

By using the list() class we successfully create two new objects. Now if we append or remove items from either list it does not propagate through. Except, it does sometimes.

In this case we are only making a new copy of the overall object but not any mutable nested types within the copy. So while any changes made within the first layer of the object are contained within the copy, any mutable objects nested more deeply will be aliases.

Confused, an example.

orig = [10, 20, 30, [100, 200]]
new = list(orig)
new.append('not nested')
print(orig) # [10, 20, 30, [100, 200]]
print(new)  # [10, 20, 30, [100, 200], 'not nested']
# first layer is not affected as it is a copy, not an alias
orig[-1].append('i am aliased to orig')
print(orig) # [10, 20, 30, [100, 200, 'i am aliased to a']]
print(new)  # [10, 20, 30, [100, 200, 'i am aliased to a'], 'not nested']

While the orig and new are independent of each other when making changes to the first layer of abstraction, any mutable types within that are simply aliases of the copies source.

Another example to check this out.

# before we started making alterations to the lists
print(id(orig))     # 140443390926984
print(id(new))      # 140443392352593
print(id(orig[-1])) # 140443395483400
print(id(new[-1]))  # 140443395483400

Inspecting the identities reveals that only the overall object's were initialised as new objects but the nested types within were bound to the original nested type - an alias!

This is something to take into consideration when passing variables around that have nested types. To circumvent this immutable types such as tuples can be used in place.

Python can do deep copies which will take care of this issue, but it has its own drawbacks. Of which we not be discussed here as this post is already quite long. See Dan Bader's excellent post for more information.

Wrapping Up

In python all objects have a type, identity and values. Only the values can change after it is created and knowing a little bit more about how this works can help us prevent unintended bugs.

Notes:

  • assignment does not create copies
  • nested mutable types within shallow copies are aliases
  • equality has two different checks; identity, and values

Keep up to date with my stuff

Subscribe to get new posts and retrospectives

Powered by Buttondown.