Is Switching to Python 3 and Back Worth It?

It has been seven years since Python 3 was released, but there are still those who choose Python 2 over the newer version. This is particularly problematic for newcomers to the Python programming language. I personally encountered this issue at my previous workplace, where colleagues found themselves in this predicament. They were not only unaware of the differences between the two versions but also uncertain about the version installed on their systems.

Naturally, different colleagues had different versions of the interpreter, which could lead to significant issues if they attempted to share scripts without considering the version incompatibility.

However, it’s important to note that this was not entirely their fault. There’s a need for improved documentation and awareness to address the fear, uncertainty, and doubt (FUD) that sometimes influence our decisions. This article is intended for those individuals, as well as those who currently use Python 2 but are unsure about upgrading, perhaps due to negative experiences with early versions of Python 3 that lacked refinement and library support.

Two Dialects Within a Single Language

A fundamental question arises: are Python 2 and Python 3 truly distinct languages? This seemingly simple question doesn’t have a straightforward answer. While some might argue that “No, it’s not a new language”, the reality is more nuanced. several proposals that would have broken compatibility without yielding important advantages have been rejected.

Python 3 is a new iteration of Python, but it doesn’t guarantee backward compatibility with code written for Python 2. However, it is possible to write code that works seamlessly with both versions. This is not accidental but a deliberate effort by Python developers, as outlined in various Python Enhancement Proposals (PEPs). In the rare instances where syntax incompatibility arises, Python’s dynamic nature allows us to modify code at runtime, eliminating the need for preprocessors with unfamiliar syntax.

Therefore, syntax differences are not a major concern (especially when disregarding Python 3 versions prior to 3.3). The more significant differences lie in code behavior, semantics, and the availability of major libraries specific to each version. While this presents a challenge, it’s not entirely unfamiliar to experienced programmers who have encountered similar situations with other languages. It’s not uncommon to encounter legacy codebases or libraries that fail to build with newer versions of the same compiler. In such cases, the compiler typically assists in resolving these issues. In Python, this assistance comes from your test suite.

This begs the question: why introduce a new version with these changes? What advantages do these modifications offer?

A Practical Illustration

Let’s imagine we want to create a program that reads the owner of files and directories (on a Unix system) within the current directory and displays them on the screen.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# encoding: utf-8

from os import listdir, stat

# to keep this example simple, we won't use the `pwd` module
names = {1000: 'dario',
         1001: u'олга'}

for node in listdir(b'.'):
    owner = names[stat(node).st_uid]
    print(owner + ': ' + node)

At first glance, everything appears to function correctly. We’ve specified the encoding for the source code file, and if a file named ‘олга’ (with user ID 1001) exists in the directory, it will be printed accurately, along with files containing non-ASCII characters.

However, there’s a scenario we haven’t considered: a file created by a user named ‘олга’ with a name containing non-ASCII characters.

1
su олга -c "touch é"

Executing this script will result in the following error:

1
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Imagine encountering a similar situation with a program consisting of thousands of lines of code instead of the four in our example. Your program gains users, including those from non-English speaking countries with names containing special characters. Everything runs smoothly until a user with such a name creates a file. Suddenly, your code throws an error, potentially causing server errors (HTTP 500), and you’re left to debug a large codebase to identify the root cause.

How does Python 3 address this issue? If you execute the same script using Python 3, you’ll notice that it immediately detects the potential problem. Even without files with special characters or users with unique names, you’ll receive an exception like this:

1
`TypeError: Can't convert 'bytes' object to str implicitly`

Pointing to the following line:

1
print(owner + ': ' + node)

The error message is quite clear: the owner variable is a string object, while node is a bytes object. This indicates that the issue stems from listdir returning a list of bytes objects.

What many might not know is that listdir returns a list of bytes objects or Unicode strings depending on the input type. We intentionally avoided using listdir('.') to maintain consistent behavior across Python 2 and 3. Otherwise, Python 3 would have returned a Unicode string, masking the bug.

Changing a single character from listdir(b'.') to listdir(u'.') ensures the code functions correctly on both Python versions. For consistency, we should also change 'dario' to u'dario'.

This difference in behavior between Python 2 and 3 stems from a fundamental change in how they handle strings. This difference becomes particularly apparent when porting code between the two versions.

This situation exemplifies the adage, “Splitters can be lumped more easily than lumpers can be split.” Python 2’s approach of lumping together Unicode strings and default byte strings, allowing for implicit coercion, has been split in Python 3.

Automated Conversion Tools

While tools like 2to3 are well-designed and helpful for automating the conversion process, they have limitations due to this bytes/Unicode split. Since the behavioral difference emerges during runtime, tools limited to static analysis and parsing cannot fully address the complexities of a large Python 2 codebase that mixes these types. You’ll need to manually design your API to determine whether functions that previously accepted any string type should now work with specific types. Conversely, tools converting Python 3 code to Python 2 have a much easier task.

Here’s an example:

I once wrote a toy HTTP server (with python-magic as its only dependency). Here’s the Python 2 version, automatically converted from the Python 3 code without manual intervention: https://gist.github.com/berdario/8abfd9020894e72b310a.

You can examine the conversion process using code converted to Python 3 or convert it directly on your system. When you attempt to execute the converted code, you’ll find that every error you encounter relates to the bytes/Unicode distinction.

Applying manual changes like these: https://gist.github.com/berdario/34370a8bc39895cae139/revisions

will make your program work on Python 3. While not overly complex, these changes require careful consideration of the data types used by your functions and the program’s control flow. With thousands of lines of code, this can become a daunting task, potentially requiring hundreds of modifications.

For curiosity’s sake, you can try converting this Python 3 code back to Python 2 using 3to2, resulting in this: https://gist.github.com/berdario/cbccaf7f36d61840e0ed. The only manual change required is adding .encode('utf-8') at line 55.

Starting with Python 3 makes the process much smoother if you ever need to revert to Python 2. However, if cross-version compatibility is a must, complete conversion might not be ideal. Maintaining compatibility with both versions is preferable. For this purpose, tools like futurize prove invaluable.

Beyond Unicode: Exploring Other Python 3 Advantages

Even if you’re unable to utilize Python 3 in production (perhaps due to reliance on a bulky Python 2-only library), maintaining code compatibility with Python 3 is advisable. You could even stub/mock incompatible libraries to benefit from continuous integration with your tests on both versions. This simplifies future migration to Python 3 and promotes better API design and early error detection, similar to the example discussed earlier.

While the discussion about porting and the bytes/Unicode split might make Python 3 seem like the lesser of two evils, it’s important to highlight its inherent benefits. Are these benefits limited to new features added to the language and its standard library?

After five years since the last minor release of Python 2, numerous compelling additions have emerged. For instance, I frequently rely on features like the new keyword-only arguments.

Optional Keyword Arguments

While creating a function to merge an arbitrary number of dictionaries (similar to dict.update but without modifying inputs), adding a function argument to customize merging logic felt natural. This allows the function to be called as follows to merge multiple dictionaries, retaining values from the rightmost dictionaries:

1
2
merge_dicts({'a':1, 'c':3}, {'a':4, 'b':2}, {'b': -1})
# {'b': -1, 'a': 4, 'c': 3}

Similarly, for merging by adding values:

1
2
3
from operator import add
merge_dicts({'a':1, 'c':3}, {'a':4, 'b':2}, {'b': -1}, withf=add)
# {'b': 1, 'a': 5, 'c': 3}

Implementing this in Python 2 would have required defining a **kwargs input and checking for a withf argument. However, a typo like withfun would go unnoticed. In contrast, Python 3 allows optional arguments after variable arguments (usable only with keywords):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def second(a, b):
    return b

def merge_dicts(*dicts, withf=second):
    newdict = {}
    for d in dicts:
        shared_keys = newdict.keys() & d.keys()
        newdict.update({k: d[k] for k in d.keys() - newdict.keys()})
        newdict.update({k: withf(newdict[k], d[k]) for k in shared_keys})
    return newdict

Unpacking Operator

From Python 3.5 onwards, naive merging can be achieved using the new unpacking operator. However, Python introduced an improved unpacking mechanism even before version 3.5:

1
2
3
a, b, *rest = [1, 2, 3, 4, 5]
rest
# [3, 4, 5]

Available since Python 3.0, this unpacking, akin to destructuring, offers a limited form of pattern matching commonly found in functional languages. It’s also a staple in dynamic languages like Ruby and JavaScript (with ECMAScript 2015 support).

Streamlined APIs for Iterables

Python 2 had duplicated APIs for handling iterables, with default ones having strict semantics. Python 3 streamlines this by generating values on demand: zip(), dict.items(), map(), range(). Want to create a custom enumerate function? Python 3 makes it effortless by composing standard library functions:

1
zip(itertools.count(1), 'abc')

This achieves the same result as enumerate('abc', 1).

Function Annotations

Wouldn’t it be convenient to define HTTP APIs like this?

1
2
3
4
5
6
7
8
9
@get('/balance')
def balance(user_id: int):
    pass
    
from decimal import Decimal

@post('/pay')
def pay(user_id: int, amount: Decimal):
    pass

No more need for ad-hoc syntax like '<int:user_id>'. You can use any type or constructor (e.g., Decimal) within routes without custom converters.

This is achievable using Python 3’s function annotations: has already been implemented. This valid Python syntax leverages annotations for self-documenting APIs.

Conclusion

These are just a few examples of the numerous improvements in Python 3 that enhance code robustness. Another notable addition is exception chain tracebacks enabled by default, as highlighted in Ionel Cristian Mărieș’s blog post titled “The most underrated feature in Python 3”. Aaron Maxwell also covers this in his article, other post, along with Python 3’s stricter comparison semantics and the new super behavior.

Moreover, numerous other improvements significantly impact daily coding:

Many functions/constructors now return contextmanagers, simplifying object closure and error management: gzip.open (available since 2.7), mmap, ThreadPoolExecutor, memoryview, FTP, TarFile, socket.create_connection, epoll, NNTP, SMTP, aifc.open, Shelf, and more.
lzma offers improved compression compared to gzip.
asyncio facilitates asynchronous programming in Python.
pathlib provides a Pythonic and expressive way to work with paths.
ipaddress
lru_cache enables automatic caching of expensive function results.
The mock module (previously available only from PyPI) is now included.
A more memory-efficient string representation.
OrderedDict, a feature every Python developer has needed at some point.
Autocompletion within pdb.
The __pycache__ directory prevents cluttering project folders with .pyc files.

The official Python documentation’s “What’s New” pages ( “What’s New”) provide a comprehensive overview of the changes. For further insights, I recommend Aaron Maxwell’s article post and Brett Cannon’s series of blog posts slides.

While Python 2.7 will be supported until until 2020, there’s no need to wait until then to embrace a newer and better version of Python!