This is a conversion of the official "Porting Python 2 Code to Python 3" HOWTO into notebook format (in August 2015). You can always view and download the
.ipynb file on nbviewer. If you would like to play with/edit this notebook online, you can use the in-preview notebook service in Azure ML Studio:
File -> Open...to access the native Jupyter Notebook management page
Uploadbutton to upload the
Uploadbutton next to the file name and wait for the
Uploadbutton to disappear
To make your project be single-source Python 2/3 compatible, the basic steps are:
pip install coverage)
pip install modernizeor
pip install future, respectively)
pip install pylint)
pip install caniusepython3)
pip install tox)
If you are dropping support for Python 2 entirely, then after you learn the differences between Python 2 & 3 you can run 2to3 over your code and skip the rest of the steps outlined above.
A key point about supporting Python 2 & 3 simultaneously is that you can start today! Even if your dependencies are not supporting Python 3 yet that does not mean you can't modernize your code now to support Python 3. Most changes required to support Python 3 lead to cleaner code using newer practices even in Python 2.
Another key point is that modernizing your Python 2 code to also support Python 3 is largely automated for you. While you might have to make some API decisions thanks to Python 3 clarifying text data versus binary data, the lower-level work is now mostly done for you and thus can at least benefit from the automated changes immediately.
Keep those key points in mind while you read on about the details of porting your code to support Python 2 & 3 simultaneously.
While you can make Python 2.5 work with Python 3, it is much easier if you only have to work with Python 2.6 or newer (and easier still if you only have to work with Python 2.7). If dropping Python 2.5 is not an option then the six project can help you support Python 2.5 & 3 simultaneously (
pip install six). Do realize, though, that nearly all the projects listed in this HOWTO will not be available to you.
If you are able to only support Python 2.6 or newer, then the required changes to your code should continue to look and feel like idiomatic Python code. At worst you will have to use a function instead of a method in some instances or have to import a function instead of using a built-in one, but otherwise the overall transformation should not feel foreign to you.
But please aim for Python 2.7. Bugfixes for that version of Python will continue until 2020 while Python 2.6 is no longer supported. There are also some tools mentioned in this HOWTO which do not support Python 2.6 (e.g., Pylint), and this will become more commonplace as time goes on.
setup.py file you should have the proper trove classifier specifying what versions of Python you support. As your project does not support Python 3 yet you should at least have
Programming Language :: Python :: 2 :: Only specified. Ideally you should also specify each major/minor version of Python that you do support, e.g.
Programming Language :: Python :: 2.7.
Once you have your code supporting the oldest version of Python 2 you want it to, you will want to make sure your test suite has good coverage. A good rule of thumb is that if you want to be confident enough in your test suite that any failures that appear after having tools rewrite your code are actual bugs in the tools and not in your code. If you want a number to aim for, try to get over 80% coverage (and don't feel bad if you can't easily get past 90%). If you don't already have a tool to measure test coverage then coverage.py is recommended.
Once you have your code well-tested you are ready to begin porting your code to Python 3! But to fully understand how your code is going to change and what you want to look out for while you code, you will want to learn what changes Python 3 makes in terms of Python 2. Typically the two best ways of doing that is reading the What's New doc for each release of Python 3 and the Porting to Python 3 book (which is free online). There is also a handy cheat sheet from the Python-Future project.
Once you feel like you know what is different in Python 3 compared to Python 2, it's time to update your code! You have a choice between two tools in porting your code automatically: Modernize and Futurize. Which tool you choose will depend on how much like Python 3 you want your code to be. Futurize does its best to make Python 3 idioms and practices exist in Python 2, e.g. backporting the
bytes type from Python 3 so that you have semantic parity between the major versions of Python. Modernize, on the other hand, is more conservative and targets a Python 2/3 subset of Python, relying on six to help provide compatibility.
Regardless of which tool you choose, they will update your code to run under Python 3 while staying compatible with the version of Python 2 you started with. Depending on how conservative you want to be, you may want to run the tool over your test suite first and visually inspect the diff to make sure the transformation is accurate. After you have transformed your test suite and verified that all the tests still pass as expected, then you can transform your application code knowing that any tests which fail is a translation failure.
Unfortunately the tools can't automate everything to make your code work under Python 3 and so there are a handful of things you will need to update manually to get full Python 3 support (which of these steps are necessary vary between the tools). Read the documentation for the tool you choose to use to see what it fixes by default and what it can do optionally to know what will (not) be fixed for you and what you may have to fix on your own (e.g. using
io.open() over the built-in
open() function is off by default in Modernize). Luckily, though, there are only a couple of things to watch out for which can be considered large issues that may be hard to debug if not watched for.
In Python 2, integer division operated in a way that programmers are used to: performing the division and then flooring the result to an integer (referred to as classic division in the Python documentation).
%%python2 print 5 / 2
But for Python 3 it was decided integer division should be more like what you learned in school and return a
float result that more accurately represented the result of the division (what the Python documentation refers to as true division).
%%python3 print(5 / 2)
This change has actually been planned since Python 2.2.0 which was released December 2001. Since then users have been encouraged to add the appropriate
__future__ statement to get Python 3 semantics in Python 2.
%%python2 from __future__ import division print 5 / 2
You can also run the interpreter with the
-Q flag to get the same semantics as the
__future__ import or to get a warning when using classic division in Python 2.
!python2 -Qnew -c "print 5 / 2"
!python2 -Qwarn -c "print 5 / 2"
-c:1: DeprecationWarning: classic int division 2
If you want the Python 2 style of integer division in either Python 2 or 3, then you can use the
// operator (the floor division operator).
%%python2 print 5 // 2
%%python2 from __future__ import division # Does not effect floor division. print 5 // 2
%%python3 print(5 // 2)
The reason that
/ isn't simply translated to
// automatically is that if an object defines its own
__floordiv__ method then the translation would lead to different results under Python 3.
%%python2 from __future__ import print_function class FloorDivIsZero(int): def __floordiv__(self, _): return 0 print(FloorDivIsZero(5) / 2)
%%python3 from __future__ import print_function # Class contains the same code as in the Python 2 cell above. class FloorDivIsZero(int): def __floordiv__(self, _): return 0 print(FloorDivIsZero(5) // 2) # Translated from `/` in Python 2 code.
In Python 2 you could use the
str type for both text and binary data. Unfortunately this confluence of two different concepts could lead to brittle code which sometimes worked for either kind of data, sometimes not. It also could lead to confusing APIs if people didn't explicitly state that something that accepted
str accepted either text or binary data instead of one specific type. This complicated the situation especially for anyone supporting multiple languages as APIs wouldn't bother explicitly supporting
unicode when they claimed text data support.
To make the distinction between text and binary data clearer and more pronounced, Python 3 did what most languages created during the age of the internet have done and made text and binary data distinct types that cannot blindly be mixed together (Python predates widespread access to the internet thanks to its development starting in December 1989). For any code that only deals with text or only binary data, this separation doesn't pose an issue. But for code that has to deal with both, it does mean you might have to now care about when you are using text compared to binary data, which is why this cannot be entirely automated.
To start, you will need to decide which APIs take text and which take binary (it is highly recommended you don't design APIs that can take both due to the difficulty of keeping the code working; as stated earlier it is difficult to do well). In Python 2 this means making sure the APIs that take text can work with
unicode in Python 2 and those that work with binary data work with the
bytes type from Python 3 and thus a subset of
str in Python 2 (which the
bytes type in Python 2 is an alias for
str). Usually the biggest issue is realizing which methods exist for which types in Python 2 & 3 simultaneously (for text that's
unicode in Python 2 and
str in Python 3, for binary that's
bytes in Python 2 and
bytes in Python 3). The following cell shows the unique methods of each data type between Python 2 versus Python 3 as well as what is unique regardless of version (e.g., the
decode() method is usable on the equivalent binary data type in either Python 2 or 3, but it can't be used by the text data type consistently between Python 2 and 3 because
str in Python 3 doesn't have the method).
import ast def output_to_set(output): """Convert captured shell output to a frozenset.""" return frozenset(ast.literal_eval(output)) def pprint_set(set_, *, indent): """Print each item in a set on its own line with a specified indentation.""" for item in sorted(set_): if item.startswith('_') and not item.startswith('__'): continue print(' ' * indent + item) py2_text_methods_output = !python2 -c "print dir(str)" py2_text_unicode_methods_output = !python2 -c "print dir(unicode)" py2_binary_methods_output = !python2 -c "print dir(bytes)" py3_text_methods_output = !python3 -c "print(dir(str))" py3_binary_methods_output = !python3 -c "print(dir(bytes))" py2_text_methods = output_to_set(py2_text_methods_output) py2_text_unicode_methods = output_to_set(py2_text_methods_output) py2_binary_methods = output_to_set(py2_binary_methods_output) py3_text_methods = output_to_set(py3_text_methods_output) py3_binary_methods = output_to_set(py3_binary_methods_output) print("Methods unique to Python 2 (i.e., not available in Python 3 on the equivalent type):") print(' Text type (as str from Python 2):') pprint_set(py2_text_methods.difference(py3_text_methods), indent=8) print(' Text type (as unicode from Python 2):') pprint_set(py2_text_unicode_methods.difference(py3_text_methods), indent=8) print(' Binary type:') pprint_set(py2_binary_methods.difference(py3_binary_methods), indent=8) common_text_methods = py2_text_methods.intersection(py3_text_methods) common_binary_methods = py2_binary_methods.intersection(py3_binary_methods) print("Methods unique to a type regardless of Python version (i.e., don't use on the wrong type in Python 2):") print(' Text type:') pprint_set(common_text_methods.difference(common_binary_methods), indent=8) print(' Binary type:') pprint_set(common_binary_methods.difference(common_text_methods), indent=8)
Methods unique to Python 2 (i.e., not available in Python 3 on the equivalent type): Text type (as str from Python 2): __getslice__ decode Text type (as unicode from Python 2): __getslice__ decode Binary type: __getslice__ __mod__ __rmod__ encode format Methods unique to a type regardless of Python version (i.e., don't use on the wrong type in Python 2): Text type: __mod__ __rmod__ encode format Binary type: decode
Making the distinction easier to handle can be accomplished by encoding and decoding between binary data and text at the edge of your code. This means that when you receive text in binary data, you should immediately decode it. And if your code needs to send text as binary data then encode it as late as possible. This allows your code to work with only text internally and thus eliminates having to keep track of what type of data you are working with.
The next issue is making sure you know whether the string literals in your code represent text or binary data. At minimum you should add a
b prefix to any literal that represents binary data. For text you should either use the
from __future__ import unicode_literals statement or add a
u prefix to the text literal.
%%python2 from __future__ import unicode_literals print type('')
%%python2 print type(u'')
As part of this dichotomy you also need to be careful about opening files. Unless you have been working on Windows, there is a chance you have not always bothered to add the
b mode when opening a binary file (e.g.,
rb for binary reading). Under Python 3, binary files and text files are clearly distinct and mutually incompatible; see the io module for details. Therefore, you must make a decision of whether a file will be used for binary access (allowing to read and/or write binary data) or text access (allowing to read and/or write text data). You should also use
io.open() for opening files instead of the built-in
open() function as the io module is consistent from Python 2 to 3 while the built-in
open() function is not (in Python 3 it's actually
%%python2 from __future__ import unicode_literals import io import os # Working with a file specifying a 'b' mode. with io.open('some_binary_file.txt', 'wb') as file: file.write('Some text encoded as UTF-32'.encode('utf-32')) # Working with a file with a specified encoding. with io.open('some_binary_file.txt', 'r', encoding='utf-32') as file: print file.read()
Some text encoded as UTF-32
Another semantic change to be aware of is that the constructors of both
bytes have different semantics for the same arguments between Python 2 & 3.
%%python2 from __future__ import print_function print('`int` argument for `bytes` type:', repr(bytes(3))) print('`bytes argument for `str` type:', repr(str(b'3')))
`int` argument for `bytes` type: '3' `bytes argument for `str` type: '3'
%%python3 # Exact same code as above for Python 2. from __future__ import print_function print('`int` argument for `bytes` type:', repr(bytes(3))) print('`bytes argument for `str` type:', repr(str(b'3')))
`int` argument for `bytes` type: b'\x00\x00\x00' `bytes argument for `str` type: "b'3'"
Finally, the indexing of binary data requires careful handling due to Python 3 returning an
int in that instance (slicing does not require any special handling).
%%python2 from __future__ import print_function print(b'B') print(b'ABC') print('Indexing compares equal to `bytes` literal:', b'ABC' == b'B')
B B Indexing compares equal to `bytes` literal: True
%%python3 from __future__ import print_function print(b'B') print(b'ABC') print('Indexing compares equal to `bytes` literal:', b'ABC' == b'B')
b'B' 66 Indexing compares equal to `bytes` literal: False
The six project has a function named
six.indexbytes() which will return an integer like in Python 3:
six.indexbytes(b'ABC', 1) == ord(b'B').
The other option is to use a single-item slice instead of indexing, but be aware that this won't raise an
IndexError if the slice extends beyond the length of the
%%python2 from __future__ import print_function print(b'ABC'[1:2] == b'B')
%%python3 from __future__ import print_function print(b'ABC'[1:2] == b'B')
unicodeand code for binary data works with
bytesin Python 2 (see the section above for what methods you can or cannot use for each type)
bprefix, use a
__future__import statement for text literals
io.open()and make sure to specify the
bmode when appropriate
Once you have fully translated your code to be compatible with Python 3, you will want to make sure your code doesn't regress and stop working under Python 3. This is especially true if you have a dependency which is blocking you from actually running under Python 3 at the moment.
To help with staying compatible, any new modules you create should have at least the following block of code at the top of it:
from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals
You can also run Python 2 with the
-3 flag to be warned about various compatibility issues your code triggers during execution. If you turn warnings into errors with
-Werror then you can make sure that you don't accidentally miss a warning.
!python2 -3 -Werror -c "print 5 / 2"
Traceback (most recent call last): File "<string>", line 1, in <module> DeprecationWarning: classic int division
Starting in Python 3.5 you can rely on the
-bb flag to the interpreter to warn you when you compare a
bytes object against an
import sys !python3 -bb -c "print(b'B' == 'B')" print() if sys.version_info.major >= 3 and sys.version_info.minor >= 5: !python3 -bb -c "print(b'B' == ord(b'B'))" else: print('Demonstrating the `bytes == int` warning requires at least Python 3.5, using Python', '.'.join(map(str, sys.version_info[:3])))
Traceback (most recent call last): File "<string>", line 1, in <module> BytesWarning: Comparison between bytes and string Demonstrating the `bytes == int` warning requires at least Python 3.5, using Python 3.4.3
You can also use the Pylint project and its
--py3k flag to lint your code to receive warnings when your code begins to deviate from Python 3 compatibility. This also prevents you from having to run Modernize or Futurize over your code regularly to catch compatibility regressions. This does require you only support Python 2.7 and Python 3.4 or newer as that is Pylint's minimum Python version support.
After you have made your code compatible with Python 3 you should begin to care about whether your dependencies have also been ported. The caniusepython3 project was created to help you determine which projects -- directly or indirectly -- are blocking you from supporting Python 3. There is both a command-line tool as well as a web interface at caniusepython3.com.
The project also provides code which you can integrate into your test suite so that you will have a failing test when you no longer have dependencies blocking you from using Python 3. This allows you to avoid having to manually check your dependencies and to be notified quickly when you can start running on Python 3.
setup.pyfile to denote Python 3 compatibility¶
Once your code works under Python 3, you should update the classifiers in your
setup.py to contain
Programming Language :: Python :: 3 and to not specify sole Python 2 support. This will tell anyone using your code that you support Python 2 and 3. Ideally you will also want to add classifiers for each major/minor version of Python you now support.
Once you are able to fully run under Python 3 you will want to make sure your code always works under both Python 2 & 3. Probably the best tool for running your tests under multiple Python interpreters is tox. You can then integrate tox with your continuous integration system so that you never accidentally break Python 2 or 3 support.
Do make sure to follow the suggestions in the "Prevent Compatibility Regressions" section of this notebook when running your continuous integration. E.g., the
-bb flag is useful to always run your tests under.
And that's mostly it! At this point your code base is compatible with both Python 2 and 3 simultaneously. Your testing will also be set up so that you don't accidentally break Python 2 or 3 compatibility regardless of which version you typically run your tests under while developing.
you are able to fully drop support for Python 2, then the steps required to transition to Python 3 simplify greatly.
After this your code will be fully Python 3 compliant but in a way that is not supported by Python 2. You should also update the classifiers in your
setup.py to contain
Programming Language :: Python :: 3 :: Only.