#!/usr/bin/env python # coding: utf-8 # A recent [rebutal](https://learnpythonthehardway.org/book/nopython3.html) against Python 3 was recently written by the (in)famous Zed Shaw, with [many](https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/) responses to various arguments and counter arguments. # # One particular topic which caught my eye was the `bytearray` vs `unicodearray` debate. I'll try explicitely avoid the term `str`/`string`/`bytes`/`unicode` naming as it is (IMHO) confusing, but that's a debate for another time. If one pay attention to above debates, you might see that there are about two camps: # # - `bytearray` and `unicodearray` are two different things, and we should _never_ convert from one to the other. (that's rought the Pro-Python-3 camp) # - `bytearray` and `unicodearray` are similar enough in most cases that we should do the magic for users. # # # I'm greatly exagerating here and the following is neither for one side or another, I have my personal preference of what I think is good, but that's irrelevant for now. Note that both sides argue that _their_ preference is better for beginners. # You can often find posts trying to explain the misconception string/str/bytes, like [this one](https://sircmpwn.github.io/2017/01/13/The-problem-with-Python-3.html) which keep insisting on the fact that `str` in python 3 is far different from bytes. # ## The mistake in the REPR # I have one theory that the `bytes`/`str` issue is not in their behavior, but in their REPR. The REPR is in the end the main informatin communication channel between the object and the brain of the programmer, user. Also, Python "ducktyped", and you have to admit that `bytes` and `str` kinda _look_ similar when printed, so assuming they should behave in similar way is not far fetched. I'm not saying that user will _conciously_ assume bytes/str are the same. I'm saying that human brain inherently may do such association. # # From the top of your head, what does `requests.get(url).content` returns ? # In[1]: import requests_cache import requests requests_cache.install_cache('cachedb.tmp') # In[2]: requests.get('http://swapi.co/api/people/1').content # ... bytes... # # I'm pretty sure you glanced ahead in this post and probaly thought it was "Text", even probably in this case Json. It might be invalid Json, I'm pretty sure you cannot tell. # # Why does it returns bytes ? Because it could fetch an image: # In[3]: requests.get('https://avatars0.githubusercontent.com/u/335567').content[:200] # And if you decode the first request ? # In[4]: requests.get('http://swapi.co/api/people/2').content.decode() # Well that looks the same (except leading `b`...). Go explain a beginner that the 2 above are totally different things, while they already struggle with 0 base indexing, iterators, and the syntax of the language. # ## Changing the repr # Lets revert the `repr` of `bytesarray` to better represent what they are. IPython allows to change object repr easily: # In[5]: text_formatter = get_ipython().display_formatter.formatters['text/plain'] # In[6]: def _print_bytestr(arg, p, cycle): p.text('') text_formatter.for_type(bytes, _print_bytestr) # In[7]: requests.get('http://swapi.co/api/people/4').content # ## Make a usefull repr # `` may not an usefull repr, so let's try to make a repr, that: # - Convey bytes are, in genral **not** text. # - Let us *peak* into the content to guess what it is # - Push the user to `.decode()` if necessary. # # Generally in Python objects have a repr which start with `<`, then have the _class name_, a _quoted representation_ of the object, and memory location of the object, a closing `>`. # # As the _quoted representation of the object may be really long, we can ellide it. # # A common representation of bytes could be binary, but it's not really compact. Hex, compact but more difficult to read, and make _peaking_ at the content hart when it could be ASCII. So let's go with ASCII reprentation where we escape non ASCII caracterd. # In[8]: ellide = lambda s: s if (len(s) < 75) else s[0:50]+'...'+s[-16:] # In[9]: def _print_bytestr(arg, p, cycle): p.text(''.format(hex(id(arg)))) text_formatter.for_type(bytes, _print_bytestr) # In[10]: requests.get('http://swapi.co/api/people/12').content # In[11]: requests.get('http://swapi.co/api/people/12').content.decode() # Advantage: It is not gobbledygook anymore when getting binary resources ! # In[12]: requests.get('https://avatars0.githubusercontent.com/u/335567').content