Little experiment base on the fact that apparently YAML is made to be better readable by Humans than JSON. We've also had some complaint that metadata are not keep in nbconvert when roundtripping through markdown, those two made me think that I could try to see what ipynb files stored as YAML would look like.
I'll also use this post to do some experiment for nbviewer future nbviewer features, if you see anything wrong with the css on some device, please tell me.
Apparently Json is a subset of YAML:
cp foo.ipynb foo.ipyamlnb
Yeah, Mission acomplished !
Install PyYaml, and see what we can do.
import json
import yaml
from IPython.nbformat import current as nbf
ls Y*.ipynb
YAML Notebook.ipynb
with open('YAML Notebook.ipynb') as f:
nbook = nbf.read( f, 'json')
nbook.worksheets[0].cells[9]
{u'cell_type': u'code', u'collapsed': False, u'input': u'from IPython.nbformat import current as nbf', u'language': u'python', u'metadata': {}, u'outputs': []}
I'll skipp the fiddling around with the yaml converter. In short, you have to specify explicitely the part you want to dump in the literal form, otherwise they are exported as list of strings, which is a little painfull to edit afterward. I'm using the safe_dump
and safe_load
methods (or pass safeLoader and Dumper). Those should be default or otherwise you could unserialise arbitrary object, and have code exucuted.
We probably don't want to reproduct the recent file Rail's critical vulnerability that append not so long ago.
# we'll patch a safe Yaml Dumper
sd = yaml.SafeDumper
# Dummy class, just to mark the part we want with custom dumping
class folded_unicode(unicode): pass
class literal_unicode(unicode): pass
I know classes should be wit upper case, but we just want to hide the fact that thoses a class to end user. At the same time I define a folded method to use it with markdown cell. when markdown contain really long lines, those will be wrapped in the yaml document.
def folded_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')
sd.add_representer(folded_unicode, folded_unicode_representer)
sd.add_representer(literal_unicode, literal_unicode_representer)
with open('YAML Notebook.ipynb') as f:
nbjson = json.load(f)
now we patch the part of the ipynb file we know we want to be literal or folded
for tcell in nbjson['worksheets'][0]['cells']:
if 'source' in tcell.keys():
tcell['source'] = folded_unicode("".join(tcell['source']))
if 'input' in tcell.keys():
tcell['input'] = literal_unicode("".join(tcell['input']))
with open('Yaml.ipymlnb','w') as f:
f.write(yaml.dump(nbjson, default_flow_style=False, Dumper=sd))
You can round trip it to json, and it's still a valid ipynb file that can be loaded. Haven't fiddled with it much more. There are just a few gotchas with empty lines as well as trailing whitespace at EOL that can respectively diseapear or make the dumper fall back to a string quoted methods to store values.
You can skip down to the end of this notebook to look at how it looks like. It's probably much compact than the current json we emit, in some cases it might be more easy to read, but I don't think it is worth considering using in the format specification.
ipynb files are ment to be humanely fixable, and I strongly prefere having a consistent format with simple rules than having to explain what are the meaning of the differents shenigan like : |2+
for literal string.
Also support across languages are not consistent, and it would probably be too much of a security burden for all code that will support loading ipynb to take care of sanitazing Yaml.
One area where I woudl use it would be to describe the ipynb format at a talk for example, and/or to have metadata editing more human readable/writable.
!cat Yaml.ipymlnb
metadata: name: YAML Notebook nbformat: 3 nbformat_minor: 0 worksheets: - cells: - cell_type: heading level: 1 metadata: {} source: >- YAML IPython notebook - cell_type: markdown metadata: {} source: "Little experiment base on the fact that apparently YAML is made to be\ \ better readable by Humans than JSON.\nWe've also had some complaint that metadata\ \ are not keep in nbconvert when roundtripping through markdown, those two\n\ made me think that I could try to see what ipynb files stored as YAML would\ \ look like. " - cell_type: heading level: 4 metadata: {} source: >- First atempt - cell_type: markdown metadata: {} source: >- Apparently Json is a subset of YAML: - cell_type: markdown metadata: {} source: >2+ cp foo.ipynb foo.ipyamlnb - cell_type: markdown metadata: {} source: >- Yeah, Mission acomplished ! - cell_type: heading level: 4 metadata: {} source: >- Second try - cell_type: markdown metadata: {} source: "Install PyYaml, and see what we can do. " - cell_type: code collapsed: false input: |- import json import yaml language: python metadata: {} outputs: [] - cell_type: code collapsed: false input: |- from IPython.nbformat import current as nbf language: python metadata: {} outputs: [] - cell_type: code collapsed: false input: |- ls Y*.ipynb language: python metadata: {} outputs: [] - cell_type: code collapsed: false input: |- with open('YAML Notebook.ipynb') as f: nbook = nbf.read( f, 'json') language: python metadata: {} outputs: [] - cell_type: code collapsed: false input: |- nbook.worksheets[0].cells[9] language: python metadata: {} outputs: [] - cell_type: markdown metadata: {} source: >- I'll skipp the fiddling around with the yaml converter. In short, you have to specify explicitely the part you want to dump in the literal form, otherwise they are exported as list of strings, which is a little painfull to edit afterward. I'm using the `safe_dump` and `safe_load` methods (or pass safeLoader and Dumper). Those should be default or otherwise you could unserialise arbitrary object, and have code exucuted. We probably don't want to reproduct the recent file Rail's critical vulnerability that append not so long ago. - cell_type: code collapsed: false input: |- # we'll patch a safe Yaml Dumper sd = yaml.SafeDumper # Dummy class, just to mark the part we want with custom dumping class folded_unicode(unicode): pass class literal_unicode(unicode): pass language: python metadata: {} outputs: [] - cell_type: markdown metadata: {} source: >- I know classes should be wit upper case, but we just want to hide the fact that thoses a class to end user. At the same time I define a folded method if I want to use it later. - cell_type: code collapsed: false input: |- def folded_unicode_representer(dumper, data): return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>') def literal_unicode_representer(dumper, data): return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|') sd.add_representer(folded_unicode, folded_unicode_representer) sd.add_representer(literal_unicode, literal_unicode_representer) with open('YAML Notebook.ipynb') as f: nbjson = json.load(f) language: python metadata: {} outputs: [] - cell_type: markdown metadata: {} source: >- now we patch the part of the ipynb file we know we want to be literal or folded - cell_type: code collapsed: false input: |- for tcell in nbjson['worksheets'][0]['cells']: if 'source' in tcell.keys(): tcell['source'] = folded_unicode("".join(tcell['source'])) if 'input' in tcell.keys(): tcell['input'] = literal_unicode("".join(tcell['input'])) language: python metadata: {} outputs: [] - cell_type: code collapsed: false input: |- with open('Yaml.ipymlnb','w') as f: f.write(yaml.dump(nbjson, default_flow_style=False, Dumper=sd)) language: python metadata: {} outputs: [] - cell_type: markdown metadata: {} source: >- You can round trip it to json, and it's still a valid ipynb file that can be loaded. Haven't fiddled with it much more. There are just a few gotchas with empty lines as well as trailing whitespace at EOL that can respectively diseapear or make the dumper fall back to a string quoted methods to store values. One could also try to tiker with `folded_unicode` in markdown cell that tipically have long lines to play a little more nicely with VCS. - cell_type: markdown metadata: {} source: >- You can skip down to the end of this notebook to loko at how it looks like. It's probably much compact than the current json we emit, in **some** cases it might be more easy to read, but I don't think it is worth considering using in the format specification. ipynb files are ment to be humanely fixable, and I strongly prefere having a consistent format with simple rules than having to explain what are the meaning of the differents shenigan like `: |2+` for literal string. Also support across languages are not consistent, and it would probably be too much of a security burden for all code that will support loading ipynb to take care of sanitazing Yaml. One area where I woudl use it would be to describe the ipynb format at a talk for example, and/or to have metadata editing more human readable/writable. - cell_type: code collapsed: false input: |- !cat Yaml.ipymlnb language: python metadata: {} outputs: [] metadata: {}