Visualising DROID CSV

This is a quick experiment in visualising format results from DROID using a Jupyter Notebook.

First we load in some example results....

In [13]:
from io import StringIO
import pandas as pd
import requests

url = 'https://raw.githubusercontent.com/exponential-decay/demystify/master/opf-test-corpus-test-output/opf-test-corpus-droid-analysis.csv'
s=requests.get(url).text

df=pd.read_csv(StringIO(s), keep_default_na=False)
df
Out[13]:
ID PARENT_ID URI FILE_PATH NAME METHOD STATUS SIZE TYPE EXT LAST_MODIFIED EXTENSION_MISMATCH SHA1_HASH FORMAT_COUNT PUID MIME_TYPE FORMAT_NAME FORMAT_VERSION
0 2 0 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... format-corpus Done Folder 2014-02-28T15:49:11 False
1 3 2 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... video Done Folder 2014-02-28T15:48:47 False
2 4 3 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... Quicktime Done Folder 2014-02-28T15:48:59 False
3 5 4 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... apple-intermediate-codec.mov Signature Done 319539 File mov 2014-02-18T16:58:16 False d097cf36467373f52b974542d48bec134279fa3f 1 x-fmt/384 video/quicktime Quicktime
4 6 4 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... animation.mov Signature Done 1020209 File mov 2014-02-18T16:58:16 False edb5226b963f449ce58054809149cb812bdf8c0a 1 x-fmt/384 video/quicktime Quicktime
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
394 396 395 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... InDesign Done Folder 2014-02-28T15:49:11 False
395 397 396 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... Neddy_Flyer_ft_HeatherRyan.jpg Signature Done 1620612 File jpg 2014-02-18T16:58:08 False 884de50cb1c052c0e10bef306850ee995d965175 1 fmt/41 image/jpeg Raw JPEG Stream
396 398 396 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... Neddy_Flyer_HeatherRyan.pdf Signature Done 59106 File pdf 2014-02-18T16:58:08 False 9e19b76e8364c840945bc380ab5f98f00a23ab80 1 fmt/17 application/pdf Acrobat PDF 1.3 - Portable Document Format 1.3
397 399 396 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... Neddy_Flyer_HeatherRyan.indd Signature Done 1503232 File indd 2014-02-18T16:58:08 False d9211fe38e79f34fb7a043fe34df59527fb6e179 1 fmt/196 Adobe InDesign Document CS
398 400 396 file:////10.1.4.222/gda/archives-sample-files/... \\10.1.4.222\gda\archives-sample-files\opf-for... Neddy_Flyer_README_HeatherRyan.md.rtf Signature Done 1210 File rtf 2014-02-18T16:58:08 False 3665b0c1457f996359939746752fe86f2025b68d 1 fmt/50 application/rtf, text/rtf Rich Text Format 1.5-1.6

399 rows × 18 columns

Now we have the data, we can explore ways to visualise it.

Here's a simple bar chart of all the different types and PUIDs...

In [22]:
import altair as alt

alt.Chart(df).mark_bar().encode(
    x=alt.X('PUID', sort='-y'),
    y='count()',
    color='TYPE',
    tooltip=['TYPE','PUID', 'FORMAT_NAME', 'FORMAT_VERSION', 'count()']
).interactive()
Out[22]:

Alternatively, we can group together the different MIME types, and use the colours to show the various PUIDs associated with each...

In [31]:
alt.Chart(df).mark_bar().encode(
    x=alt.X('MIME_TYPE', sort='-y'),
    y='count()',
    color=alt.Color('PUID', legend=None),
    tooltip=['TYPE','MIME_TYPE', 'PUID', 'FORMAT_NAME', 'FORMAT_VERSION', 'count()']
).interactive()
Out[31]:
In [ ]: