Ideas are related temporally. This notebook explores the git history to quantify weekly activity on the deathbeds blog.

In [34]:

    import git, pandas, pathlib, networkx, matplotlib.pyplot

In [3]:

    repo = git.Repo('..')
    commits = pandas.Series(repo.iter_commits())

compute the files changed for each of the commits in the repo.

In [4]:

    # This takes time.
    changed = commits.apply(lambda x: pandas.Series(x.stats.files))

Create a tidy 𝔇𝔉 to hold the weekly file changes.

In [30]:

    𝔇𝔉 = changed.stack().apply(pandas.Series).reset_index(-1).rename(columns={
        'level_1': 'name'
    }).join(commits.apply(lambda x: x.committed_datetime).rename('time')).pipe(
        lambda df: df.set_index(pandas.to_datetime(df['time'], utc=True))    
    ).groupby(['name', pandas.Grouper(freq='W')]).agg({
        'lines': pandas.Series.count, 'insertions': 'sum', 'deletions': 'sum'
    }).rename(columns=dict(lines='changes')).unstack('time').fillna(0)
    𝔇𝔉 = 𝔇𝔉[𝔇𝔉.index.str.endswith('.ipynb')]

Separate the changes, insertions, and deletions for each of the files.

In [32]:

    ℭℌ𝔄𝔑𝔊𝔈𝔖, insertions, deletions = (𝔇𝔉[object] for object in 'changes insertions deletions'.split())

We'll only continue forward with changes.

Create 𝔊 & draw the networkx.Graph.

In [28]:

    𝔊 = (
        ℭℌ𝔄𝔑𝔊𝔈𝔖.set_index(ℭℌ𝔄𝔑𝔊𝔈𝔖.index.rename('source')) 
        @ ℭℌ𝔄𝔑𝔊𝔈𝔖.set_index(ℭℌ𝔄𝔑𝔊𝔈𝔖.index.rename('target')).T).pipe(
        lambda df: df-(pandas.np.eye(len(df))*pandas.np.diag(df))
    )
    𝔊 = 𝔊.divide(𝔊).stack().to_frame('value').reset_index().pipe(networkx.from_pandas_edgelist)

A view of the connectivity of the files.

In [29]:

    if __name__ == '__main__':
        networkx.draw_networkx(𝔊)
        matplotlib.pyplot.gcf().set_size_inches(24, 16)

Some clustering of ideas have a longer continuity than others. Newer ideas haven't been able to be come parts of larger ideas yet.