Ideas are related temporally. This notebook explores the git history to quantify weekly activity on the deathbeds blog.
import git, pandas, pathlib, networkx, matplotlib.pyplot
repo = git.Repo('..')
commits = pandas.Series(repo.iter_commits())
compute the files changed
for each of the commits
in the repo
.
# This takes time.
changed = commits.apply(lambda x: pandas.Series(x.stats.files))
Create a tidy 𝔇𝔉
to hold the weekly file changes.
𝔇𝔉 = changed.stack().apply(pandas.Series).reset_index(-1).rename(columns={
'level_1': 'name'
}).join(commits.apply(lambda x: x.committed_datetime).rename('time')).pipe(
lambda df: df.set_index(pandas.to_datetime(df['time'], utc=True))
).groupby(['name', pandas.Grouper(freq='W')]).agg({
'lines': pandas.Series.count, 'insertions': 'sum', 'deletions': 'sum'
}).rename(columns=dict(lines='changes')).unstack('time').fillna(0)
𝔇𝔉 = 𝔇𝔉[𝔇𝔉.index.str.endswith('.ipynb')]
Separate the changes
, insertions
, and deletions
for each of the files.
ℭℌ𝔄𝔑𝔊𝔈𝔖, insertions, deletions = (𝔇𝔉[object] for object in 'changes insertions deletions'.split())
We'll only continue forward with changes
.
Create 𝔊
& draw the networkx.Graph
.
𝔊 = (
ℭℌ𝔄𝔑𝔊𝔈𝔖.set_index(ℭℌ𝔄𝔑𝔊𝔈𝔖.index.rename('source'))
@ ℭℌ𝔄𝔑𝔊𝔈𝔖.set_index(ℭℌ𝔄𝔑𝔊𝔈𝔖.index.rename('target')).T).pipe(
lambda df: df-(pandas.np.eye(len(df))*pandas.np.diag(df))
)
𝔊 = 𝔊.divide(𝔊).stack().to_frame('value').reset_index().pipe(networkx.from_pandas_edgelist)
A view of the connectivity of the files.
if __name__ == '__main__':
networkx.draw_networkx(𝔊)
matplotlib.pyplot.gcf().set_size_inches(24, 16)
Some clustering of ideas have a longer continuity than others. Newer ideas haven't been able to be come parts of larger ideas yet.