Wikiwho Edit Persistence to track the persistence of all actions done in a page, or all actions done by an editor by providing monthly accumulates (sums) of the actions (insertions, deletions and reinserions) per page.
The method edit_persistance
will query the persistence of changes on a page.
from wikiwho_wrapper import WikiWho
ww = WikiWho(lng='en')
df = ww.dv.edit_persistence(page_id=6187)
df.head()
The dataframe accumulates several actions by month (year_month
column), and editor (editor_id
):
adds
: number of tokens inserted for the first timeadds_surv_48h
: number of tokens inserted for the first time that survived at least 48 hoursadds_persistent
: number of tokens inserted for the first time that survived until, at least, the end of the monthadds_stopword_count
: number of tokens inserted that were stop wordsdels
: number of tokens deleteddels_surv_48h
: number of tokens deleted that were not resinserted in the next 48 hoursdels_persistent
: number of tokens deleted that were not resinserted until, at least, the end of the monthdels_stopword_count
: number of tokens deleted that were stop wordsreins
: number of tokens reinsertedreins_surv_48h
: number of tokens reinserted that survived at least 48 hoursreins_persistent
: number of tokens reinserted that survived until the end of the monthreins_stopword_count
: number of tokens reinserted that were stop wordsThe dataframe present the persistance of granulated actions, i.e insertions, resinsertions and deletions, so a common operation is to add them up to get the big picture of the persistence.
#Calculating total actions (regardles persistence)
df['total_actions'] = df['adds'] + df['dels'] + df['reins']
#Calculating total actions in 48h
df['total_actions_48h'] = df['adds_surv_48h'] + df['dels_surv_48h'] + df['reins_surv_48h']
# Calculating total persistent actions
df['total_persistent'] = df['adds_persistent'] + df['dels_persistent'] + df['reins_persistent']
# Calculating total stopword counts
df['total_stopword_count'] = df['adds_stopword_count'] + df['dels_stopword_count'] + df['reins_stopword_count']
#display
df[['year_month', 'editor_id', 'total_actions', 'total_actions_48h', 'total_persistent', 'total_stopword_count']].head()
The dataframe present the data by month and by editor. If the goal is to only track the changes of the pages (without the editor), a simple groupby will be sufficient.
df.drop(columns=['editor_id']).groupby(['year_month', 'page_id']).sum().head()
Another possibility is to calculate the total number of actions per editor regardless the time.
df.drop(columns=['page_id']).groupby('editor_id').sum().head()
We can also check the number of months in which each of the editors have had, at least, one contribution
df.groupby('editor_id').size().sort_values(ascending=False).head()
The most valuable service that the edit_persistance
function provides is tracking all actions of an editor across all pages. Let's start with the editor id 2092791
taken from the previous section.
df = ww.dv.edit_persistence(editor_id=2092791)
df.head()
The dataframe accumulates several actions by month (year_month
column), and page (page_id
):
adds
: number of tokens inserted for the first timeadds_surv_48h
: number of tokens inserted for the first time that survived at least 48 hoursadds_persistent
: number of tokens inserted for the first time that survived until, at least, the end of the monthadds_stopword_count
: number of tokens inserted that were stop wordsdels
: number of tokens deleteddels_surv_48h
: number of tokens deleted that were not resinserted in the next 48 hoursdels_persistent
: number of tokens deleted that were not resinserted until, at least, the end of the monthdels_stopword_count
: number of tokens deleted that were stop wordsreins
: number of tokens reinsertedreins_surv_48h
: number of tokens reinserted that survived at least 48 hoursreins_persistent
: number of tokens reinserted that survived until the end of the monthreins_stopword_count
: number of tokens reinserted that were stop wordsThe dataframe present the persistance of granulated actions, i.e insertions, resinsertions and deletions, so a common operation is to add them up to get the big picture of the persistence.
#Calculating total actions (regardles persistence)
df['total_actions'] = df['adds'] + df['dels'] + df['reins']
#Calculating total actions in 48h
df['total_actions_48h'] = df['adds_surv_48h'] + df['dels_surv_48h'] + df['reins_surv_48h']
# Calculating total persistent actions
df['total_persistent'] = df['adds_persistent'] + df['dels_persistent'] + df['reins_persistent']
# Calculating total stopword counts
df['total_stopword_count'] = df['adds_stopword_count'] + df['dels_stopword_count'] + df['reins_stopword_count']
#display
df[['year_month', 'editor_id', 'total_actions', 'total_actions_48h', 'total_persistent', 'total_stopword_count']].head()
The dataframe present the data by month and by page. If the goal is to only track the actions of the editor (without the pages), a simple groupby will be sufficient.
df.drop(columns=['page_id']).groupby(['year_month', 'editor_id']).sum().head()
Another possibility is to calculate the total number of actions (per page) per editor regardless the time.
df.drop(columns=['editor_id']).groupby(['page_id']).sum().head()
Or we can get number of actions across all pages.
df.drop(columns=['page_id']).groupby('editor_id').sum().head()
We can also check the number of months in which each of the editors have had, at least, one contribution
df.groupby('page_id').size().sort_values(ascending=False).head()
Notice that the above also list all the pages in which an author has contributed.
from utils.notebooks import get_next_notebook
from IPython.display import HTML
try:
display(HTML(f'<a href="{get_next_notebook()}" target="_blank">Go to next workbook</a>'))
except:
HTML('<a href="6. Find all revisions and context involving actions on a word" target="_blank">Go to next workbook</a>')