# Analyzing a simple Crowdcrafting application with Enki¶

PyBossa provides a very simple and interesting package for analyzing any PyBossa application statistically: Enki.

The package allows you to statistically analyzed all the contributed task runs to your application, for example for the Video Pattern Recognition application in Crowdcrafting.

In order to use it, all you have to do is the following: first, install it.

NOTE: it takes some minutes to compile all the required libraries, so be patient.

In [4]:
!pip install enki

Requirement already satisfied (use --upgrade to upgrade): enki in ./env/lib/python2.7/site-packages
Cleaning up...


After installing the package you can import it:

In [5]:
import enki


And then, you can start analyzing the application with the following command:

In [6]:
e = enki.Enki(api_key='private', endpoint='http://crowdcrafting.org', app_short_name='vimeo')


As you can see, the api_key is your private key, but for reading the API you don't need a valid one.

Then, you can get all the completed tasks and its associated task_runs in order to start analyzing them:

In [7]:
e.get_all()


• e.tasks_df: a Pandas data frame for analyzing them

Let's have a look to e.tasks:

In [8]:
e.tasks

Out[8]:
[pybossa.Task(256071),
pybossa.Task(256075)]

In [9]:
e.task_runs

Out[9]:
{256071: [pybossa.TaskRun(81768),
pybossa.TaskRun(596550)]}

OK, so the data are ready to be analyzed ;-) As I said before, we have a data frame per task, so it is really easy to analyze the results of the contributed answers by our volunteers.

Enki uses Pandas package, so it is really easy to statistically analyze the answers. For example, lets get an overview of the answers for the first task of the application within the data frame:

NOTE: Enki explodes the PyBossa task_run.info field if it is a dictionary. In this case, the Vimeo task_runs have within the info field, a dictionary with this structure: task_run.info = {'answer': 'Yes'}, so the following command will show a column with the name answer automatically for us.

In [10]:
e.task_runs_df[e.tasks[0].id]

Out[10]:
88027 NaN 598 None 2013-06-28T15:00:07.695365 2013-06-28T15:00:07.695384 88027 {} <link rel='self' title='taskrun' href='http://... [<link rel='parent' title='app' href='http://c... 256071 None 1138 None
145662 NaN 598 None 2013-08-08T16:15:34.834110 2013-08-08T16:15:34.834130 145662 {} <link rel='self' title='taskrun' href='http://... [<link rel='parent' title='app' href='http://c... 256071 None NaN 79.158.104.138
175363 NaN 598 None 2013-08-13T20:32:30.328999 2013-08-13T20:32:30.329018 175363 {} <link rel='self' title='taskrun' href='http://... [<link rel='parent' title='app' href='http://c... 256071 None NaN 99.243.63.115
214622 NaN 598 None 2013-08-22T12:09:41.845525 2013-08-22T12:09:41.845541 214622 {} <link rel='self' title='taskrun' href='http://... [<link rel='parent' title='app' href='http://c... 256071 None 1869 None

As we have a column with all the answers, called answer let's analyze it for our current application Vimeo.

NOTE: The possible answers that the volunteers can provide are: Yes, No, or I don't know. Enki will detect it, and look for the most voted answer, showing all the relevant information.

In [11]:
e.task_runs_df[e.tasks[0].id]['answer'].describe()

Out[11]:
count     26
unique     3
top       No
freq      15
dtype: object

And now you can iterate over each task and get the most voted answer from the users with the following snippet of code:

In [12]:
for t in e.tasks:
print "The top answer for task.id %s is %s" % (t.id, desc['top'])

The top answer for task.id 256071 is No


If you want to create some charts and graphics, you will need to install matplotlib:

In [13]:
!pip install matplotlib

Requirement already satisfied (use --upgrade to upgrade): matplotlib in ./env/lib/python2.7/site-packages

s = e.task_runs_df[e.tasks[0].id]['answer'].value_counts()

<matplotlib.axes.AxesSubplot at 0x3ee4c50>