PyBossa provides a very simple and interesting package for analyzing any PyBossa application statistically: Enki.
The package allows you to statistically analyzed all the contributed task runs to your application, for example for the Video Pattern Recognition application in Crowdcrafting.
In order to use it, all you have to do is the following: first, install it.
NOTE: it takes some minutes to compile all the required libraries, so be patient.
!pip install enki
Requirement already satisfied (use --upgrade to upgrade): enki in ./env/lib/python2.7/site-packages Requirement already satisfied (use --upgrade to upgrade): pybossa-client in ./env/lib/python2.7/site-packages (from enki) Requirement already satisfied (use --upgrade to upgrade): pandas in ./env/lib/python2.7/site-packages (from enki) Requirement already satisfied (use --upgrade to upgrade): requests>=0.13.0 in ./env/lib/python2.7/site-packages (from pybossa-client->enki) Requirement already satisfied (use --upgrade to upgrade): python-dateutil in ./env/lib/python2.7/site-packages (from pandas->enki) Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in ./env/lib/python2.7/site-packages (from pandas->enki) Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6.1 in ./env/lib/python2.7/site-packages (from pandas->enki) Requirement already satisfied (use --upgrade to upgrade): six in ./env/lib/python2.7/site-packages (from python-dateutil->pandas->enki) Cleaning up...
After installing the package you can import it:
import enki
And then, you can start analyzing the application with the following command:
e = enki.Enki(api_key='private', endpoint='http://crowdcrafting.org', app_short_name='vimeo')
As you can see, the api_key is your private key, but for reading the API you don't need a valid one.
Then, you can get all the completed tasks and its associated task_runs in order to start analyzing them:
e.get_all()
The previous command downloads all the tasks and task_runs, and creates 4 variables:
Let's have a look to e.tasks:
e.tasks
[pybossa.Task(256071), pybossa.Task(256072), pybossa.Task(256073), pybossa.Task(256074), pybossa.Task(256075)]
And now to e.task_runs:
e.task_runs
{256071: [pybossa.TaskRun(81768), pybossa.TaskRun(82458), pybossa.TaskRun(88027), pybossa.TaskRun(94595), pybossa.TaskRun(96020), pybossa.TaskRun(109055), pybossa.TaskRun(122053), pybossa.TaskRun(134030), pybossa.TaskRun(145662), pybossa.TaskRun(147257), pybossa.TaskRun(154250), pybossa.TaskRun(163983), pybossa.TaskRun(163987), pybossa.TaskRun(165169), pybossa.TaskRun(175363), pybossa.TaskRun(179023), pybossa.TaskRun(189571), pybossa.TaskRun(196359), pybossa.TaskRun(196736), pybossa.TaskRun(209642), pybossa.TaskRun(210179), pybossa.TaskRun(210652), pybossa.TaskRun(211688), pybossa.TaskRun(214622), pybossa.TaskRun(220670), pybossa.TaskRun(226763), pybossa.TaskRun(229376), pybossa.TaskRun(229411), pybossa.TaskRun(232332), pybossa.TaskRun(235082)], 256072: [pybossa.TaskRun(81769), pybossa.TaskRun(88028), pybossa.TaskRun(94596), pybossa.TaskRun(122054), pybossa.TaskRun(145673), pybossa.TaskRun(147291), pybossa.TaskRun(154267), pybossa.TaskRun(163984), pybossa.TaskRun(163988), pybossa.TaskRun(165170), pybossa.TaskRun(175364), pybossa.TaskRun(179024), pybossa.TaskRun(189576), pybossa.TaskRun(196360), pybossa.TaskRun(196737), pybossa.TaskRun(209643), pybossa.TaskRun(210180), pybossa.TaskRun(210653), pybossa.TaskRun(211689), pybossa.TaskRun(214623), pybossa.TaskRun(229377), pybossa.TaskRun(229378), pybossa.TaskRun(229379), pybossa.TaskRun(229412), pybossa.TaskRun(232333), pybossa.TaskRun(235083), pybossa.TaskRun(235126), pybossa.TaskRun(236173), pybossa.TaskRun(236825), pybossa.TaskRun(257037)], 256073: [pybossa.TaskRun(81770), pybossa.TaskRun(94597), pybossa.TaskRun(122055), pybossa.TaskRun(154274), pybossa.TaskRun(163985), pybossa.TaskRun(163989), pybossa.TaskRun(165171), pybossa.TaskRun(175365), pybossa.TaskRun(179026), pybossa.TaskRun(189587), pybossa.TaskRun(196361), pybossa.TaskRun(196739), pybossa.TaskRun(209644), pybossa.TaskRun(210181), pybossa.TaskRun(210654), pybossa.TaskRun(211690), pybossa.TaskRun(214624), pybossa.TaskRun(229413), pybossa.TaskRun(232334), pybossa.TaskRun(235084), pybossa.TaskRun(235127), pybossa.TaskRun(257038), pybossa.TaskRun(257840), pybossa.TaskRun(262876), pybossa.TaskRun(346996), pybossa.TaskRun(358431), pybossa.TaskRun(399260), pybossa.TaskRun(416175), pybossa.TaskRun(435963), pybossa.TaskRun(586064)], 256074: [pybossa.TaskRun(81771), pybossa.TaskRun(94598), pybossa.TaskRun(122056), pybossa.TaskRun(154281), pybossa.TaskRun(163986), pybossa.TaskRun(163990), pybossa.TaskRun(165172), pybossa.TaskRun(179027), pybossa.TaskRun(189616), pybossa.TaskRun(196362), pybossa.TaskRun(196740), pybossa.TaskRun(209645), pybossa.TaskRun(210655), pybossa.TaskRun(211691), pybossa.TaskRun(214625), pybossa.TaskRun(232335), pybossa.TaskRun(235128), pybossa.TaskRun(257039), pybossa.TaskRun(262877), pybossa.TaskRun(347003), pybossa.TaskRun(358433), pybossa.TaskRun(399261), pybossa.TaskRun(416176), pybossa.TaskRun(435968), pybossa.TaskRun(586111), pybossa.TaskRun(589078), pybossa.TaskRun(592392), pybossa.TaskRun(592873), pybossa.TaskRun(593926), pybossa.TaskRun(596473)], 256075: [pybossa.TaskRun(82060), pybossa.TaskRun(94599), pybossa.TaskRun(122057), pybossa.TaskRun(154315), pybossa.TaskRun(163991), pybossa.TaskRun(165173), pybossa.TaskRun(189643), pybossa.TaskRun(196363), pybossa.TaskRun(196741), pybossa.TaskRun(209646), pybossa.TaskRun(210656), pybossa.TaskRun(211692), pybossa.TaskRun(214626), pybossa.TaskRun(235871), pybossa.TaskRun(257040), pybossa.TaskRun(262878), pybossa.TaskRun(347043), pybossa.TaskRun(358436), pybossa.TaskRun(416177), pybossa.TaskRun(435973), pybossa.TaskRun(586160), pybossa.TaskRun(589079), pybossa.TaskRun(592393), pybossa.TaskRun(592874), pybossa.TaskRun(592875), pybossa.TaskRun(592876), pybossa.TaskRun(592877), pybossa.TaskRun(592878), pybossa.TaskRun(593927), pybossa.TaskRun(596550)]}
OK, so the data are ready to be analyzed ;-) As I said before, we have a data frame per task, so it is really easy to analyze the results of the contributed answers by our volunteers.
Enki uses Pandas package, so it is really easy to statistically analyze the answers. For example, lets get an overview of the answers for the first task of the application within the data frame:
NOTE: Enki explodes the PyBossa task_run.info field if it is a dictionary. In this case, the Vimeo task_runs have within the info field, a dictionary with this structure: task_run.info = {'answer': 'Yes'}, so the following command will show a column with the name answer automatically for us.
e.task_runs_df[e.tasks[0].id]
answer | app_id | calibration | created | finish_time | id | info | link | links | task_id | timeout | user_id | user_ip | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
81768 | No | 598 | None | 2013-06-18T09:08:19.075539 | 2013-06-18T09:08:19.075560 | 81768 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 3 | None |
82458 | Yes | 598 | None | 2013-06-24T12:52:21.573111 | 2013-06-24T12:52:21.573128 | 82458 | {u'answer': u'Yes'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 125.22.43.12 |
88027 | NaN | 598 | None | 2013-06-28T15:00:07.695365 | 2013-06-28T15:00:07.695384 | 88027 | {} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1138 | None |
94595 | No | 598 | None | 2013-07-16T10:54:44.987977 | 2013-07-16T10:54:44.987993 | 94595 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 46.162.161.63 |
96020 | Yes | 598 | None | 2013-07-21T01:28:44.488405 | 2013-07-21T01:28:44.488419 | 96020 | {u'answer': u'Yes'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 201.250.176.37 |
109055 | No | 598 | None | 2013-08-03T11:01:53.143951 | 2013-08-03T11:01:53.143968 | 109055 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1258 | None |
122053 | Yes | 598 | None | 2013-08-05T11:31:44.318728 | 2013-08-05T11:31:44.318744 | 122053 | {u'answer': u'Yes'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 194.81.199.55 |
134030 | NotKnown | 598 | None | 2013-08-07T09:28:09.741601 | 2013-08-07T09:28:09.741625 | 134030 | {u'answer': u'NotKnown'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1323 | None |
145662 | NaN | 598 | None | 2013-08-08T16:15:34.834110 | 2013-08-08T16:15:34.834130 | 145662 | {} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 79.158.104.138 |
147257 | No | 598 | None | 2013-08-08T17:46:54.688632 | 2013-08-08T17:46:54.688650 | 147257 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1484 | None |
154250 | No | 598 | None | 2013-08-09T09:13:07.940880 | 2013-08-09T09:13:07.940898 | 154250 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1590 | None |
163983 | NotKnown | 598 | None | 2013-08-11T01:47:58.604305 | 2013-08-11T01:47:58.604327 | 163983 | {u'answer': u'NotKnown'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 189.71.6.185 |
163987 | Yes | 598 | None | 2013-08-11T02:01:00.579848 | 2013-08-11T02:01:00.579867 | 163987 | {u'answer': u'Yes'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1659 | None |
165169 | No | 598 | None | 2013-08-11T15:23:08.467515 | 2013-08-11T15:23:08.467533 | 165169 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1379 | None |
175363 | NaN | 598 | None | 2013-08-13T20:32:30.328999 | 2013-08-13T20:32:30.329018 | 175363 | {} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 99.243.63.115 |
179023 | NotKnown | 598 | None | 2013-08-14T11:33:29.942327 | 2013-08-14T11:33:29.942344 | 179023 | {u'answer': u'NotKnown'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1824 | None |
189571 | No | 598 | None | 2013-08-15T17:51:32.663308 | 2013-08-15T17:51:32.663325 | 189571 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 82.234.143.184 |
196359 | NotKnown | 598 | None | 2013-08-16T20:08:36.783964 | 2013-08-16T20:08:36.783985 | 196359 | {u'answer': u'NotKnown'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 82.228.190.168 |
196736 | No | 598 | None | 2013-08-16T23:41:46.782379 | 2013-08-16T23:41:46.782396 | 196736 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 2024 | None |
209642 | No | 598 | None | 2013-08-20T07:48:59.958100 | 2013-08-20T07:48:59.958119 | 209642 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 88.186.11.35 |
210179 | No | 598 | None | 2013-08-20T15:50:45.972478 | 2013-08-20T15:50:45.972499 | 210179 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 129.11.77.198 |
210652 | Yes | 598 | None | 2013-08-20T22:23:01.771939 | 2013-08-20T22:23:01.771963 | 210652 | {u'answer': u'Yes'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 2134 | None |
211688 | Yes | 598 | None | 2013-08-21T15:53:44.000784 | 2013-08-21T15:53:44.000801 | 211688 | {u'answer': u'Yes'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 2147 | None |
214622 | NaN | 598 | None | 2013-08-22T12:09:41.845525 | 2013-08-22T12:09:41.845541 | 214622 | {} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 1869 | None |
220670 | Yes | 598 | None | 2013-08-23T10:12:17.192103 | 2013-08-23T10:12:17.192124 | 220670 | {u'answer': u'Yes'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 61.115.201.241 |
226763 | No | 598 | None | 2013-08-27T11:39:25.651255 | 2013-08-27T11:39:25.651273 | 226763 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 129.31.25.80 |
229376 | No | 598 | None | 2013-08-29T16:03:31.663031 | 2013-08-29T16:03:31.663049 | 229376 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 2199 | None |
229411 | No | 598 | None | 2013-08-29T23:08:07.813845 | 2013-08-29T23:08:07.813865 | 229411 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 177.158.76.105 |
232332 | No | 598 | None | 2013-09-06T08:58:02.237716 | 2013-09-06T08:58:02.237731 | 232332 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | 2237 | None |
235082 | No | 598 | None | 2013-09-13T08:25:04.359900 | 2013-09-13T08:25:04.359918 | 235082 | {u'answer': u'No'} | <link rel='self' title='taskrun' href='http://... | [<link rel='parent' title='app' href='http://c... | 256071 | None | NaN | 192.128.3.241 |
As we have a column with all the answers, called answer let's analyze it for our current application Vimeo.
NOTE: The possible answers that the volunteers can provide are: Yes, No, or I don't know. Enki will detect it, and look for the most voted answer, showing all the relevant information.
e.task_runs_df[e.tasks[0].id]['answer'].describe()
count 26 unique 3 top No freq 15 dtype: object
And now you can iterate over each task and get the most voted answer from the users with the following snippet of code:
for t in e.tasks:
desc = e.task_runs_df[t.id]['answer'].describe()
print "The top answer for task.id %s is %s" % (t.id, desc['top'])
The top answer for task.id 256071 is No The top answer for task.id 256072 is Yes The top answer for task.id 256073 is Yes The top answer for task.id 256074 is Yes The top answer for task.id 256075 is Yes
If you want to create some charts and graphics, you will need to install matplotlib:
!pip install matplotlib
Requirement already satisfied (use --upgrade to upgrade): matplotlib in ./env/lib/python2.7/site-packages Requirement already satisfied (use --upgrade to upgrade): numpy>=1.5 in ./env/lib/python2.7/site-packages (from matplotlib) Requirement already satisfied (use --upgrade to upgrade): python-dateutil in ./env/lib/python2.7/site-packages (from matplotlib) Requirement already satisfied (use --upgrade to upgrade): tornado in ./env/lib/python2.7/site-packages (from matplotlib) Requirement already satisfied (use --upgrade to upgrade): pyparsing>=1.5.6,!=2.0.0 in ./env/lib/python2.7/site-packages (from matplotlib) Requirement already satisfied (use --upgrade to upgrade): nose in ./env/lib/python2.7/site-packages (from matplotlib) Requirement already satisfied (use --upgrade to upgrade): six in ./env/lib/python2.7/site-packages (from python-dateutil->matplotlib) Cleaning up...
Now we count the values, create a Pandas Serie and plot it to see the distribution of answers for the first task:
s = e.task_runs_df[e.tasks[0].id]['answer'].value_counts()
s.plot(kind='bar', rot=0)
<matplotlib.axes.AxesSubplot at 0x3ee4c50>