Hacker News(HN) is a social news website owned by the Y-Combinator that focuses on Computer Science and Entreprenuership. It is a widely read site that focuses on infosec related topics such as hacking news, cyber attacks, computer security among others. It attracts 8 million readers monthly with whom are IT experts, hackers and enthusiasts. With this, we analysed the site's data and come up with the number of posts and comments by hour of post creation, average number points and comments by hour of posting and most read articles.
Our aim is to analyse the data available on the Hacker News site and reccommend on the best hour to post and the best topic.
To achieve this I will use libraries such as
pandas
,matplotlib
andnumpy
provided by Python.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
hn_df=pd.read_csv("HN_posts_year_to_Sep_26_2016.csv")
hn_df
id | title | url | num_points | num_comments | author | created_at | |
---|---|---|---|---|---|---|---|
0 | 12579008 | You have two days to comment if you want stem ... | http://www.regulations.gov/document?D=FDA-2015... | 1 | 0 | altstar | 9/26/2016 3:26 |
1 | 12579005 | SQLAR the SQLite Archiver | https://www.sqlite.org/sqlar/doc/trunk/README.md | 1 | 0 | blacksqr | 9/26/2016 3:24 |
2 | 12578997 | What if we just printed a flatscreen televisio... | https://medium.com/vanmoof/our-secrets-out-f21... | 1 | 0 | pavel_lishin | 9/26/2016 3:19 |
3 | 12578989 | algorithmic music | http://cacm.acm.org/magazines/2011/7/109891-al... | 1 | 0 | poindontcare | 9/26/2016 3:16 |
4 | 12578979 | How the Data Vault Enables the Next-Gen Data W... | https://www.talend.com/blog/2016/05/12/talend-... | 1 | 0 | markgainor1 | 9/26/2016 3:14 |
... | ... | ... | ... | ... | ... | ... | ... |
293114 | 10176919 | Ask HN: What is/are your favorite quote(s)? | NaN | 15 | 20 | kumarski | 9/6/2015 6:02 |
293115 | 10176917 | Attention and awareness in stage magic: turnin... | http://people.cs.uchicago.edu/~luitien/nrn2473... | 14 | 0 | stakent | 9/6/2015 6:01 |
293116 | 10176908 | Dying vets fuck you letter (2013) | http://dangerousminds.net/comments/dying_vets_... | 10 | 2 | mycodebreaks | 9/6/2015 5:56 |
293117 | 10176907 | PHP 7 Coolest Features: Space Ships, Type Hint... | https://www.zend.com/en/resources/php-7 | 2 | 0 | Garbage | 9/6/2015 5:55 |
293118 | 10176903 | Toyota Establishes Research Centers with MIT a... | http://newsroom.toyota.co.jp/en/detail/9233109/ | 4 | 0 | tim_sw | 9/6/2015 5:50 |
293119 rows × 7 columns
ask_post= []
show_post=[]
other_post=[]
for title in hn_df['title']:
if title.startswith('Ask'):
ask_post.append (title)
elif title.startswith('Show'):
show_post.append (title)
else:
other_post.append (title)
print(len(other_post))
print(len(ask_post))
print (len(show_post))
273664 9248 10207
As of the data analysed from the data frame, there were 9248 posts that started with the words Ask HN, 10207 with the words Show HN and the remaining 273664 were categorised in the Other field. Simply any other that was not the two mentioned prior. The first few words of every post have a direct corelation with the number of points and comments as will be analysed later.
ask_df= hn_df[hn_df ['title'].str.startswith ('Ask')].copy()
ask_df.sort_values('num_points',ascending=False, inplace=True)
ask_df[['title','num_points','num_comments']].head(20)
title | num_points | num_comments | |
---|---|---|---|
145256 | Ask HN: How much do you make at Amazon? Here i... | 1213 | 691 |
130553 | Ask HN: Pick startups for YC to fund | 867 | 286 |
11218 | Ask HN: Is web programming a series of hacks o... | 822 | 660 |
9004 | Ask HN: What's your favorite HN post? | 691 | 138 |
32876 | Ask HN: Is it possible to run your own mail se... | 648 | 299 |
86558 | Ask HN: Who is hiring? (June 2016) | 644 | 1007 |
43916 | Ask HN: What was your why didn't I start doing... | 630 | 767 |
63744 | Ask HN: Who is hiring? (July 2016) | 566 | 898 |
109928 | Ask HN: Who is hiring? (May 2016) | 553 | 937 |
42275 | Ask HN: Who is hiring? (August 2016) | 534 | 947 |
18411 | Ask HN: Who is hiring? (September 2016) | 521 | 910 |
263117 | Ask HN: How to Be a Good Technical Lead? | 520 | 174 |
249406 | Ask HN: Who is hiring? (November 2015) | 512 | 896 |
64356 | Ask HN: Just got an innocent man out of prison... | 510 | 199 |
46837 | Ask HN: Why don't companies hire programmers f... | 498 | 348 |
249731 | Ask HN: New attempt at mobile markup keep or ... | 497 | 285 |
121694 | Ask HN: Describe your first enterprise sale | 496 | 96 |
158995 | Ask HN: Who is hiring? (March 2016) | 488 | 825 |
183255 | Ask HN: Who is hiring? (February 2016) | 455 | 778 |
45530 | Ask HN: Anonymous person sent proof of SSH acc... | 450 | 231 |
total_comments=0
for comments in ask_df['num_comments']:
total_comments=total_comments+ int(comments)
average_comments=round(total_comments/ask_df.shape[0],2)
print('The ask_post has '+ str(average_comments)+ ' comments on average')
total_comments
The ask_post has 10.35 comments on average
95671
relativity=(ask_df["num_points"].sum()) / (ask_df["num_comments"].sum())
relativity
1.0900272809942406
show_df= hn_df[hn_df ['title'].str.startswith ('Show')].copy()
show_df.sort_values('num_points',ascending=False, inplace=True)
show_df[['title','num_points','num_comments']].head(20)
title | num_points | num_comments | |
---|---|---|---|
46219 | Show HN: Web Design in 4 minutes | 1624 | 152 |
289195 | Show HN: Make a programmable mirror | 1172 | 136 |
214398 | Show HN: Open Hunt an open and community-run ... | 1093 | 180 |
35517 | Show HN: Generating fantasy maps an interacti... | 1004 | 64 |
4362 | Show HN: Primitive Pictures | 893 | 169 |
118299 | Show HN: I made an interactive Bootstrap 4 che... | 847 | 35 |
83116 | Show HN: New calendar app idea | 825 | 197 |
176162 | Show HN: I've been writing daily TILs for a year | 819 | 150 |
251128 | Show HN: Twitch Installs Arch Linux A coopera... | 802 | 250 |
78072 | Show HN: I made a database of remote companies | 747 | 114 |
229163 | Show HN: Something pointless I made | 747 | 102 |
278169 | Show HN: I spent a year making an electro-mech... | 681 | 103 |
241787 | Show HN: Parinfer a simpler way to write Lisp | 674 | 134 |
24471 | Show HN: Carbide A New Programming Environment | 628 | 136 |
174950 | Show HN: Htop 2.0 released, now cross-platform | 613 | 153 |
282857 | Show HN: %%30%30: A Game | 608 | 89 |
282903 | Show HN: Hacker News Simulator | 572 | 163 |
134197 | Show HN: What every browser knows about you | 553 | 206 |
265877 | Show HN: Exposé a static site generator for ... | 548 | 73 |
41144 | Show HN: Noms A new decentralized database ba... | 508 | 167 |
total_comments=0
for comments in show_df['num_comments']:
total_comments=total_comments+ int(comments)
average_comments=round(total_comments/show_df.shape[0],2)
print('The show_post has '+ str(average_comments)+ ' comments on average')
show_df
total_comments
The show_post has 4.87 comments on average
49668
relativity=(show_df["num_points"].sum()) / (show_df["num_comments"].sum())
relativity
3.041495530321334
ASK posts has a total of 95671 comments and 10.35 on average. SHOW posts has a total of 49668 comments and 4.87 on average. Based on the above findings, creating a post on the Hacker News site whose first words are ASK HN, is most likely to get a higher number of comments than another one that has SHOW HN as the first two words. In that case I would reccommend one to post an article that starts with ASK HN as the first two words.
RELATIVITY
On the ASK HN posts, for every one point, there is one comment whereas on the SHOW HN posts, for every one point there are three comments.
hourpattern=r'[0-9]{1,2}?\S[0-9]{1,2}?\S[0-9]{4}\W([0-9]{1,2}?)\S[0-9]{2}'
ask_df['hour']=ask_df['created_at'].str.extract(hourpattern)
ask_df
id | title | url | num_points | num_comments | author | created_at | hour | |
---|---|---|---|---|---|---|---|---|
145256 | 11312984 | Ask HN: How much do you make at Amazon? Here i... | NaN | 1213 | 691 | boren_ave11 | 3/18/2016 16:43 | 16 |
130553 | 11440627 | Ask HN: Pick startups for YC to fund | NaN | 867 | 286 | dang | 4/6/2016 18:19 | 18 |
11218 | 12477190 | Ask HN: Is web programming a series of hacks o... | NaN | 822 | 660 | barefootcoder | 9/12/2016 4:29 | 4 |
9004 | 12496558 | Ask HN: What's your favorite HN post? | NaN | 691 | 138 | rkhraishi | 9/14/2016 13:20 | 13 |
32876 | 12282231 | Ask HN: Is it possible to run your own mail se... | NaN | 648 | 299 | jdmoreira | 8/13/2016 17:25 | 17 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
49154 | 12141044 | Ask HN: What would happen if AI would write Pr... | NaN | 1 | 1 | holaboyperu | 7/22/2016 0:42 | 0 |
184957 | 10998232 | Ask HN: Do you know/use speaker recognition so... | NaN | 1 | 0 | alistproducer2 | 1/29/2016 20:27 | 20 |
49136 | 12141157 | Ask HN: How to Peacock my Resume | NaN | 1 | 1 | thirstysusrando | 7/22/2016 1:26 | 1 |
185081 | 10997158 | Ask HN: Best Android phone today? | NaN | 1 | 2 | simonebrunozzi | 1/29/2016 18:15 | 18 |
193157 | 10936916 | Ask HN: Whats a good book to learn Startup 'Ma... | NaN | 1 | 2 | nns | 1/20/2016 9:36 | 9 |
9248 rows × 8 columns
ask_df['hour'].value_counts(ascending=False)
15 651 18 626 17 598 16 586 19 562 14 523 21 520 20 516 13 447 22 383 12 350 23 348 11 315 0 301 10 287 1 286 3 274 2 270 8 260 4 246 6 238 7 227 9 224 5 210 Name: hour, dtype: int64
hourpattern= r'[0-9]{1,2}?\S[0-9]{1,2}\S[0-9]{4}\W([0-9]{1,2})\S[0-9]{1,2}?'
show_df['hour']=show_df['created_at'].str. extract(hourpattern)
show_df
id | title | url | num_points | num_comments | author | created_at | hour | |
---|---|---|---|---|---|---|---|---|
46219 | 12166687 | Show HN: Web Design in 4 minutes | http://jgthms.com/web-design-in-4-minutes/ | 1624 | 152 | bbx | 7/26/2016 16:17 | 16 |
289195 | 10204018 | Show HN: Make a programmable mirror | https://github.com/HannahMitt/HomeMirror | 1172 | 136 | hannahmitt | 9/11/2015 14:58 | 14 |
214398 | 10759879 | Show HN: Open Hunt an open and community-run ... | https://www.openhunt.co | 1093 | 180 | mhurwi | 12/18/2015 18:12 | 18 |
35517 | 12260794 | Show HN: Generating fantasy maps an interacti... | http://mewo2.com/notes/terrain/ | 1004 | 64 | mewo2 | 8/10/2016 11:17 | 11 |
4362 | 12539109 | Show HN: Primitive Pictures | https://github.com/fogleman/primitive | 893 | 169 | fogleman | 9/20/2016 12:55 | 12 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
38865 | 12231435 | Show HN: A tool that recycles your best social... | http://recurpost.com | 1 | 1 | dinwal | 8/5/2016 11:32 | 11 |
184997 | 10997980 | Show HN: Silence your phone | http://georing.strikingly.com/ | 1 | 0 | dwursteisen | 1/29/2016 19:50 | 19 |
184942 | 10998349 | Show HN: Multiple invitations on top of devise... | https://github.com/RoxasShadow/devise_invitations | 1 | 0 | RoxasShadow | 1/29/2016 20:44 | 20 |
84569 | 11830999 | Show HN: Noti 2.2.0: desktop/mobile notificati... | https://github.com/variadico/noti/releases/tag... | 1 | 0 | ahci8e | 6/3/2016 15:48 | 15 |
134272 | 11406809 | Show HN: Feather A Python Micro-Web Framework... | https://github.com/Max00355/Feather/blob/maste... | 1 | 1 | max0563 | 4/1/2016 17:41 | 17 |
10207 rows × 8 columns
show_df['hour'].value_counts(ascending=False)
15 836 16 806 17 764 14 700 18 661 13 613 19 558 20 528 12 519 21 432 11 403 22 380 10 325 23 320 8 318 9 303 0 280 1 248 7 237 2 210 3 206 4 196 6 193 5 171 Name: hour, dtype: int64
ask_df[['num_comments','hour']]
num_comments | hour | |
---|---|---|
145256 | 691 | 16 |
130553 | 286 | 18 |
11218 | 660 | 4 |
9004 | 138 | 13 |
32876 | 299 | 17 |
... | ... | ... |
49154 | 1 | 0 |
184957 | 0 | 20 |
49136 | 1 | 1 |
185081 | 2 | 18 |
193157 | 2 | 9 |
9248 rows × 2 columns
comment_dic={}
for h in range(0,24):
new_df=ask_df[ask_df['hour']==str(h)]
sum_comment=new_df['num_comments'].sum()
comment_dic[h]=sum_comment
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
comment_df
Hour | Total Comments | |
---|---|---|
15 | 15 | 18525 |
13 | 13 | 7242 |
17 | 17 | 5629 |
14 | 14 | 4985 |
18 | 18 | 4874 |
16 | 16 | 4601 |
21 | 21 | 4500 |
20 | 20 | 4470 |
12 | 12 | 4271 |
19 | 19 | 4139 |
22 | 22 | 3369 |
10 | 10 | 3018 |
2 | 2 | 2997 |
11 | 11 | 2798 |
23 | 23 | 2467 |
8 | 8 | 2378 |
4 | 4 | 2372 |
0 | 0 | 2265 |
3 | 3 | 2191 |
1 | 1 | 2091 |
5 | 5 | 1838 |
6 | 6 | 1589 |
7 | 7 | 1585 |
9 | 9 | 1477 |
comment_dic={}
for h in range(0,24):
new_df=show_df[show_df['hour']==str(h)]
sum_comment=new_df['num_comments'].sum()
comment_dic[h]=sum_comment
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
comment_df
Hour | Total Comments | |
---|---|---|
14 | 14 | 3844 |
15 | 15 | 3823 |
16 | 16 | 3771 |
12 | 12 | 3610 |
13 | 13 | 3314 |
17 | 17 | 3262 |
18 | 18 | 3243 |
19 | 19 | 2791 |
11 | 11 | 2413 |
20 | 20 | 2183 |
8 | 8 | 1770 |
21 | 21 | 1759 |
7 | 7 | 1576 |
22 | 22 | 1451 |
23 | 23 | 1443 |
9 | 9 | 1411 |
0 | 0 | 1284 |
10 | 10 | 1228 |
2 | 2 | 1076 |
1 | 1 | 1006 |
4 | 4 | 981 |
3 | 3 | 934 |
6 | 6 | 904 |
5 | 5 | 591 |
From the data above, 1500hrs is the hour with the most number of posts in both SHOW and ASK categories.
As for the number of comments on both post categories 1500hr leads in the ASK posts with a total of 18525 comments which is more than twice the number of comments in the hour that follows in merit! For the SHOW posts however, 1400 hrs takes the day with a total 3844 comments. The difference here is not as large as in the ASK category since 1500hrs follows closely with 3823 total comments.
Based on the data above, I would reccommend that a person who wants to post on Hacker News to do it at 1500hrs.
points_dic={}
for h in range(0,24):
new_df=ask_df[ask_df['hour']==str(h)]
sum_points=new_df['num_points'].sum()
points_dic[h]=sum_points
items= points_dic.items()
points_list= list(items)
points_df=pd.DataFrame(points_list)
points_df.columns=['Hour','Total Points']
points_df.sort_values('Total Points',ascending= False, inplace=True)
points_df
Hour | Total Points | |
---|---|---|
15 | 15 | 13991 |
13 | 13 | 7972 |
17 | 17 | 7236 |
18 | 18 | 6872 |
16 | 16 | 6088 |
14 | 14 | 5413 |
21 | 21 | 5051 |
19 | 19 | 4993 |
12 | 12 | 4713 |
20 | 20 | 4503 |
10 | 10 | 3800 |
22 | 22 | 3599 |
2 | 2 | 2945 |
11 | 11 | 2870 |
23 | 23 | 2852 |
0 | 0 | 2837 |
8 | 8 | 2756 |
1 | 1 | 2673 |
4 | 4 | 2667 |
3 | 3 | 2559 |
5 | 5 | 2047 |
7 | 7 | 2041 |
6 | 6 | 2038 |
9 | 9 | 1768 |
points_dic={}
for h in range(0,24):
new_df=show_df[show_df['hour']==str(h)]
sum_points=new_df['num_points'].sum()
points_dic[h]=sum_points
items= points_dic.items()
points_list= list(items)
points_df=pd.DataFrame(points_list)
points_df.columns=['Hour','Total Points']
points_df.sort_values('Total Points',ascending= False, inplace=True)
points_df
Hour | Total Points | |
---|---|---|
15 | 15 | 11653 |
16 | 16 | 11494 |
12 | 12 | 10790 |
17 | 17 | 10722 |
14 | 14 | 10511 |
13 | 13 | 10393 |
18 | 18 | 9951 |
19 | 19 | 8930 |
11 | 11 | 7743 |
20 | 20 | 6951 |
21 | 21 | 5995 |
23 | 23 | 5061 |
22 | 22 | 5036 |
8 | 8 | 4643 |
10 | 10 | 4307 |
0 | 0 | 4303 |
9 | 9 | 3764 |
7 | 7 | 3304 |
6 | 6 | 3072 |
1 | 1 | 2933 |
2 | 2 | 2765 |
4 | 4 | 2752 |
3 | 3 | 2168 |
5 | 5 | 1824 |
The hour that takes the lead in the points section, like in the comments, is 1500hrs. In the ASK posts, it leads with 13991 total number of points,again about *twice* the hour that follows and 11653 points in the SHOW category
comment_dic={}
for h in range(0,24):
new_df=ask_df[ask_df['hour']==str(h)]
sum_comment=new_df['num_comments'].sum()
num_time=new_df.shape[0]
comment_dic[h]=[sum_comment,num_time]
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
l=comment_df['Total Comments'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
list_1.append(rows[0])
list_2.append(rows[1])
comment_df['Total_comments']=list_1
comment_df['Total_users']=list_2
comment_df['average_comments'] = comment_df['Total_comments'].astype(float)/ comment_df['Total_users'].astype(float)
comment_df=comment_df.sort_values('average_comments',ascending=False)
comment_df[['Hour',"Total Comments","average_comments"]]
Hour | Total Comments | average_comments | |
---|---|---|---|
15 | 15 | [18525, 651] | 28.456221 |
13 | 13 | [7242, 447] | 16.201342 |
12 | 12 | [4271, 350] | 12.202857 |
2 | 2 | [2997, 270] | 11.100000 |
10 | 10 | [3018, 287] | 10.515679 |
4 | 4 | [2372, 246] | 9.642276 |
14 | 14 | [4985, 523] | 9.531549 |
17 | 17 | [5629, 598] | 9.413043 |
8 | 8 | [2378, 260] | 9.146154 |
11 | 11 | [2798, 315] | 8.882540 |
22 | 22 | [3369, 383] | 8.796345 |
5 | 5 | [1838, 210] | 8.752381 |
20 | 20 | [4470, 516] | 8.662791 |
21 | 21 | [4500, 520] | 8.653846 |
3 | 3 | [2191, 274] | 7.996350 |
16 | 16 | [4601, 586] | 7.851536 |
18 | 18 | [4874, 626] | 7.785942 |
0 | 0 | [2265, 301] | 7.524917 |
19 | 19 | [4139, 562] | 7.364769 |
1 | 1 | [2091, 286] | 7.311189 |
23 | 23 | [2467, 348] | 7.089080 |
7 | 7 | [1585, 227] | 6.982379 |
6 | 6 | [1589, 238] | 6.676471 |
9 | 9 | [1477, 224] | 6.593750 |
comment_dic={}
for h in range(0,24):
new_df=show_df[show_df['hour']==str(h)]
sum_comment=new_df['num_comments'].sum()
num_time=new_df.shape[0]
comment_dic[h]=[sum_comment,num_time]
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
l=comment_df['Total Comments'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
list_1.append(rows[0])
list_2.append(rows[1])
comment_df['Total_comments']=list_1
comment_df['Total_users']=list_2
comment_df['average_comments'] = comment_df['Total_comments'].astype(float)/ comment_df['Total_users'].astype(float)
comment_df=comment_df.sort_values('average_comments',ascending=False)
comment_df[['Hour',"Total Comments","average_comments"]]
Hour | Total Comments | average_comments | |
---|---|---|---|
12 | 12 | [3610, 519] | 6.955684 |
7 | 7 | [1576, 237] | 6.649789 |
11 | 11 | [2413, 403] | 5.987593 |
8 | 8 | [1770, 318] | 5.566038 |
14 | 14 | [3844, 700] | 5.491429 |
13 | 13 | [3314, 613] | 5.406199 |
2 | 2 | [1076, 210] | 5.123810 |
4 | 4 | [981, 196] | 5.005102 |
19 | 19 | [2791, 558] | 5.001792 |
18 | 18 | [3243, 661] | 4.906203 |
6 | 6 | [904, 193] | 4.683938 |
16 | 16 | [3771, 806] | 4.678660 |
9 | 9 | [1411, 303] | 4.656766 |
0 | 0 | [1284, 280] | 4.585714 |
15 | 15 | [3823, 836] | 4.572967 |
3 | 3 | [934, 206] | 4.533981 |
23 | 23 | [1443, 320] | 4.509375 |
17 | 17 | [3262, 764] | 4.269634 |
20 | 20 | [2183, 528] | 4.134470 |
21 | 21 | [1759, 432] | 4.071759 |
1 | 1 | [1006, 248] | 4.056452 |
22 | 22 | [1451, 380] | 3.818421 |
10 | 10 | [1228, 325] | 3.778462 |
5 | 5 | [591, 171] | 3.456140 |
comment_dic={}
for h in range(0,24):
new_df=show_df[show_df['hour']==str(h)]
sum_comment=new_df['num_comments'].sum()
num_time=new_df.shape[0]
comment_dic[h]=[sum_comment,num_time]
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
l=comment_df['Total Comments'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
list_1.append(rows[0])
list_2.append(rows[1])
comment_df['Total_comments']=list_1
comment_df['Total_users']=list_2
comment_df['average_comments'] = comment_df['Total_comments'].astype(float)/ comment_df['Total_users'].astype(float)
comment_df=comment_df.sort_values('average_comments',ascending=False)
comment_df[['Hour',"Total Comments","average_comments"]]
Hour | Total Comments | average_comments | |
---|---|---|---|
12 | 12 | [3610, 519] | 6.955684 |
7 | 7 | [1576, 237] | 6.649789 |
11 | 11 | [2413, 403] | 5.987593 |
8 | 8 | [1770, 318] | 5.566038 |
14 | 14 | [3844, 700] | 5.491429 |
13 | 13 | [3314, 613] | 5.406199 |
2 | 2 | [1076, 210] | 5.123810 |
4 | 4 | [981, 196] | 5.005102 |
19 | 19 | [2791, 558] | 5.001792 |
18 | 18 | [3243, 661] | 4.906203 |
6 | 6 | [904, 193] | 4.683938 |
16 | 16 | [3771, 806] | 4.678660 |
9 | 9 | [1411, 303] | 4.656766 |
0 | 0 | [1284, 280] | 4.585714 |
15 | 15 | [3823, 836] | 4.572967 |
3 | 3 | [934, 206] | 4.533981 |
23 | 23 | [1443, 320] | 4.509375 |
17 | 17 | [3262, 764] | 4.269634 |
20 | 20 | [2183, 528] | 4.134470 |
21 | 21 | [1759, 432] | 4.071759 |
1 | 1 | [1006, 248] | 4.056452 |
22 | 22 | [1451, 380] | 3.818421 |
10 | 10 | [1228, 325] | 3.778462 |
5 | 5 | [591, 171] | 3.456140 |
point_dic={}
for h in range(0,24):
new_df=ask_df[ask_df['hour']==str(h)]
sum_point=new_df['num_points'].sum()
num_time=new_df.shape[0]
point_dic[h]=[sum_point,num_time]
items= point_dic.items()
point_list= list(items)
point_df=pd.DataFrame(point_list)
point_df.columns=['Hour','Total Points']
point_df.sort_values('Total Points',ascending= False, inplace=True)
l=point_df['Total Points'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
list_1.append(rows[0])
list_2.append(rows[1])
point_df['Total_points']=list_1
point_df['Total_users']=list_2
point_df['average_points'] = point_df['Total_points'].astype(float)/ point_df['Total_users'].astype(float)
point_df=point_df.sort_values('average_points',ascending=False)
point_df[['Hour',"Total Points","average_points"]]
point_df
Hour | Total Points | Total_points | Total_users | average_points | |
---|---|---|---|---|---|
15 | 15 | [13991, 651] | 13991 | 651 | 21.491551 |
13 | 13 | [7972, 447] | 7972 | 447 | 17.834452 |
12 | 12 | [4713, 350] | 4713 | 350 | 13.465714 |
10 | 10 | [3800, 287] | 3800 | 287 | 13.240418 |
17 | 17 | [7236, 598] | 7236 | 598 | 12.100334 |
18 | 18 | [6872, 626] | 6872 | 626 | 10.977636 |
2 | 2 | [2945, 270] | 2945 | 270 | 10.907407 |
4 | 4 | [2667, 246] | 2667 | 246 | 10.841463 |
8 | 8 | [2756, 260] | 2756 | 260 | 10.600000 |
16 | 16 | [6088, 586] | 6088 | 586 | 10.389078 |
14 | 14 | [5413, 523] | 5413 | 523 | 10.349904 |
5 | 5 | [2047, 210] | 2047 | 210 | 9.747619 |
21 | 21 | [5051, 520] | 5051 | 520 | 9.713462 |
0 | 0 | [2837, 301] | 2837 | 301 | 9.425249 |
22 | 22 | [3599, 383] | 3599 | 383 | 9.396867 |
1 | 1 | [2673, 286] | 2673 | 286 | 9.346154 |
3 | 3 | [2559, 274] | 2559 | 274 | 9.339416 |
11 | 11 | [2870, 315] | 2870 | 315 | 9.111111 |
7 | 7 | [2041, 227] | 2041 | 227 | 8.991189 |
19 | 19 | [4993, 562] | 4993 | 562 | 8.884342 |
20 | 20 | [4503, 516] | 4503 | 516 | 8.726744 |
6 | 6 | [2038, 238] | 2038 | 238 | 8.563025 |
23 | 23 | [2852, 348] | 2852 | 348 | 8.195402 |
9 | 9 | [1768, 224] | 1768 | 224 | 7.892857 |
point_dic={}
for h in range(0,24):
new_df=show_df[show_df['hour']==str(h)]
sum_point=new_df['num_points'].sum()
num_time=new_df.shape[0]
point_dic[h]=[sum_point,num_time]
items= point_dic.items()
point_list= list(items)
point_df=pd.DataFrame(point_list)
point_df.columns=['Hour','Total Points']
point_df.sort_values('Total Points',ascending= False, inplace=True)
l=point_df['Total Points'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
list_1.append(rows[0])
list_2.append(rows[1])
point_df['Total_points']=list_1
point_df['Total_users']=list_2
point_df['average_points'] = point_df['Total_points'].astype(float)/ point_df['Total_users'].astype(float)
point_df=point_df.sort_values('average_points',ascending=False)
point_df[['Hour',"Total Points","average_points"]]
point_df
Hour | Total Points | Total_points | Total_users | average_points | |
---|---|---|---|---|---|
12 | 12 | [10790, 519] | 10790 | 519 | 20.789981 |
11 | 11 | [7743, 403] | 7743 | 403 | 19.213400 |
13 | 13 | [10393, 613] | 10393 | 613 | 16.954323 |
19 | 19 | [8930, 558] | 8930 | 558 | 16.003584 |
6 | 6 | [3072, 193] | 3072 | 193 | 15.917098 |
23 | 23 | [5061, 320] | 5061 | 320 | 15.815625 |
0 | 0 | [4303, 280] | 4303 | 280 | 15.367857 |
18 | 18 | [9951, 661] | 9951 | 661 | 15.054463 |
14 | 14 | [10511, 700] | 10511 | 700 | 15.015714 |
8 | 8 | [4643, 318] | 4643 | 318 | 14.600629 |
16 | 16 | [11494, 806] | 11494 | 806 | 14.260546 |
4 | 4 | [2752, 196] | 2752 | 196 | 14.040816 |
17 | 17 | [10722, 764] | 10722 | 764 | 14.034031 |
7 | 7 | [3304, 237] | 3304 | 237 | 13.940928 |
15 | 15 | [11653, 836] | 11653 | 836 | 13.938995 |
21 | 21 | [5995, 432] | 5995 | 432 | 13.877315 |
22 | 22 | [5036, 380] | 5036 | 380 | 13.252632 |
10 | 10 | [4307, 325] | 4307 | 325 | 13.252308 |
2 | 2 | [2765, 210] | 2765 | 210 | 13.166667 |
20 | 20 | [6951, 528] | 6951 | 528 | 13.164773 |
9 | 9 | [3764, 303] | 3764 | 303 | 12.422442 |
1 | 1 | [2933, 248] | 2933 | 248 | 11.826613 |
5 | 5 | [1824, 171] | 1824 | 171 | 10.666667 |
3 | 3 | [2168, 206] | 2168 | 206 | 10.524272 |
The hour with the highest average points on the ASK posts, is 1500hr with 21(rounded off) comments. This is interpreted as, for every post made in the 3:00P.M bracket, it is most likely to get 21 points.
In the SHOW posts category the leading hour is 12:00 at noon with an average of 20points.
In this analysis, we analysed a site called Hacker News. We only focused on posts that start with ASK HN and SHOW HN posts. Based on our analysis, posts starting with ASK HN received more points and more comments compared to SHOW HN. we can therefore conclude that the best topic to post about is one that starts with the words ASK HN and the best time to do it is 3:00pm. This topic and time has the highest number of comments and points and in turn higher renumerations.