EXPLORING HACKER NEWS SITE¶

Introduction¶

Hacker News(HN) is a social news website owned by the Y-Combinator that focuses on Computer Science and Entreprenuership. It is a widely read site that focuses on infosec related topics such as hacking news, cyber attacks, computer security among others. It attracts 8 million readers monthly with whom are IT experts, hackers and enthusiasts. With this, we analysed the site's data and come up with the number of posts and comments by hour of post creation, average number points and comments by hour of posting and most read articles.

Objective¶

Our aim is to analyse the data available on the Hacker News site and reccommend on the best hour to post and the best topic.

To achieve this I will use libraries such as pandas, matplotlib and numpyprovided by Python.

1. Importing necessary libraries¶

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:

hn_df=pd.read_csv("HN_posts_year_to_Sep_26_2016.csv")
hn_df

Out[2]:

	id	title	url	num_points	num_comments	author	created_at
0	12579008	You have two days to comment if you want stem ...	http://www.regulations.gov/document?D=FDA-2015...	1	0	altstar	9/26/2016 3:26
1	12579005	SQLAR the SQLite Archiver	https://www.sqlite.org/sqlar/doc/trunk/README.md	1	0	blacksqr	9/26/2016 3:24
2	12578997	What if we just printed a flatscreen televisio...	https://medium.com/vanmoof/our-secrets-out-f21...	1	0	pavel_lishin	9/26/2016 3:19
3	12578989	algorithmic music	http://cacm.acm.org/magazines/2011/7/109891-al...	1	0	poindontcare	9/26/2016 3:16
4	12578979	How the Data Vault Enables the Next-Gen Data W...	https://www.talend.com/blog/2016/05/12/talend-...	1	0	markgainor1	9/26/2016 3:14
...	...	...	...	...	...	...	...
293114	10176919	Ask HN: What is/are your favorite quote(s)?	NaN	15	20	kumarski	9/6/2015 6:02
293115	10176917	Attention and awareness in stage magic: turnin...	http://people.cs.uchicago.edu/~luitien/nrn2473...	14	0	stakent	9/6/2015 6:01
293116	10176908	Dying vets fuck you letter (2013)	http://dangerousminds.net/comments/dying_vets_...	10	2	mycodebreaks	9/6/2015 5:56
293117	10176907	PHP 7 Coolest Features: Space Ships, Type Hint...	https://www.zend.com/en/resources/php-7	2	0	Garbage	9/6/2015 5:55
293118	10176903	Toyota Establishes Research Centers with MIT a...	http://newsroom.toyota.co.jp/en/detail/9233109/	4	0	tim_sw	9/6/2015 5:50

293119 rows × 7 columns

2. Isolate 'Ask' and 'Show' posts¶

2.1 Seperating ASK and SHOW posts from OTHERS¶

In [3]:

ask_post= []
show_post=[]
other_post=[]

for title in hn_df['title']:
    if title.startswith('Ask'):
        ask_post.append (title)
    elif title.startswith('Show'):
        show_post.append (title)
    else:
        other_post.append (title)
        
    
    

In [4]:

print(len(other_post))
print(len(ask_post))
print (len(show_post))

273664
9248
10207

2.2 Finding on the total posts on each category¶

As of the data analysed from the data frame, there were 9248 posts that started with the words Ask HN, 10207 with the words Show HN and the remaining 273664 were categorised in the Other field. Simply any other that was not the two mentioned prior. The first few words of every post have a direct corelation with the number of points and comments as will be analysed later.

2.3 ASK posts¶

In [5]:

ask_df= hn_df[hn_df ['title'].str.startswith ('Ask')].copy()
ask_df.sort_values('num_points',ascending=False, inplace=True)
ask_df[['title','num_points','num_comments']].head(20)

Out[5]:

	title	num_points	num_comments
145256	Ask HN: How much do you make at Amazon? Here i...	1213	691
130553	Ask HN: Pick startups for YC to fund	867	286
11218	Ask HN: Is web programming a series of hacks o...	822	660
9004	Ask HN: What's your favorite HN post?	691	138
32876	Ask HN: Is it possible to run your own mail se...	648	299
86558	Ask HN: Who is hiring? (June 2016)	644	1007
43916	Ask HN: What was your why didn't I start doing...	630	767
63744	Ask HN: Who is hiring? (July 2016)	566	898
109928	Ask HN: Who is hiring? (May 2016)	553	937
42275	Ask HN: Who is hiring? (August 2016)	534	947
18411	Ask HN: Who is hiring? (September 2016)	521	910
263117	Ask HN: How to Be a Good Technical Lead?	520	174
249406	Ask HN: Who is hiring? (November 2015)	512	896
64356	Ask HN: Just got an innocent man out of prison...	510	199
46837	Ask HN: Why don't companies hire programmers f...	498	348
249731	Ask HN: New attempt at mobile markup keep or ...	497	285
121694	Ask HN: Describe your first enterprise sale	496	96
158995	Ask HN: Who is hiring? (March 2016)	488	825
183255	Ask HN: Who is hiring? (February 2016)	455	778
45530	Ask HN: Anonymous person sent proof of SSH acc...	450	231

2.3.1 Average number of comments on ASK posts¶

In [6]:

total_comments=0
for comments in ask_df['num_comments']:
    total_comments=total_comments+ int(comments)
    
average_comments=round(total_comments/ask_df.shape[0],2)
print('The ask_post has '+ str(average_comments)+ ' comments on average')
total_comments

The ask_post has 10.35 comments on average

Out[6]:

2.3.2 RELATIVITY OF COMMENTS AND POINTS IN 'ASK' POSTS¶

In [7]:

relativity=(ask_df["num_points"].sum()) / (ask_df["num_comments"].sum())
relativity

Out[7]:

1.0900272809942406

2.4 SHOW posts¶

In [8]:

show_df= hn_df[hn_df ['title'].str.startswith ('Show')].copy()
show_df.sort_values('num_points',ascending=False, inplace=True)
show_df[['title','num_points','num_comments']].head(20)

Out[8]:

	title	num_points	num_comments
46219	Show HN: Web Design in 4 minutes	1624	152
289195	Show HN: Make a programmable mirror	1172	136
214398	Show HN: Open Hunt an open and community-run ...	1093	180
35517	Show HN: Generating fantasy maps an interacti...	1004	64
4362	Show HN: Primitive Pictures	893	169
118299	Show HN: I made an interactive Bootstrap 4 che...	847	35
83116	Show HN: New calendar app idea	825	197
176162	Show HN: I've been writing daily TILs for a year	819	150
251128	Show HN: Twitch Installs Arch Linux A coopera...	802	250
78072	Show HN: I made a database of remote companies	747	114
229163	Show HN: Something pointless I made	747	102
278169	Show HN: I spent a year making an electro-mech...	681	103
241787	Show HN: Parinfer a simpler way to write Lisp	674	134
24471	Show HN: Carbide A New Programming Environment	628	136
174950	Show HN: Htop 2.0 released, now cross-platform	613	153
282857	Show HN: %%30%30: A Game	608	89
282903	Show HN: Hacker News Simulator	572	163
134197	Show HN: What every browser knows about you	553	206
265877	Show HN: ExposÃ© a static site generator for ...	548	73
41144	Show HN: Noms A new decentralized database ba...	508	167

2.4.1 TOTAL AND AVERAGE NUMBER OF COMMENTS ON THE 'SHOW' POSTS¶

In [9]:

total_comments=0
for comments in show_df['num_comments']:
    total_comments=total_comments+ int(comments)
    
average_comments=round(total_comments/show_df.shape[0],2)
print('The show_post has '+ str(average_comments)+ ' comments on average')
show_df
total_comments

The show_post has 4.87 comments on average

Out[9]:

2.4.2 RELATIVITY, NUMBER OF POINTS TO NUMBER OF COMMENTS IN 'SHOW' POSTS¶

In [10]:

relativity=(show_df["num_points"].sum()) / (show_df["num_comments"].sum())
relativity

Out[10]:

3.041495530321334

2.5 OBSERVATION ON THE MEAN OF THE TWO POST CATEGORIES AND THE COMMENTS, POINTS RELATIVITY¶

ASK posts has a total of 95671 comments and 10.35 on average. SHOW posts has a total of 49668 comments and 4.87 on average. Based on the above findings, creating a post on the Hacker News site whose first words are ASK HN, is most likely to get a higher number of comments than another one that has SHOW HN as the first two words. In that case I would reccommend one to post an article that starts with ASK HN as the first two words.

RELATIVITY

On the ASK HN posts, for every one point, there is one comment whereas on the SHOW HN posts, for every one point there are three comments.

3.Number of posts & comments by hour created¶

3.1 Number of ASK posts by hour created¶

In [11]:

hourpattern=r'[0-9]{1,2}?\S[0-9]{1,2}?\S[0-9]{4}\W([0-9]{1,2}?)\S[0-9]{2}'
ask_df['hour']=ask_df['created_at'].str.extract(hourpattern)
ask_df

Out[11]:

	id	title	url	num_points	num_comments	author	created_at	hour
145256	11312984	Ask HN: How much do you make at Amazon? Here i...	NaN	1213	691	boren_ave11	3/18/2016 16:43	16
130553	11440627	Ask HN: Pick startups for YC to fund	NaN	867	286	dang	4/6/2016 18:19	18
11218	12477190	Ask HN: Is web programming a series of hacks o...	NaN	822	660	barefootcoder	9/12/2016 4:29	4
9004	12496558	Ask HN: What's your favorite HN post?	NaN	691	138	rkhraishi	9/14/2016 13:20	13
32876	12282231	Ask HN: Is it possible to run your own mail se...	NaN	648	299	jdmoreira	8/13/2016 17:25	17
...	...	...	...	...	...	...	...	...
49154	12141044	Ask HN: What would happen if AI would write Pr...	NaN	1	1	holaboyperu	7/22/2016 0:42	0
184957	10998232	Ask HN: Do you know/use speaker recognition so...	NaN	1	0	alistproducer2	1/29/2016 20:27	20
49136	12141157	Ask HN: How to Peacock my Resume	NaN	1	1	thirstysusrando	7/22/2016 1:26	1
185081	10997158	Ask HN: Best Android phone today?	NaN	1	2	simonebrunozzi	1/29/2016 18:15	18
193157	10936916	Ask HN: Whats a good book to learn Startup 'Ma...	NaN	1	2	nns	1/20/2016 9:36	9

9248 rows × 8 columns

In [12]:

ask_df['hour'].value_counts(ascending=False)

Out[12]:

15    651
18    626
17    598
16    586
19    562
14    523
21    520
20    516
13    447
22    383
12    350
23    348
11    315
0     301
10    287
1     286
3     274
2     270
8     260
4     246
6     238
7     227
9     224
5     210
Name: hour, dtype: int64

3.2 Number of SHOW posts by hour created¶

In [13]:

hourpattern= r'[0-9]{1,2}?\S[0-9]{1,2}\S[0-9]{4}\W([0-9]{1,2})\S[0-9]{1,2}?'
show_df['hour']=show_df['created_at'].str. extract(hourpattern)
show_df

Out[13]:

	id	title	url	num_points	num_comments	author	created_at	hour
46219	12166687	Show HN: Web Design in 4 minutes	http://jgthms.com/web-design-in-4-minutes/	1624	152	bbx	7/26/2016 16:17	16
289195	10204018	Show HN: Make a programmable mirror	https://github.com/HannahMitt/HomeMirror	1172	136	hannahmitt	9/11/2015 14:58	14
214398	10759879	Show HN: Open Hunt an open and community-run ...	https://www.openhunt.co	1093	180	mhurwi	12/18/2015 18:12	18
35517	12260794	Show HN: Generating fantasy maps an interacti...	http://mewo2.com/notes/terrain/	1004	64	mewo2	8/10/2016 11:17	11
4362	12539109	Show HN: Primitive Pictures	https://github.com/fogleman/primitive	893	169	fogleman	9/20/2016 12:55	12
...	...	...	...	...	...	...	...	...
38865	12231435	Show HN: A tool that recycles your best social...	http://recurpost.com	1	1	dinwal	8/5/2016 11:32	11
184997	10997980	Show HN: Silence your phone	http://georing.strikingly.com/	1	0	dwursteisen	1/29/2016 19:50	19
184942	10998349	Show HN: Multiple invitations on top of devise...	https://github.com/RoxasShadow/devise_invitations	1	0	RoxasShadow	1/29/2016 20:44	20
84569	11830999	Show HN: Noti 2.2.0: desktop/mobile notificati...	https://github.com/variadico/noti/releases/tag...	1	0	ahci8e	6/3/2016 15:48	15
134272	11406809	Show HN: Feather A Python Micro-Web Framework...	https://github.com/Max00355/Feather/blob/maste...	1	1	max0563	4/1/2016 17:41	17

10207 rows × 8 columns

In [14]:

show_df['hour'].value_counts(ascending=False)

Out[14]:

15    836
16    806
17    764
14    700
18    661
13    613
19    558
20    528
12    519
21    432
11    403
22    380
10    325
23    320
8     318
9     303
0     280
1     248
7     237
2     210
3     206
4     196
6     193
5     171
Name: hour, dtype: int64

In [15]:

ask_df[['num_comments','hour']]

Out[15]:

	num_comments	hour
145256	691	16
130553	286	18
11218	660	4
9004	138	13
32876	299	17
...	...	...
49154	1	0
184957	0	20
49136	1	1
185081	2	18
193157	2	9

9248 rows × 2 columns

3.3 Number of comments on ASK posts by hour created¶

In [16]:

comment_dic={}

for h in range(0,24):
    new_df=ask_df[ask_df['hour']==str(h)]
    sum_comment=new_df['num_comments'].sum()
    comment_dic[h]=sum_comment
    
         
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
comment_df

Out[16]:

	Hour	Total Comments
15	15	18525
13	13	7242
17	17	5629
14	14	4985
18	18	4874
16	16	4601
21	21	4500
20	20	4470
12	12	4271
19	19	4139
22	22	3369
10	10	3018
2	2	2997
11	11	2798
23	23	2467
8	8	2378
4	4	2372
0	0	2265
3	3	2191
1	1	2091
5	5	1838
6	6	1589
7	7	1585
9	9	1477

In [ ]:

3.4 Number of comments on SHOW posts by hour created.¶

In [17]:

comment_dic={}

for h in range(0,24):
    new_df=show_df[show_df['hour']==str(h)]
    sum_comment=new_df['num_comments'].sum()
    comment_dic[h]=sum_comment
    
         
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
comment_df

Out[17]:

	Hour	Total Comments
14	14	3844
15	15	3823
16	16	3771
12	12	3610
13	13	3314
17	17	3262
18	18	3243
19	19	2791
11	11	2413
20	20	2183
8	8	1770
21	21	1759
7	7	1576
22	22	1451
23	23	1443
9	9	1411
0	0	1284
10	10	1228
2	2	1076
1	1	1006
4	4	981
3	3	934
6	6	904
5	5	591

3.5 OBSERVATION ON THE NUMBER OF POSTS AND COMMENTS BY HOUR CREATED¶

From the data above, 1500hrs is the hour with the most number of posts in both SHOW and ASK categories.

As for the number of comments on both post categories 1500hr leads in the ASK posts with a total of 18525 comments which is more than twice the number of comments in the hour that follows in merit! For the SHOW posts however, 1400 hrs takes the day with a total 3844 comments. The difference here is not as large as in the ASK category since 1500hrs follows closely with 3823 total comments.

Based on the data above, I would reccommend that a person who wants to post on Hacker News to do it at 1500hrs.

4. Number of points by hr of post creation¶

4.1 Number of points on ASK post¶

In [18]:

points_dic={}

for h in range(0,24):
    new_df=ask_df[ask_df['hour']==str(h)]
    sum_points=new_df['num_points'].sum()
    points_dic[h]=sum_points
    
         
items= points_dic.items()
points_list= list(items)
points_df=pd.DataFrame(points_list)
points_df.columns=['Hour','Total Points']
points_df.sort_values('Total Points',ascending= False, inplace=True)
points_df

Out[18]:

	Hour	Total Points
15	15	13991
13	13	7972
17	17	7236
18	18	6872
16	16	6088
14	14	5413
21	21	5051
19	19	4993
12	12	4713
20	20	4503
10	10	3800
22	22	3599
2	2	2945
11	11	2870
23	23	2852
0	0	2837
8	8	2756
1	1	2673
4	4	2667
3	3	2559
5	5	2047
7	7	2041
6	6	2038
9	9	1768

4.2 Number of points on SHOW posts¶

In [19]:

points_dic={}

for h in range(0,24):
    new_df=show_df[show_df['hour']==str(h)]
    sum_points=new_df['num_points'].sum()
    points_dic[h]=sum_points
    
         
items= points_dic.items()
points_list= list(items)
points_df=pd.DataFrame(points_list)
points_df.columns=['Hour','Total Points']
points_df.sort_values('Total Points',ascending= False, inplace=True)
points_df

Out[19]:

	Hour	Total Points
15	15	11653
16	16	11494
12	12	10790
17	17	10722
14	14	10511
13	13	10393
18	18	9951
19	19	8930
11	11	7743
20	20	6951
21	21	5995
23	23	5061
22	22	5036
8	8	4643
10	10	4307
0	0	4303
9	9	3764
7	7	3304
6	6	3072
1	1	2933
2	2	2765
4	4	2752
3	3	2168
5	5	1824

4.3 OBSERVATION ON THE NUMBER OF POINTS BY HOUR CREATED.¶

The hour that takes the lead in the points section, like in the comments, is 1500hrs. In the ASK posts, it leads with 13991 total number of points,again about *twice* the hour that follows and 11653 points in the SHOW category

5. Average comments per hour¶

5.1 Average comments per hour on ASK posts¶

In [20]:

comment_dic={}

for h in range(0,24):
    new_df=ask_df[ask_df['hour']==str(h)]
    sum_comment=new_df['num_comments'].sum()
    num_time=new_df.shape[0]
    comment_dic[h]=[sum_comment,num_time]
    
         
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
l=comment_df['Total Comments'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
    list_1.append(rows[0])
    list_2.append(rows[1])  
comment_df['Total_comments']=list_1
comment_df['Total_users']=list_2
comment_df['average_comments'] = comment_df['Total_comments'].astype(float)/ comment_df['Total_users'].astype(float)
comment_df=comment_df.sort_values('average_comments',ascending=False)
comment_df[['Hour',"Total Comments","average_comments"]]

Out[20]:

	Hour	Total Comments	average_comments
15	15	[18525, 651]	28.456221
13	13	[7242, 447]	16.201342
12	12	[4271, 350]	12.202857
2	2	[2997, 270]	11.100000
10	10	[3018, 287]	10.515679
4	4	[2372, 246]	9.642276
14	14	[4985, 523]	9.531549
17	17	[5629, 598]	9.413043
8	8	[2378, 260]	9.146154
11	11	[2798, 315]	8.882540
22	22	[3369, 383]	8.796345
5	5	[1838, 210]	8.752381
20	20	[4470, 516]	8.662791
21	21	[4500, 520]	8.653846
3	3	[2191, 274]	7.996350
16	16	[4601, 586]	7.851536
18	18	[4874, 626]	7.785942
0	0	[2265, 301]	7.524917
19	19	[4139, 562]	7.364769
1	1	[2091, 286]	7.311189
23	23	[2467, 348]	7.089080
7	7	[1585, 227]	6.982379
6	6	[1589, 238]	6.676471
9	9	[1477, 224]	6.593750

5.2 Average comments per hour on SHOW posts¶

In [21]:

comment_dic={}

for h in range(0,24):
    new_df=show_df[show_df['hour']==str(h)]
    sum_comment=new_df['num_comments'].sum()
    num_time=new_df.shape[0]
    comment_dic[h]=[sum_comment,num_time]
    
         
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
l=comment_df['Total Comments'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
    list_1.append(rows[0])
    list_2.append(rows[1])  
comment_df['Total_comments']=list_1
comment_df['Total_users']=list_2
comment_df['average_comments'] = comment_df['Total_comments'].astype(float)/ comment_df['Total_users'].astype(float)
comment_df=comment_df.sort_values('average_comments',ascending=False)
comment_df[['Hour',"Total Comments","average_comments"]]

Out[21]:

	Hour	Total Comments	average_comments
12	12	[3610, 519]	6.955684
7	7	[1576, 237]	6.649789
11	11	[2413, 403]	5.987593
8	8	[1770, 318]	5.566038
14	14	[3844, 700]	5.491429
13	13	[3314, 613]	5.406199
2	2	[1076, 210]	5.123810
4	4	[981, 196]	5.005102
19	19	[2791, 558]	5.001792
18	18	[3243, 661]	4.906203
6	6	[904, 193]	4.683938
16	16	[3771, 806]	4.678660
9	9	[1411, 303]	4.656766
0	0	[1284, 280]	4.585714
15	15	[3823, 836]	4.572967
3	3	[934, 206]	4.533981
23	23	[1443, 320]	4.509375
17	17	[3262, 764]	4.269634
20	20	[2183, 528]	4.134470
21	21	[1759, 432]	4.071759
1	1	[1006, 248]	4.056452
22	22	[1451, 380]	3.818421
10	10	[1228, 325]	3.778462
5	5	[591, 171]	3.456140

In [22]:

comment_dic={}

for h in range(0,24):
    new_df=show_df[show_df['hour']==str(h)]
    sum_comment=new_df['num_comments'].sum()
    num_time=new_df.shape[0]
    comment_dic[h]=[sum_comment,num_time]
    
         
items= comment_dic.items()
comment_list= list(items)
comment_df=pd.DataFrame(comment_list)
comment_df.columns=['Hour','Total Comments']
comment_df.sort_values('Total Comments',ascending= False, inplace=True)
l=comment_df['Total Comments'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
    list_1.append(rows[0])
    list_2.append(rows[1])  
comment_df['Total_comments']=list_1
comment_df['Total_users']=list_2
comment_df['average_comments'] = comment_df['Total_comments'].astype(float)/ comment_df['Total_users'].astype(float)
comment_df=comment_df.sort_values('average_comments',ascending=False)
comment_df[['Hour',"Total Comments","average_comments"]]

Out[22]:

	Hour	Total Comments	average_comments
12	12	[3610, 519]	6.955684
7	7	[1576, 237]	6.649789
11	11	[2413, 403]	5.987593
8	8	[1770, 318]	5.566038
14	14	[3844, 700]	5.491429
13	13	[3314, 613]	5.406199
2	2	[1076, 210]	5.123810
4	4	[981, 196]	5.005102
19	19	[2791, 558]	5.001792
18	18	[3243, 661]	4.906203
6	6	[904, 193]	4.683938
16	16	[3771, 806]	4.678660
9	9	[1411, 303]	4.656766
0	0	[1284, 280]	4.585714
15	15	[3823, 836]	4.572967
3	3	[934, 206]	4.533981
23	23	[1443, 320]	4.509375
17	17	[3262, 764]	4.269634
20	20	[2183, 528]	4.134470
21	21	[1759, 432]	4.071759
1	1	[1006, 248]	4.056452
22	22	[1451, 380]	3.818421
10	10	[1228, 325]	3.778462
5	5	[591, 171]	3.456140

6. Average points per hour¶

6.1 Average points per hour on ASK posts¶

In [23]:

point_dic={}

for h in range(0,24):
    new_df=ask_df[ask_df['hour']==str(h)]
    sum_point=new_df['num_points'].sum()
    num_time=new_df.shape[0]
    point_dic[h]=[sum_point,num_time]
    
         
items= point_dic.items()
point_list= list(items)
point_df=pd.DataFrame(point_list)
point_df.columns=['Hour','Total Points']
point_df.sort_values('Total Points',ascending= False, inplace=True)
l=point_df['Total Points'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
    list_1.append(rows[0])
    list_2.append(rows[1])  
point_df['Total_points']=list_1
point_df['Total_users']=list_2
point_df['average_points'] = point_df['Total_points'].astype(float)/ point_df['Total_users'].astype(float)
point_df=point_df.sort_values('average_points',ascending=False)
point_df[['Hour',"Total Points","average_points"]]
point_df

Out[23]:

	Hour	Total Points	Total_points	Total_users	average_points
15	15	[13991, 651]	13991	651	21.491551
13	13	[7972, 447]	7972	447	17.834452
12	12	[4713, 350]	4713	350	13.465714
10	10	[3800, 287]	3800	287	13.240418
17	17	[7236, 598]	7236	598	12.100334
18	18	[6872, 626]	6872	626	10.977636
2	2	[2945, 270]	2945	270	10.907407
4	4	[2667, 246]	2667	246	10.841463
8	8	[2756, 260]	2756	260	10.600000
16	16	[6088, 586]	6088	586	10.389078
14	14	[5413, 523]	5413	523	10.349904
5	5	[2047, 210]	2047	210	9.747619
21	21	[5051, 520]	5051	520	9.713462
0	0	[2837, 301]	2837	301	9.425249
22	22	[3599, 383]	3599	383	9.396867
1	1	[2673, 286]	2673	286	9.346154
3	3	[2559, 274]	2559	274	9.339416
11	11	[2870, 315]	2870	315	9.111111
7	7	[2041, 227]	2041	227	8.991189
19	19	[4993, 562]	4993	562	8.884342
20	20	[4503, 516]	4503	516	8.726744
6	6	[2038, 238]	2038	238	8.563025
23	23	[2852, 348]	2852	348	8.195402
9	9	[1768, 224]	1768	224	7.892857

6.2 Average points per hour on SHOW posts¶

In [24]:

point_dic={}

for h in range(0,24):
    new_df=show_df[show_df['hour']==str(h)]
    sum_point=new_df['num_points'].sum()
    num_time=new_df.shape[0]
    point_dic[h]=[sum_point,num_time]
    
         
items= point_dic.items()
point_list= list(items)
point_df=pd.DataFrame(point_list)
point_df.columns=['Hour','Total Points']
point_df.sort_values('Total Points',ascending= False, inplace=True)
l=point_df['Total Points'].tolist()
list_one=l
list_1=[]
list_2=[]
for rows in list_one:
    list_1.append(rows[0])
    list_2.append(rows[1])  
point_df['Total_points']=list_1
point_df['Total_users']=list_2
point_df['average_points'] = point_df['Total_points'].astype(float)/ point_df['Total_users'].astype(float)
point_df=point_df.sort_values('average_points',ascending=False)
point_df[['Hour',"Total Points","average_points"]]
point_df

Out[24]:

	Hour	Total Points	Total_points	Total_users	average_points
12	12	[10790, 519]	10790	519	20.789981
11	11	[7743, 403]	7743	403	19.213400
13	13	[10393, 613]	10393	613	16.954323
19	19	[8930, 558]	8930	558	16.003584
6	6	[3072, 193]	3072	193	15.917098
23	23	[5061, 320]	5061	320	15.815625
0	0	[4303, 280]	4303	280	15.367857
18	18	[9951, 661]	9951	661	15.054463
14	14	[10511, 700]	10511	700	15.015714
8	8	[4643, 318]	4643	318	14.600629
16	16	[11494, 806]	11494	806	14.260546
4	4	[2752, 196]	2752	196	14.040816
17	17	[10722, 764]	10722	764	14.034031
7	7	[3304, 237]	3304	237	13.940928
15	15	[11653, 836]	11653	836	13.938995
21	21	[5995, 432]	5995	432	13.877315
22	22	[5036, 380]	5036	380	13.252632
10	10	[4307, 325]	4307	325	13.252308
2	2	[2765, 210]	2765	210	13.166667
20	20	[6951, 528]	6951	528	13.164773
9	9	[3764, 303]	3764	303	12.422442
1	1	[2933, 248]	2933	248	11.826613
5	5	[1824, 171]	1824	171	10.666667
3	3	[2168, 206]	2168	206	10.524272

6.3 OBSERVATION ON THE AVERAGE NUMBER OF POINTS ON SHOW AND ASK POSTS BY CREATION HOUR¶

The hour with the highest average points on the ASK posts, is 1500hr with 21(rounded off) comments. This is interpreted as, for every post made in the 3:00P.M bracket, it is most likely to get 21 points.

In the SHOW posts category the leading hour is 12:00 at noon with an average of 20points.

7. Conclusion¶

In this analysis, we analysed a site called Hacker News. We only focused on posts that start with ASK HN and SHOW HN posts. Based on our analysis, posts starting with ASK HN received more points and more comments compared to SHOW HN. we can therefore conclude that the best topic to post about is one that starts with the words ASK HN and the best time to do it is 3:00pm. This topic and time has the highest number of comments and points and in turn higher renumerations.

In [ ]: