Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon.
In this project, I am going to analysis submission data and compare the number of comments for "Ask HN" and "Show HN" posts. Then, we will analysis if posts created at a certain time receive more comments on average. So that, in the end I can find out the best hour to create a post on Hacker News in Hong Kong in order to receive the most number of comments
from csv import reader
import datetime as dt
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:5])
[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[0])
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
title = row[1]
title = title.lower()
if title.startswith('ask hn') == True:
ask_posts.append(row)
elif title.startswith('show hn') == True:
show_posts.append(row)
else:
other_posts.append(row)
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
print(ask_posts[:5])
print(show_posts[:5])
1744 1162 17194 [['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']] [['10627194', 'Show HN: Wio Link ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'], ['12178806', 'Show HN: Webscope Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11'], ['10872799', 'Show HN: GeoScreenshot Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45']]
total_ask_comments = 0
for row in ask_posts:
num_comments = row[4]
num_comments = int(num_comments)
total_ask_comments += num_comments
avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)
total_show_comments = 0
for row in show_posts:
num_comments = row[4]
num_comments = int(num_comments)
total_show_comments += num_comments
avg_show_comments = total_show_comments / len(ask_posts)
print(avg_show_comments)
14.038417431192661 6.873853211009174
The average number of comments for ask posts is 14.0 and the average number of comments for show posts is 6.9. So, ask posts receive more comments on average.
result_list = []
for row in ask_posts:
created_at = row[6]
num_comments = int(row[4])
result_list.append([created_at, num_comments])
counts_by_hour = {}
comments_by_hour = {}
for row in result_list:
created_at = row[0]
created_at = dt.datetime.strptime(created_at, '%m/%d/%Y %H:%M')
created_at = created_at.strftime('%H')
if created_at not in counts_by_hour:
counts_by_hour[created_at] = 1
comments_by_hour[created_at] = row[1]
else:
counts_by_hour[created_at] += 1
comments_by_hour[created_at] += row[1]
avg_by_hour = []
for hour in comments_by_hour:
hour_avg = comments_by_hour[hour] / counts_by_hour[hour]
avg_by_hour.append([hour, hour_avg])
print(avg_by_hour)
[['09', 5.5777777777777775], ['13', 14.741176470588234], ['10', 13.440677966101696], ['14', 13.233644859813085], ['16', 16.796296296296298], ['23', 7.985294117647059], ['12', 9.41095890410959], ['17', 11.46], ['15', 38.5948275862069], ['21', 16.009174311926607], ['20', 21.525], ['02', 23.810344827586206], ['18', 13.20183486238532], ['03', 7.796296296296297], ['05', 10.08695652173913], ['19', 10.8], ['01', 11.383333333333333], ['22', 6.746478873239437], ['08', 10.25], ['04', 7.170212765957447], ['00', 8.127272727272727], ['06', 9.022727272727273], ['07', 7.852941176470588], ['11', 11.051724137931034]]
swap_avg_by_hour = []
for row in avg_by_hour:
swap_avg_by_hour.append([row[1], row[0]])
sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print('Top 5 Hours for Ask Posts Comments')
for row in sorted_swap[:5]:
hour = dt.datetime.strptime(row[1], '%H')
hour = hour.strftime('%H:00')
string = '{h}: {a:.2f} average comments per post'.format(h = hour, a = row[0])
print(string)
Top 5 Hours for Ask Posts Comments 15:00: 38.59 average comments per post 02:00: 23.81 average comments per post 20:00: 21.52 average comments per post 16:00: 16.80 average comments per post 21:00: 16.01 average comments per post
As Hong Kong is 13 hours ahead of US Eastern time. I am adding 13 hours to the top 5 hours to get the best time to create a post in Hong Kong.
for row in sorted_swap[:5]:
hour = dt.datetime.strptime(row[1], '%H')
hk_time = hour + dt.timedelta(hours = 13)
hk_time = hk_time.strftime('%H:00')
string = '{h}: {a:.2f} average comments per post'.format(h = hk_time, a = row[0])
print(string)
04:00: 38.59 average comments per post 15:00: 23.81 average comments per post 09:00: 21.52 average comments per post 05:00: 16.80 average comments per post 10:00: 16.01 average comments per post
In conclusion, to receive the most number of comments on Hacker News, one should create a post at 4:00 AM / HKT.
Since most people are asleep at 4 AM, the second and third best hours to post on Hacker News are 3:00 PM / HKT and 9:00 AM / HKT.