In this project I will be exploring a dataset of submissions from the popular technology site Hacker News
A popular feature of Hacker News is to ask questions to other users of Hacker News. These are ask posts.
I will be looking to determine the best time to post questions based on the average number of comments on posts at different times of day.
See below the data dictionary for our dataset:
Firstly I will read the dataset into a set of lists.
import csv
# Read dataset into a list of list and assign to the variable hn
file = open('hacker_news.csv')
read_file = csv.reader(file)
hn = list(read_file)
# Just the columns
header = hn[0]
# Remove columns from dataset
hn = hn[1:]
# Show first 5 rows
hn[:5]
# Show column names
print('Column headings:')
header
Column headings:
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
Our dataset is now available in the variable hn. We can see the columns above.
Identify ask posts.
Firstly, I will sperate the different types of posts. This involves seperating the rows into 3 spearate lists; 'ask_posts', 'show_posts' and 'other_posts'.
# Create empty lists
ask_posts = []
show_posts = []
other_posts = []
# Fill empty lists with correct rows
for rows in hn:
title = rows[1].lower()
if title.startswith('ask hn'):
ask_posts.append(rows)
elif title.startswith('show hn'):
show_posts.append(rows)
else:
other_posts.append(rows)
# Show number of rows in each list
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
1744 1162 17194
We now have three lists. There are 1744 rows in the asks_posts list and 1162 rows in the show_posts list.
Find average number of comments per post.
Next I am going to see the average number of comments for the ask posts and the show posts.
# Loop through ask posts and find average number of comments.
total_ask_comments = 0
for rows in ask_posts:
total_ask_comments += int(rows[4])
average_ask_comments = total_ask_comments / len(ask_posts)
# Loop through show posts and find the average number of comments.
total_show_comments = 0
for rows in show_posts:
total_show_comments += int(rows[4])
average_show_comments = total_show_comments / len(show_posts)
#Show results
print(f'Ask Comments - {average_ask_comments}')
print(f'Show Comments- {average_show_comments}')
Ask Comments - 14.038417431192661 Show Comments- 10.31669535283993
We can see that on average the ask posts recieve just less than 4 more comments for each post than the show posts.
Calculate number of posts and comments at different times of day.
Next let's have a look at the most popular times for ask posts and the times that recieve the most comments.
# Import datetime module
import datetime as dt
# Loop through ask posts and find the number of comments and the time the post was created
result_list = []
for rows in ask_posts:
new_list = []
created_at = rows[6]
num_comments = int(rows[4])
new_list.append(created_at)
new_list.append(num_comments)
result_list.append(new_list)
counts_by_hour = {}
comments_by_hour = {}
# Calculate the number of comments for each hour.
for rows in result_list:
time= rows[0]
comments = int(rows[1])
time = dt.datetime.strptime(time, '%m/%d/%Y %H:%M')
hour = time.hour
if hour in counts_by_hour:
counts_by_hour[hour] += 1
comments_by_hour[hour] += comments
elif hour not in counts_by_hour:
counts_by_hour[hour] = 1
comments_by_hour[hour] = comments
# Show sorted results
print('Number of posts per hour')
print(sorted(counts_by_hour.items()))
print('\n')
print('Number of comments per hour')
print(sorted(comments_by_hour.items()))
Number of posts per hour [(0, 55), (1, 60), (2, 58), (3, 54), (4, 47), (5, 46), (6, 44), (7, 34), (8, 48), (9, 45), (10, 59), (11, 58), (12, 73), (13, 85), (14, 107), (15, 116), (16, 108), (17, 100), (18, 109), (19, 110), (20, 80), (21, 109), (22, 71), (23, 68)] Number of comments per hour [(0, 447), (1, 683), (2, 1381), (3, 421), (4, 337), (5, 464), (6, 397), (7, 267), (8, 492), (9, 251), (10, 793), (11, 641), (12, 687), (13, 1253), (14, 1416), (15, 4477), (16, 1814), (17, 1146), (18, 1439), (19, 1188), (20, 1722), (21, 1745), (22, 479), (23, 543)]
From the data here we can see that from 2-9pm are the most common times for new posts and that these times also recieve the most comments. However, that does not mean that this is the best time of day to post. Instead we need to find the average number of comments for each post.
Find average number of comments at each hour of the day.
Now lets find the average number of comments on each post in for each hour of the day.
# Loop through comments per hour and find average number of comments on each post.
average_ask_posts = []
for key, value in comments_by_hour.items():
new_list = []
new_list.append(key)
new_list.append(value / counts_by_hour[key])
average_ask_posts.append(new_list)
avg_by_hour = sorted(average_ask_posts)
# Swap values in list and sort to make the data easier to read
swap_avg_by_hour = []
for value in avg_by_hour:
new_list = []
new_list.append(value[1])
new_list.append(value[0])
swap_avg_by_hour.append(new_list)
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
# Present 5 times with highest number of average comments
print('Top 5 hours for Ask Posts Comments\n')
for values in sorted_swap[0:5]:
time = dt.time(hour=values[1])
hour = time.strftime('%H')
print(f'{hour}:00 - {values[0]:.2f} average comments per post.')
Top 5 hours for Ask Posts Comments 15:00 - 38.59 average comments per post. 02:00 - 23.81 average comments per post. 20:00 - 21.52 average comments per post. 16:00 - 16.80 average comments per post. 21:00 - 16.01 average comments per post.
Conclusion
So there we have it. If you are going to ask a question on Hacker News the best time to do it is at 15:00ET.