Hacker News is a website, similar to Reddit, where users submit posts in a community where others can vote and comment on your posts.
We are interested in comparing specific types of posts, specifically posts that start with:
Ask HN
: asks the Hacker News community specific questionsShow HN
: shows the Hacker News community a project, product, or something just generally interestingWe would like to answer questions like:
The dataset we are using comes from Link, which originally contained Hacker News posts from a 12 month period (until September 26, 2016), but for this exercise, it has been reduced to approximately 20,000 rows to isolate only the posts with comments and randomly sampled from the posts with comments.
The columns in the dataset are:
Index | Column name | Description |
---|---|---|
0 | id |
The unique identifer from Hacker News for the post |
1 | title |
Post title |
2 | url |
URL for post, if any |
3 | num_points |
Number of points the post acquired. Calculated as total number of upvotes minus total number of downvotes |
4 | num_comments |
Number of comments on post |
5 | author |
Username for the person who submitted the post |
6 | created_at |
Date and time of post submission (in the eastern US timezone) |
We'll begin the analysis by reading in the data.
# read in the csv
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
# separate the header from the dataset
headers = hn[0]
hn = hn[1:]
# Display header
print(headers)
# Display first 3 rows
print('\n')
for row in hn[:3]:
print(row)
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
We may want to display the first three rows of different datasets often, so let's make a function display_3_rows
to reduce complexity in our code moving forward.
def display_3_rows(dataset):
for row in dataset[:3]:
print(row)
We care about 3 types of posts: Ask HN
posts, Show HN
posts, and other. We need to cut the current dataset into three separate lists of lists to isolate the types of posts.
#Separate the hn dataset into separate lists of lists for the
#various post types (Ask HN, Show HN, and other types)
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
title = row[1]
title = title.lower() #to account for variable case
if title.startswith('ask hn'):
ask_posts.append(row)
elif title.startswith('show hn'):
show_posts.append(row)
else:
other_posts.append(row)
# Count number of posts per type
num_ask_posts = len(ask_posts)
num_show_posts = len(show_posts)
num_other_posts = len(other_posts)
#Print information on the separate posts and new lists of lists
print('There are {} Ask HN posts. Examples:'.format(num_ask_posts))
print(display_3_rows(ask_posts))
print('\n')
print('There are {} Show HN posts. Examples:'.format(num_show_posts))
print(display_3_rows(show_posts))
print('\n')
print('There are {} Other posts. Examples:'.format(num_other_posts))
print(display_3_rows(other_posts))
There are 1744 Ask HN posts. Examples: ['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'] ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'] ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'] None There are 1162 Show HN posts. Examples: ['10627194', 'Show HN: Wio Link ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'] ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'] ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'] None There are 17194 Other posts. Examples: ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'] None
Now that we have three separated datasets for each type of post, we can compare the number of comments for each post type. Since we have to do the same analysis for the three separate datasets, we'll write a function count_metric
that takes in a dataset, iterates through all the rows in the dataset to pull the number of comments OR posts, and takes the average of the number of comments OR posts for the number of posts of that type. The function count_metric
will print the average number of posts.
# This function can be used to count either comments OR points for a dataset
def count_metric(dataset, post_type_string, metric):
# accomodates either "comments" or "points" as key metric to analyze
if metric == "comments":
index = 4 #num_comments in index 4
elif metric == "points":
index = 3 #num_points in index 3
total_metric = 0
num_posts = 0
string_template_1 = 'Our dataset contains {m_total} {post_type} {m} over {p_total} posts.'
string_template_2 = 'There is an average of {number} {m} on {post_type} posts'
for row in dataset:
num_metric = int(row[index]) # need to convert string to int type
total_metric += num_metric
num_posts += 1
avg_metric = total_metric / num_posts
print(string_template_1.format(m_total = total_metric, post_type = post_type_string, p_total = num_posts, m=metric))
print(string_template_2.format(number=avg_metric, post_type = post_type_string, m=metric))
print('\n')
return total_metric, avg_metric
#Ask HN Posts
total_ask_comments, avg_ask_comments = count_metric(ask_posts, "Ask HN", "comments")
#Show HN Posts
total_show_comments, avg_show_comments = count_metric(show_posts, "Show HN", "comments")
#Other Posts
total_other_comments, avg_other_comments = count_metric(other_posts, "Other", "comments")
Our dataset contains 24483 Ask HN comments over 1744 posts. There is an average of 14.038417431192661 comments on Ask HN posts Our dataset contains 11988 Show HN comments over 1162 posts. There is an average of 10.31669535283993 comments on Show HN posts Our dataset contains 462055 Other comments over 17194 posts. There is an average of 26.8730371059672 comments on Other posts
We see that there are both more Ask HN
posts in quantity and each post has a greater average number of comments per posts relative to Show HN
posts. There is an average of 14 comments per Ask HN
posts vs an average of 10 comments per Show HN
posts.
It is also interesting to note that there are many more posts in the Other
category, and these types of posts generally have an average of 27 comments per post, which is greater than the average number of comments for either the Ask HN
and Show HN
posts.
We have generally found that Ask HN
posts generate more comments per post, on average. To dig deeper, we'd like to see if there is a certain time period within a day that attracts comments. Our strategy is to:
We'll make a function count_posts_hour
that will return dictionaries that contains the hour of the day as the key with:
counts_by_hour
, which contains the number of posts of that type created during each hour of the daymetric_by_hour
, which contains the corresponding number of comments OR points the type of posts created at each hour receivedSince it takes in any dataset, we can complete the same analysis for the 3 datasets we have for Ask HN
posts, Show HN
posts, and all other types.
We will look at the Ask HN
posts first.
import datetime as dt
# Creates counts for post/hr and comments OR points per hr with any dataset
def count_posts_hour(dataset, post_type, metric, show=False):
# Allows user to select metric of success as comments or points
if metric == "comments":
index = 4
elif metric == "points":
index = 3
# isolating the columns we need (time created and comments/posts)
result_list = []
for row in dataset:
created_at = row[6]
num_metric = int(row[index])
result_list.append([created_at, num_metric])
# populate metrics by hour in dictionaries via iterating through results_list
counts_by_hour = {}
metric_by_hour = {}
date_format = '%m/%d/%Y %H:%M' # example format for str: '11/22/2015 13:43'
for row in result_list:
temp_metric = row[1]
created_at_str = row[0]
created_at_dt = dt.datetime.strptime(created_at_str, date_format)
hour = created_at_dt.strftime('%H')
if hour not in counts_by_hour:
counts_by_hour[hour] = 1
metric_by_hour[hour] = temp_metric
else:
counts_by_hour[hour] += 1
metric_by_hour[hour] += temp_metric
# print summaries for posts/hr and points/comments per hr only if 'show'=True
# 'show' = False by default. This is primarily for demonstration.
if show == True:
print('For {} Posts:'.format(post_type))
print('Frequency table of posts per hour')
print(counts_by_hour)
print('\n')
print('Number of {} asks posts created at each hour received:'.format(metric))
print(metric_by_hour)
return counts_by_hour, metric_by_hour
ask_counts_by_hour, ask_comments_by_hour = count_posts_hour(ask_posts, 'Ask HN', 'comments', show=True)
For Ask HN Posts: Frequency table of posts per hour {'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58} Number of comments asks posts created at each hour received: {'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}
We will use the dictionaries created in combination to determine the average number of comments OR points for post generated each hour of the day. We'll write the code in a function average_metric_by_hour
to be able to do the analysis on various types of posts. This function takes in both dictionaries, calculates the average based on the matching key, and appends the pairing of the hour and its average into a new list that is returned.
# calculates average comments or posts per hour. Not ordered.
def average_metric_by_hour(counts_by_hour, metric_by_hour, show=False):
avg_by_hour = []
for key in counts_by_hour:
avg = (metric_by_hour[key]/counts_by_hour[key])
avg_by_hour.append([key, avg])
if show == True: # default, we do not need to show this, but display lines for demo
for row in avg_by_hour: #using a for loop for easier readability
print(row)
return avg_by_hour
In action, we look specifically at the Ask HN
posts:
# 'show' argument = True for the sake of demonstration
ask_avg_by_hour = average_metric_by_hour(ask_counts_by_hour, ask_comments_by_hour, show=True)
['09', 5.5777777777777775] ['13', 14.741176470588234] ['10', 13.440677966101696] ['14', 13.233644859813085] ['16', 16.796296296296298] ['23', 7.985294117647059] ['12', 9.41095890410959] ['17', 11.46] ['15', 38.5948275862069] ['21', 16.009174311926607] ['20', 21.525] ['02', 23.810344827586206] ['18', 13.20183486238532] ['03', 7.796296296296297] ['05', 10.08695652173913] ['19', 10.8] ['01', 11.383333333333333] ['22', 6.746478873239437] ['08', 10.25] ['04', 7.170212765957447] ['00', 8.127272727272727] ['06', 9.022727272727273] ['07', 7.852941176470588] ['11', 11.051724137931034]
We are able to list out the average number of comments for posts created during each hour of the day, but this format is difficult to read. For easier data analysis, we will sort the list of lists and print the 5 highest values.
# Builds on average_metric_by_hour by sorting to find top 5 and bottom 5 hours for success metric
def sort_avg_by_hour(avg_by_hour, post_type, metric):
swap_avg_by_hour = []
for row in avg_by_hour:
swap_avg_by_hour.append([row[1],row[0]])
# Sort in highest to lowest to find best hours
sorted_swap = sorted(swap_avg_by_hour, reverse=True)
print('Top 5 Hours for {} Posts {}'.format(post_type, metric))
show_sorted(sorted_swap, metric)
# Sort in lowest to highest to find worst hours
sorted_swap_bottom = sorted(swap_avg_by_hour, reverse=False)
print('\n')
print('Bottom 5 Hours for {} Posts {}'.format(post_type, metric))
show_sorted(sorted_swap_bottom, metric)
# Sub function to display first 5 rows in specified format
def show_sorted(sorted_swap, metric):
str_format = "{hr}: {avg:.2f} average {m} per post"
for row in sorted_swap[:5]:
hour = dt.datetime.strptime(row[1], '%H')
hour_str = hour.strftime("%H:%M")
average = row[0]
print(str_format.format(hr=hour_str, avg=average, m=metric))
sort_avg_by_hour(ask_avg_by_hour, 'Ask HN', 'comments')
Top 5 Hours for Ask HN Posts comments 15:00: 38.59 average comments per post 02:00: 23.81 average comments per post 20:00: 21.52 average comments per post 16:00: 16.80 average comments per post 21:00: 16.01 average comments per post Bottom 5 Hours for Ask HN Posts comments 09:00: 5.58 average comments per post 22:00: 6.75 average comments per post 04:00: 7.17 average comments per post 03:00: 7.80 average comments per post 07:00: 7.85 average comments per post
From our analysis, Ask HN
posts that were created at 3:00PM, 2:00AM, 8:00PM, 4:00PM, and 9:00PM (eastern US timezone) are the top 5 hours to create Ask HN Posts that on average generate the most comments.
We also see that Ask HN
posts that were created at 9:00AM, 10:00PM, 4:00AM, 3:00AM, and 7:00AM (eastern US timezone) are the bottom 5 hours to create Ask HN posts that on average do not generate a lot of comments.
With some exceptions, afternoon or evening hours seem like the best time to generate Ask HN
posts for more comment engagement.
We'd like to do the same analysis for Show HN
posts, and we can use the functions we wrote previously.
show_counts_by_hour, show_comments_by_hour = count_posts_hour(show_posts, 'Show HN', 'comments')
show_avg_by_hour = average_metric_by_hour(show_counts_by_hour, show_comments_by_hour)
sort_avg_by_hour(show_avg_by_hour, 'Show HN','comments')
Top 5 Hours for Show HN Posts comments 18:00: 15.77 average comments per post 00:00: 15.71 average comments per post 14:00: 13.44 average comments per post 23:00: 12.42 average comments per post 22:00: 12.39 average comments per post Bottom 5 Hours for Show HN Posts comments 05:00: 3.05 average comments per post 02:00: 4.23 average comments per post 08:00: 4.85 average comments per post 21:00: 5.79 average comments per post 15:00: 8.10 average comments per post
Show HN
posts that were created at 6:00PM, midnight, 2:00PM, 11:00PM, and 10:00PM EST on average generate more comments per post relative to the other hours of the day. Show HN
posts that were created at 5:00AM, 2:00AM, 8:00AM, 9:00PM, and 3:00PM on average generate the least comments per post relative to the other hours of the day. This is in agreement with what we found with the Ask HN
posts where afternoon and evening posts seem to attract the most comments. Let's look at Other
posts.
other_counts_by_hour, other_comments_by_hour = count_posts_hour(other_posts, 'Other', 'comments')
other_avg_by_hour = average_metric_by_hour(other_counts_by_hour, other_comments_by_hour)
sort_avg_by_hour(other_avg_by_hour, 'Other', 'comments')
Top 5 Hours for Other Posts comments 14:00: 32.33 average comments per post 13:00: 30.90 average comments per post 12:00: 30.35 average comments per post 11:00: 29.59 average comments per post 15:00: 29.52 average comments per post Bottom 5 Hours for Other Posts comments 06:00: 21.36 average comments per post 01:00: 23.07 average comments per post 20:00: 23.14 average comments per post 22:00: 23.27 average comments per post 21:00: 23.61 average comments per post
Other
posts generally agree with Ask HN
and Show HN
findings in that posts generated in afternoon hours tend to get more comments on average per post.
We can also look at the entire hn
dataset, which we had before we divvied up our data into separate lists of lists for the Ask HN
, Show HN
, and Other
category.
all_counts_by_hour, all_comments_by_hour = count_posts_hour(hn, 'all', 'comments', show=True)
all_avg_by_hour = average_metric_by_hour(all_counts_by_hour, all_comments_by_hour)
sort_avg_by_hour(all_avg_by_hour, 'All', 'comments')
For all Posts: Frequency table of posts per hour {'11': 762, '19': 1145, '22': 875, '00': 697, '04': 527, '09': 609, '16': 1302, '18': 1254, '14': 1151, '10': 686, '12': 923, '13': 1102, '20': 1051, '03': 488, '17': 1362, '01': 588, '23': 778, '08': 578, '02': 529, '21': 1030, '15': 1234, '06': 468, '07': 508, '05': 453} Number of comments asks posts created at each hour received: {'11': 20664, '19': 27894, '22': 18684, '00': 17478, '04': 11537, '09': 15274, '16': 30857, '18': 31587, '14': 33545, '10': 16818, '12': 25351, '13': 30562, '20': 23414, '03': 11626, '17': 34784, '01': 12465, '23': 17582, '08': 14062, '02': 13762, '21': 22652, '15': 35809, '06': 9253, '07': 12576, '05': 10290} Top 5 Hours for All Posts comments 14:00: 29.14 average comments per post 15:00: 29.02 average comments per post 13:00: 27.73 average comments per post 12:00: 27.47 average comments per post 11:00: 27.12 average comments per post Bottom 5 Hours for All Posts comments 06:00: 19.77 average comments per post 01:00: 21.20 average comments per post 22:00: 21.35 average comments per post 04:00: 21.89 average comments per post 21:00: 21.99 average comments per post
Looking at the entire hn
dataset, without differentiating by post type, it looks like posts generated in the late morning or mid-afternoon tend to attract the most comments, with the top 5 hours for posts born to generate comments being 2:00PM, 3:00PM, 1:00PM, 12:00PM, 11:00AM. This is very similar to what we found for the Other
posts, which makes sense since our entire hn
dataset is really dominated by Other
posts in number, as we've seen in the previous section.
The hour that is the most popular for generating any type of post is 5:00PM.
Similar to the analysis we did to compare the average number of comments per type of post, we can compare the average number of points per type of post. We created the functions above to be flexible and accomodate looking at either comments or points. We will use the same steps and functions to look at points.
#Ask HN Posts
total_ask_points, avg_ask_points = count_metric(ask_posts, "Ask HN", "points")
#Show HN Posts
total_show_points, avg_show_points = count_metric(show_posts, "Show HN", "points")
#Other Posts
total_other_points, avg_other_points = count_metric(other_posts, "Other", "points")
Our dataset contains 26268 Ask HN points over 1744 posts. There is an average of 15.061926605504587 points on Ask HN posts Our dataset contains 32019 Show HN points over 1162 posts. There is an average of 27.555077452667813 points on Show HN posts Our dataset contains 952664 Other points over 17194 posts. There is an average of 55.4067698034198 points on Other posts
When doing a similar analysis to compare which types of posts generate the most comments on average per post, we found that Ask HN
posts generated more comments, relative to Show HN
posts. Here, shifting our metric to look at points instead, we see instead that Show HN
posts have more points with an average of 28 points compared to Ask HN
posts with an average of 15 points.
We point out that the dataset has Other
posts that on average outperforms both Show HN
and Ask HN
posts in the average points generated per post.
Similar to the analysis we did earlier to determine if posts created at a certain time are more likely to receive more comments, we can look at the same dataset to determine if posts created at a certain time are more likely to receive more points. We created the functions above to be flexible and accomodate looking at either comments or points. We will use the same steps and functions to look at points.
ask_counts_by_hour, ask_points_by_hour = count_posts_hour(ask_posts, 'Ask HN', 'points')
ask_avg_by_hour_pt = average_metric_by_hour(ask_counts_by_hour, ask_points_by_hour)
sort_avg_by_hour(ask_avg_by_hour_pt, 'Ask HN', 'points')
Top 5 Hours for Ask HN Posts points 15:00: 29.99 average points per post 13:00: 24.26 average points per post 16:00: 23.35 average points per post 17:00: 19.41 average points per post 10:00: 18.68 average points per post Bottom 5 Hours for Ask HN Posts points 03:00: 6.93 average points per post 22:00: 7.20 average points per post 09:00: 7.31 average points per post 00:00: 8.20 average points per post 04:00: 8.28 average points per post
Similar to what we found for Ask HN
posts and comments, the best hour to create a post that is more likely to receive points AND comments is 3:00PM EST. Posts created in the afternoon seem likely to get more points, as the top 5 hours for creating posts for a higher average number of points per post are 3:00PM, 1:00PM, 4:00PM, 5:00PM, and 10:00AM.
In general, the hours in which post generation leads to less points per post on average are mostly in the morning, as the bottom 5 hours are 9:00AM, 10:00PM, 4:00AM, 3:00AM, and 7:00AM EST.
show_counts_by_hour, show_points_by_hour = count_posts_hour(show_posts, 'Show HN', 'points')
show_avg_by_hour_pt = average_metric_by_hour(show_counts_by_hour, show_points_by_hour)
sort_avg_by_hour(show_avg_by_hour_pt, 'Show HN', 'points')
Top 5 Hours for Show HN Posts points 23:00: 42.39 average points per post 12:00: 41.69 average points per post 22:00: 40.35 average points per post 00:00: 37.84 average points per post 18:00: 36.31 average points per post Bottom 5 Hours for Show HN Posts points 05:00: 5.47 average points per post 02:00: 11.33 average points per post 04:00: 14.85 average points per post 08:00: 15.26 average points per post 21:00: 18.43 average points per post
Similar to what we found for Show HN
posts and comments, posts created in the evening seem likely to get more points, as the top 5 hours for creating posts for a higher average number of points per post are 11:00PM, 12:00PM, 10:00PM, 12:00AM, and 6:00PM EST.
In general, the hours in which post generation leads to less points per post on average are mostly in the morning (also similar to what we found for comments), as the bottom 5 hours are 5:00AM, 2:00AM, 4:00AM, 8:00AM, and 9:00PM EST.
other_counts_by_hour, other_points_by_hour = count_posts_hour(other_posts, 'Other', 'points')
other_avg_by_hour_pt = average_metric_by_hour(other_counts_by_hour, other_points_by_hour)
sort_avg_by_hour(other_avg_by_hour_pt, 'Other', 'points')
Top 5 Hours for Other Posts points 13:00: 62.53 average points per post 14:00: 61.79 average points per post 15:00: 60.54 average points per post 10:00: 60.48 average points per post 19:00: 60.01 average points per post Bottom 5 Hours for Other Posts points 20:00: 45.24 average points per post 06:00: 46.24 average points per post 21:00: 49.37 average points per post 04:00: 49.67 average points per post 05:00: 49.97 average points per post
Generally, for Other
posts we first notice that the bottom 5 hours for creating posts to generate comments still have a very high average number of points per post, as the averages are all >45 points. This exceeds the average points per post for both the top hour for Show HN
posts and Ask HN
posts.
We also generally see that the afternoon or evening hours to generate posts in the Other
category are best for getting points.
all_counts_by_hour, all_points_by_hour = count_posts_hour(hn, 'All', 'points')
all_avg_by_hour_pt = average_metric_by_hour(all_counts_by_hour, all_points_by_hour)
sort_avg_by_hour(all_avg_by_hour_pt, 'All', 'points')
Top 5 Hours for All Posts points 13:00: 56.17 average points per post 15:00: 55.65 average points per post 10:00: 54.71 average points per post 14:00: 54.44 average points per post 19:00: 54.17 average points per post Bottom 5 Hours for All Posts points 20:00: 42.04 average points per post 06:00: 42.37 average points per post 05:00: 44.25 average points per post 04:00: 44.26 average points per post 21:00: 44.40 average points per post
Again, like with what we found with comments, we see that the behavior for ALL posts are similar to the behavior of Other
posts, likely because the dataset is dominated by Other
posts. We generally see that the afternoon or evening hours to generate any posts are best for getting points. The average points per posts are lowered than what we saw when we did the same analysis for the Other
posts because we introduce the lower point generating posts Show HN
and Ask HN
to the dataset for all posts.
Closing out this segment of analysis, we end by acknowledging that generally we see that we get similar "top" hours by either looking at comments or posts for whatever post type. This likely means that we can use either comments or posts as metrics for engagement similarly.
On average, there are more comments Ask HN
posts relative to Show HN
posts. On average, there are more points on Show HN
posts relative to Ask HN
posts. The posts categorized as Other
in our dataset surpass both Ask HN
and Show HN
in terms of both average number of comments AND posts.
In general, we see that the top hours for creating a post that will likely get more comments or posts are in the afternoon or evening hours in the EST timezone. Comments or posts both seem to give similar results, so they seem like similarly adequate metrics to gauge post engagement.
While we compared specific posts Show HN
and Ask HN
posts relative to each other, the posts in the Other
category on average had more comments and posts to both the Show HN
and Ask HN
posts. If we want to dig deeper into what characteristics of posts generally generate more engagement through comments or points, we should try to segment the post types into more categories beyond what we did for this analysis.
There appears to be the most post creation and engagement with posts is in the afternoon or evening in the EST timezone, which may reveal a little about our users. In the future, if we need to make announcements or service users better, we may be more successful if we plan for these hours, where our site sees more activity.