Notebook

Exploring Hacker News Posts¶

In this project we will be exploring and comparing the 'Ask HN'(Hacker News) and 'Show HN' posts to answer the following questions:
Do Ask HN or Show HN posts receive more comments on average?
Do posts created at a certain time receive more comments on average?

Importing and Testing the Data Set:¶

In [46]:

import csv
import datetime as dt

open_file = open('hacker_news.csv')
read_file = csv.reader(open_file)
hacker_raw_list = list(read_file)

headers = hacker_raw_list[0]

hn = hacker_raw_list[1:]

Testing the correct list of lists extraction of 'hacker_news.csv' by callig the header and first 5 rows seperately

In [20]:

print(headers)
print("\n")
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]

Seperating Ask HN and Show HN posts:¶

In this section we will be seperating posts beginning with Ask HN and Show HN into two different lists
The third list other_posts will contain posts that do not fit into the above categories

In [30]:

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row [1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
    

Checking the amount of posts inside each category

In [32]:

print("Ask HN: " +str(len(ask_posts)))
print("Show HN: "+ str(len(show_posts)))
print("Other Posts: "+ str(len(other_posts)))

Ask HN: 1744
Show HN: 1162
Other Posts: 17194

In [34]:

print(show_posts[:5])

[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'], ['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11'], ['10872799', 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45']]

Average Ask vs Average Show comments:¶

In this section we determine if Show HN or Ask HN receive more momments per post
We then average both values and compare the average Ask and average Show comments

In [48]:

total_ask_comments = 0

def get_total_comments(dataset,column_index):
    total_comments= 0 
    for row in dataset:
        num_comments= row[column_index]
        num_comments= int(num_comments)
        total_comments += num_comments
    return total_comments

avg_ask_comments = get_total_comments(ask_posts,4)/len(ask_posts)
avg_ask_comments = round(avg_ask_comments,2)
print("Average Ask Comments: "+ str(avg_ask_comments))

avg_show_comments= get_total_comments(show_posts,4)/len(show_posts)
avg_show_comments = round(avg_show_comments,2)
print("Average Show Comments: "+str(avg_show_comments))
#PBZ

Average Ask Comments: 14.04
Average Show Comments: 10.32

The above analysis shows that the "Ask" Posts receive a larger average comments ratio compared to the "Show" posts.

Average Posts by Time of Day:¶

In this section we will explore the following:
1. Amount of Ask posts created in each hour of the day and the number of comments it received
1. Average number of comments that ask posts receive by hour created

In [106]:

result_list = []
for row in ask_posts:
    created_at = row[6]
    comments_pp = row[4]
    comments_pp = int(comments_pp)
    append_list = [created_at,comments_pp]
    result_list.append(append_list)

counts_by_hour = {}
comments_by_hour = {}    

#print(result_list[:2])

for row in result_list:
    post_time = row[0]
    post_time = dt.datetime.strptime(post_time,"%m/%d/%Y %H:%M")
    comments_pp= row[1]
    # Alternative to selecting the hour is post_time.hour -> returns an hour in int form
    #print(post_time.hour)
    post_time = post_time.strftime("%H")
    #print(post_time)
    if post_time not in counts_by_hour:
        counts_by_hour[post_time]=1
        comments_by_hour[post_time] = comments_pp
    elif post_time in counts_by_hour:
        counts_by_hour[post_time] += 1
        comments_by_hour[post_time] += comments_pp
        
print("Posts per hour: "+str(counts_by_hour))
print("\n")
print("Comments per post every hour: "+str(comments_by_hour))
    

Posts per hour: {'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}


Comments per post every hour: {'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}

Using the above data we can now calculate the average amount of comments per post every hour

In [110]:

avg_by_hour = []

for hour in counts_by_hour:
    avg_postings = comments_by_hour[hour]/counts_by_hour[hour]
    avg_postings = round(avg_postings,2)
    avg_by_hour.append([hour,avg_postings])

print(avg_by_hour)    
    

[['09', 5.58], ['13', 14.74], ['10', 13.44], ['14', 13.23], ['16', 16.8], ['23', 7.99], ['12', 9.41], ['17', 11.46], ['15', 38.59], ['21', 16.01], ['20', 21.52], ['02', 23.81], ['18', 13.2], ['03', 7.8], ['05', 10.09], ['19', 10.8], ['01', 11.38], ['22', 6.75], ['08', 10.25], ['04', 7.17], ['00', 8.13], ['06', 9.02], ['07', 7.85], ['11', 11.05]]

In the next cell we will be sorting the list by the hours with the top most average posts
From this list we will display the top 5 results

In [112]:

swap_avg_by_hour = []

for row in avg_by_hour:
    swap = [row[1],row[0]]
    swap_avg_by_hour.append(swap)

[[5.58, '09'], [14.74, '13'], [13.44, '10'], [13.23, '14'], [16.8, '16'], [7.99, '23'], [9.41, '12'], [11.46, '17'], [38.59, '15'], [16.01, '21'], [21.52, '20'], [23.81, '02'], [13.2, '18'], [7.8, '03'], [10.09, '05'], [10.8, '19'], [11.38, '01'], [6.75, '22'], [10.25, '08'], [7.17, '04'], [8.13, '00'], [9.02, '06'], [7.85, '07'], [11.05, '11']]

In [116]:

sorted_sawp = sorted(swap_avg_by_hour,reverse=True)
print(sorted_sawp) 

[[38.59, '15'], [23.81, '02'], [21.52, '20'], [16.8, '16'], [16.01, '21'], [14.74, '13'], [13.44, '10'], [13.23, '14'], [13.2, '18'], [11.46, '17'], [11.38, '01'], [11.05, '11'], [10.8, '19'], [10.25, '08'], [10.09, '05'], [9.41, '12'], [9.02, '06'], [8.13, '00'], [7.99, '23'], [7.85, '07'], [7.8, '03'], [7.17, '04'], [6.75, '22'], [5.58, '09']]

In [142]:

print("Top 5 Hours for Ask Posts Comments")
print("\n")


for row in sorted_sawp[:5]:
    template = "{hour}: {avg_comments:.0f} average comments per post."
    hour_cm= row[1]
    #print(hour_cm)
    template = template.format(hour = hour_cm, avg_comments = row[0])
    print(template)
    

Top 5 Hours for Ask Posts Comments


15: 39 average comments per post.
02: 24 average comments per post.
20: 22 average comments per post.
16: 17 average comments per post.
21: 16 average comments per post.

Conclusion:¶

Creating a post at 3 p. will yield the highest return on commets per post
Posts in the afternoon and evening between 8-9 pm and at 4 p.m will yield also yiel a high post ritio
There is an outlier at 2 a.m this could be due to members posting in the forum who come from a different time zone.