Notebook

Guided Project: Exploring Hacker News Posts¶

In this project, we'll work with a dataset of submissions to a popular technology site called Hacker News. Hacker News is extremely popular in the technology sector. Hacker News is a community based forum website similar to reddit. In this project we're specifically interested in posts with titles that begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Users also submit Show HN posts to show the Hacker News community a project, product, or just something interesting.¶

We'll compare these two types of posts to determine the following:¶

Do Ask HN or Show HN receive more comments on average?
Do posts created at a certain time receive more comments on average?

Let's start by importing the libraries I will need and reading the dataset into a list of lists.¶

In [81]:

from csv import reader 
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
hn_header = hn[0]

print(hn[0:5])
print()
print (hn_header)

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

Removing Headers from a list of lists:¶

In [82]:

hn = hn[1:]
print (hn_header)
print ()
print (hn [0:5]) 

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]

Extracting Ask HN and Show HN posts:¶

In [136]:

ask_posts = []
show_posts = []
other_posts = []

for row in hn: 
    title = row[1]
    
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    
    else:
        other_posts.append(row)
        
        
# printed a sample of each group 

print (ask_posts[0:3])
print ()
print ()
print (show_posts[0:3])
print ()
print ()
print (other_posts[0:3])

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']]


[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']]


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']]

In [84]:

#printing length for each type of post

print ('Ask Posts Total:',len(ask_posts))        
print ()
print ('Show Posts Total:',len(show_posts)) 
print ()
print('Other Posts Total:',len(other_posts))

Ask Posts Total: 1744

Show Posts Total: 1162

Other Posts Total: 17194

Calculating the Average Number of Comments for Ask HN and Show HN Posts¶

In [85]:

total_ask_comments = [0]

for row in ask_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_ask_comments.append(num_comments)
    
    
print(total_ask_comments[0:20])

[0, 6, 29, 1, 3, 17, 1, 4, 1, 1, 2, 7, 1, 1, 4, 4, 2, 3, 1, 22]

In [86]:

avg_ask_comments = sum(total_ask_comments)/ len(total_ask_comments)

print(avg_ask_comments)
print(round(avg_ask_comments))

14.030372492836676
14

In [87]:

total_show_comments = [0]

for row in show_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_show_comments.append(num_comments)
    
    
print(total_show_comments[0:20])

[0, 22, 102, 1, 3, 9, 3, 1, 1, 1, 2, 3, 1, 4, 3, 1, 3, 2, 1, 2]

In [88]:

avg_show_comments = sum(total_show_comments)/ len(total_show_comments)

print(avg_show_comments)
print(round(avg_show_comments))

10.307824591573517
10

In [89]:

print('Average Ask Comments:',round(avg_ask_comments))
print('Average Show Comments:',round(avg_show_comments))

Average Ask Comments: 14
Average Show Comments: 10

The average Ask post comments is 14. The average of Show post comments is 10. It appears more people respond to ask posts then show posts.

Finding the Number of Ask Posts and Comments by Hour Created¶

Since ask posts are more likely to receive comments, I'll focus the remaining analysis just on these posts. I'll determine if ask posts created at a certain time are more likely to attract comments.

In [90]:

print(hn_header)
print(ask_posts[0:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']]

In [91]:

import datetime as dt

result_list = []

for row in ask_posts: 
    created_at = row[6]
    num_comments = row[4]
    num_comments = int(num_comments)
    result_list.append([created_at, num_comments])

print(result_list[0:20])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3], ['10/15/2015 16:38', 17], ['9/26/2015 23:23', 1], ['4/22/2016 12:24', 4], ['11/16/2015 9:22', 1], ['2/24/2016 17:57', 1], ['6/4/2016 17:17', 2], ['9/19/2015 17:04', 7], ['9/22/2015 13:16', 1], ['6/21/2016 15:45', 1], ['1/13/2016 21:17', 4], ['10/4/2015 21:27', 4], ['1/25/2016 20:27', 2], ['10/27/2015 2:47', 3], ['1/19/2016 12:01', 1], ['3/22/2016 2:05', 22], ['9/8/2015 14:04', 2]]

Above I have seperated the number of comments per date and printed a sample to see the result of how the data looks cleaned

Below I seperated the amount of comments per hour in a 24 hour time period.

In [92]:

import datetime as dt

counts_by_hour = {}
comments_by_hour = {}

for row in result_list: 
    hour_date = row[0]
    comment_date = row[1]
    date_object = "%m/%d/%Y %H:%M"
    hour_post = dt.datetime.strptime(hour_date, date_object).strftime("%H")
    
    if hour_post not in counts_by_hour: 
        counts_by_hour[hour_post] = 1
        comments_by_hour[hour_post] = comment_date
    else: 
        counts_by_hour[hour_post] += 1 
        comments_by_hour[hour_post] += comment_date 
    
display(counts_by_hour)  
print()
display(comments_by_hour)

{'09': 45,
 '13': 85,
 '10': 59,
 '14': 107,
 '16': 108,
 '23': 68,
 '12': 73,
 '17': 100,
 '15': 116,
 '21': 109,
 '20': 80,
 '02': 58,
 '18': 109,
 '03': 54,
 '05': 46,
 '19': 110,
 '01': 60,
 '22': 71,
 '08': 48,
 '04': 47,
 '00': 55,
 '06': 44,
 '07': 34,
 '11': 58}

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

Calculating the Average Number of Comments for Ask HN Posts by Hour¶

In [93]:

avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])
    
display(avg_by_hour)    

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

Sorting and Printing Values from a List of Lists¶

In [138]:

swap_avg_by_hour = []

for column in avg_by_hour: 
    columns_a = column[1]
    columns_b = column[0]
    swap_avg_by_hour.append([columns_a, columns_b])
    
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

display (sorted_swap)

[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]

Printing the "Top 5 hours for Ask Posts Comments:¶

In [95]:

print ('Top 5 Hours for Ask Posts Comments:', sorted_swap[0:5])

Top 5 Hours for Ask Posts Comments: [[38.5948275862069, '15'], [23.810344827586206, '02'], [21.525, '20'], [16.796296296296298, '16'], [16.009174311926607, '21']]

In [139]:

#converted to a 12 hour format.

for time in sorted_swap[0:5]:
    display_format = "{hour}: {five_comments:.2f} average comments per post"
    five_comments = time[0]
    sorted_five = dt.datetime.strptime(time[1],"%H").strftime("%I:%M %p")
    print (display_format.format(hour= sorted_five, five_comments = five_comments))
    

03:00 PM: 38.59 average comments per post
02:00 AM: 23.81 average comments per post
08:00 PM: 21.52 average comments per post
04:00 PM: 16.80 average comments per post
09:00 PM: 16.01 average comments per post

The most comments for Ask posts seem to occur during the afternoon around 3-4 pm. The data set is already in eastern time zone which is my local time zone. In conclusion the best time to receive a comment to your post is around 3-4pm and 8-9pm as a second place alternative.¶

Continuation of additional tasks:¶

1 )Determine if show or ask posts receive more points on average.¶

2 )Determine if posts created at a certain time are more likely to receive more points.¶

3) Compare your results to the average number of comments and points other posts receive.¶

We will start with the first additional question of determining if show or ask posts receive more points on average.¶

In [141]:

print (hn_header)
print ()
print (ask_posts[0:4])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20']]

In [98]:

total_ask_points = [0]

for row in ask_posts:
    num_points = row[3]
    num_points = int(num_points)
    total_ask_points.append(num_points)
    
avg_ask_points = sum(total_ask_points)/ len(total_ask_points)

display (round(avg_ask_points))

In [143]:

print (hn_header)
print ()
print (show_posts[0:4])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'], ['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11']]

In [100]:

total_show_points = [0]

for row in show_posts:
    num_comments_points = row[3]
    num_comments_points = int(num_comments_points)
    total_show_points.append(num_comments_points)
    
avg_show_points = sum(total_show_points)/ len(total_show_points)

display (round(avg_show_points))

In [101]:

print ("Average Ask Points:", (round(avg_ask_points)))
print ("Average Show Points:", (round(avg_show_points)))

Average Ask Points: 15
Average Show Points: 28

According to the above information. `Show Posts` obtain more points then `Ask Posts`.¶

Average Ask Points: 15¶
Average Show Points: 28¶

In this part we will determine if a certain time receives more points. I will break it down for each show and ask points.¶

First we will start with Ask posts:¶

In [102]:

time_ask_posts = []

for row in ask_posts: 
    time_ask = row[6]
    comments_ask = row[3]
    comments_ask = int(comments_ask)
    time_ask_posts.append([time_ask,comments_ask])
    
print (time_ask_posts[0:15])    

[['8/16/2016 9:55', 2], ['11/22/2015 13:43', 28], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 1], ['10/15/2015 16:38', 28], ['9/26/2015 23:23', 2], ['4/22/2016 12:24', 4], ['11/16/2015 9:22', 2], ['2/24/2016 17:57', 2], ['6/4/2016 17:17', 3], ['9/19/2015 17:04', 11], ['9/22/2015 13:16', 1], ['6/21/2016 15:45', 1], ['1/13/2016 21:17', 8], ['10/4/2015 21:27', 3]]

Above I have seperated the number of comments per date and printed a sample to see the result of how the data looks cleaned

Below I seperated the amount of comments per hour in a 24 hour time period.

In [103]:

import datetime as dt

ask_date_time = {}
ask_point_count = {}

for row in time_ask_posts: 
    adt_a = row[0]
    apc_b = row[1]
    points_date_object = "%m/%d/%Y %H:%M"
    hour_points = dt.datetime.strptime(adt_a, points_date_object).strftime("%H")
    
    if hour_points not in ask_date_time: 
        ask_date_time[hour_points] = 1
        ask_point_count[hour_points] = apc_b 
    else: 
        ask_date_time[hour_points] += 1
        ask_point_count[hour_points] += apc_b
        
display (ask_date_time)
display (ask_point_count)
    

{'09': 45,
 '13': 85,
 '10': 59,
 '14': 107,
 '16': 108,
 '23': 68,
 '12': 73,
 '17': 100,
 '15': 116,
 '21': 109,
 '20': 80,
 '02': 58,
 '18': 109,
 '03': 54,
 '05': 46,
 '19': 110,
 '01': 60,
 '22': 71,
 '08': 48,
 '04': 47,
 '00': 55,
 '06': 44,
 '07': 34,
 '11': 58}

{'09': 329,
 '13': 2062,
 '10': 1102,
 '14': 1282,
 '16': 2522,
 '23': 581,
 '12': 782,
 '17': 1941,
 '15': 3479,
 '21': 1721,
 '20': 1151,
 '02': 793,
 '18': 1741,
 '03': 374,
 '05': 552,
 '19': 1513,
 '01': 700,
 '22': 511,
 '08': 515,
 '04': 389,
 '00': 451,
 '06': 591,
 '07': 361,
 '11': 825}

Avg of Ask points per hour:¶

In [104]:

avg_ask_points = []

for hour in ask_date_time:
    avg_ask_points.append([hour, ask_point_count[hour]/ask_date_time[hour]])
    
display(avg_ask_points)    

[['09', 7.311111111111111],
 ['13', 24.258823529411764],
 ['10', 18.677966101694917],
 ['14', 11.981308411214954],
 ['16', 23.35185185185185],
 ['23', 8.544117647058824],
 ['12', 10.712328767123287],
 ['17', 19.41],
 ['15', 29.99137931034483],
 ['21', 15.788990825688073],
 ['20', 14.3875],
 ['02', 13.672413793103448],
 ['18', 15.972477064220184],
 ['03', 6.925925925925926],
 ['05', 12.0],
 ['19', 13.754545454545454],
 ['01', 11.666666666666666],
 ['22', 7.197183098591549],
 ['08', 10.729166666666666],
 ['04', 8.27659574468085],
 ['00', 8.2],
 ['06', 13.431818181818182],
 ['07', 10.617647058823529],
 ['11', 14.224137931034482]]

Swapping the time and average column. Also rounding the average amount of points.¶

In [146]:

avg_ask_points_swap = []

for column in avg_ask_points: 
    asp_a = column[0]
    asp_b = round(column[1])
    avg_ask_points_swap.append([asp_b, asp_a])
    
avg_ask_points_swap = sorted(avg_ask_points_swap, reverse = True)    
display (avg_ask_points_swap)

[[30, '15'],
 [24, '13'],
 [23, '16'],
 [19, '17'],
 [19, '10'],
 [16, '21'],
 [16, '18'],
 [14, '20'],
 [14, '19'],
 [14, '11'],
 [14, '02'],
 [13, '06'],
 [12, '14'],
 [12, '05'],
 [12, '01'],
 [11, '12'],
 [11, '08'],
 [11, '07'],
 [9, '23'],
 [8, '04'],
 [8, '00'],
 [7, '22'],
 [7, '09'],
 [7, '03']]

Top 5 hours for the most amount of points.¶

In [107]:

for time in avg_ask_points_swap[0:5]:
    display_format = "{hour}: {five_points:.2f} average points per post"
    five_points = time[0]
    points_five = dt.datetime.strptime(time[1],"%H").strftime("%I:%M %p")
    print (display_format.format(hour= points_five,  five_points= five_points))

03:00 PM: 30.00 average points per post
01:00 PM: 24.00 average points per post
04:00 PM: 23.00 average points per post
05:00 PM: 19.00 average points per post
10:00 AM: 19.00 average points per post

Here we can see that the highest amount of points received is `30 points at 3pm` timeframe for the `Ask posts`. The most points received for `Ask posts` seem to be around `1-5pm eastern standard time`. All hours in this range are in the top 5 except 2pm.¶

I have already determined that `Show Posts` receive more points then `Ask Posts`. I will continue now to clean the 'Show posts' to see what are the top 5 hours in the day the `Show Posts` receive the most points.¶

Since the code will basically be an exact copy of the code ran for `Ask posts` above. I will copy and paste and run it all in fewer cells.¶

In [108]:

time_show_posts = []

for row in show_posts: 
    time_show = row[6]
    comments_show = row[3]
    comments_show = int(comments_show)
    time_show_posts.append([time_show,comments_show])
    
print (time_show_posts[0:15])    

[['11/25/2015 14:03', 26], ['11/29/2015 22:46', 747], ['4/28/2016 18:05', 1], ['7/28/2016 7:11', 3], ['1/9/2016 20:45', 1], ['3/7/2016 5:17', 3], ['11/20/2015 20:23', 4], ['3/27/2016 16:19', 8], ['9/26/2015 19:02', 6], ['8/9/2016 16:11', 2], ['9/11/2015 18:32', 18], ['6/6/2016 16:36', 1], ['2/19/2016 15:34', 43], ['4/7/2016 3:26', 8], ['8/30/2016 7:45', 3]]

In [109]:

import datetime as dt

show_date_time = {}
show_point_count = {}

for row in time_show_posts: 
    sdt_a = row[0]
    spc_b = row[1]
    points_show_object = "%m/%d/%Y %H:%M"
    hour_show = dt.datetime.strptime(sdt_a, points_show_object).strftime("%H")
    
    if hour_show not in show_date_time: 
        show_date_time[hour_show] = 1
        show_point_count[hour_show] = spc_b 
    else: 
        show_date_time[hour_show] += 1
        show_point_count[hour_show] += spc_b
        

avg_show_points = []

for hour in show_date_time:
    avg_show_points.append([hour, show_point_count[hour]/show_date_time[hour]])
    

avg_show_points = sorted(avg_show_points, reverse = False)


avg_show_points_swap = []

for column in avg_show_points: 
    ssp_a = column[0]
    ssp_b = round(column[1])
    avg_show_points_swap.append([ssp_b, ssp_a])
    
avg_show_points_swap = sorted(avg_show_points_swap, reverse = True)    
display (avg_show_points_swap)

for time in avg_show_points_swap[0:5]:
    show_display_format = "{hour}: {show_five_points:.2f} average points per post"
    show_five_points = time[0]
    show_points_five = dt.datetime.strptime(time[1],"%H").strftime("%I:%M %p")
    print (show_display_format.format(hour=show_points_five,  show_five_points= show_five_points))

[[42, '23'],
 [42, '12'],
 [40, '22'],
 [38, '00'],
 [36, '18'],
 [34, '11'],
 [31, '19'],
 [30, '20'],
 [29, '15'],
 [28, '16'],
 [27, '17'],
 [25, '14'],
 [25, '13'],
 [25, '03'],
 [25, '01'],
 [23, '06'],
 [19, '10'],
 [19, '07'],
 [18, '21'],
 [18, '09'],
 [15, '08'],
 [15, '04'],
 [11, '02'],
 [5, '05']]

11:00 PM: 42.00 average points per post
12:00 PM: 42.00 average points per post
10:00 PM: 40.00 average points per post
12:00 AM: 38.00 average points per post
06:00 PM: 36.00 average points per post

As we can see with the top 5 `Show posts` the most points are obtained between the hours of 10pm to midnight eastern standard time. With 12 noon being number 2. It seems alot of people also check Hacker news on their lunch.¶

What I have also determined is that `Show Posts` are most commented at night and `Ask posts` are most commented during the afternoon.¶

I will continue to now clean the data for `Other posts`. I will determine the average amount of comments and points received for other posts.¶

I will also see if `Other posts` has any noticeable pattern that can be observed.¶

In [150]:

other_posts_comments = []

for row in other_posts: 
    time_other = row[6]
    comments_other = row[4]
    comments_other = int(comments_other)
    other_posts_comments.append([time_other,comments_other])
                                 

In [148]:

import datetime as dt

other_posts_date = {}
other_posts_com = {}

for row in other_posts_comments: 
    opc_a = row[0]
    opc_b = row[1]
    comments_other_object = "%m/%d/%Y %H:%M"
    hour_comments_other = dt.datetime.strptime(opc_a, comments_other_object).strftime("%H")
    
    if hour_comments_other not in other_posts_date: 
        other_posts_date[hour_comments_other] = 1
        other_posts_com[hour_comments_other] = opc_b 
    else: 
        other_posts_date[hour_comments_other] += 1
        other_posts_com[hour_comments_other] += opc_b  

avg_other_posts = []

for hour in other_posts_date:
    avg_other_posts.append([hour, other_posts_com[hour]/other_posts_date[hour]])     
    

In [112]:

swap_other_posts = []

for column in avg_other_posts: 
    sop_a = column[0]
    sop_b = column[1]
    swap_other_posts.append([sop_b, sop_a])
    
sorted_other_posts = sorted(swap_other_posts, reverse = True)
display (sorted_other_posts)

for time in sorted_other_posts[0:5]: 
    other_display_format = "{hour}: {swap_other_five: .2f} average comments per hour for other posts" 
    swap_other_five = time[0]
    swap_other_time = dt.datetime.strptime(time[1],"%H").strftime("%I:%M %p")
    display (other_display_format.format(hour=swap_other_time, swap_other_five = swap_other_five))
          

[[32.33089770354906, '14'],
 [30.896514161220043, '13'],
 [30.34727503168568, '12'],
 [29.593939393939394, '11'],
 [29.51923076923077, '15'],
 [27.99572284003422, '17'],
 [27.786848072562357, '02'],
 [27.588014981273407, '09'],
 [27.076923076923077, '00'],
 [27.026209677419356, '08'],
 [26.924354243542435, '18'],
 [26.825552825552826, '03'],
 [26.808035714285715, '07'],
 [26.701020408163266, '19'],
 [26.612521150592215, '10'],
 [25.394187102633968, '16'],
 [25.175257731958762, '05'],
 [24.617210682492583, '23'],
 [24.125550660792953, '04'],
 [23.60983981693364, '21'],
 [23.265171503957784, '22'],
 [23.13940724478595, '20'],
 [23.072, '01'],
 [21.357843137254903, '06']]

'02:00 PM:  32.33 average comments per hour for other posts'

'01:00 PM:  30.90 average comments per hour for other posts'

'12:00 PM:  30.35 average comments per hour for other posts'

'11:00 AM:  29.59 average comments per hour for other posts'

'03:00 PM:  29.52 average comments per hour for other posts'

In [113]:

total_other_comments = [0]

for row in other_posts:
    num_comments_other = row[4]
    num_comments_other = int(num_comments_other)
    total_other_comments.append(num_comments_other)
    
avg_other_comments = sum(total_other_comments)/ len(total_other_comments)

print('Average Other Comments:' ,round(avg_other_comments))
print('Average Ask Comments:',round(avg_ask_comments))
print('Average Show Comments:',round(avg_show_comments))

Average Other Comments: 27
Average Ask Comments: 14
Average Show Comments: 10

We started by cleaning the comments by hour for other posts. The end result shows that the 2pm eastern standard time has the most activity with comments. The most popular time range seems to be between 11am to 3pm.¶

In conclusion with average comments by hour. Other posts seem to have the most activity with comments. In 2nd place will be ask posts and last will be show posts.¶

Average Other Comments: 27 Average Ask Comments: 14 Average Show Comments: 10

Now I will look at the amount of points received per hour for Other posts and its average amount of points compared to Ask and Show posts.¶

In [114]:

other_posts_points = []

for row in other_posts: 
    time_other = row[6]
    points_other = row[3]
    points_other = int(points_other)
    other_posts_points.append([time_other, points_other])
    
print (other_posts_points[0:15])

[['8/4/2016 11:52', 386], ['1/26/2016 19:30', 39], ['6/23/2016 22:20', 2], ['6/17/2016 0:01', 3], ['9/30/2015 4:12', 8], ['10/31/2015 9:48', 53], ['11/13/2015 0:45', 3], ['3/22/2016 16:18', 34], ['10/13/2015 9:30', 91], ['3/27/2016 18:08', 3], ['5/10/2016 4:46', 2], ['6/26/2016 16:36', 18], ['4/28/2016 10:01', 59], ['8/22/2016 12:37', 7], ['4/1/2016 9:45', 1]]

In [115]:

other_post_date = {}
other_points = {}

for row in other_posts_points: 
    opd_a = row[0]
    opd_b = row[1]
    points_other_object = "%m/%d/%Y %H:%M"
    hour_points_other = dt.datetime.strptime(opd_a, points_other_object).strftime("%H")
    
    if hour_points_other not in other_post_date: 
        other_post_date[hour_points_other] = 1
        other_points[hour_points_other] = opd_b
    else: 
        other_post_date[hour_points_other] += 1 
        other_points[hour_points_other] += opd_b
        
avg_other_points = []

for hour in other_post_date: 
    avg_other_points.append([hour, other_points[hour]/other_post_date[hour]])

swap_other_points = []

for column in avg_other_points: 
    avg_a = column[0]
    avg_b = column[1]
    swap_other_points.append([avg_b, avg_a])
        
sorted_other_points = sorted(swap_other_points, reverse = True)
display (sorted_other_points)

for time in sorted_other_points[0:5]: 
    other_display_format = "{hour}: {swap_other_points_five: .2f} average points per hour for other posts"
    swap_other_points_five = time[0]
    swap_other_date_times = dt.datetime.strptime(time[1], "%H").strftime("%I:%M %p")
    display (other_display_format.format(hour = swap_other_date_times, swap_other_points_five = swap_other_points_five))
        
    

[[62.525054466230934, '13'],
 [61.78601252609603, '14'],
 [60.542307692307695, '15'],
 [60.4839255499154, '10'],
 [60.01122448979592, '19'],
 [58.471655328798185, '02'],
 [58.4582651391162, '00'],
 [57.97861420017109, '17'],
 [57.56818181818182, '11'],
 [57.3979721166033, '12'],
 [56.92137592137592, '03'],
 [56.832589285714285, '07'],
 [54.182561307901906, '16'],
 [54.09274193548387, '08'],
 [53.93632958801498, '09'],
 [53.928966789667896, '18'],
 [52.02967359050445, '23'],
 [50.606, '01'],
 [50.236147757255935, '22'],
 [49.96649484536083, '05'],
 [49.66740088105727, '04'],
 [49.369565217391305, '21'],
 [46.23529411764706, '06'],
 [45.24478594950604, '20']]

'01:00 PM:  62.53 average points per hour for other posts'

'02:00 PM:  61.79 average points per hour for other posts'

'03:00 PM:  60.54 average points per hour for other posts'

'10:00 AM:  60.48 average points per hour for other posts'

'07:00 PM:  60.01 average points per hour for other posts'

The highest average amount of points is 62.5 points per hour during 1pm. It seems that the other posts receive the most points during the range of 1-3 pm est. This falls in line with the average range of comments being between 11-3pm est for other posts.¶

In [133]:

total_other_posts_points = [0]

for row in other_posts:
    num_other_points = row[3]
    num_other_points = int(num_other_points)
    total_other_posts_points.append(num_other_points)
    
avg_other_points = sum(total_other_posts_points)/ len(total_other_posts_points)

print ("Average Other Points:", (round(avg_other_points)))

Average Other Points: 55

The average amount of points an hour is 55 points for other posts. Other posts recieve more comments and points then Ask or Show posts. Also another observation is the peak amount of points given for other posts and ask posts seem overlap in a similar time frame. Below I will break it down for comparsion.¶