Let's summarize everything we learned in this course:
How to work with strings.
Object-oriented programming.
Dates and times.
In this project, we'll work with a data set of submissions to popular technology site Hacker News .
We're specifically interested in posts whose titles begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. You can download dataset from Kaggle, but in this project I using cleaned dataset from mission 356 on the dataquest.io.
I carefully viewed this dataset and I consider that dataset contains next authentic information:
Date and time when created posts .
Three post category :
2.1 Ask HN.
2.2 Show HN.
2.3 Others HN.
Total number comments of each post. Dataset hasn't contain detailed information about date and time of creating comments. Thus linking date and time to comments for post has general character and if we begin divide number comments to hours of publication or days - we got artificial and unreliable values and this is will seem trying "put on a owl to the globe".
Therefore I divide this project to two parts:
Part 1. Separate data to three datasest required the custom criteria and collect general statistic.
Part 2. Collect detailed information of date and time dispersion.
Lets go to Part 1.
Load functions requiring for extracting dataset from csv file:
def extract_variable_name(variable_name):
"""
Help on extr_var_name:
Extract name of variable of from global space name and return it.
Required argument- object name
"""
try:
for name in globals():
if eval(name) == variable_name:
return name
except Exception as X:
return ("\nError {} in function extract_variable_name(). ".format(X))
def csv_reader(csv_name, header = True):
from csv import reader
"""
Help on custom csv_reader:
Open cvs file and return data as lists.
Required arguments - csvname and False:
1. With defaults arguments returns two separate lists - header as list[0:1]
and list with data as list[1:].
2. With one argument csvfile returns list containing header as list[:]
Module csv using for export csv files to list
"""
try:
open_file = open(csv_name)
read_file = reader(open_file)
data = list(read_file)
if header:
return data[0], data[1:]
return data
except Exception as X:
print("\nError {} in function csv_reader()!".format(X))
def list_veiwer(list_name, start = 0, end = 0):
"""
Help on custom function list_veiwer (list_name(list), start(int) , end (int)):
Takes in list and returns detailed information about it and optional views
defined range elements or columns.
Required arguments - one or three:
1. list_name returns general information about list and views first row.
2. list_veiwer(list, start, end) returns general information about list
and view rows between start and end-1.
Convert negative index to positive index.
For print use * arg and sep ="\n"
"""
try:
tested_list = list_name[:]
rows = len(tested_list)
name_list = extract_variable_name(tested_list)
if not rows:
return "The {} is a empty list with zero rows and columns!"\
.format(name_list), tested_list
if not isinstance(tested_list[0], list):
elements = len(tested_list)
if elements > 101:
print_limit = 100
else:
print_limit = elements
return "The {} is a single row list contains {:,} elements.\nView \
first {} element".format(name_list, elements, print_limit),\
tested_list[:print_limit]
columns = len(tested_list[0])
if start == 0 and end == 0:
elements = len(tested_list[0])
if elements > 101:
print_limit = 100
else:
print_limit = elements
return "The {} is a list contains {:,} rows and {:,} columns.\
\nView its first row".format(name_list, rows, columns), tested_list[0]
if start == end and start !=0 and end !=0:
elements = len(tested_list[start])
if elements > 101:
print_limit = 100
else:
print_limit = elements
return "Entered start and end are same!The {} contains {:,} row and\
{:,} columns. View first row:".format(name_list, rows, columns),\
tested_list[start][:print_limit]
if start < 0:
start = start + rows
if end < 0:
end = end + rows
if start > end:
return "Entered start greater than end! The {} is a list contains {:,}\
row and {:,} columns. View first row:".format(name_list, rows, columns),\
tested_list[0]
return_list =["The {} list contains {:,} rows and {:,} columns\
\nView rows from {:,} to {}".format(name_list, rows, columns,start, end-1)]
row_no = [a for a in range(start, end)]
for i in row_no:
return_list.append(tested_list[i])
return return_list
except Exception as X:
return "\nError for {}: {} in function list_veiwer()!".\
format(name_list, X), "Check entered values!"
Extract separately header and list of data from "hacker_news.csv" and view its general information.
try:
header_hn, hn = csv_reader("hacker_news.csv")
print(*list_veiwer(header_hn), sep ="\n")
print(*list_veiwer(hn, 0, 5), sep = "\n")
except Exception as X:
print ("\nError for {}:".format(X))
The header_hn is a single row list contains 7 elements. View first 7 element ['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] The hn list contains 20,100 rows and 7 columns View rows from 0 to 4 ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'] ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'] ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']
Lets separate hn list to three different category posts:
List ask_posts for Ask HN
List show_posts for Show HN
List other_posts for Others HN
by next simple code and print general information about its lists.
try:
ask_posts =[]
show_posts = []
other_posts = []
for row in hn:
title = row[1].lower()
if title.startswith("ask hn"):
ask_posts.append(row)
elif title.startswith("show hn"):
show_posts.append(row)
else:
other_posts.append(row)
except Exception as X:
print ("Error for {}:".format(X))
print(*list_veiwer(ask_posts), sep = '\n')
print(*list_veiwer(show_posts), sep = '\n')
print(*list_veiwer(other_posts), sep = '\n')
The ask_posts is a list contains 1,744 rows and 7 columns. View its first row ['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'] The show_posts is a list contains 1,162 rows and 7 columns. View its first row ['10627194', 'Show HN: Wio Link ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'] The other_posts is a list contains 17,194 rows and 7 columns. View its first row ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
As we see
Ask HN has 1,744 (share 8,68%) posts,
Show HN has 1, 162 (share 5,78%) posts, less than - 33,37% about Ask HN
Others HN has 17,194 (share 85,54%) posts.
loading required functions for getting more detailed information:
def pretty_table_print(list_name, header,footer):
"""
Help on custom function pretty_table_print (list_name, header , footer):
Takes in list and print tabulated table.
Required arguments - list, header and footer
"""
try:
from prettytable import PrettyTable
row_range = [i for i in range(len(list_name[0]))]
print_list = list_name[:]
pretty_table = PrettyTable()
pretty_table.field_names = header
for i, row in enumerate(print_list):
pretty_row = [i+1]
for j in row_range:
pretty_row.append(row[j])
pretty_table.add_row(pretty_row)
pretty_table.add_row(footer)
pretty_table.align = "l"
print(pretty_table.get_string())
except Exception as X:
return "\nError {} for function pretty_table_print()!".format(X)
def posts_stats(list_name, type_post):
try:
import datetime as dt
tested_list = list_name[:]
comments_idx = []
for i, row in enumerate(tested_list):
comments = (int(row[4]))
index = i
comments_idx.append([comments, index])
# Sort list by comments descending by comments with linked indexes
comments_idx = sorted(comments_idx,
key = lambda comm: comm[0],
reverse = True)
# Separate comms_idx_lst_desc to list_comments and list_ids
comments = [comments_idx[0] for comments_idx in comments_idx ]
comments_idx = [comments_idx[1] for comments_idx in comments_idx]
comments_tot = sum(comments)
comments_avg = comments_tot/len(comments)
for i, value in enumerate(comments):
if value >= comments_avg:
continue
else:
count_over_comm_avg = i
break
print("The {} contains {:,} different posts.".\
format(str(type_post), len(comments)))
print("The {} contains total {:,} comments.".\
format(type_post, comments_tot))
print("The {} has average {:.3f} comments for each post.".\
format(type_post, comments_avg))
print("The {} has maximal {:,} and minimal {} comments.".\
format(type_post,comments[0], comments[-1]))
# replace dot to comma for percent
share_comma = "{:.3%}".format(count_over_comm_avg/len(comments)).replace(".", ",")
print("The number of posts in {} having comments over average {:,}, \
with total share {}".format(type_post,count_over_comm_avg , share_comma))
print("The top ten posts in {} with largest comments see below: ".\
format(type_post))
# Print top ten post with largest comments
out_list = []
top_ten_idx = comments_idx[0:10]
top_ten_comments_tot = 0
for idx in top_ten_idx:
row = tested_list[idx]
row_list =[]
row_idx = [0, 1, 4, 5, 6]
for i in row_idx:
if i != 4 and i != 6:
row_list.append(row[i])
elif i == 4:
row_list.append("{:,}".format(int(row[i])))
top_ten_comments_tot += int(row[i])
elif i == 6:
create_dt = dt.datetime.strptime(row[i], "%m/%d/%Y %H:%M")
row_list.append(create_dt.strftime("%d %B %Y %H:%M"))
out_list.append(row_list)
header = ["No", "id", "title", "num_comments", "author",\
"created_at"]
footer = ["", "", "Total top ten post comments:",\
"{:,}".format(top_ten_comments_tot), "","",]
pretty_table_print(out_list, header, footer)
return"EOF for general post statistics for {}.".format(type_post)
except Exception as X:
print ("Error {} for function posts_stats().".format(X))
and print more detailed statistic information:
print(posts_stats(ask_posts, "Ask HN"))
print(posts_stats(show_posts, "Show HN"))
print(posts_stats(other_posts, "Others HN"))
The Ask HN contains 1,744 different posts. The Ask HN contains total 24,483 comments. The Ask HN has average 14.038 comments for each post. The Ask HN has maximal 947 and minimal 1 comments. The number of posts in Ask HN having comments over average 270, with total share 15,482% The top ten posts in Ask HN with largest comments see below: +----+----------+---------------------------------------------------------------------------+--------------+-------------+-------------------------+ | No | id | title | num_comments | author | created_at | +----+----------+---------------------------------------------------------------------------+--------------+-------------+-------------------------+ | 1 | 12202865 | Ask HN: Who is hiring? (August 2016) | 947 | whoishiring | 01 August 2016 15:01 | | 2 | 12405698 | Ask HN: Who is hiring? (September 2016) | 910 | whoishiring | 01 September 2016 15:00 | | 3 | 11694277 | Ask HN: What's the best tool you used to use that doesn't exist anymore? | 868 | mod50ack | 14 May 2016 02:07 | | 4 | 11312984 | Ask HN: How much do you make at Amazon? Here is how much I make at Amazon | 691 | boren_ave11 | 18 March 2016 16:43 | | 5 | 10998486 | Ask HN: What should we fund at YC Research? | 520 | sama | 29 January 2016 21:03 | | 6 | 12243611 | Ask HN: What book have you given as a gift? | 514 | schappim | 07 August 2016 20:57 | | 7 | 12567645 | Ask HN: What do you wish someone would build? | 477 | prmph | 23 September 2016 20:18 | | 8 | 11247368 | Ask HN: Moving Out of Silicon Valley because of housing? Where to? | 383 | Apocryphon | 08 March 2016 18:35 | | 9 | 11405241 | Ask HN: Who wants to be hired? (April 2016) | 283 | whoishiring | 01 April 2016 15:01 | | 10 | 10642500 | Ask HN: Which open source projects have kind, supportive, talented teams? | 281 | mikemajzoub | 28 November 2015 21:23 | | | | Total top ten post comments: | 5,874 | | | +----+----------+---------------------------------------------------------------------------+--------------+-------------+-------------------------+ EOF for general post statistics for Ask HN. The Show HN contains 1,162 different posts. The Show HN contains total 11,988 comments. The Show HN has average 10.317 comments for each post. The Show HN has maximal 306 and minimal 1 comments. The number of posts in Show HN having comments over average 232, with total share 19,966% The top ten posts in Show HN with largest comments see below: +----+----------+-----------------------------------------------------------------------+--------------+---------------+-------------------------+ | No | id | title | num_comments | author | created_at | +----+----------+-----------------------------------------------------------------------+--------------+---------------+-------------------------+ | 1 | 11667494 | Show HN: BitKeeper Enterprise-ready version control, now open-source | 306 | wscott | 10 May 2016 14:39 | | 2 | 11778077 | Show HN: Automatic private time tracking for OS X | 233 | ivm | 26 May 2016 14:12 | | 3 | 11407536 | Show HN: What every browser knows about you | 206 | Capira | 01 April 2016 18:55 | | 4 | 11846108 | Show HN: New calendar app idea | 197 | petermolyneux | 06 June 2016 12:02 | | 5 | 10849460 | Show HN: Nodal. Next-Generation Node.js Server and Framework | 168 | keithwhor | 06 January 2016 09:20 | | 6 | 12211754 | Show HN: Noms A new decentralized database based on ideas from Git | 167 | ahl | 02 August 2016 17:57 | | 7 | 10248773 | Show HN: Hacker News Simulator | 163 | orf | 20 September 2015 19:50 | | 8 | 11354021 | Show HN: Watch movies with the freedom to filter | 143 | marco1 | 24 March 2016 16:16 | | 9 | 10355030 | Show HN: Download any song without knowing its name | 134 | yask123 | 08 October 2015 18:41 | | 10 | 11174815 | Show HN: CuriosityStream Netflix for non-fiction | 113 | MPetitt | 25 February 2016 14:58 | | | | Total top ten post comments: | 1,830 | | | +----+----------+-----------------------------------------------------------------------+--------------+---------------+-------------------------+ EOF for general post statistics for Show HN. The Others HN contains 17,194 different posts. The Others HN contains total 462,055 comments. The Others HN has average 26.873 comments for each post. The Others HN has maximal 1,733 and minimal 1 comments. The number of posts in Others HN having comments over average 4,248, with total share 24,706% The top ten posts in Others HN with largest comments see below: +----+----------+------------------------------------------------------------------------+--------------+---------------+-------------------------+ | No | id | title | num_comments | author | created_at | +----+----------+------------------------------------------------------------------------+--------------+---------------+-------------------------+ | 1 | 12445994 | iPhone 7 | 1,733 | benigeri | 07 September 2016 18:52 | | 2 | 11618896 | A Basic Income Should Be the Next Big Thing | 809 | warrenmar | 03 May 2016 08:39 | | 3 | 12494998 | Pardon Snowden | 781 | erlend_sh | 14 September 2016 08:31 | | 4 | 12211651 | Massachusetts Bans Employers from Asking Applicants About Previous Pay | 760 | OhHeyItsE | 02 August 2016 17:44 | | 5 | 10580208 | VLC contributor living in Aleppo writing about the Paris attacks | 705 | etix | 17 November 2015 10:27 | | 6 | 12133766 | Master Plan, Part Deux | 677 | arturogarrido | 21 July 2016 00:52 | | 7 | 10339388 | New Windows 10 Devices From Microsoft | 644 | yread | 06 October 2015 15:11 | | 8 | 10563540 | Paris Shootings and Explosions Kill Over 100, Police Say | 624 | franzb | 14 November 2015 00:25 | | 9 | 12480733 | How the Sugar Industry Shifted Blame to Fat | 599 | okket | 12 September 2016 15:51 | | 10 | 11196718 | Tech workers are increasingly looking to leave Silicon Valley | 569 | prostoalex | 29 February 2016 17:21 | | | | Total top ten post comments: | 7,901 | | | +----+----------+------------------------------------------------------------------------+--------------+---------------+-------------------------+ EOF for general post statistics for Others HN.
As we see
Ask HN has 1,744 (share 8,68%) posts,
Show HN has 1, 162 (share 5,78%) posts, less than - 33,37% about Ask HN
Others HN has 17,194 (share 85,54%) posts.
As we see from general statistics:
1.The Ask HN and Show HN have ~ -50 % little ratio post/comments than Others HN.
2.The share Ask HN and Show HN for popular post with over average ratio post/comments ~ -30% and less than in the Others HN.
3. The most popular post in Ask HN - tattles from near IT party-goers and reflections russian IT emigrant build or not to build pretty home on Silicon Valey and etc.
4. The most popular post in Show HN - post about software.
5. The most popular post in Show HN - general non IT news and little SMM pushing mainstream products from Microsoft and Apple.
6. The most part popular discussed post create from May to September 2016
Let's finishing part 1 and go to part 2.
loading required functions for getting more detailed statistic of time dispersion:
def convert_dt_list_to_freq_post_table(list_name, hour = False):
"""
Help on custom conver_dt_list_to_freq_post_table function:
convert a single list contains string date and time values to
formated pivot table with sort descending order.
Required arguments - list_name and hour = True / False
"""
try:
tested_list = list_name[:]
pivot_dict = {}
for value in tested_list:
if value not in pivot_dict:
pivot_dict[value] = 1
else:
pivot_dict[value] += 1
total = sum(list(pivot_dict.values()))
values_list = []
for k, v in pivot_dict.items():
values_list.append([k, float(v)/total, int(v)])
desc_list = sorted(values_list, key = lambda v : v[2], reverse = True)
out_list = []
# replace dot to comma for percent
for row in desc_list:
if not hour:
out_list.append([row[0],
("{:.3%}".format(row[1])).replace(".", ","),
"{:,}".format(row[2])])
else:
out_list.append([row[0] + ":00 - " + row[0] +":59",
("{:.3%}".format(row[1])).replace(".", ","),
"{:,}".format(row[2])])
return out_list, ["","Total"," posts: ", "{:,}.".format(total)]
except Exception as X:
print("\nError {} in function csv_reader()!".format(X))
def time_stats_posts(list_name, type_post):
try:
import datetime as dt
tested_list = list_name[:]
# Create lists contains number comments, date and time and weekday
comments = []
dates_times = []
days_months_years =[]
months_years = []
days = []
hours = []
for row in tested_list:
comments.append(int(row[4]))
date_time_dt = dt.datetime.strptime(row[6], "%m/%d/%Y %H:%M")
dates_times.append(date_time_dt)
# separate day, full month name, year
days_months_years.append(date_time_dt.strftime("%d %B %Y"))
# separate full month name and year
months_years.append((date_time_dt.strftime("%B %Y")))
# separate full name day of week
days.append(date_time_dt.strftime("%A"))
# separate hours of day
hours.append(date_time_dt.strftime("%H"))
# Calculate and output time interval, and general statistic
start_dt = min(dates_times)
end_dt = max(dates_times)
diff_dt = end_dt - start_dt
diff_days = round(diff_dt.total_seconds()/(24*3600), 3)
posts_tot =(len(tested_list))
posts_avg_day = round(posts_tot/diff_days, 3)
comments_tot = sum(comments)
comments_avg_post = round(comments_tot/len(comments), 3)
comments_avg_day = round(comments_tot/diff_days, 3)
comments_count_over_avg = 0
for value in comments:
if value > int(comments_avg_post):
comments_count_over_avg += 1
# Print and average total values
print("The {} from {} to {} contains total {:,} posts and {:,} comments.".\
format(type_post,start_dt.strftime("%d %B %Y %H:%M"),
end_dt.strftime("%d %B %Y %H:%M"),
posts_tot, comments_tot))
print("Average posts in the day {:.2f}, average comments for each posts \
{:.2f}, average comments in the each day {:.2f}.".\
format(posts_avg_day, comments_avg_post, comments_avg_day))
# replace dot to comma for percent
share_comma = ("{:.3%}".format(comments_count_over_avg/len(comments))).\
replace(".", ",")
print("The {:,} posts of {} that have ratio comments/post over average ratio \
comments/post with share of {} from total.".\
format(comments_count_over_avg,
type_post,
share_comma))
# Convert dictionaries to formated sort by descending lists
# and output its using pretty tables
print("\n1. Dispersion top ten posts numbers by date.")
print_dmy, footer_dmy = convert_dt_list_to_freq_post_table(days_months_years, False)
print_dmy = print_dmy[0:10]
header_dmy = ["No", "Date", "Share %", "Post numbers"]
pretty_table_print(print_dmy, header_dmy, footer_dmy)
print("\n2. Dispersion posts numbers by month year.")
print_my, footer_my = convert_dt_list_to_freq_post_table(months_years, False)
header_my = ["No", "Month Year", "Share %", "Post numbers"]
pretty_table_print(print_my, header_my, footer_my)
print("\n3. Dispersion posts numbers by day of week.")
print_d, footer_d = convert_dt_list_to_freq_post_table(days, False)
header_d = ["No", "Day of week", "Share %", "Post numbers"]
pretty_table_print(print_d, header_d, footer_d)
print("\n4. Dispersion posts numbers by hours .")
print_hours, footer_hours = convert_dt_list_to_freq_post_table(hours, True)
header_hours = ["No", "Hours of day", "Share %", "Post numbers"]
pretty_table_print(print_hours, header_hours, footer_hours)
return "EOF for time dispersion for {}.".format(type_post)
except Exception as X:
print ("Error {} for function time_stats_posts().".format(X))
and print detailed statistic for date and time dispersion:
print(time_stats_posts(ask_posts, "Ask HN"))
print(time_stats_posts(show_posts, "Show HN"))
print(time_stats_posts(other_posts, "Others HN"))
The Ask HN from 06 September 2015 14:53 to 26 September 2016 01:17 contains total 1,744 posts and 24,483 comments. Average posts in the day 4.53, average comments for each posts 14.04, average comments in the each day 63.52. The 270 posts of Ask HN that have ratio comments/post over average ratio comments/post with share of 15,482% from total. 1. Dispersion top ten posts numbers by date. +----+-------------------+----------+--------------+ | No | Date | Share % | Post numbers | +----+-------------------+----------+--------------+ | 1 | 19 September 2016 | 0,688% | 12 | | 2 | 02 May 2016 | 0,573% | 10 | | 3 | 13 September 2016 | 0,573% | 10 | | 4 | 08 April 2016 | 0,573% | 10 | | 5 | 23 March 2016 | 0,573% | 10 | | 6 | 14 January 2016 | 0,573% | 10 | | 7 | 13 January 2016 | 0,516% | 9 | | 8 | 25 January 2016 | 0,516% | 9 | | 9 | 19 January 2016 | 0,516% | 9 | | 10 | 20 July 2016 | 0,516% | 9 | | | Total | posts: | 1,744. | +----+-------------------+----------+--------------+ 2. Dispersion posts numbers by month year. +----+----------------+----------+--------------+ | No | Month Year | Share % | Post numbers | +----+----------------+----------+--------------+ | 1 | January 2016 | 9,174% | 160 | | 2 | August 2016 | 9,002% | 157 | | 3 | April 2016 | 8,486% | 148 | | 4 | March 2016 | 8,200% | 143 | | 5 | May 2016 | 8,085% | 141 | | 6 | July 2016 | 8,028% | 140 | | 7 | November 2015 | 7,569% | 132 | | 8 | June 2016 | 7,569% | 132 | | 9 | October 2015 | 7,225% | 126 | | 10 | February 2016 | 7,167% | 125 | | 11 | September 2016 | 7,167% | 125 | | 12 | December 2015 | 7,053% | 123 | | 13 | September 2015 | 5,275% | 92 | | | Total | posts: | 1,744. | +----+----------------+----------+--------------+ 3. Dispersion posts numbers by day of week. +----+-------------+----------+--------------+ | No | Day of week | Share % | Post numbers | +----+-------------+----------+--------------+ | 1 | Wednesday | 16,858% | 294 | | 2 | Tuesday | 16,514% | 288 | | 3 | Monday | 16,342% | 285 | | 4 | Friday | 15,539% | 271 | | 5 | Thursday | 14,564% | 254 | | 6 | Saturday | 10,894% | 190 | | 7 | Sunday | 9,289% | 162 | | | Total | posts: | 1,744. | +----+-------------+----------+--------------+ 4. Dispersion posts numbers by hours . +----+---------------+----------+--------------+ | No | Hours of day | Share % | Post numbers | +----+---------------+----------+--------------+ | 1 | 15:00 - 15:59 | 6,651% | 116 | | 2 | 19:00 - 19:59 | 6,307% | 110 | | 3 | 21:00 - 21:59 | 6,250% | 109 | | 4 | 18:00 - 18:59 | 6,250% | 109 | | 5 | 16:00 - 16:59 | 6,193% | 108 | | 6 | 14:00 - 14:59 | 6,135% | 107 | | 7 | 17:00 - 17:59 | 5,734% | 100 | | 8 | 13:00 - 13:59 | 4,874% | 85 | | 9 | 20:00 - 20:59 | 4,587% | 80 | | 10 | 12:00 - 12:59 | 4,186% | 73 | | 11 | 22:00 - 22:59 | 4,071% | 71 | | 12 | 23:00 - 23:59 | 3,899% | 68 | | 13 | 01:00 - 01:59 | 3,440% | 60 | | 14 | 10:00 - 10:59 | 3,383% | 59 | | 15 | 02:00 - 02:59 | 3,326% | 58 | | 16 | 11:00 - 11:59 | 3,326% | 58 | | 17 | 00:00 - 00:59 | 3,154% | 55 | | 18 | 03:00 - 03:59 | 3,096% | 54 | | 19 | 08:00 - 08:59 | 2,752% | 48 | | 20 | 04:00 - 04:59 | 2,695% | 47 | | 21 | 05:00 - 05:59 | 2,638% | 46 | | 22 | 09:00 - 09:59 | 2,580% | 45 | | 23 | 06:00 - 06:59 | 2,523% | 44 | | 24 | 07:00 - 07:59 | 1,950% | 34 | | | Total | posts: | 1,744. | +----+---------------+----------+--------------+ EOF for time dispersion for Ask HN. The Show HN from 06 September 2015 12:38 to 25 September 2016 19:06 contains total 1,162 posts and 11,988 comments. Average posts in the day 3.02, average comments for each posts 10.32, average comments in the each day 31.12. The 232 posts of Show HN that have ratio comments/post over average ratio comments/post with share of 19,966% from total. 1. Dispersion top ten posts numbers by date. +----+-----------------+----------+--------------+ | No | Date | Share % | Post numbers | +----+-----------------+----------+--------------+ | 1 | 21 March 2016 | 0,775% | 9 | | 2 | 12 October 2015 | 0,775% | 9 | | 3 | 31 March 2016 | 0,775% | 9 | | 4 | 02 August 2016 | 0,688% | 8 | | 5 | 01 August 2016 | 0,688% | 8 | | 6 | 30 March 2016 | 0,688% | 8 | | 7 | 17 August 2016 | 0,688% | 8 | | 8 | 14 January 2016 | 0,688% | 8 | | 9 | 07 March 2016 | 0,602% | 7 | | 10 | 06 July 2016 | 0,602% | 7 | | | Total | posts: | 1,162. | +----+-----------------+----------+--------------+ 2. Dispersion posts numbers by month year. +----+----------------+----------+--------------+ | No | Month Year | Share % | Post numbers | +----+----------------+----------+--------------+ | 1 | March 2016 | 11,102% | 129 | | 2 | November 2015 | 8,778% | 102 | | 3 | February 2016 | 8,692% | 101 | | 4 | January 2016 | 8,520% | 99 | | 5 | August 2016 | 8,348% | 97 | | 6 | October 2015 | 8,348% | 97 | | 7 | July 2016 | 7,831% | 91 | | 8 | September 2015 | 6,885% | 80 | | 9 | December 2015 | 6,713% | 78 | | 10 | April 2016 | 6,540% | 76 | | 11 | May 2016 | 6,368% | 74 | | 12 | June 2016 | 6,282% | 73 | | 13 | September 2016 | 5,594% | 65 | | | Total | posts: | 1,162. | +----+----------------+----------+--------------+ 3. Dispersion posts numbers by day of week. +----+-------------+----------+--------------+ | No | Day of week | Share % | Post numbers | +----+-------------+----------+--------------+ | 1 | Wednesday | 18,589% | 216 | | 2 | Monday | 17,642% | 205 | | 3 | Tuesday | 17,298% | 201 | | 4 | Thursday | 16,523% | 192 | | 5 | Friday | 12,048% | 140 | | 6 | Sunday | 9,122% | 106 | | 7 | Saturday | 8,778% | 102 | | | Total | posts: | 1,162. | +----+-------------+----------+--------------+ 4. Dispersion posts numbers by hours . +----+---------------+----------+--------------+ | No | Hours of day | Share % | Post numbers | +----+---------------+----------+--------------+ | 1 | 13:00 - 13:59 | 8,520% | 99 | | 2 | 16:00 - 16:59 | 8,003% | 93 | | 3 | 17:00 - 17:59 | 8,003% | 93 | | 4 | 14:00 - 14:59 | 7,401% | 86 | | 5 | 15:00 - 15:59 | 6,713% | 78 | | 6 | 18:00 - 18:59 | 5,250% | 61 | | 7 | 12:00 - 12:59 | 5,250% | 61 | | 8 | 20:00 - 20:59 | 5,164% | 60 | | 9 | 19:00 - 19:59 | 4,733% | 55 | | 10 | 21:00 - 21:59 | 4,045% | 47 | | 11 | 22:00 - 22:59 | 3,959% | 46 | | 12 | 11:00 - 11:59 | 3,787% | 44 | | 13 | 23:00 - 23:59 | 3,098% | 36 | | 14 | 10:00 - 10:59 | 3,098% | 36 | | 15 | 08:00 - 08:59 | 2,926% | 34 | | 16 | 00:00 - 00:59 | 2,668% | 31 | | 17 | 02:00 - 02:59 | 2,582% | 30 | | 18 | 09:00 - 09:59 | 2,582% | 30 | | 19 | 01:00 - 01:59 | 2,410% | 28 | | 20 | 03:00 - 03:59 | 2,324% | 27 | | 21 | 07:00 - 07:59 | 2,238% | 26 | | 22 | 04:00 - 04:59 | 2,238% | 26 | | 23 | 05:00 - 05:59 | 1,635% | 19 | | 24 | 06:00 - 06:59 | 1,377% | 16 | | | Total | posts: | 1,162. | +----+---------------+----------+--------------+ EOF for time dispersion for Show HN. The Others HN from 06 September 2015 05:56 to 26 September 2016 03:13 contains total 17,194 posts and 462,055 comments. Average posts in the day 44.56, average comments for each posts 26.87, average comments in the each day 1197.38. The 4,248 posts of Others HN that have ratio comments/post over average ratio comments/post with share of 24,706% from total. 1. Dispersion top ten posts numbers by date. +----+-------------------+----------+--------------+ | No | Date | Share % | Post numbers | +----+-------------------+----------+--------------+ | 1 | 07 July 2016 | 0,430% | 74 | | 2 | 14 September 2016 | 0,425% | 73 | | 3 | 14 January 2016 | 0,425% | 73 | | 4 | 16 March 2016 | 0,419% | 72 | | 5 | 12 November 2015 | 0,419% | 72 | | 6 | 30 September 2015 | 0,413% | 71 | | 7 | 10 December 2015 | 0,390% | 67 | | 8 | 27 January 2016 | 0,384% | 66 | | 9 | 17 November 2015 | 0,384% | 66 | | 10 | 28 April 2016 | 0,378% | 65 | | | Total | posts: | 17,194. | +----+-------------------+----------+--------------+ 2. Dispersion posts numbers by month year. +----+----------------+----------+--------------+ | No | Month Year | Share % | Post numbers | +----+----------------+----------+--------------+ | 1 | October 2015 | 8,549% | 1,470 | | 2 | January 2016 | 8,346% | 1,435 | | 3 | November 2015 | 8,323% | 1,431 | | 4 | March 2016 | 8,166% | 1,404 | | 5 | April 2016 | 8,090% | 1,391 | | 6 | June 2016 | 8,026% | 1,380 | | 7 | December 2015 | 7,852% | 1,350 | | 8 | May 2016 | 7,660% | 1,317 | | 9 | August 2016 | 7,607% | 1,308 | | 10 | February 2016 | 7,474% | 1,285 | | 11 | July 2016 | 7,410% | 1,274 | | 12 | September 2016 | 6,322% | 1,087 | | 13 | September 2015 | 6,177% | 1,062 | | | Total | posts: | 17,194. | +----+----------------+----------+--------------+ 3. Dispersion posts numbers by day of week. +----+-------------+----------+--------------+ | No | Day of week | Share % | Post numbers | +----+-------------+----------+--------------+ | 1 | Thursday | 16,704% | 2,872 | | 2 | Wednesday | 16,389% | 2,818 | | 3 | Tuesday | 16,337% | 2,809 | | 4 | Monday | 15,087% | 2,594 | | 5 | Friday | 14,970% | 2,574 | | 6 | Sunday | 10,434% | 1,794 | | 7 | Saturday | 10,079% | 1,733 | | | Total | posts: | 17,194. | +----+-------------+----------+--------------+ 4. Dispersion posts numbers by hours . +----+---------------+----------+--------------+ | No | Hours of day | Share % | Post numbers | +----+---------------+----------+--------------+ | 1 | 17:00 - 17:59 | 6,799% | 1,169 | | 2 | 16:00 - 16:59 | 6,403% | 1,101 | | 3 | 18:00 - 18:59 | 6,305% | 1,084 | | 4 | 15:00 - 15:59 | 6,049% | 1,040 | | 5 | 19:00 - 19:59 | 5,700% | 980 | | 6 | 14:00 - 14:59 | 5,572% | 958 | | 7 | 13:00 - 13:59 | 5,339% | 918 | | 8 | 20:00 - 20:59 | 5,298% | 911 | | 9 | 21:00 - 21:59 | 5,083% | 874 | | 10 | 12:00 - 12:59 | 4,589% | 789 | | 11 | 22:00 - 22:59 | 4,409% | 758 | | 12 | 23:00 - 23:59 | 3,920% | 674 | | 13 | 11:00 - 11:59 | 3,839% | 660 | | 14 | 00:00 - 00:59 | 3,554% | 611 | | 15 | 10:00 - 10:59 | 3,437% | 591 | | 16 | 09:00 - 09:59 | 3,106% | 534 | | 17 | 01:00 - 01:59 | 2,908% | 500 | | 18 | 08:00 - 08:59 | 2,885% | 496 | | 19 | 04:00 - 04:59 | 2,640% | 454 | | 20 | 07:00 - 07:59 | 2,606% | 448 | | 21 | 02:00 - 02:59 | 2,565% | 441 | | 22 | 06:00 - 06:59 | 2,373% | 408 | | 23 | 03:00 - 03:59 | 2,367% | 407 | | 24 | 05:00 - 05:59 | 2,257% | 388 | | | Total | posts: | 17,194. | +----+---------------+----------+--------------+ EOF for time dispersion for Others HN.
We see next general trends:
1. The most post with the most comments created in 2016 year.
2. Tattles from near IT party-goers in Ask HN more popular than boring everyday life software developers or high art of programming from Show HN.
3. Share popular posts with ratio posts/comment over average not more 24.7%
4. Most posts created in Monday, Tuesday, Wednesday and Friday between 13:00 - 22:00, why is there a recession on Thursday.
5. Number of posts raising during the time period
Conclusion: I hope that is last project with tabulated tables and in near time I learn how display data in a more modern form than tables in style the MS-DOS 6.22 era.))
Created on Jan 13, 2021
@author: Vadim Maklakov, used some ideas from public Internet resources.
© 3-clause BSD License
Software environment:
Debian 10.7
Python 3.8.7
required preinstalled next python modules:
csv
datetime
prettytable