Heroes Of Pymoli Data Analysis¶

Of the 576 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).
Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).

The total revenue is \$ 2379.77 from 780 purchases out of 179 unique items, in which the average item price is \$ 3.05.
Males have spent almost three times more on purchases than females (\$1967.64-male vs \$361.94-female). This would be an interesting topic as to why females prefer playing computer games less than males, in general.
The teenages aged less than 19 years old made up 25.35% of the group, which is concerning on the total time consumption these teenagers would spend on playing computer games. Their mental and physical health related to the time consumption on playing games should raise public awareness.
The "Final Critic" game is the most profitable item and the most popular item, which has been sold 13 times and made $60 as the total purchase value.

In [14]:

# Dependencies and Setup
import pandas as pd

# File to Load 
# file_to_load = "Resources/purchase_data.csv"
file_to_load = "purchase_data.csv"
# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

As final considerations:

You must use the Pandas Library and the Jupyter Notebook. You must submit a link to your Jupyter Notebook with the viewable Data Frames. You must include a written description of three observable trends based on the data. See Example Solution for a reference on expected format.

In [15]:

purchase_data.describe()

Out[15]:

	Purchase ID	Age	Item ID	Price
count	780.000000	780.000000	780.000000	780.000000
mean	389.500000	22.714103	91.755128	3.050987
std	225.310896	6.659444	52.697702	1.169549
min	0.000000	7.000000	0.000000	1.000000
25%	194.750000	20.000000	47.750000	1.980000
50%	389.500000	22.000000	92.000000	3.150000
75%	584.250000	25.000000	138.000000	4.080000
max	779.000000	45.000000	183.000000	4.990000

In [16]:

purchase_data.head()

Out[16]:

	Purchase ID	SN	Age	Gender	Item ID	Item Name	Price
0	0	Lisim78	20	Male	108	Extraction, Quickblade Of Trembling Hands	3.53
1	1	Lisovynya38	40	Male	143	Frenzied Scimitar	1.56
2	2	Ithergue48	24	Male	92	Final Critic	4.88
3	3	Chamassasya86	24	Male	100	Blindscythe	3.27
4	4	Iskosia90	23	Male	131	Fury	1.44

Player Count¶

Display the total number of players

In [17]:

player_number = len(purchase_data['SN'].unique())

total_players = pd.DataFrame([
    {"Total Players": player_number}
])

total_players

Out[17]:

	Total Players
0	576

In [ ]:

Purchasing Analysis (Total)¶

Run basic calculations to obtain number of unique items, average price, etc.
Create a summary data frame to hold the results
Optional: give the displayed data cleaner formatting
Display the summary data frame

In [18]:

unique_items=len(purchase_data["Item ID"].unique())
avg_price=purchase_data["Price"].mean()
purchase_number = sum(purchase_data["Item ID"].value_counts())
total_revenue = purchase_data["Price"].sum()

p_analysis = pd.DataFrame([
    {"Number of Unique Items": unique_items,
     "Average Price": avg_price,
     "Number of Purchases": purchase_number,
     "Total Revenue": total_revenue}])

p_analysis["Average Price"]=p_analysis["Average Price"].map("${:.2f}".format)
p_analysis["Total Revenue"]=p_analysis["Total Revenue"].map("${:.2f}".format)
p_analysis

Out[18]:

	Number of Unique Items	Average Price	Number of Purchases	Total Revenue
0	179	$3.05	780	$2379.77

Gender Demographics¶

Percentage and Count of Male Players
Percentage and Count of Female Players
Percentage and Count of Other / Non-Disclosed

In [19]:

demo=pd.DataFrame(purchase_data.groupby('SN')["Gender"].max())

gender_demo=pd.DataFrame(demo.groupby('Gender')['Gender'].count())
total=len(demo['Gender'])
gender_demo["Percentage of Players"]= (gender_demo['Gender']/total)*100
gender_demo["Percentage of Players"]=gender_demo["Percentage of Players"].map("{0:.2f}%".format)

gender_demo=gender_demo.rename(columns={'Gender': 'Total Count'})
gender_final=gender_demo.sort_values('Total Count', ascending=False)
gender_final

Out[19]:

	Total Count	Percentage of Players
Gender
Male	484	84.03%
Female	81	14.06%
Other / Non-Disclosed	11	1.91%

Purchasing Analysis (Gender)¶

Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender
Create a summary data frame to hold the results
Optional: give the displayed data cleaner formatting
Display the summary data frame

In [20]:

count= purchase_data.groupby('Gender')['Item Name'].count().values.tolist()
avgprice=purchase_data.groupby('Gender')['Price'].mean().values.tolist()
totalprice=purchase_data.groupby('Gender')['Price'].sum().values.tolist()


sumpricepp=purchase_data.groupby(['Gender','SN'])['Price'].sum().to_frame(name='Sum').reset_index()
ppcount=purchase_data.groupby(['Gender','SN'])['SN'].count().to_frame(name='Count').reset_index()
merge_pricepp=pd.merge(sumpricepp, ppcount, on="SN", how="outer")
merge_pricepp["Average"]=(merge_pricepp['Sum']/merge_pricepp['Count']).astype("float")
avgpricepp=merge_pricepp.groupby(['Gender_y'])['Sum'].mean()
avgpricepp_list = avgpricepp.values.tolist()

purchase_gender_df=pd.DataFrame([
    {"Gender": "Female", "Purchase Count": count[0], "Average Purchase Price": avgprice[0], 
     "Total Purchase Value": totalprice[0], "Avg Total Purchase per Person": avgpricepp_list[0]},
    {"Gender": "Male", "Purchase Count": count[1], "Average Purchase Price": avgprice[1], 
     "Total Purchase Value": totalprice[1], "Avg Total Purchase per Person": avgpricepp_list[1]},
    {"Gender": "Other / Non-Disclosed", "Purchase Count": count[2], "Average Purchase Price": avgprice[2], 
     "Total Purchase Value": totalprice[2], "Avg Total Purchase per Person": avgpricepp_list[2]}
])

purchase_gender_df["Average Purchase Price"]=purchase_gender_df["Average Purchase Price"].map("${:.2f}".format)
purchase_gender_df["Total Purchase Value"]=purchase_gender_df["Total Purchase Value"].map("${:.2f}".format)
purchase_gender_df["Avg Total Purchase per Person"]=purchase_gender_df["Avg Total Purchase per Person"].map("${:.2f}".format)
purchase_gender_df=purchase_gender_df.reset_index(drop=True)
purchase_gender_df

Out[20]:

	Gender	Purchase Count	Average Purchase Price	Total Purchase Value	Avg Total Purchase per Person
0	Female	113	$3.20	$361.94	$4.47
1	Male	652	$3.02	$1967.64	$4.07
2	Other / Non-Disclosed	15	$3.35	$50.19	$4.56

Age Demographics¶

Establish bins for ages
Categorize the existing players using the age bins. Hint: use pd.cut()
Calculate the numbers and percentages by age group
Create a summary data frame to hold the results
Optional: round the percentage column to two decimal points
Display Age Demographics Table

In [21]:

age_bins=[0,9,14,19,24,29,34,39,50]
age_labels=["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]

uniquesn=pd.DataFrame(purchase_data.groupby('SN')['Age'].mean())
uniquesn["Age Range"]=pd.cut(uniquesn['Age'],age_bins,labels=age_labels)

new_age_df=pd.DataFrame(uniquesn.groupby("Age Range")["Age"].count())
new_age_df=new_age_df.rename(columns={"Age":"Total Counts"})

new_age_df["Percentage of Players"]=(new_age_df["Total Counts"]/(new_age_df["Total Counts"].sum()))*100
new_age_df["Percentage of Players"] = new_age_df["Percentage of Players"].map("{0:,.2f}%".format)
new_age_df

Out[21]:

	Total Counts	Percentage of Players
Age Range
<10	17	2.95%
10-14	22	3.82%
15-19	107	18.58%
20-24	258	44.79%
25-29	77	13.37%
30-34	52	9.03%
35-39	31	5.38%
40+	12	2.08%

Purchasing Analysis (Age)¶

Bin the purchase_data data frame by age
Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below
Create a summary data frame to hold the results
Optional: give the displayed data cleaner formatting
Display the summary data frame

In [22]:

purchase_data["Age Range"]=pd.cut(purchase_data['Age'],age_bins,labels=age_labels)


age_purchase=purchase_data.groupby("Age Range")["Item ID"].count()
avg_price=purchase_data.groupby("Age Range")['Price'].mean()
total_price=purchase_data.groupby("Age Range")['Price'].sum()

# calculate average total price per person
ppcount=pd.DataFrame(purchase_data.groupby('SN')['Item ID'].count()) 
pptotal=pd.DataFrame(purchase_data.groupby('SN')['Price'].sum())
ppmerge_df=pd.merge(ppcount, pptotal, on="SN", how="outer")
aronly_df=pd.DataFrame(purchase_data.groupby('SN')['Age Range'].max())
pparmerge=pd.merge(aronly_df, ppmerge_df, on="SN", how="outer")
avgpp_df=pparmerge.groupby("Age Range")["Price"].mean()


# final merge all dataframes to store all info
finalmerge1=pd.merge(age_purchase, avg_price, on="Age Range", how="outer")
finalmerge2=pd.merge(finalmerge1, total_price, on="Age Range", how="outer")
finalmerge3=pd.merge(finalmerge2, avgpp_df, on="Age Range", how="outer")


# format
finalmerge3=finalmerge3.rename(columns={"Item ID":"Purchase Count","Price_x": "Average Purchase Price",
                                       "Price_y":"Total Purchase Value","Price":"Avg Total Purchase per Person"})

finalmerge3["Purchase Count"] = finalmerge3["Purchase Count"]
finalmerge3["Average Purchase Price"] = finalmerge3["Average Purchase Price"].map("${:.2f}".format)
finalmerge3["Total Purchase Value"] = finalmerge3["Total Purchase Value"].map("${:.2f}".format)
finalmerge3["Avg Total Purchase per Person"] = finalmerge3["Avg Total Purchase per Person"].map("${:.2f}".format)

finalmerge3

Out[22]:

	Purchase Count	Average Purchase Price	Total Purchase Value	Avg Total Purchase per Person
Age Range
<10	23	$3.35	$77.13	$4.54
10-14	28	$2.96	$82.78	$3.76
15-19	136	$3.04	$412.89	$3.86
20-24	365	$3.05	$1114.06	$4.32
25-29	101	$2.90	$293.00	$3.81
30-34	73	$2.93	$214.00	$4.12
35-39	41	$3.60	$147.67	$4.76
40+	13	$2.94	$38.24	$3.19

Top Spenders¶

Run basic calculations to obtain the results in the table below
Create a summary data frame to hold the results
Sort the total purchase value column in descending order
Optional: give the displayed data cleaner formatting
Display a preview of the summary data frame

In [23]:

ppavg=pd.DataFrame(purchase_data.groupby('SN')['Price'].mean())
topspender=pd.merge(pparmerge, ppavg, on="SN", how='outer')

topspender=topspender.rename(columns={"Item ID": "Purchase Count", "Price_y":"Average Purchase Value", 
                                      "Price_x":"Total Purchase Value"})
topspender=topspender.sort_values("Total Purchase Value", ascending=False)
del topspender['Age Range']

# formatting
topspender=topspender[["Purchase Count", "Average Purchase Value", "Total Purchase Value"]]
topspender["Total Purchase Value"] = topspender["Total Purchase Value"].map("${:.2f}".format)
topspender["Average Purchase Value"] = topspender["Average Purchase Value"].map("${:.2f}".format)

topspender[:5]

Out[23]:

	Purchase Count	Average Purchase Value	Total Purchase Value
SN
Lisosia93	5	$3.79	$18.96
Idastidru52	4	$3.86	$15.45
Chamjask73	3	$4.61	$13.83
Iral74	4	$3.40	$13.62
Iskadarya95	3	$4.37	$13.10

Most Popular Items¶

Retrieve the Item ID, Item Name, and Item Price columns
Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value
Create a summary data frame to hold the results
Sort the purchase count column in descending order
Optional: give the displayed data cleaner formatting
Display a preview of the summary data frame

In [24]:

purchase_data.columns

popularity= purchase_data[["Item ID", "Item Name", "Price"]]

total_price= popularity.groupby(["Item ID","Item Name"])['Price'].sum()
item_price=popularity.groupby(["Item ID","Item Name"])['Price'].mean()
count=popularity.groupby(["Item ID","Item Name"])['Price'].count()

merge1_df=pd.merge(count, item_price, on=["Item ID","Item Name"], how="outer")
merge2_df=pd.merge(merge1_df, total_price, on=["Item ID","Item Name"], how="outer")
merge2_df=merge2_df.rename(columns={"Price_x":"Purchase Count",
                                     "Price_y": "Item Price",
                                    "Price": "Total Purchase Value"})
newmerge_df=merge2_df.sort_values("Purchase Count", ascending=False)



newmerge_df["Item Price"] = newmerge_df["Item Price"].map("${:.2f}".format)
newmerge_df["Total Purchase Value"] = newmerge_df["Total Purchase Value"].map("${:.2f}".format)
newmerge_df[:5] 

                              

Out[24]:

		Purchase Count	Item Price	Total Purchase Value
Item ID	Item Name
92	Final Critic	13	$4.61	$59.99
178	Oathbreaker, Last Hope of the Breaking Storm	12	$4.23	$50.76
145	Fiery Glass Crusader	9	$4.58	$41.22
132	Persuasion	9	$3.22	$28.99
108	Extraction, Quickblade Of Trembling Hands	9	$3.53	$31.77

Most Profitable Items¶

Sort the above table by total purchase value in descending order
Optional: give the displayed data cleaner formatting
Display a preview of the data frame

In [25]:

newmerge_df=merge2_df.sort_values("Total Purchase Value", ascending=False)

newmerge_df["Item Price"] = newmerge_df["Item Price"].map("${:.2f}".format)
newmerge_df["Total Purchase Value"] = newmerge_df["Total Purchase Value"].map("${:.2f}".format)
newmerge_df[:5]

Out[25]:

		Purchase Count	Item Price	Total Purchase Value
Item ID	Item Name
92	Final Critic	13	$4.61	$59.99
178	Oathbreaker, Last Hope of the Breaking Storm	12	$4.23	$50.76
82	Nirvana	9	$4.90	$44.10
145	Fiery Glass Crusader	9	$4.58	$41.22
103	Singed Scalpel	8	$4.35	$34.80

In [ ]: