Of the 576 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).
Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).
# Dependencies and Setup
import pandas as pd
# File to Load
# file_to_load = "Resources/purchase_data.csv"
file_to_load = "purchase_data.csv"
# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
As final considerations:
You must use the Pandas Library and the Jupyter Notebook. You must submit a link to your Jupyter Notebook with the viewable Data Frames. You must include a written description of three observable trends based on the data. See Example Solution for a reference on expected format.
purchase_data.describe()
Purchase ID | Age | Item ID | Price | |
---|---|---|---|---|
count | 780.000000 | 780.000000 | 780.000000 | 780.000000 |
mean | 389.500000 | 22.714103 | 91.755128 | 3.050987 |
std | 225.310896 | 6.659444 | 52.697702 | 1.169549 |
min | 0.000000 | 7.000000 | 0.000000 | 1.000000 |
25% | 194.750000 | 20.000000 | 47.750000 | 1.980000 |
50% | 389.500000 | 22.000000 | 92.000000 | 3.150000 |
75% | 584.250000 | 25.000000 | 138.000000 | 4.080000 |
max | 779.000000 | 45.000000 | 183.000000 | 4.990000 |
purchase_data.head()
Purchase ID | SN | Age | Gender | Item ID | Item Name | Price | |
---|---|---|---|---|---|---|---|
0 | 0 | Lisim78 | 20 | Male | 108 | Extraction, Quickblade Of Trembling Hands | 3.53 |
1 | 1 | Lisovynya38 | 40 | Male | 143 | Frenzied Scimitar | 1.56 |
2 | 2 | Ithergue48 | 24 | Male | 92 | Final Critic | 4.88 |
3 | 3 | Chamassasya86 | 24 | Male | 100 | Blindscythe | 3.27 |
4 | 4 | Iskosia90 | 23 | Male | 131 | Fury | 1.44 |
player_number = len(purchase_data['SN'].unique())
total_players = pd.DataFrame([
{"Total Players": player_number}
])
total_players
Total Players | |
---|---|
0 | 576 |
Run basic calculations to obtain number of unique items, average price, etc.
Create a summary data frame to hold the results
Optional: give the displayed data cleaner formatting
Display the summary data frame
unique_items=len(purchase_data["Item ID"].unique())
avg_price=purchase_data["Price"].mean()
purchase_number = sum(purchase_data["Item ID"].value_counts())
total_revenue = purchase_data["Price"].sum()
p_analysis = pd.DataFrame([
{"Number of Unique Items": unique_items,
"Average Price": avg_price,
"Number of Purchases": purchase_number,
"Total Revenue": total_revenue}])
p_analysis["Average Price"]=p_analysis["Average Price"].map("${:.2f}".format)
p_analysis["Total Revenue"]=p_analysis["Total Revenue"].map("${:.2f}".format)
p_analysis
Number of Unique Items | Average Price | Number of Purchases | Total Revenue | |
---|---|---|---|---|
0 | 179 | $3.05 | 780 | $2379.77 |
Percentage and Count of Male Players
Percentage and Count of Female Players
Percentage and Count of Other / Non-Disclosed
demo=pd.DataFrame(purchase_data.groupby('SN')["Gender"].max())
gender_demo=pd.DataFrame(demo.groupby('Gender')['Gender'].count())
total=len(demo['Gender'])
gender_demo["Percentage of Players"]= (gender_demo['Gender']/total)*100
gender_demo["Percentage of Players"]=gender_demo["Percentage of Players"].map("{0:.2f}%".format)
gender_demo=gender_demo.rename(columns={'Gender': 'Total Count'})
gender_final=gender_demo.sort_values('Total Count', ascending=False)
gender_final
Total Count | Percentage of Players | |
---|---|---|
Gender | ||
Male | 484 | 84.03% |
Female | 81 | 14.06% |
Other / Non-Disclosed | 11 | 1.91% |
Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender
Create a summary data frame to hold the results
Optional: give the displayed data cleaner formatting
Display the summary data frame
count= purchase_data.groupby('Gender')['Item Name'].count().values.tolist()
avgprice=purchase_data.groupby('Gender')['Price'].mean().values.tolist()
totalprice=purchase_data.groupby('Gender')['Price'].sum().values.tolist()
sumpricepp=purchase_data.groupby(['Gender','SN'])['Price'].sum().to_frame(name='Sum').reset_index()
ppcount=purchase_data.groupby(['Gender','SN'])['SN'].count().to_frame(name='Count').reset_index()
merge_pricepp=pd.merge(sumpricepp, ppcount, on="SN", how="outer")
merge_pricepp["Average"]=(merge_pricepp['Sum']/merge_pricepp['Count']).astype("float")
avgpricepp=merge_pricepp.groupby(['Gender_y'])['Sum'].mean()
avgpricepp_list = avgpricepp.values.tolist()
purchase_gender_df=pd.DataFrame([
{"Gender": "Female", "Purchase Count": count[0], "Average Purchase Price": avgprice[0],
"Total Purchase Value": totalprice[0], "Avg Total Purchase per Person": avgpricepp_list[0]},
{"Gender": "Male", "Purchase Count": count[1], "Average Purchase Price": avgprice[1],
"Total Purchase Value": totalprice[1], "Avg Total Purchase per Person": avgpricepp_list[1]},
{"Gender": "Other / Non-Disclosed", "Purchase Count": count[2], "Average Purchase Price": avgprice[2],
"Total Purchase Value": totalprice[2], "Avg Total Purchase per Person": avgpricepp_list[2]}
])
purchase_gender_df["Average Purchase Price"]=purchase_gender_df["Average Purchase Price"].map("${:.2f}".format)
purchase_gender_df["Total Purchase Value"]=purchase_gender_df["Total Purchase Value"].map("${:.2f}".format)
purchase_gender_df["Avg Total Purchase per Person"]=purchase_gender_df["Avg Total Purchase per Person"].map("${:.2f}".format)
purchase_gender_df=purchase_gender_df.reset_index(drop=True)
purchase_gender_df
Gender | Purchase Count | Average Purchase Price | Total Purchase Value | Avg Total Purchase per Person | |
---|---|---|---|---|---|
0 | Female | 113 | $3.20 | $361.94 | $4.47 |
1 | Male | 652 | $3.02 | $1967.64 | $4.07 |
2 | Other / Non-Disclosed | 15 | $3.35 | $50.19 | $4.56 |
Establish bins for ages
Categorize the existing players using the age bins. Hint: use pd.cut()
Calculate the numbers and percentages by age group
Create a summary data frame to hold the results
Optional: round the percentage column to two decimal points
Display Age Demographics Table
age_bins=[0,9,14,19,24,29,34,39,50]
age_labels=["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]
uniquesn=pd.DataFrame(purchase_data.groupby('SN')['Age'].mean())
uniquesn["Age Range"]=pd.cut(uniquesn['Age'],age_bins,labels=age_labels)
new_age_df=pd.DataFrame(uniquesn.groupby("Age Range")["Age"].count())
new_age_df=new_age_df.rename(columns={"Age":"Total Counts"})
new_age_df["Percentage of Players"]=(new_age_df["Total Counts"]/(new_age_df["Total Counts"].sum()))*100
new_age_df["Percentage of Players"] = new_age_df["Percentage of Players"].map("{0:,.2f}%".format)
new_age_df
Total Counts | Percentage of Players | |
---|---|---|
Age Range | ||
<10 | 17 | 2.95% |
10-14 | 22 | 3.82% |
15-19 | 107 | 18.58% |
20-24 | 258 | 44.79% |
25-29 | 77 | 13.37% |
30-34 | 52 | 9.03% |
35-39 | 31 | 5.38% |
40+ | 12 | 2.08% |
Bin the purchase_data data frame by age
Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below
Create a summary data frame to hold the results
Optional: give the displayed data cleaner formatting
Display the summary data frame
purchase_data["Age Range"]=pd.cut(purchase_data['Age'],age_bins,labels=age_labels)
age_purchase=purchase_data.groupby("Age Range")["Item ID"].count()
avg_price=purchase_data.groupby("Age Range")['Price'].mean()
total_price=purchase_data.groupby("Age Range")['Price'].sum()
# calculate average total price per person
ppcount=pd.DataFrame(purchase_data.groupby('SN')['Item ID'].count())
pptotal=pd.DataFrame(purchase_data.groupby('SN')['Price'].sum())
ppmerge_df=pd.merge(ppcount, pptotal, on="SN", how="outer")
aronly_df=pd.DataFrame(purchase_data.groupby('SN')['Age Range'].max())
pparmerge=pd.merge(aronly_df, ppmerge_df, on="SN", how="outer")
avgpp_df=pparmerge.groupby("Age Range")["Price"].mean()
# final merge all dataframes to store all info
finalmerge1=pd.merge(age_purchase, avg_price, on="Age Range", how="outer")
finalmerge2=pd.merge(finalmerge1, total_price, on="Age Range", how="outer")
finalmerge3=pd.merge(finalmerge2, avgpp_df, on="Age Range", how="outer")
# format
finalmerge3=finalmerge3.rename(columns={"Item ID":"Purchase Count","Price_x": "Average Purchase Price",
"Price_y":"Total Purchase Value","Price":"Avg Total Purchase per Person"})
finalmerge3["Purchase Count"] = finalmerge3["Purchase Count"]
finalmerge3["Average Purchase Price"] = finalmerge3["Average Purchase Price"].map("${:.2f}".format)
finalmerge3["Total Purchase Value"] = finalmerge3["Total Purchase Value"].map("${:.2f}".format)
finalmerge3["Avg Total Purchase per Person"] = finalmerge3["Avg Total Purchase per Person"].map("${:.2f}".format)
finalmerge3
Purchase Count | Average Purchase Price | Total Purchase Value | Avg Total Purchase per Person | |
---|---|---|---|---|
Age Range | ||||
<10 | 23 | $3.35 | $77.13 | $4.54 |
10-14 | 28 | $2.96 | $82.78 | $3.76 |
15-19 | 136 | $3.04 | $412.89 | $3.86 |
20-24 | 365 | $3.05 | $1114.06 | $4.32 |
25-29 | 101 | $2.90 | $293.00 | $3.81 |
30-34 | 73 | $2.93 | $214.00 | $4.12 |
35-39 | 41 | $3.60 | $147.67 | $4.76 |
40+ | 13 | $2.94 | $38.24 | $3.19 |
Run basic calculations to obtain the results in the table below
Create a summary data frame to hold the results
Sort the total purchase value column in descending order
Optional: give the displayed data cleaner formatting
Display a preview of the summary data frame
ppavg=pd.DataFrame(purchase_data.groupby('SN')['Price'].mean())
topspender=pd.merge(pparmerge, ppavg, on="SN", how='outer')
topspender=topspender.rename(columns={"Item ID": "Purchase Count", "Price_y":"Average Purchase Value",
"Price_x":"Total Purchase Value"})
topspender=topspender.sort_values("Total Purchase Value", ascending=False)
del topspender['Age Range']
# formatting
topspender=topspender[["Purchase Count", "Average Purchase Value", "Total Purchase Value"]]
topspender["Total Purchase Value"] = topspender["Total Purchase Value"].map("${:.2f}".format)
topspender["Average Purchase Value"] = topspender["Average Purchase Value"].map("${:.2f}".format)
topspender[:5]
Purchase Count | Average Purchase Value | Total Purchase Value | |
---|---|---|---|
SN | |||
Lisosia93 | 5 | $3.79 | $18.96 |
Idastidru52 | 4 | $3.86 | $15.45 |
Chamjask73 | 3 | $4.61 | $13.83 |
Iral74 | 4 | $3.40 | $13.62 |
Iskadarya95 | 3 | $4.37 | $13.10 |
Retrieve the Item ID, Item Name, and Item Price columns
Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value
Create a summary data frame to hold the results
Sort the purchase count column in descending order
Optional: give the displayed data cleaner formatting
Display a preview of the summary data frame
purchase_data.columns
popularity= purchase_data[["Item ID", "Item Name", "Price"]]
total_price= popularity.groupby(["Item ID","Item Name"])['Price'].sum()
item_price=popularity.groupby(["Item ID","Item Name"])['Price'].mean()
count=popularity.groupby(["Item ID","Item Name"])['Price'].count()
merge1_df=pd.merge(count, item_price, on=["Item ID","Item Name"], how="outer")
merge2_df=pd.merge(merge1_df, total_price, on=["Item ID","Item Name"], how="outer")
merge2_df=merge2_df.rename(columns={"Price_x":"Purchase Count",
"Price_y": "Item Price",
"Price": "Total Purchase Value"})
newmerge_df=merge2_df.sort_values("Purchase Count", ascending=False)
newmerge_df["Item Price"] = newmerge_df["Item Price"].map("${:.2f}".format)
newmerge_df["Total Purchase Value"] = newmerge_df["Total Purchase Value"].map("${:.2f}".format)
newmerge_df[:5]
Purchase Count | Item Price | Total Purchase Value | ||
---|---|---|---|---|
Item ID | Item Name | |||
92 | Final Critic | 13 | $4.61 | $59.99 |
178 | Oathbreaker, Last Hope of the Breaking Storm | 12 | $4.23 | $50.76 |
145 | Fiery Glass Crusader | 9 | $4.58 | $41.22 |
132 | Persuasion | 9 | $3.22 | $28.99 |
108 | Extraction, Quickblade Of Trembling Hands | 9 | $3.53 | $31.77 |
Sort the above table by total purchase value in descending order
Optional: give the displayed data cleaner formatting
Display a preview of the data frame
newmerge_df=merge2_df.sort_values("Total Purchase Value", ascending=False)
newmerge_df["Item Price"] = newmerge_df["Item Price"].map("${:.2f}".format)
newmerge_df["Total Purchase Value"] = newmerge_df["Total Purchase Value"].map("${:.2f}".format)
newmerge_df[:5]
Purchase Count | Item Price | Total Purchase Value | ||
---|---|---|---|---|
Item ID | Item Name | |||
92 | Final Critic | 13 | $4.61 | $59.99 |
178 | Oathbreaker, Last Hope of the Breaking Storm | 12 | $4.23 | $50.76 |
82 | Nirvana | 9 | $4.90 | $44.10 |
145 | Fiery Glass Crusader | 9 | $4.58 | $41.22 |
103 | Singed Scalpel | 8 | $4.35 | $34.80 |