We are working as a data analyst for a company that builds Android and iOS mobile apps. The comapny builds free apps (which are free to download and install). Main revenue of the company is from in-app ads. It depends on the number of users. i.e. more the number of users watch and engage with the ads, more the revenue. Our aim here is to help our developers understand what type of apps attract more users. We have come up with a list of apps which are profitable to both Apple Store and Google Play Store.
# Open both the the csv files
#Read the data and transform it into list of lists
import csv
open_Apple = open('AppleStore.csv')
read_Apple = csv.reader(open_Apple)
data_Apple = list(read_Apple)
#header_Apple = list(read_Apple)[0]
open_google = open('googleplaystore.csv')
read_google = csv.reader(open_google)
data_google = list(read_google)
#header_google = list(read_google)[0]
# Define a function 'explore_data()' which prints the rows & columns,
# also prints the no.of rows and columns of the dataset
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
#loop through the dataset
for row in dataset_slice:
print(row)
print('\n')
if rows_and_columns:
print('Number of rows', len(dataset))
print('Number of columns', len(dataset[0]))
explore_data(data_Apple, 1, 4) #exploring the first three rows of PlayStore data
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']
explore_data(data_google, 1, 4) #exploring the first three rows of Android Store data
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']
#Printing the no.of rows & no.of columns for Play Store
print('Number of rows', len(data_Apple[1:]))
print('Number of column', len(data_Apple[0]))
Number of rows 7197 Number of column 16
#Printing the no.of rows and no.of columns for Android Store
print('Number of rows', len(data_google[1:]))
print('Number of columns', len(data_google[0]))
Number of rows 10841 Number of columns 13
print('AppleStore column names:', data_Apple[0]) #Printing the header row for Play Store
print('Required Columns: rating_count_tot, user_rating, cont_rating') #Link to the original data
AppleStore column names: ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] Required Columns: rating_count_tot, user_rating, cont_rating
For more clarity check:
link
print('googleplaystore column names:', data_google[0]) #Printing header row for Android Store
print('Required Columns: Rating, Reviews, Installs, Content Rating') #Link to the original data
googleplaystore column names: ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] Required Columns: Rating, Reviews, Installs, Content Rating
For more clarity check:
link
Above we explored both datasets by
We are going to check if there is any missing data in the google Play Store. The way we do it is to check if the length of any row is 'not equal' to the length of the header row. We will delete such rows.
#select the header row for Android Store data
header_google = data_google[0]
#loop over the data
for row in data_google[1:]:
header_len = len(header_google)
row_len = len(row) #if the length of the row
if row_len != header_len: #is not equivalent to the
print(row) #header row print the row &
print(data_google.index(row)) # it's index
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 10473
del data_google[10473] #delete the selected row
print(data_google[10473]) #check if the perticular row is deleted
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']
We will perform the above action on Apple Store data as well
header_Apple = data_Apple[0]
for row in data_Apple[1:]:
header_len = len(header_Apple)
row_len = len(row)
if row_len != header_len:
print(row)
print(data_Apple.index(row))
We found that there is one row with missing data in Android Play Store and we deleted that row. But there is no missing data among Apple Store.
In here we are going to findout if there are any duplicate Apps in the Android Store.
#loop through the android data
for app in data_google:
name = app[0]
if name == 'Facebook': #we have taken 'Facebook' for checking duplicate entries
print(app)
['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device'] ['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
Below we will calculate the number of duplicate apps for the Android Play Store
duplicate_apps = [] #create an empty list called 'duplicate_apps'
unique_apps = [] #create an empty list called 'unique_apps'
#loop through the data and append the relavent apps in the above empty lists
for app in data_google:
name = app[0]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
#Calculate the number of duplicate apps
print('Number of duplicate apps:', len(duplicate_apps))
Number of duplicate apps: 1181
Removing the duplicate rows manually(randomly) is a cumbersome and laborious process. So we should come up with programmatic way to carry out this process.
Here are few methods we can implement:
Option1:
Choosing the highest number of reviews(column 4) as it will be the more recent review and removing all other data(duplicates).
Option2:
Selecting the highest number of installs(column 6) as it will be the most recent one and removing the others(duplicates).
Option3:
Selecting the last updated(column 11), which will be recently updated app and removing the other duplicates.
Option4:
Selecting the latest(current) version(column 12) as it will be the most recent App than the others.
Here we are going to perform the first method
We will create a dictionary called reviews_max, where the key is app name and value is max_reviews(i.e. maximum reviews recorded by an app)
We will find out the length of the dictionary(in order to cross check the answer):
10840(total apps) - 1181(duplicate apps) = 9659
We will create a list called android_clean where we can add the complete row of an app with maximum reviews.
We will create a list called already_added where we can add the names of apps which are already included in the android_clean list. (We are adding this supplementary information to take care of fact that if the maximum number of reviews is same for more than one duplicate app)
Below we perform the method mentioned above
reviews_max = {} #Create a dictionary
#loop through the rows and add the 'key' & 'values' to the dictionary
for row in data_google[1:]:
name = row[0]
n_reviews = float(row[3]) #change the row[3] data type to float & name it 'n_reviews'
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
#Calculate the length of the dictionary
print('Expected length:', len(reviews_max))
Expected length: 9659
The length of the dictionary, reviews_max exactly matches with the expected length. Below we are going to use the reviews_max dictionary to remove duplicate rows
android_clean = [] #create empty lists called 'android_clean' and
already_added = [] #'already_added'
#loop through the rows
for row in data_google[1:]:
name = row[0]
n_reviews = float(row[3])
#append the app with maximum reviews in 'android_clean' app and rest
#in 'already_added'
if n_reviews == reviews_max[name] and name not in already_added:
android_clean.append(row)
already_added.append(name)
#Here we are going to check our method by observing the cleaned data
# and by checking the length of the rows
print((android_clean[:3]))
print('Number of expected rows:', len(android_clean))
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']] Number of expected rows: 9659
As expected we have got 9659 rows.
The company we are working for is specially for English speaking audience. So we need only English Apps for our analysis. We will delete all other apps.
First we will define a function which checks if the given app has only English alphabets or not using ord() function.
Below we define a function english_app(), with string as a parameter. It checks if the given app is english or not
def english_app(string):
#loop through the string & apply the conditions
for character in string:
if ord(character) > 127:
return False
else:
return True
print(english_app('Instagram'))
True
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))
False
print(english_app('Docs To Go™ Free Office Suite'))
True
print(english_app('Instachat 😜'))
True
To check if the function is giving appropriate outcomes, we ran the function with English and non-English app names. We got the correct answers.
If we use the above function, we may loose some English apps along with non-English ones. So we are going to define one more function very similar to the ealier one. Here we will only remove the app if it has more than three characters with corresponding numbers falling outside the ASCII range. This means if the app has up to three emoji or other special characters, it will still be labelled as Englsh app.
# define a function E_A() with string as a parameter
def E_A(string):
ord_list = 0 #assign variable ord_list to zero value
#loop through string, if the character's number is greater
#than 127 increment the ord_list
for character in string:
if ord(character) > 127:
ord_list += 1
#if the value of 'ord_list' is greater than 3 the function
#returns False else True
if ord_list > 3:
return False
else:
return True
print(E_A('Docs To Go™ Free Office Suite'))
True
print(E_A('Instachat 😜'))
True
print(E_A('爱奇艺PPS -《欢乐颂2》电视剧热播'))
False
Above we checked the function on few apps and it works properly
Now we are going to apply the above function on both Android Store and Apple Play Store.
android_English_App = [] #create an empty list
#loop through the cleaned data and append the
#row in the list above
for row in android_clean:
app = row[0]
if E_A(app) is True:
android_English_App.append(row)
Apply the function explore_data() on the list android_English_App and observe the results
print(explore_data(android_English_App, 0, 4, rows_and_columns=True))
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] Number of rows 9614 Number of columns 13 None
Repeat the same process on Apple Store Data
apple_English_App = [] #create an empty list
#loop through the data and append it to the above list
for row in data_Apple[1:]:
app = row[1]
if E_A(app) is True:
apple_English_App.append(row)
Apply the explore_data() function on the list apple_English_App and observe the data
print(explore_data(apple_English_App, 0, 4, rows_and_columns=True))
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] Number of rows 6183 Number of columns 16 None
We have successfully removed the non English apps from both Android Store and Apple Play Store
The last step of the data cleaning is to isolate the free apps from the paid apps. As I have mentioned at the beginning the company is only interested in the free apps and the main revenue comes from the in-app ads.
Below I am going to seperate the free apps from the paid apps for both Android Store and Apple Play Store together.
#create an empty lists
android_free_app = []
ios_free_app = []
#loop through the data and append the appropriate list with zero price
for row in android_English_App:
price = row[7]
if price == '0':
android_free_app.append(row)
for row in apple_English_App:
price = row[4]
if price == '0.0': #exploration of the data shows that for ios
ios_free_app.append(row) #zero price is listed as '0.0'
print(explore_data(android_free_app, 0, 3, rows_and_columns=True ))
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows 8864 Number of columns 13 None
print(explore_data(ios_free_app, 0, 3, rows_and_columns=True ))
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows 3222 Number of columns 16 None
We applied explore_data() function on both the lists and observed the number of rows and columns.
Our goal is to find an app profile that attracts users on both App Store and google play. Once we identify such a profile, we would like to validate our recommendation by first building a new app fitting this profile on one of the platforms (say android), observing its usage and, if successful port the app to the other platform.
Our validation strategy for an app idea has 3 steps:
Build a minimal Android version of the app and add it to Google Play.
If the app has a good response from users, develope it further.
If the app is profitable after six months in Google Play, build an iOS version of the app and add it to the App Store.
For generating frequency tables to find out most common genres we can use:
For Google Play: Column 2 (category)
and Column 10 (Genres)
For Play Store: Column 12 (prime_genre)
First we are going to define a function which creates a frequency table. This frequency tables shows the percentage of each genre.
#create a function called freq_table() which takes
#dataset & index as parameters
def freq_table(dataset, index):
dict_freq = {}
total = 0
#loop through the row & add it to the total &
#generate the frequency table with genre as key
# and no.of genre as value
for row in dataset:
total += 1
genre = row[index]
if genre in dict_freq:
dict_freq[genre] += 1
else:
dict_freq[genre] = 1
#generate a frequency table with value(no.of genre) in the above
#frequency table as key and percentage of total (total no.of genre)
#as value
dict_freq_percentage = {}
for value in dict_freq:
dict_freq[value] /= total
percentage = dict_freq[value] * 100
dict_freq_percentage[value] = percentage
return dict_freq_percentage
We will create one more function called display_table(), which will display the genre percentages in a descending order
#create a function display_table with dataset and index as parameters
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = [] #create an empty list
#loop through the key in the above table and append the
#above list with a tuple of table[key] & key
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
#sort the list using sorted function in descending order
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
Below we apply the display_table() function on 'android Category', 'android Genre' and 'ios prime_genre' and observe the output
print('android Category:')
print(display_table(android_free_app, 1))
print('\n')
print('android Genre:')
print(display_table(android_free_app, 9))
print('\n')
print('ios prime_genre:')
print(display_table(ios_free_app, 11))
android Category: FAMILY : 18.907942238267147 GAME : 9.724729241877256 TOOLS : 8.461191335740072 BUSINESS : 4.591606498194946 LIFESTYLE : 3.9034296028880866 PRODUCTIVITY : 3.892148014440433 FINANCE : 3.7003610108303246 MEDICAL : 3.531137184115524 SPORTS : 3.395758122743682 PERSONALIZATION : 3.3167870036101084 COMMUNICATION : 3.2378158844765346 HEALTH_AND_FITNESS : 3.0798736462093865 PHOTOGRAPHY : 2.944494584837545 NEWS_AND_MAGAZINES : 2.7978339350180503 SOCIAL : 2.6624548736462095 TRAVEL_AND_LOCAL : 2.33528880866426 SHOPPING : 2.2450361010830324 BOOKS_AND_REFERENCE : 2.1435018050541514 DATING : 1.861462093862816 VIDEO_PLAYERS : 1.7937725631768955 MAPS_AND_NAVIGATION : 1.3989169675090252 FOOD_AND_DRINK : 1.2409747292418771 EDUCATION : 1.1620036101083033 ENTERTAINMENT : 0.9589350180505415 LIBRARIES_AND_DEMO : 0.9363718411552346 AUTO_AND_VEHICLES : 0.9250902527075812 HOUSE_AND_HOME : 0.8235559566787004 WEATHER : 0.8009927797833934 EVENTS : 0.7107400722021661 PARENTING : 0.6543321299638989 ART_AND_DESIGN : 0.6430505415162455 COMICS : 0.6204873646209386 BEAUTY : 0.5979241877256317 None android Genre: Tools : 8.449909747292418 Entertainment : 6.069494584837545 Education : 5.347472924187725 Business : 4.591606498194946 Productivity : 3.892148014440433 Lifestyle : 3.892148014440433 Finance : 3.7003610108303246 Medical : 3.531137184115524 Sports : 3.463447653429603 Personalization : 3.3167870036101084 Communication : 3.2378158844765346 Action : 3.1024368231046933 Health & Fitness : 3.0798736462093865 Photography : 2.944494584837545 News & Magazines : 2.7978339350180503 Social : 2.6624548736462095 Travel & Local : 2.3240072202166067 Shopping : 2.2450361010830324 Books & Reference : 2.1435018050541514 Simulation : 2.0419675090252705 Dating : 1.861462093862816 Arcade : 1.8501805054151623 Video Players & Editors : 1.7712093862815883 Casual : 1.7599277978339352 Maps & Navigation : 1.3989169675090252 Food & Drink : 1.2409747292418771 Puzzle : 1.128158844765343 Racing : 0.9927797833935018 Role Playing : 0.9363718411552346 Libraries & Demo : 0.9363718411552346 Auto & Vehicles : 0.9250902527075812 Strategy : 0.9138086642599278 House & Home : 0.8235559566787004 Weather : 0.8009927797833934 Events : 0.7107400722021661 Adventure : 0.6768953068592057 Comics : 0.6092057761732852 Beauty : 0.5979241877256317 Art & Design : 0.5979241877256317 Parenting : 0.4963898916967509 Card : 0.45126353790613716 Casino : 0.42870036101083037 Trivia : 0.41741877256317694 Educational;Education : 0.39485559566787 Board : 0.3835740072202166 Educational : 0.3722924187725632 Education;Education : 0.33844765342960287 Word : 0.2594765342960289 Casual;Pretend Play : 0.236913357400722 Music : 0.2030685920577617 Racing;Action & Adventure : 0.16922382671480143 Puzzle;Brain Games : 0.16922382671480143 Entertainment;Music & Video : 0.16922382671480143 Casual;Brain Games : 0.13537906137184114 Casual;Action & Adventure : 0.13537906137184114 Arcade;Action & Adventure : 0.12409747292418773 Action;Action & Adventure : 0.10153429602888085 Educational;Pretend Play : 0.09025270758122744 Simulation;Action & Adventure : 0.078971119133574 Parenting;Education : 0.078971119133574 Entertainment;Brain Games : 0.078971119133574 Board;Brain Games : 0.078971119133574 Parenting;Music & Video : 0.06768953068592057 Educational;Brain Games : 0.06768953068592057 Casual;Creativity : 0.06768953068592057 Art & Design;Creativity : 0.06768953068592057 Education;Pretend Play : 0.056407942238267145 Role Playing;Pretend Play : 0.04512635379061372 Education;Creativity : 0.04512635379061372 Role Playing;Action & Adventure : 0.033844765342960284 Puzzle;Action & Adventure : 0.033844765342960284 Entertainment;Creativity : 0.033844765342960284 Entertainment;Action & Adventure : 0.033844765342960284 Educational;Creativity : 0.033844765342960284 Educational;Action & Adventure : 0.033844765342960284 Education;Music & Video : 0.033844765342960284 Education;Brain Games : 0.033844765342960284 Education;Action & Adventure : 0.033844765342960284 Adventure;Action & Adventure : 0.033844765342960284 Video Players & Editors;Music & Video : 0.02256317689530686 Sports;Action & Adventure : 0.02256317689530686 Simulation;Pretend Play : 0.02256317689530686 Puzzle;Creativity : 0.02256317689530686 Music;Music & Video : 0.02256317689530686 Entertainment;Pretend Play : 0.02256317689530686 Casual;Education : 0.02256317689530686 Board;Action & Adventure : 0.02256317689530686 Video Players & Editors;Creativity : 0.01128158844765343 Trivia;Education : 0.01128158844765343 Travel & Local;Action & Adventure : 0.01128158844765343 Tools;Education : 0.01128158844765343 Strategy;Education : 0.01128158844765343 Strategy;Creativity : 0.01128158844765343 Strategy;Action & Adventure : 0.01128158844765343 Simulation;Education : 0.01128158844765343 Role Playing;Brain Games : 0.01128158844765343 Racing;Pretend Play : 0.01128158844765343 Puzzle;Education : 0.01128158844765343 Parenting;Brain Games : 0.01128158844765343 Music & Audio;Music & Video : 0.01128158844765343 Lifestyle;Pretend Play : 0.01128158844765343 Lifestyle;Education : 0.01128158844765343 Health & Fitness;Education : 0.01128158844765343 Health & Fitness;Action & Adventure : 0.01128158844765343 Entertainment;Education : 0.01128158844765343 Communication;Creativity : 0.01128158844765343 Comics;Creativity : 0.01128158844765343 Casual;Music & Video : 0.01128158844765343 Card;Action & Adventure : 0.01128158844765343 Books & Reference;Education : 0.01128158844765343 Art & Design;Pretend Play : 0.01128158844765343 Art & Design;Action & Adventure : 0.01128158844765343 Arcade;Pretend Play : 0.01128158844765343 Adventure;Education : 0.01128158844765343 None ios prime_genre: Games : 58.16263190564867 Entertainment : 7.883302296710118 Photo & Video : 4.9658597144630665 Education : 3.662321539416512 Social Networking : 3.2898820608317814 Shopping : 2.60707635009311 Utilities : 2.5139664804469275 Sports : 2.1415270018621975 Music : 2.0484171322160147 Health & Fitness : 2.0173805090006205 Productivity : 1.7380509000620732 Lifestyle : 1.5828677839851024 News : 1.3345747982619491 Travel : 1.2414649286157666 Finance : 1.1173184357541899 Weather : 0.8690254500310366 Food & Drink : 0.8069522036002483 Reference : 0.5586592178770949 Business : 0.5276225946617008 Book : 0.4345127250155183 Navigation : 0.186219739292365 Medical : 0.186219739292365 Catalogs : 0.12414649286157665 None
Our analysis of the frequency table generated for prime_genre of the App Store data set:
Frequency table shows that the most common genre is games (more than 50%) and next common genre is Entertainment apps.
Majority of the apps (i.e. more than 70%) fall in the Entertainment purpose compared to Practical purpose.
It is premature to recommend an app profile based on the above frequency table, as this table is built using app genre and not with any kind of user information.
Our analysis of the frequency table generated for Category and Genres column of the Google Play data set:
For Google Play the most common genres are Entertainment and Tools. Among category, Family with ~ 18% is on top of the table. Among genre, tools with ~ 8% is on top of the table.
Here the ratio of practicle purpose apps and Entertainment purpose apps are almost equal unlike in app store.
As in the case of app store, it is not possible to recommend any app profile for google play store as well. As these are based on app genre/category and not on any user information.
Below we are going to list the genre and respective average rating counts for 'Apple Store'
unique_genre = freq_table(ios_free_app, 11) #generate a
#'unique genre' using frequecy table 'prime genre' column
#loop through the unique_genre
for genre in unique_genre:
total = 0 #sum of user rating
len_genre = 0 #no. of apps specific to each genre
#loop through the ios_free_app and iterate through
#genre and rating counts
for row in ios_free_app:
genre_app = row[11]
if genre_app == genre:
rating_counts = float(row[5])
total += rating_counts
len_genre += 1
#calculate the average rating counts
avg_rating_counts = total / len_genre
print(genre, ':', avg_rating_counts)
Social Networking : 71548.34905660378 Photo & Video : 28441.54375 Games : 22788.6696905016 Music : 57326.530303030304 Reference : 74942.11111111111 Health & Fitness : 23298.015384615384 Weather : 52279.892857142855 Utilities : 18684.456790123455 Travel : 28243.8 Shopping : 26919.690476190477 News : 21248.023255813954 Navigation : 86090.33333333333 Lifestyle : 16485.764705882353 Entertainment : 14029.830708661417 Food & Drink : 33333.92307692308 Sports : 23008.898550724636 Book : 39758.5 Finance : 31467.944444444445 Education : 7003.983050847458 Productivity : 21028.410714285714 Business : 7491.117647058823 Catalogs : 4004.0 Medical : 612.0
Here we recommend an app profile for IOS app store based on the user ratings.
Above frequency table shows that there are five apps which have >50000 user rating counts. We are listing them in descending order
Few other apps which have the rating counts of >30000 are listed below in descending order
Some which are <30000 are listed below
Below we give the list of Navigation Apps with respective rating counts for 'Apple Store', as this app has the highest rating count of ~86000.
for row in ios_free_app:
if row[11] == 'Navigation':
print(row[1], ':', row[5])
Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5
Below we are going to list the category and respective average number of installs for 'Google Play Store'
#generate a unique app
unique_app_genre_android = freq_table(android_free_app, 1)
#loop through the unique app
for category in unique_app_genre_android:
total = 0
len_category = 0
#loop through the android free app & select the category
for row in android_free_app:
category_app = row[1]
#apply the conditional statement to the category
#and select installs
if category_app == category:
n_installs = row[5]
n_installs = n_installs.replace('+', '')
n_installs = n_installs.replace(',', '')
n_installs = float(n_installs)
total += n_installs #add the installs to total
len_category += 1 #count the no.of category
#calculate the average installs and print it
avg_no_n_installs = total / len_category
print(category, ':', avg_no_n_installs)
ART_AND_DESIGN : 1986335.0877192982 AUTO_AND_VEHICLES : 647317.8170731707 BEAUTY : 513151.88679245283 BOOKS_AND_REFERENCE : 8767811.894736841 BUSINESS : 1712290.1474201474 COMICS : 817657.2727272727 COMMUNICATION : 38456119.167247385 DATING : 854028.8303030303 EDUCATION : 1833495.145631068 ENTERTAINMENT : 11640705.88235294 EVENTS : 253542.22222222222 FINANCE : 1387692.475609756 FOOD_AND_DRINK : 1924897.7363636363 HEALTH_AND_FITNESS : 4188821.9853479853 HOUSE_AND_HOME : 1331540.5616438356 LIBRARIES_AND_DEMO : 638503.734939759 LIFESTYLE : 1437816.2687861272 GAME : 15588015.603248259 FAMILY : 3695641.8198090694 MEDICAL : 120550.61980830671 SOCIAL : 23253652.127118643 SHOPPING : 7036877.311557789 PHOTOGRAPHY : 17840110.40229885 SPORTS : 3638640.1428571427 TRAVEL_AND_LOCAL : 13984077.710144928 TOOLS : 10801391.298666667 PERSONALIZATION : 5201482.6122448975 PRODUCTIVITY : 16787331.344927534 PARENTING : 542603.6206896552 WEATHER : 5074486.197183099 VIDEO_PLAYERS : 24727872.452830188 NEWS_AND_MAGAZINES : 9549178.467741935 MAPS_AND_NAVIGATION : 4056941.7741935486
Here we recommend the app profile for Google Play Store based on the number of user installs.
Frequency table above shows that there are nine apps which are > 10000000 installs. We are listing them in descending order.
Below we give the list of few Communication Apps (which has a ~ 38456119 average no. of installs) for 'Google Play Store'
for row in android_free_app[:290]:
if row[1]== 'COMMUNICATION':
print(row[0], ':', row[5])
WhatsApp Messenger : 1,000,000,000+ Messenger for SMS : 10,000,000+ My Tele2 : 5,000,000+ imo beta free calls and text : 100,000,000+ Contacts : 50,000,000+ Call Free – Free Call : 5,000,000+ Web Browser & Explorer : 5,000,000+ Browser 4G : 10,000,000+ MegaFon Dashboard : 10,000,000+ ZenUI Dialer & Contacts : 10,000,000+ Cricket Visual Voicemail : 10,000,000+ TracFone My Account : 1,000,000+ Xperia Link™ : 10,000,000+
Based on our analysis and observation, we have come up with a list of apps common to both Apple Store and Google Play Store.
# #Below we are going to list the *genre* and respective
# #*average number of installs* for 'Google Play Store'
# #perform same steps as above
# unique_app_genre_android = freq_table(android_free_app, 9)
# for genre in unique_app_genre_android:
# total = 0
# len_genre = 0
# for row in android_free_app:
# genre_app = row[9]
# if genre_app == genre:
# n_installs = row[5]
# n_installs = n_installs.replace('+', '')
# n_installs = n_installs.replace(',', '')
# n_installs = float(n_installs)
# total += n_installs
# len_genre += 1
# avg_no_n_installs = total / len_genre
# print(genre, ':', avg_no_n_installs)
In this project we analyzed app data of Apple Store and Google Play Store. We needed to recommend a free app which will be profitable to both. We have come up with a list of apps which can be profitable to both the stores.