We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app the more users who see and engage with the ads, the better.
Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.
from csv import reader
### Google Plat data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]
### App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n')
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
print(android_header)
print('\n')
explore_data(android, 0, 4, True)
print(ios_header)
print('\n')
explore_data(android, 0, 4, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 10841 Number of columns: 13 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 10841 Number of columns: 13
print(android[10472])
del android[10472]
print(len(android))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 10840
The Google Play dataset has duplicate entries that may distort our data analysis process hence the need to remove them inorder to remain with unique data.
for app in android:
name = app[0]
if name == 'Instagram':
print(app)
duplicate_apps = []
unique_apps = []
for app in android:
name = app[0]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
print('Number of duplicate apps:', len(duplicate_apps))
print('/n')
print('Examples of duplicate apps:', duplicate_apps[:15])
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] Number of duplicate apps: 1181 /n Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
reviews_max = {}
for app in android:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))
Expected length: 9659 Actual length: 9659
android_clean = []
already_added = []
for app in android:
name = app[0]
n_reviews = float(app[3])
if (reviews_max[name] == n_reviews) and (name not in already_added):
android_clean.append(app)
already_added.append(name)
explore_data(android_clean, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9659 Number of columns: 13
def is_english(string):
for character in string:
if ord(character) > 127:
return False
return True
print (is_english('Instagram'))
print (is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True False False False
def is_english(string):
non_ascii = 0
for character in string:
if ord(character) > 127:
non_ascii =+ 1
if non_ascii > 3:
return False
else:
return True
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True True
android_english = []
ios_english = []
for app in android_clean:
name = app[0]
if is_english(name):
android_english.append(app)
for app in ios:
name = app[1]
if is_english(name):
ios_english.append(app)
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9659 Number of columns: 13 ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 7197 Number of columns: 16
android_final = []
ios_final = []
for app in android:
price = app[7]
if price == '0':
android_final.append(app)
for app in ios:
price = app[4]
if price == '0.0':
ios_final.append(app)
print(len(android_final))
print(len(ios_final))
10040 4056
As we mentioned earlier, our objective is to identify the types of apps that have a high response rate from the users since the number of users of the apps is directly proportional to our revenue. To mitigate the risks and overhead, our ploy for app generation has three levels.
Our aim at the end of our analysis is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets.
def freq_table(dataset, index):
table = {}
total = 0
for row in dataset:
total += 1
value = row[index]
if value in table:
table[value] += 1
else:
table[value] = 1
table_percentages = {}
for key in table:
percentage = (table[key] / total) * 100
table_percentages[key] = percentage
return table_percentages
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
print(freq_table(ios_final, 11))
print('\n')
print(freq_table(android_final, 9))
print('\n')
print(freq_table(android_final, 1))
{'Social Networking': 3.5256410256410255, 'Photo & Video': 4.117357001972387, 'Games': 55.64595660749507, 'Music': 1.6518737672583828, 'Reference': 0.4930966469428008, 'Health & Fitness': 1.8737672583826428, 'Weather': 0.7642998027613412, 'Utilities': 2.687376725838264, 'Travel': 1.3806706114398422, 'Shopping': 2.983234714003945, 'News': 1.4299802761341223, 'Navigation': 0.4930966469428008, 'Lifestyle': 2.3175542406311638, 'Entertainment': 8.234714003944774, 'Food & Drink': 1.0601577909270217, 'Sports': 1.947731755424063, 'Book': 1.6272189349112427, 'Finance': 2.0710059171597637, 'Education': 3.2544378698224854, 'Productivity': 1.5285996055226825, 'Business': 0.4930966469428008, 'Catalogs': 0.22189349112426035, 'Medical': 0.19723865877712032} {'Art & Design': 0.5478087649402391, 'Art & Design;Pretend Play': 0.0199203187250996, 'Art & Design;Creativity': 0.06972111553784861, 'Art & Design;Action & Adventure': 0.0199203187250996, 'Auto & Vehicles': 0.8167330677290837, 'Beauty': 0.5278884462151394, 'Books & Reference': 2.0219123505976095, 'Business': 4.442231075697211, 'Comics': 0.5876494023904383, 'Comics;Creativity': 0.0099601593625498, 'Communication': 3.5856573705179287, 'Dating': 2.2609561752988045, 'Education;Education': 0.4382470119521913, 'Education': 5.169322709163347, 'Education;Creativity': 0.049800796812749, 'Education;Music & Video': 0.049800796812749, 'Education;Action & Adventure': 0.0398406374501992, 'Education;Pretend Play': 0.0796812749003984, 'Education;Brain Games': 0.0398406374501992, 'Entertainment': 6.01593625498008, 'Entertainment;Music & Video': 0.26892430278884466, 'Entertainment;Brain Games': 0.0796812749003984, 'Entertainment;Creativity': 0.0298804780876494, 'Events': 0.6274900398406374, 'Finance': 3.4760956175298805, 'Food & Drink': 1.245019920318725, 'Health & Fitness': 3.237051792828685, 'House & Home': 0.8764940239043826, 'Libraries & Demo': 0.8366533864541833, 'Lifestyle': 3.6055776892430282, 'Lifestyle;Pretend Play': 0.0099601593625498, 'Adventure;Action & Adventure': 0.10956175298804782, 'Arcade': 1.9920318725099602, 'Casual': 1.8326693227091633, 'Card': 0.40836653386454186, 'Casual;Pretend Play': 0.24900398406374502, 'Action': 3.396414342629482, 'Strategy': 0.9362549800796812, 'Puzzle': 1.205179282868526, 'Sports': 3.7250996015936253, 'Music': 0.20916334661354583, 'Word': 0.2888446215139442, 'Racing': 0.9462151394422311, 'Casual;Creativity': 0.06972111553784861, 'Casual;Action & Adventure': 0.199203187250996, 'Simulation': 1.902390438247012, 'Adventure': 0.6274900398406374, 'Board': 0.348605577689243, 'Trivia': 0.3784860557768924, 'Role Playing': 0.8665338645418327, 'Action;Action & Adventure': 0.13944223107569723, 'Casual;Brain Games': 0.1294820717131474, 'Simulation;Action & Adventure': 0.10956175298804782, 'Educational;Creativity': 0.0298804780876494, 'Puzzle;Brain Games': 0.1693227091633466, 'Educational;Education': 0.3784860557768924, 'Card;Brain Games': 0.0099601593625498, 'Educational;Brain Games': 0.0597609561752988, 'Educational;Pretend Play': 0.13944223107569723, 'Entertainment;Education': 0.0099601593625498, 'Casual;Education': 0.0199203187250996, 'Music;Music & Video': 0.0199203187250996, 'Racing;Action & Adventure': 0.1892430278884462, 'Arcade;Pretend Play': 0.0099601593625498, 'Role Playing;Action & Adventure': 0.0597609561752988, 'Simulation;Pretend Play': 0.0298804780876494, 'Puzzle;Creativity': 0.0199203187250996, 'Sports;Action & Adventure': 0.0199203187250996, 'Educational;Action & Adventure': 0.0398406374501992, 'Arcade;Action & Adventure': 0.1195219123505976, 'Entertainment;Action & Adventure': 0.0298804780876494, 'Puzzle;Action & Adventure': 0.049800796812749, 'Strategy;Action & Adventure': 0.0099601593625498, 'Music & Audio;Music & Video': 0.0099601593625498, 'Health & Fitness;Education': 0.0099601593625498, 'Adventure;Education': 0.0199203187250996, 'Board;Brain Games': 0.0796812749003984, 'Board;Action & Adventure': 0.0199203187250996, 'Casual;Music & Video': 0.0199203187250996, 'Role Playing;Pretend Play': 0.049800796812749, 'Entertainment;Pretend Play': 0.0199203187250996, 'Video Players & Editors;Creativity': 0.0199203187250996, 'Medical': 3.5258964143426295, 'Social': 2.908366533864542, 'Shopping': 2.5697211155378485, 'Photography': 3.117529880478088, 'Travel & Local': 2.4402390438247012, 'Travel & Local;Action & Adventure': 0.0099601593625498, 'Tools': 7.609561752988048, 'Tools;Education': 0.0099601593625498, 'Personalization': 3.0776892430278884, 'Productivity': 3.944223107569721, 'Parenting': 0.4382470119521913, 'Parenting;Music & Video': 0.0597609561752988, 'Parenting;Education': 0.06972111553784861, 'Parenting;Brain Games': 0.0099601593625498, 'Weather': 0.7370517928286853, 'Video Players & Editors': 1.6832669322709164, 'Video Players & Editors;Music & Video': 0.0298804780876494, 'News & Magazines': 2.7988047808764938, 'Maps & Navigation': 1.3147410358565739, 'Health & Fitness;Action & Adventure': 0.0099601593625498, 'Educational': 0.32868525896414347, 'Casino': 0.3784860557768924, 'Trivia;Education': 0.0099601593625498, 'Lifestyle;Education': 0.0099601593625498, 'Card;Action & Adventure': 0.0099601593625498, 'Books & Reference;Education': 0.0099601593625498, 'Simulation;Education': 0.0099601593625498, 'Puzzle;Education': 0.0099601593625498, 'Role Playing;Brain Games': 0.0099601593625498, 'Strategy;Education': 0.0099601593625498, 'Racing;Pretend Play': 0.0099601593625498, 'Communication;Creativity': 0.0099601593625498, 'Strategy;Creativity': 0.0099601593625498} {'ART_AND_DESIGN': 0.6175298804780877, 'AUTO_AND_VEHICLES': 0.8167330677290837, 'BEAUTY': 0.5278884462151394, 'BOOKS_AND_REFERENCE': 2.0219123505976095, 'BUSINESS': 4.442231075697211, 'COMICS': 0.5976095617529881, 'COMMUNICATION': 3.5856573705179287, 'DATING': 2.2609561752988045, 'EDUCATION': 1.5139442231075697, 'ENTERTAINMENT': 1.4641434262948207, 'EVENTS': 0.6274900398406374, 'FINANCE': 3.4760956175298805, 'FOOD_AND_DRINK': 1.245019920318725, 'HEALTH_AND_FITNESS': 3.237051792828685, 'HOUSE_AND_HOME': 0.8764940239043826, 'LIBRARIES_AND_DEMO': 0.8366533864541833, 'LIFESTYLE': 3.6155378486055776, 'GAME': 10.56772908366534, 'FAMILY': 17.739043824701195, 'MEDICAL': 3.5258964143426295, 'SOCIAL': 2.908366533864542, 'SHOPPING': 2.5697211155378485, 'PHOTOGRAPHY': 3.117529880478088, 'SPORTS': 3.5856573705179287, 'TRAVEL_AND_LOCAL': 2.450199203187251, 'TOOLS': 7.6195219123505975, 'PERSONALIZATION': 3.0776892430278884, 'PRODUCTIVITY': 3.944223107569721, 'PARENTING': 0.5776892430278884, 'WEATHER': 0.7370517928286853, 'VIDEO_PLAYERS': 1.7031872509960162, 'NEWS_AND_MAGAZINES': 2.7988047808764938, 'MAPS_AND_NAVIGATION': 1.3147410358565739}
display_table(ios_final, 11)
Games : 55.64595660749507 Entertainment : 8.234714003944774 Photo & Video : 4.117357001972387 Social Networking : 3.5256410256410255 Education : 3.2544378698224854 Shopping : 2.983234714003945 Utilities : 2.687376725838264 Lifestyle : 2.3175542406311638 Finance : 2.0710059171597637 Sports : 1.947731755424063 Health & Fitness : 1.8737672583826428 Music : 1.6518737672583828 Book : 1.6272189349112427 Productivity : 1.5285996055226825 News : 1.4299802761341223 Travel : 1.3806706114398422 Food & Drink : 1.0601577909270217 Weather : 0.7642998027613412 Reference : 0.4930966469428008 Navigation : 0.4930966469428008 Business : 0.4930966469428008 Catalogs : 0.22189349112426035 Medical : 0.19723865877712032
Among the free English apps, 55.64% are games (more than a half). Entertainment apps are close to 8%, followed by photo and video apps, which are 4%. 3.25% of the apps are designed for education, followed by social networking apps which amount for 3.52% of the apps in our data set.
The general conclusion from our data set is that App Store is majorly dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare.
The fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer
display_table(android_final, 1)
FAMILY : 17.739043824701195 GAME : 10.56772908366534 TOOLS : 7.6195219123505975 BUSINESS : 4.442231075697211 PRODUCTIVITY : 3.944223107569721 LIFESTYLE : 3.6155378486055776 SPORTS : 3.5856573705179287 COMMUNICATION : 3.5856573705179287 MEDICAL : 3.5258964143426295 FINANCE : 3.4760956175298805 HEALTH_AND_FITNESS : 3.237051792828685 PHOTOGRAPHY : 3.117529880478088 PERSONALIZATION : 3.0776892430278884 SOCIAL : 2.908366533864542 NEWS_AND_MAGAZINES : 2.7988047808764938 SHOPPING : 2.5697211155378485 TRAVEL_AND_LOCAL : 2.450199203187251 DATING : 2.2609561752988045 BOOKS_AND_REFERENCE : 2.0219123505976095 VIDEO_PLAYERS : 1.7031872509960162 EDUCATION : 1.5139442231075697 ENTERTAINMENT : 1.4641434262948207 MAPS_AND_NAVIGATION : 1.3147410358565739 FOOD_AND_DRINK : 1.245019920318725 HOUSE_AND_HOME : 0.8764940239043826 LIBRARIES_AND_DEMO : 0.8366533864541833 AUTO_AND_VEHICLES : 0.8167330677290837 WEATHER : 0.7370517928286853 EVENTS : 0.6274900398406374 ART_AND_DESIGN : 0.6175298804780877 COMICS : 0.5976095617529881 PARENTING : 0.5776892430278884 BEAUTY : 0.5278884462151394
The most common genres are family with a 17.74% followed by game which is close to 12%. Business is at 4.4% followed closely by productivity which is close to 4%. The trend seems significantly different on Google Play compared to AppleStore.
There are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.)
display_table(android_final, 9)
Tools : 7.609561752988048 Entertainment : 6.01593625498008 Education : 5.169322709163347 Business : 4.442231075697211 Productivity : 3.944223107569721 Sports : 3.7250996015936253 Lifestyle : 3.6055776892430282 Communication : 3.5856573705179287 Medical : 3.5258964143426295 Finance : 3.4760956175298805 Action : 3.396414342629482 Health & Fitness : 3.237051792828685 Photography : 3.117529880478088 Personalization : 3.0776892430278884 Social : 2.908366533864542 News & Magazines : 2.7988047808764938 Shopping : 2.5697211155378485 Travel & Local : 2.4402390438247012 Dating : 2.2609561752988045 Books & Reference : 2.0219123505976095 Arcade : 1.9920318725099602 Simulation : 1.902390438247012 Casual : 1.8326693227091633 Video Players & Editors : 1.6832669322709164 Maps & Navigation : 1.3147410358565739 Food & Drink : 1.245019920318725 Puzzle : 1.205179282868526 Racing : 0.9462151394422311 Strategy : 0.9362549800796812 House & Home : 0.8764940239043826 Role Playing : 0.8665338645418327 Libraries & Demo : 0.8366533864541833 Auto & Vehicles : 0.8167330677290837 Weather : 0.7370517928286853 Events : 0.6274900398406374 Adventure : 0.6274900398406374 Comics : 0.5876494023904383 Art & Design : 0.5478087649402391 Beauty : 0.5278884462151394 Parenting : 0.4382470119521913 Education;Education : 0.4382470119521913 Card : 0.40836653386454186 Trivia : 0.3784860557768924 Educational;Education : 0.3784860557768924 Casino : 0.3784860557768924 Board : 0.348605577689243 Educational : 0.32868525896414347 Word : 0.2888446215139442 Entertainment;Music & Video : 0.26892430278884466 Casual;Pretend Play : 0.24900398406374502 Music : 0.20916334661354583 Casual;Action & Adventure : 0.199203187250996 Racing;Action & Adventure : 0.1892430278884462 Puzzle;Brain Games : 0.1693227091633466 Educational;Pretend Play : 0.13944223107569723 Action;Action & Adventure : 0.13944223107569723 Casual;Brain Games : 0.1294820717131474 Arcade;Action & Adventure : 0.1195219123505976 Simulation;Action & Adventure : 0.10956175298804782 Adventure;Action & Adventure : 0.10956175298804782 Entertainment;Brain Games : 0.0796812749003984 Education;Pretend Play : 0.0796812749003984 Board;Brain Games : 0.0796812749003984 Parenting;Education : 0.06972111553784861 Casual;Creativity : 0.06972111553784861 Art & Design;Creativity : 0.06972111553784861 Role Playing;Action & Adventure : 0.0597609561752988 Parenting;Music & Video : 0.0597609561752988 Educational;Brain Games : 0.0597609561752988 Role Playing;Pretend Play : 0.049800796812749 Puzzle;Action & Adventure : 0.049800796812749 Education;Music & Video : 0.049800796812749 Education;Creativity : 0.049800796812749 Educational;Action & Adventure : 0.0398406374501992 Education;Brain Games : 0.0398406374501992 Education;Action & Adventure : 0.0398406374501992 Video Players & Editors;Music & Video : 0.0298804780876494 Simulation;Pretend Play : 0.0298804780876494 Entertainment;Creativity : 0.0298804780876494 Entertainment;Action & Adventure : 0.0298804780876494 Educational;Creativity : 0.0298804780876494 Video Players & Editors;Creativity : 0.0199203187250996 Sports;Action & Adventure : 0.0199203187250996 Puzzle;Creativity : 0.0199203187250996 Music;Music & Video : 0.0199203187250996 Entertainment;Pretend Play : 0.0199203187250996 Casual;Music & Video : 0.0199203187250996 Casual;Education : 0.0199203187250996 Board;Action & Adventure : 0.0199203187250996 Art & Design;Pretend Play : 0.0199203187250996 Art & Design;Action & Adventure : 0.0199203187250996 Adventure;Education : 0.0199203187250996 Trivia;Education : 0.0099601593625498 Travel & Local;Action & Adventure : 0.0099601593625498 Tools;Education : 0.0099601593625498 Strategy;Education : 0.0099601593625498 Strategy;Creativity : 0.0099601593625498 Strategy;Action & Adventure : 0.0099601593625498 Simulation;Education : 0.0099601593625498 Role Playing;Brain Games : 0.0099601593625498 Racing;Pretend Play : 0.0099601593625498 Puzzle;Education : 0.0099601593625498 Parenting;Brain Games : 0.0099601593625498 Music & Audio;Music & Video : 0.0099601593625498 Lifestyle;Pretend Play : 0.0099601593625498 Lifestyle;Education : 0.0099601593625498 Health & Fitness;Education : 0.0099601593625498 Health & Fitness;Action & Adventure : 0.0099601593625498 Entertainment;Education : 0.0099601593625498 Communication;Creativity : 0.0099601593625498 Comics;Creativity : 0.0099601593625498 Card;Brain Games : 0.0099601593625498 Card;Action & Adventure : 0.0099601593625498 Books & Reference;Education : 0.0099601593625498 Arcade;Pretend Play : 0.0099601593625498
There is a thin line separating Genre and Category columns in our dataset, but one observable feature we see is that the Genres column has more categories than the Genre column.
Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps.
genres_ios = freq_table(ios_final, 11)
for genre in genres_ios:
total = 0
len_genre = 0
for app in ios_final:
genre_app = app[11]
if genre_app == genre:
n_ratings = float(app[5])
total += n_ratings
len_genre += 1
avg_n_ratings = total/len_genre
print(genre, ':', avg_n_ratings)
Social Networking : 53078.195804195806 Photo & Video : 27249.892215568863 Games : 18924.68896765618 Music : 56482.02985074627 Reference : 67447.9 Health & Fitness : 19952.315789473683 Weather : 47220.93548387097 Utilities : 14010.100917431193 Travel : 20216.01785714286 Shopping : 18746.677685950413 News : 15892.724137931034 Navigation : 25972.05 Lifestyle : 8978.308510638299 Entertainment : 10822.961077844311 Food & Drink : 20179.093023255813 Sports : 20128.974683544304 Book : 8498.333333333334 Finance : 13522.261904761905 Education : 6266.333333333333 Productivity : 19053.887096774193 Business : 6367.8 Catalogs : 1779.5555555555557 Medical : 459.75
On average, reference apps have the highest number of user reviews, perhaps this is heavily influenced by a few big players like Bible and Dictionary.com
category_android = freq_table(android_final, 1)
for category in category_android:
total = 0
len_category = 0
for app in android_final:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
total += float(n_installs)
len_category += 1
avg_n_installs = total / len_category
print(category, ':', avg_n_installs)
ART_AND_DESIGN : 2005195.1612903227 AUTO_AND_VEHICLES : 647317.8170731707 BEAUTY : 513151.88679245283 BOOKS_AND_REFERENCE : 9465252.512315271 BUSINESS : 2245520.3811659194 COMICS : 934769.1666666666 COMMUNICATION : 90683100.55833334 DATING : 1164270.7356828193 EDUCATION : 5729276.315789473 ENTERTAINMENT : 19516734.69387755 EVENTS : 253542.22222222222 FINANCE : 2511355.6790830945 FOOD_AND_DRINK : 2190710.008 HEALTH_AND_FITNESS : 4869225.852307692 HOUSE_AND_HOME : 1917187.0568181819 LIBRARIES_AND_DEMO : 749950.119047619 LIFESTYLE : 1477863.44077135 GAME : 33048939.16116871 FAMILY : 5742274.952835485 MEDICAL : 147563.28813559323 SOCIAL : 48184458.56849315 SHOPPING : 12588522.03488372 PHOTOGRAPHY : 32218111.54952077 SPORTS : 4860918.563888889 TRAVEL_AND_LOCAL : 27921561.32520325 TOOLS : 14968685.586928105 PERSONALIZATION : 7508854.330097088 PRODUCTIVITY : 35794644.73232323 PARENTING : 542603.6206896552 WEATHER : 5747142.162162162 VIDEO_PLAYERS : 36385565.614035085 NEWS_AND_MAGAZINES : 26677267.829181496 MAPS_AND_NAVIGATION : 5486066.590909091
On average, communication apps in the Google Play data set have the highest number of user installations, perhaps this is heavily influenced by a few apps that have over a huge number of installs (Facebook, Whatsapp, Gmail, Google Chrome, etc.).
On the other hand Social apps follow (though by a wide margin) with close to 50,000,000 installs.
The books and reference genre looks fairly popular as well, with an average number of installs of 9,465,252. It's interesting since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.
We concluded that taking a popular book and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.