In this analysis we will be able to pinpoint the type of apps our end users interact with the most. This analysis will also show us which apps we profit the most from so that we may focus our efforts in building more apps like them.
from csv import reader
#Apple Store Dataset
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]
#Google Play Store Dataset
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]
#function created to make it easier to explore data
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 7197 Number of columns: 16
We see that we have 7197 application in the App Store, the data we want to focus on is: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'. Some column names might be confusing so you can review the definition of each one here.
print(android_header)
print('\n')
explore_data(android, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Number of rows: 10841 Number of columns: 13
For the Google Play Store we have 10841 applications, and we want to focus on the data provided in: 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'. If you wish to find more information on the columns or dataset you can find it here.
There was an error in data reported for the Google Play Store Dataset, we will go ahead and clean that data. First, we will identify that row.
for row in android:
headerlength = len(android_header)
rowlength = len(row)
if rowlength != headerlength:
print(row)
print(android.index(row))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 10472
Now that we have identified the row for the Google Play Store Dataset, we will proceed to delete it.
del android[10472]
Looking at the data long enough, we have also found that the Google Play Dataset has duplicate entries, for instance the app Instagram has 4 entries:
for app in android:
name = app[0]
if name == 'Instagram':
print(app)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
We have 1,181 cases of duplicates.
duplicate_apps = []
unique_apps = []
for app in android:
name = app[0]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])
Number of duplicate apps: 1181 Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
We will not be removing the duplicates randomly, we will pick the duplicates with the highest ratings, and remove the other entries for any given app.
For this we will:
reviews_max = {}
for app in android:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
print(len(reviews_max))
9659
Now that we have created a dictionary with a unique set of keys we can confirm that we have the right amount of unique apps (10840 - 9659 = 1181 <-- Represents the number of duplicates) and now we will proceed to remove duplicate rows from the dictionary we created.
In here we will:
android_clean = []
already_added = []
for app in android:
name = app[0]
n_reviews = float(app[3])
if n_reviews == reviews_max[name] and name not in already_added:
android_clean.append(app)
already_added.append(name)
Lets make sure that we have the exact number of rows by exploring the new dataset.
explore_data(android_clean, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9659 Number of columns: 13
As it was expected, we have 9659 rows
Now we need to separate all of the English apps from the Non-English since we create English based apps, for that we will create a function that will iterate over the characters and by its ASCII value for each character we will be determine if its English or Non-English
def is_english(string):
for character in string:
if ord(character) > 127:
return False
return True
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True False False False
As you can see for:
It is returning false, because of "😜" and "™", so we need to re-define the function so that it will accept more than one ASCII character withing the function and return it as English.
def is_english(string):
non_ascii = 0
for character in string:
if ord(character) > 127:
non_ascii += 1
if non_ascii >= 3:
return False
else:
return True
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
True True False
Now that we haver re-defined the function we will not be accepting any string that contains three or more characters beyond the ASCII code 127. As you can see:
Have been accepted as apps in the English language. We will proceed to utilize this new function to filter out non-English apps from both data sets.
ios_english = []
android_english = []
for app in ios:
name = app[1]
if is_english(name):
ios_english.append(app)
for app in android_clean:
name = app[0]
if is_english(name):
android_english.append(app)
explore_data(ios_english, 0, 3, True)
print('\n')
explore_data(android_english, 0, 3, True)
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 6155 Number of columns: 16 ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9597 Number of columns: 13
Now that we have isolated the English from the non-English apps, we can see that for the Apple App Store Dataset we have 6,155 rows and 16 columns, while for the Google Play Store Dataset we have 9597 rows and 13 columns. Below we will now separate the free apps from the paid apps.
ios_final = []
android_final = []
for app in ios_english:
price = app[4]
if price == '0.0':
ios_final.append(app)
for app in android_english:
price = app[7]
if price == '0':
android_final.append(app)
print('Number of Apple Store Apps:', + len(ios_final))
print('Number of Google Play Store Apps:', len(android_final))
Number of Apple Store Apps: 3203 Number of Google Play Store Apps: 8848
Now that we have:
We want to determine the kind of apps that are more likely to attract more users since our revenue is highly influenced by the number of people using the apps we create.
Since we want to minimize our risks and overhead, the validation strategy for an app idea will be comprised of three steps:
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For example, a profile could work well for both markets might be a productivity app that makes use of gamification.
We will now begin out analysis by getting a sense of what are the most common genres for each market. For this, we will build frequency tables for a few of the columns in our datasets.
We will now build two functions:
def freq_table(dataset, index):
table = {}
total = 0
for row in dataset:
total += 1
value = row[index]
if value in table:
table[value] += 1
else:
table[value] = 1
table_percentages = {}
for key in table:
percentage = (table[key] / total) * 100
table_percentages[key] = percentage
return table_percentages
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
We will now review the frequency table for the prime_genre column of the App Store data set.
display_table(ios_final, 11)
Games : 58.25788323446769 Entertainment : 7.836403371838902 Photo & Video : 4.995316890415236 Education : 3.6840462066812365 Social Networking : 3.3093974399000934 Shopping : 2.5913206369029034 Utilities : 2.466437714642523 Sports : 2.1542304089915705 Music : 2.0605682172962845 Health & Fitness : 2.0293474867311896 Productivity : 1.7483609116453322 Lifestyle : 1.5610365282547611 News : 1.3424914142990947 Travel : 1.248829222603809 Finance : 1.0927255697783327 Weather : 0.8741804558226661 Food & Drink : 0.8117389946924758 Reference : 0.5307524196066188 Business : 0.5307524196066188 Book : 0.3746487667811427 Navigation : 0.18732438339057134 Medical : 0.18732438339057134 Catalogs : 0.1248829222603809
We can see that in the App Store the majority of the Free English Apps fall in the category of Games which are 58.26% of the apps, followed by Entertaiment apps with close to 8%, with Photo & Video with almost 5%, and only 3.68% of the apps are for education.
We can see that the App Store market is dominated for apps dedicated for leisure and entertainment or fun. This doesn't mean that it amounts to them having a majority of users.
Now, lets review the frequency tables for the Google Play dataset. We will review the Category and Genres Columns.
display_table(android_final, 1) #Category
FAMILY : 18.942133815551536 GAME : 9.697106690777577 TOOLS : 8.453887884267631 BUSINESS : 4.599909584086799 PRODUCTIVITY : 3.899186256781193 LIFESTYLE : 3.887884267631103 FINANCE : 3.7070524412296564 MEDICAL : 3.5375226039783 SPORTS : 3.390596745027125 PERSONALIZATION : 3.322784810126582 COMMUNICATION : 3.2323688969258586 HEALTH_AND_FITNESS : 3.0854430379746836 PHOTOGRAPHY : 2.949819168173599 NEWS_AND_MAGAZINES : 2.802893309222423 SOCIAL : 2.667269439421338 TRAVEL_AND_LOCAL : 2.3395117540687163 SHOPPING : 2.2490958408679926 BOOKS_AND_REFERENCE : 2.1360759493670884 DATING : 1.8648282097649187 VIDEO_PLAYERS : 1.7970162748643763 MAPS_AND_NAVIGATION : 1.3901446654611211 FOOD_AND_DRINK : 1.2432188065099457 EDUCATION : 1.164104882459313 ENTERTAINMENT : 0.9606690777576853 LIBRARIES_AND_DEMO : 0.9380650994575045 AUTO_AND_VEHICLES : 0.9267631103074141 HOUSE_AND_HOME : 0.8024412296564195 WEATHER : 0.7911392405063291 EVENTS : 0.7120253164556962 PARENTING : 0.6555153707052441 ART_AND_DESIGN : 0.6442133815551537 COMICS : 0.6103074141048824 BEAUTY : 0.599005424954792
The Market is completely different, for Google Play, we see that in the Category section, the majority of the apps are dominated by the Family section, followed by games, tools, and business. If we investigate this further, we can see that the family category means mostly games for kids.
And if we review the frequency table below, we will see that practical apps seem to have a good niche, you can see it in the Genres column below:
display_table(android_final, 9) #Genres
Tools : 8.44258589511754 Entertainment : 6.080470162748644 Education : 5.357142857142857 Business : 4.599909584086799 Productivity : 3.899186256781193 Lifestyle : 3.8765822784810124 Finance : 3.7070524412296564 Medical : 3.5375226039783 Sports : 3.4584086799276674 Personalization : 3.322784810126582 Communication : 3.2323688969258586 Action : 3.096745027124774 Health & Fitness : 3.0854430379746836 Photography : 2.949819168173599 News & Magazines : 2.802893309222423 Social : 2.667269439421338 Travel & Local : 2.328209764918626 Shopping : 2.2490958408679926 Books & Reference : 2.1360759493670884 Simulation : 2.0456600361663653 Dating : 1.8648282097649187 Arcade : 1.842224231464738 Video Players & Editors : 1.7744122965641953 Casual : 1.763110307414105 Maps & Navigation : 1.3901446654611211 Food & Drink : 1.2432188065099457 Puzzle : 1.1301989150090417 Racing : 0.9945750452079566 Role Playing : 0.9380650994575045 Libraries & Demo : 0.9380650994575045 Auto & Vehicles : 0.9267631103074141 Strategy : 0.9154611211573236 House & Home : 0.8024412296564195 Weather : 0.7911392405063291 Events : 0.7120253164556962 Adventure : 0.6668173598553345 Comics : 0.599005424954792 Beauty : 0.599005424954792 Art & Design : 0.599005424954792 Parenting : 0.4972875226039783 Card : 0.45207956600361665 Trivia : 0.4181735985533454 Casino : 0.4181735985533454 Educational;Education : 0.39556962025316456 Board : 0.3842676311030741 Educational : 0.3729656419529837 Education;Education : 0.33905967450271246 Word : 0.25994575045207957 Casual;Pretend Play : 0.23734177215189875 Music : 0.2034358047016275 Racing;Action & Adventure : 0.16952983725135623 Puzzle;Brain Games : 0.16952983725135623 Entertainment;Music & Video : 0.16952983725135623 Casual;Brain Games : 0.13562386980108498 Casual;Action & Adventure : 0.13562386980108498 Arcade;Action & Adventure : 0.12432188065099457 Action;Action & Adventure : 0.10171790235081375 Educational;Pretend Play : 0.09041591320072333 Simulation;Action & Adventure : 0.07911392405063292 Parenting;Education : 0.07911392405063292 Entertainment;Brain Games : 0.07911392405063292 Board;Brain Games : 0.07911392405063292 Parenting;Music & Video : 0.06781193490054249 Educational;Brain Games : 0.06781193490054249 Casual;Creativity : 0.06781193490054249 Art & Design;Creativity : 0.06781193490054249 Education;Pretend Play : 0.05650994575045208 Role Playing;Pretend Play : 0.045207956600361664 Education;Creativity : 0.045207956600361664 Role Playing;Action & Adventure : 0.033905967450271246 Puzzle;Action & Adventure : 0.033905967450271246 Entertainment;Creativity : 0.033905967450271246 Entertainment;Action & Adventure : 0.033905967450271246 Educational;Creativity : 0.033905967450271246 Educational;Action & Adventure : 0.033905967450271246 Education;Music & Video : 0.033905967450271246 Education;Brain Games : 0.033905967450271246 Education;Action & Adventure : 0.033905967450271246 Adventure;Action & Adventure : 0.033905967450271246 Video Players & Editors;Music & Video : 0.022603978300180832 Sports;Action & Adventure : 0.022603978300180832 Simulation;Pretend Play : 0.022603978300180832 Puzzle;Creativity : 0.022603978300180832 Music;Music & Video : 0.022603978300180832 Entertainment;Pretend Play : 0.022603978300180832 Casual;Education : 0.022603978300180832 Board;Action & Adventure : 0.022603978300180832 Video Players & Editors;Creativity : 0.011301989150090416 Trivia;Education : 0.011301989150090416 Travel & Local;Action & Adventure : 0.011301989150090416 Tools;Education : 0.011301989150090416 Strategy;Education : 0.011301989150090416 Strategy;Creativity : 0.011301989150090416 Strategy;Action & Adventure : 0.011301989150090416 Simulation;Education : 0.011301989150090416 Role Playing;Brain Games : 0.011301989150090416 Racing;Pretend Play : 0.011301989150090416 Puzzle;Education : 0.011301989150090416 Parenting;Brain Games : 0.011301989150090416 Music & Audio;Music & Video : 0.011301989150090416 Lifestyle;Pretend Play : 0.011301989150090416 Lifestyle;Education : 0.011301989150090416 Health & Fitness;Education : 0.011301989150090416 Health & Fitness;Action & Adventure : 0.011301989150090416 Entertainment;Education : 0.011301989150090416 Communication;Creativity : 0.011301989150090416 Comics;Creativity : 0.011301989150090416 Casual;Music & Video : 0.011301989150090416 Card;Action & Adventure : 0.011301989150090416 Books & Reference;Education : 0.011301989150090416 Art & Design;Pretend Play : 0.011301989150090416 Art & Design;Action & Adventure : 0.011301989150090416 Arcade;Pretend Play : 0.011301989150090416 Adventure;Education : 0.011301989150090416
There is no real difference within the Catery and Genres Column from the Google Play Store, although Genres has more categories, we will only use the the Category column from now on since we are trying to look at the bigger picture.
We found that the App Store is dominated by apps designed for fun, while Google Play shows is more balance with both practical and for-fun apps. Now, we would like to get an idea on which apps has the highest amount of users.
A way to find which genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play dataset, we can find this information in the Installs column, but this information is missing for the App Store dataset. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.
genre_ios = freq_table(ios_final, 11)
for genre in genre_ios:
total = 0
len_genre = 0
for app in ios_final:
genre_app = app[11]
if genre_app == genre:
n_ratings = float(app[5])
total += n_ratings
len_genre += 1
avg_n_ratings = total / len_genre
print(genre, ':', avg_n_ratings )
Navigation : 86090.33333333333 Food & Drink : 33333.92307692308 Finance : 32367.02857142857 Utilities : 19156.493670886077 Photo & Video : 28441.54375 Music : 57326.530303030304 Productivity : 21028.410714285714 Shopping : 27230.734939759037 Book : 46384.916666666664 Business : 7491.117647058823 Social Networking : 71548.34905660378 Health & Fitness : 23298.015384615384 News : 21248.023255813954 Reference : 79350.4705882353 Catalogs : 4004.0 Travel : 28243.8 Games : 22886.36709539121 Lifestyle : 16815.48 Education : 7003.983050847458 Weather : 52279.892857142855 Sports : 23008.898550724636 Entertainment : 14195.358565737051 Medical : 612.0
As you can see, Navigation apps has the highest number of user reviews, this is probably due to Google Maps, or Waze. Social Networking is also a big one but this is probably due to Facebook and Instagram. Taking aside all the popular genres, we see that the Book genre has many user reviews, and this is a niche that we can focus on.
Now, for Google Play, we do have date about the number of installs, so we should be able to view it much better, however, the install number don't seem precise we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):
display_table(android_final, 5) # Installs Columns
1,000,000+ : 15.75497287522604 100,000+ : 11.539330922242314 10,000,000+ : 10.567359855334539 10,000+ : 10.194394213381555 1,000+ : 8.39737793851718 100+ : 6.928119349005425 5,000,000+ : 6.826401446654612 500,000+ : 5.560578661844485 50,000+ : 4.769439421338156 5,000+ : 4.486889692585895 10+ : 3.5375226039783 500+ : 3.2436708860759493 50,000,000+ : 2.2830018083182644 100,000,000+ : 2.1360759493670884 50+ : 1.9213381555153706 5+ : 0.7911392405063291 1+ : 0.5085895117540687 500,000,000+ : 0.27124773960216997 1,000,000,000+ : 0.22603978300180833 0+ : 0.045207956600361664 0 : 0.011301989150090416
One problem with this data is that we do not know if an app with 100,000+ installs has 100,000 installs, 200,000 or 350,000. Although we do not require precise data for our purposes, we wish to only find out which app genres attract the most users.
We will leave the numbers as they are, which means that we will consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.
To perform computations, however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.
category_android = freq_table(android_final, 1)
for category in category_android:
total = 0
len_category = 0
for app in android_final:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = n_installs.replace('+', '')
n_installs = n_installs.replace(',', '')
total += float(n_installs)
len_category += 1
avg_n_installs = total / len_category
print(category, ':', avg_n_installs)
TOOLS : 10830251.970588235 ENTERTAINMENT : 11640705.88235294 AUTO_AND_VEHICLES : 647317.8170731707 HEALTH_AND_FITNESS : 4188821.9853479853 SOCIAL : 23253652.127118643 VIDEO_PLAYERS : 24727872.452830188 PARENTING : 542603.6206896552 BOOKS_AND_REFERENCE : 8814199.78835979 LIBRARIES_AND_DEMO : 638503.734939759 COMMUNICATION : 38590581.08741259 DATING : 854028.8303030303 HOUSE_AND_HOME : 1360598.042253521 NEWS_AND_MAGAZINES : 9549178.467741935 SHOPPING : 7036877.311557789 TRAVEL_AND_LOCAL : 13984077.710144928 COMICS : 832613.8888888889 EDUCATION : 1833495.145631068 PRODUCTIVITY : 16787331.344927534 MAPS_AND_NAVIGATION : 4049274.6341463416 FOOD_AND_DRINK : 1924897.7363636363 FINANCE : 1387692.475609756 BUSINESS : 1712290.1474201474 SPORTS : 3650602.276666667 GAME : 15544014.51048951 LIFESTYLE : 1446158.2238372094 WEATHER : 5145550.285714285 FAMILY : 3695641.8198090694 PERSONALIZATION : 5201482.6122448975 EVENTS : 253542.22222222222 PHOTOGRAPHY : 17840110.40229885 ART_AND_DESIGN : 1986335.0877192982 MEDICAL : 120550.61980830671 BEAUTY : 513151.88679245283
We can see that for the Google Play Store, Family, Communication, Video Players, and Social have a huge ammount of installs, this is probably because of apps like Whatsapp, Youtube, Facebook, and Instagram which have a huge amount of installs in itself.
Taking those aside, we that Tools and Books and Reference have quite a number of installs so this is a niche that we could be focused on.
After reviewing all of the data, I would recommend working on developing apps for Books, since that is a common popular category within both stores, we could focus our efforts on one genre, and making a high quality free book app. This way we will not