The ultimate goal of any app-maker is to build apps that generate a steady revenue. Cutting to the chase - there are broadly 3 ways to generate revenue via apps -
We at Raghav & Co. only build apps that are free to download and install, and our main source of revenue consists of in-app ads. So basically -
Taking informed decisions on building only those apps that may attract more user-attention is the bottom line. This - ladies and gents - is the goal of our project.
Before we get our hands dirty, let us first get our data clean! And before we get our data clean, let's just read the data first..!
We will be working with 2 data-sets of app-data, namely -
First, we write a pretty straightforward code that allows python to read these data-sets into 2 separate lists.
from csv import reader
#opening iOS apps data
opened_file = open('AppleStore.csv',encoding = "utf8")
fhand = reader(opened_file)
ios_data = list(fhand)
apple_header = ios_data[0]
ios_data = ios_data[1:]
#opening Android apps data
opened_file = open('googleplaystore.csv', encoding = "utf8")
fhand = reader(opened_file)
and_data = list(fhand)
android_header = and_data[0]
and_data = and_data[1:]
Let's just write another piece of code that prints a couple of rows from our data sets, just to get an idea of how App-data looks like -
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
print(android_header,'\n')
explore_data(and_data,0,2)
print(apple_header,'\n')
explore_data(ios_data,0,2)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']
Great! We have a fair idea of the structure of our data, and what we shall be dealing with.
For the sake of convenience, let's also record the field names of our data-sets along with the index number they are located at.
Index No. | iOS field names | Android field names |
---|---|---|
0 | id | App |
1 | track_name | Category |
2 | size_bytes | Rating |
3 | currency | Reviews |
4 | price | Size |
5 | rating_count_tot | Installs |
6 | rating_count_ver | Type |
7 | user_rating | Price |
8 | user_rating_ver | Content Rating |
9 | ver | Genres |
10 | cont_rating | Last Updated |
11 | prime_genre | Current Ver |
12 | sup_devices.num | Android Ver |
13 | ipadSc_urls.num | - |
14 | lang.num | - |
15 | vpp_lic | - |
In the same way, bad data can spoil an entire analysis.
Before we proceed on to analyse our raw data, we need to clean our data and make it workable. We need to -
We found no rows with empty values in the iOS Data Set.
Thanks to PhaniKirenSiddineni's post on kaggle discussions forum (https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), we found out that the "Ratings" field of row 10472 is empty.
We will not consider this row from the Android data-set for our analysis. Thus, we delete it:
def null_row(data_set):
len_header = len(data_set[0])
for row in data_set:
if len(row) == len_header:
continue
else:
del_x = data_set.index(row)
del data_set[del_x]
return None
null_row(and_data)
null_row(ios_data)
# define 2 dictionaries - 1 for duplicate apps,
# and another for unique apps
def count_duplicates(data_set):
count=0
for row in data_set:
app = row[0]
count+=1
#print(count)
if app in unique_apps:
duplicate_apps.append(app)
else:
unique_apps.append(app)
print('Count of Unique apps : ', len(unique_apps))
print('Count of Duplicate apps : ', len(duplicate_apps))
print('Count of Total apps : ', len(unique_apps)+len(duplicate_apps))
return None
duplicate_apps = []
unique_apps = []
print('For Android Apps-')
count_duplicates(and_data)
# print('\nFor iOS Apps-')
# count_duplicates(ios_data)
For Android Apps- Count of Unique apps : 9659 Count of Duplicate apps : 1181 Count of Total apps : 10840
(In the above code, a function named count_duplicates() is called with and_data and ios_data data sets in arguments. This function checks for any duplicate rows with null values, and prints the number of duplicate apps for iOS and Android data sets respectively.)
max_reviews_dict = {}
and_updated = []
already_added = []
def clean_duplicates(data_set):
# max-reviews dictionary
for row in data_set:
app = row[0]
num_reviews = float(row[3])
if app not in max_reviews_dict:
max_reviews_dict[app] = num_reviews
elif app in max_reviews_dict and num_reviews > max_reviews_dict[app]:
max_reviews_dict[app] = num_reviews
# appending unique rows into new data-set
for row in data_set:
name = row[0]
n_reviews = float(row[3])
if name not in duplicate_apps:
and_updated.append(row)
already_added.append(name)
elif name in duplicate_apps and n_reviews == max_reviews_dict[name] and name not in already_added:
and_updated.append(row)
already_added.append(name)
else:
continue
clean_duplicates(and_data)
(In the above code, a function named clean_duplicates() is called with and_data as an argument, which creates another list called and_updated and appends all the unique rows in that list, thus, in a way - cleaning the duplicates. For an app having duplicate rows, the row which has the maximum value of "Reviews", gets appended, and the remaining get discarded.)
Here we identify a total of 1014 apps that have non-English names. After we have removed these duplicates, we should be left with 6183 apps/rows.
Here we identify a total of 45 apps that have non-English names. After we have removed these duplicates, we should be left with 9614 apps/rows.
and_3 = []
ios_3 = []
and_3_ne = []
ios_3_ne = []
# function to remove non-english apps
def eng_check(ios_data, and_updated):
# Code to remove non-Eng apps from iOS
for row in ios_data:
app = row[1]
count = 0
for letter in app:
if ord(letter) > 127:
count += 1
if count <= 3:
ios_3.append(row)
else:
ios_3_ne.append(row)
# Code to remove non-Eng apps from iOS
for row in and_updated:
app = row[0]
count = 0
for letter in app:
if ord(letter) > 127:
count += 1
if count <= 3:
and_3.append(row)
else:
and_3_ne.append(row)
eng_check(ios_data, and_updated)
print('For iOS Apps :')
print('Count of English Apps : ',len(ios_3))
print('Count of Non-English Apps : ',len(ios_3_ne),'\n')
print('For Android Apps :')
print('Count of English Apps : ',len(and_3))
print('Count of Non-English Apps : ',len(and_3_ne))
For iOS Apps : Count of English Apps : 6183 Count of Non-English Apps : 1014 For Android Apps : Count of English Apps : 9614 Count of Non-English Apps : 45
(In the above code, a function named eng_check() is called, with ios_data and and_data as the arguments, which creates another list called and_3 (for cleaned android apps) and ios_3 (for cleaned Apple apps), and appends all rows with name of apps in English in that list, thus, in a way - checking for non-English apps. For an app having non-English name, the row that corresponds to that app is not appended into the final-list)
android = []
apple = []
for row in and_3:
price = row[7]
if price == '0':
android.append(row)
for row in ios_3:
price = row[4]
if price == '0.0':
apple.append(row)
# print(len(android)) --> to check for total no. of Android apps
# print(len(apple)) --> to check for total no. of iOS apps
(In the above code, we work with the and_3 and ios_3 data sets, which are derived from our original and_data and ios_data data-sets, but don't contain any duplicate, non -english data. By writing two for loops (one for each data-set), we append into the new, final lists - android[:] and apple[:] - only those rows for which the apps are free (i.e. price = 0))
After getting our precious data rid of the dirt, it's time to work on the data!
As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.
To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.
Let's begin the analysis by getting a sense of what the most common genres are for each market. For this, we'll need to build frequency tables for a few columns in our data sets -
# this function, when called, returns a list having count of takes in android/apple dataset[:] and an
# number of apps per genre, in a descending order
def freq_table(data_set, index):
count_dict = {}
count_ls = []
count_list = []
for row in data_set:
field = row[index]
if field in count_dict:
count_dict[field] += 1
else:
count_dict[field] = 1
for key in count_dict:
percentage = 100*count_dict[key]/len(data_set)
count_ls.append([count_dict[key], key, str(round(percentage,2))+'%'])
count_ls = sorted(count_ls, reverse = True)
for row in count_ls:
count_list.append([row[1], row[0], row[2]])
return count_list
# this function displays the genre or category wise app-count for
# apple/android data sets
def display_table(data_set, index):
data_set_category = freq_table(data_set, index)
x = 'ANDROID - CATEGORY WISE APP COUNT\nCategory : Count of Apps : Percentage'
y = 'ANDROID - GENRE WISE APP COUNT\nGenre : Count of Apps : Percentage'
z = 'APPLE - GENRE WISE APP COUNT\nGenre : Count of Apps : Percentage'
if data_set == android and index == 1:
print(x)
elif data_set == android and index == 9:
print(y)
elif data_set == apple:
print(z)
for row in data_set_category:
print(row[0],':',row[1],':',row[2])
print('')
(In the code above, the function freq_table() returns the number of apps per genre in a descending order. The function display_table() prints the frequency table, when called.)
Now that we have built the functions that display the frequency table for genre wise count of apps for both our data-sets, we can call the functions to see which genre / category of apps are more common than others.
First we take a look at the category-wise frequency table for Android apps:
# Android count of apps category wise:
display_table(android, 1)
ANDROID - CATEGORY WISE APP COUNT Category : Count of Apps : Percentage FAMILY : 1676 : 18.91% GAME : 862 : 9.72% TOOLS : 750 : 8.46% BUSINESS : 407 : 4.59% LIFESTYLE : 346 : 3.9% PRODUCTIVITY : 345 : 3.89% FINANCE : 328 : 3.7% MEDICAL : 313 : 3.53% SPORTS : 301 : 3.4% PERSONALIZATION : 294 : 3.32% COMMUNICATION : 287 : 3.24% HEALTH_AND_FITNESS : 273 : 3.08% PHOTOGRAPHY : 261 : 2.94% NEWS_AND_MAGAZINES : 248 : 2.8% SOCIAL : 236 : 2.66% TRAVEL_AND_LOCAL : 207 : 2.34% SHOPPING : 199 : 2.25% BOOKS_AND_REFERENCE : 190 : 2.14% DATING : 165 : 1.86% VIDEO_PLAYERS : 159 : 1.79% MAPS_AND_NAVIGATION : 124 : 1.4% FOOD_AND_DRINK : 110 : 1.24% EDUCATION : 103 : 1.16% ENTERTAINMENT : 85 : 0.96% LIBRARIES_AND_DEMO : 83 : 0.94% AUTO_AND_VEHICLES : 82 : 0.93% HOUSE_AND_HOME : 73 : 0.82% WEATHER : 71 : 0.8% EVENTS : 63 : 0.71% PARENTING : 58 : 0.65% ART_AND_DESIGN : 57 : 0.64% COMICS : 55 : 0.62% BEAUTY : 53 : 0.6%
Next, we take a look at the genre-wise frequency table for Android apps -
# Android count of apps genre wise:
display_table(android, 9)
ANDROID - GENRE WISE APP COUNT Genre : Count of Apps : Percentage Tools : 749 : 8.45% Entertainment : 538 : 6.07% Education : 474 : 5.35% Business : 407 : 4.59% Productivity : 345 : 3.89% Lifestyle : 345 : 3.89% Finance : 328 : 3.7% Medical : 313 : 3.53% Sports : 307 : 3.46% Personalization : 294 : 3.32% Communication : 287 : 3.24% Action : 275 : 3.1% Health & Fitness : 273 : 3.08% Photography : 261 : 2.94% News & Magazines : 248 : 2.8% Social : 236 : 2.66% Travel & Local : 206 : 2.32% Shopping : 199 : 2.25% Books & Reference : 190 : 2.14% Simulation : 181 : 2.04% Dating : 165 : 1.86% Arcade : 164 : 1.85% Video Players & Editors : 157 : 1.77% Casual : 156 : 1.76% Maps & Navigation : 124 : 1.4% Food & Drink : 110 : 1.24% Puzzle : 100 : 1.13% Racing : 88 : 0.99% Role Playing : 83 : 0.94% Libraries & Demo : 83 : 0.94% Auto & Vehicles : 82 : 0.93% Strategy : 81 : 0.91% House & Home : 73 : 0.82% Weather : 71 : 0.8% Events : 63 : 0.71% Adventure : 60 : 0.68% Comics : 54 : 0.61% Beauty : 53 : 0.6% Art & Design : 53 : 0.6% Parenting : 44 : 0.5% Card : 40 : 0.45% Casino : 38 : 0.43% Trivia : 37 : 0.42% Educational;Education : 35 : 0.39% Board : 34 : 0.38% Educational : 33 : 0.37% Education;Education : 30 : 0.34% Word : 23 : 0.26% Casual;Pretend Play : 21 : 0.24% Music : 18 : 0.2% Racing;Action & Adventure : 15 : 0.17% Puzzle;Brain Games : 15 : 0.17% Entertainment;Music & Video : 15 : 0.17% Casual;Brain Games : 12 : 0.14% Casual;Action & Adventure : 12 : 0.14% Arcade;Action & Adventure : 11 : 0.12% Action;Action & Adventure : 9 : 0.1% Educational;Pretend Play : 8 : 0.09% Simulation;Action & Adventure : 7 : 0.08% Parenting;Education : 7 : 0.08% Entertainment;Brain Games : 7 : 0.08% Board;Brain Games : 7 : 0.08% Parenting;Music & Video : 6 : 0.07% Educational;Brain Games : 6 : 0.07% Casual;Creativity : 6 : 0.07% Art & Design;Creativity : 6 : 0.07% Education;Pretend Play : 5 : 0.06% Role Playing;Pretend Play : 4 : 0.05% Education;Creativity : 4 : 0.05% Role Playing;Action & Adventure : 3 : 0.03% Puzzle;Action & Adventure : 3 : 0.03% Entertainment;Creativity : 3 : 0.03% Entertainment;Action & Adventure : 3 : 0.03% Educational;Creativity : 3 : 0.03% Educational;Action & Adventure : 3 : 0.03% Education;Music & Video : 3 : 0.03% Education;Brain Games : 3 : 0.03% Education;Action & Adventure : 3 : 0.03% Adventure;Action & Adventure : 3 : 0.03% Video Players & Editors;Music & Video : 2 : 0.02% Sports;Action & Adventure : 2 : 0.02% Simulation;Pretend Play : 2 : 0.02% Puzzle;Creativity : 2 : 0.02% Music;Music & Video : 2 : 0.02% Entertainment;Pretend Play : 2 : 0.02% Casual;Education : 2 : 0.02% Board;Action & Adventure : 2 : 0.02% Video Players & Editors;Creativity : 1 : 0.01% Trivia;Education : 1 : 0.01% Travel & Local;Action & Adventure : 1 : 0.01% Tools;Education : 1 : 0.01% Strategy;Education : 1 : 0.01% Strategy;Creativity : 1 : 0.01% Strategy;Action & Adventure : 1 : 0.01% Simulation;Education : 1 : 0.01% Role Playing;Brain Games : 1 : 0.01% Racing;Pretend Play : 1 : 0.01% Puzzle;Education : 1 : 0.01% Parenting;Brain Games : 1 : 0.01% Music & Audio;Music & Video : 1 : 0.01% Lifestyle;Pretend Play : 1 : 0.01% Lifestyle;Education : 1 : 0.01% Health & Fitness;Education : 1 : 0.01% Health & Fitness;Action & Adventure : 1 : 0.01% Entertainment;Education : 1 : 0.01% Communication;Creativity : 1 : 0.01% Comics;Creativity : 1 : 0.01% Casual;Music & Video : 1 : 0.01% Card;Action & Adventure : 1 : 0.01% Books & Reference;Education : 1 : 0.01% Art & Design;Pretend Play : 1 : 0.01% Art & Design;Action & Adventure : 1 : 0.01% Arcade;Pretend Play : 1 : 0.01% Adventure;Education : 1 : 0.01%
And finally, we take a look at the genre-wise frequency table for iOS apps:
# Apple count of apps genre wise:
display_table(apple, 11)
APPLE - GENRE WISE APP COUNT Genre : Count of Apps : Percentage Games : 1874 : 58.16% Entertainment : 254 : 7.88% Photo & Video : 160 : 4.97% Education : 118 : 3.66% Social Networking : 106 : 3.29% Shopping : 84 : 2.61% Utilities : 81 : 2.51% Sports : 69 : 2.14% Music : 66 : 2.05% Health & Fitness : 65 : 2.02% Productivity : 56 : 1.74% Lifestyle : 51 : 1.58% News : 43 : 1.33% Travel : 40 : 1.24% Finance : 36 : 1.12% Weather : 28 : 0.87% Food & Drink : 26 : 0.81% Reference : 18 : 0.56% Business : 17 : 0.53% Book : 14 : 0.43% Navigation : 6 : 0.19% Medical : 6 : 0.19% Catalogs : 4 : 0.12%
Apple users are really into gaming! The data suggests that more than 50% of the available apps in Apple-Store are of the Gaming genre. BUT - does that mean the gaming genre is where it's at? We shall see soon.
As for the Android users, the genre availability is much more diverse.
At first, it seems that for Android users, the most common apps are of the "Family" genre, which is 18%, and the second most common apps are of Gaming genre. BUT - a deep dive shows us that most of the Family genre apps ARE indeed for kids, i.e. of the gaming type! See for yourselves -
Now that we know what genre of apps are most commonly available on both the Android and Apple markets, we can do further analysis to see whether the most common genres are also the most popular.
To do that, we write a pretty piece of code, which basically tells us the Genre / Category wise average number of ratings / downloads for Apple and Android data sets -
(NOTE: for the sake of readability - I have printed only the top-15 genres for the data sets. This will also allow us to direct our focus only on the top genres. However, the code is very flexible, and can be easily tweaked to print any number of rows you want)
# this function gives average no of app downloads per
# genre (for iOS Apps) in descending order
def apple_num_installs():
genre_list = freq_table(apple, 11)
install_count = []
for genrerow in genre_list:
sum_installs = 0
genre_name = genrerow[0]
num_apps = float(genrerow[1])
for row in apple:
name = row[11]
num_installs = float(row[5])
if genre_name == name:
sum_installs += num_installs
avg = round(sum_installs/num_apps,3)
install_count.append([avg,genre_name])
#print(avg,genre_name)
install_count = sorted(install_count, reverse = True)
return install_count
# this function gives average no of app installs per
# genre/category (for Android Apps) in descending order
# depending on the index number send in argument
# (1=Genre, 9=Category)
def android_num_installs(index):
genre_list = freq_table(android, index)
install_count = []
for genrerow in genre_list:
sum_installs = 0
genre_name = genrerow[0]
num_apps = float(genrerow[1])
for row in android:
name = row[index]
num_installs = row[5]
num_installs = num_installs.replace('+','')
num_installs = num_installs.replace(',','')
num_installs = float(num_installs)
if genre_name == name:
sum_installs += num_installs
avg = round(sum_installs/num_apps,3)
install_count.append([avg,genre_name])
install_count = sorted(install_count, reverse = True)
# for row in install_count:
# print(row[0],':', row[1])
# print('')
return install_count
# Android avg-apps genre wise
x = android_num_installs(1)
for row in x[:15]:
print(row[0],row[1])
38456119.167 COMMUNICATION 24727872.453 VIDEO_PLAYERS 23253652.127 SOCIAL 17840110.402 PHOTOGRAPHY 16787331.345 PRODUCTIVITY 15588015.603 GAME 13984077.71 TRAVEL_AND_LOCAL 11640705.882 ENTERTAINMENT 10801391.299 TOOLS 9549178.468 NEWS_AND_MAGAZINES 8767811.895 BOOKS_AND_REFERENCE 7036877.312 SHOPPING 5201482.612 PERSONALIZATION 5074486.197 WEATHER 4188821.985 HEALTH_AND_FITNESS
# Android avg-apps genre wise
x = android_num_installs(9)
for row in x[:15]:
print(row[0],row[1])
38456119.167 Communication 35333333.333 Adventure;Action & Adventure 24947335.796 Video Players & Editors 23253652.127 Social 22888365.488 Arcade 19569221.603 Casual 18366666.667 Puzzle;Action & Adventure 17840110.402 Photography 17016666.667 Educational;Action & Adventure 16787331.345 Productivity 15910645.682 Racing 14051476.146 Travel & Local 12916666.667 Casual;Action & Adventure 12603588.873 Action 11199902.531 Strategy
# Apple avg-apps genre wise
apple_app_installs = apple_num_installs()
x = apple_app_installs
for row in x[:15]:
print(row[0],row[1])
86090.333 Navigation 74942.111 Reference 71548.349 Social Networking 57326.53 Music 52279.893 Weather 39758.5 Book 33333.923 Food & Drink 31467.944 Finance 28441.544 Photo & Video 28243.8 Travel 26919.69 Shopping 23298.015 Health & Fitness 23008.899 Sports 22788.67 Games 21248.023 News
Ladies and Gents - the Games genre has been upset by the Communications and Navigation genres! Who would've though that these genres would be the most frequently downloaded, even more than Games!
Jokes aside, we clearly see that -
However, from statistics 101, we know that -
Before jumping to conclusions, it is imperative that we run a sneak peek on the top most downloaded apps in these most downloaded genres of Apple and Android, and check whether those genres are indeed very popuular, or its just a bunch of outliers affecting the average and making it "seem" popular. We will deep dive further, if need be.
def android_genre_count_check(genre, index):
x = []
for row in android:
category = row[1]
if category == genre:
numinstall = row[5]
numinstall = numinstall.replace('+','')
numinstall = numinstall.replace(',','')
numinstall = float(numinstall)
x.append([numinstall, row[0]])
x=sorted(x, reverse = True)
for row in x[:index]:
print(row[0],row[1])
def apple_genre_count_check(genre, index):
x = []
for row in apple:
category = row[11]
if category == genre:
numinstall = row[5]
numinstall = float(numinstall)
x.append([numinstall, row[1]])
x=sorted(x, reverse = True)
for row in x[:index]:
print(row[0],row[1])
Let's check the genre-wise most popular apps in Android first, starting with the 'COMMUNICATION" category -
android_genre_count_check('COMMUNICATION',10)
1000000000.0 WhatsApp Messenger 1000000000.0 Skype - free IM & video calls 1000000000.0 Messenger – Text and Video Chat for Free 1000000000.0 Hangouts 1000000000.0 Google Chrome: Fast & Secure 1000000000.0 Gmail 500000000.0 imo free video calls and chat 500000000.0 Viber Messenger 500000000.0 UC Browser - Fast Download Private & Secure 500000000.0 LINE: Free Calls & Messages
For Android, we clearly see above that the COMMUNICATION genre owes its popularity to only a few bestseller heavyweights like WhatsApp Messenger, Skype, Hangouts etc, that have more than 1 Billion Downloads. The average number of downloads for this genre is around 48 Million, which is highly misleading.
Raghav & Co. does not recommend going against the established biggies in this Genre. Let's check the second and third most popular category of apps in Android -
print('"VIDEO PLAYERS" Category - ANDROID')
android_genre_count_check('VIDEO_PLAYERS',5)
print('')
print('"SOCIAL" Category - ANDROID')
android_genre_count_check('SOCIAL',5)
"VIDEO PLAYERS" Category - ANDROID 1000000000.0 YouTube 1000000000.0 Google Play Movies & TV 500000000.0 MX Player 100000000.0 VivaVideo - Video Editor & Photo Movie 100000000.0 VideoShow-Video Editor, Video Maker, Beauty Camera "SOCIAL" Category - ANDROID 1000000000.0 Instagram 1000000000.0 Google+ 1000000000.0 Facebook 500000000.0 Snapchat 500000000.0 Facebook Lite
Similarly, for 'VIDEO_PLAYERS' and 'SOCIAL' categories in Android, a few apps with 1 Billion + downloads skew the average installs greatly. Most other apps suffer from the "Big fish - Little fish" syndrome, and are in all likeliness, unable to gain popularity due to the existing heavyweights.
Verdict - NOT RECOMMENDED.
Let's now have a glance at the Apple's top 3 most common genres, and see whether the average downloads per genre show a promising value for a particular genre.
print('"Navigation" Category - APPLE')
apple_genre_count_check('Navigation',10)
print('\n"Reference" Category - APPLE')
apple_genre_count_check('Reference',10)
print('\n"Social Networking" Category - APPLE')
apple_genre_count_check('Social Networking',10)
"Navigation" Category - APPLE 345046.0 Waze - GPS Navigation, Maps & Real-time Traffic 154911.0 Google Maps - Navigation & Transit 12811.0 Geocaching® 3582.0 CoPilot GPS – Car Navigation & Offline Maps 187.0 ImmobilienScout24: Real Estate Search in Germany 5.0 Railway Route Search "Reference" Category - APPLE 985920.0 Bible 200047.0 Dictionary.com Dictionary & Thesaurus 54175.0 Dictionary.com Dictionary & Thesaurus for iPad 26786.0 Google Translate 18418.0 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran 17588.0 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition 16849.0 Merriam-Webster Dictionary 12122.0 Night Sky 8535.0 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) 4693.0 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools "Social Networking" Category - APPLE 2974676.0 Facebook 1061624.0 Pinterest 373519.0 Skype for iPhone 351466.0 Messenger 334293.0 Tumblr 287589.0 WhatsApp Messenger 260965.0 Kik 177501.0 ooVoo – Free Video Call, Text and Voice 164963.0 TextNow - Unlimited Text + Calls 164249.0 Viber Messenger – Text & Call
In case of Apple above, Waze and Google Maps give a false sense of prosperity in Navigation apps. Similarly, "Social networking" genre of apps' average number of installs is heavily skewed by "Facebook", "Pinterest" and "Skype".These outliers skew our data heavily, and thus going in this genre for now, is not advisable.
However, the "Reference" genre of iOS store shows no such hetrogeneity in its apps, which are mainly of the 'Reading/Books" type. The Except for 'Bible" app, most other apps show a promising number of downloads.
Let us now take a deeper dive into the apps related to the Books/Reading type, in both the Android and Apple stores.
First, taking cue from the "Reference" genre in Apple, we take a look at the number of downloads of apps in the "Book" genre of Apple (which is the 6th most popular Genre) -
print('"Book" Category - APPLE')
apple_genre_count_check('Book',10)
"Book" Category - APPLE 252076.0 Kindle – Read eBooks, Magazines & Textbooks 105274.0 Audible – audio books, original series & podcasts 84062.0 Color Therapy Adult Coloring Book for Adults 65450.0 OverDrive – Library eBooks and Audiobooks 47829.0 HOOKED - Chat Stories 879.0 BookShout: Read eBooks & Track Your Reading Goals 451.0 Dr. Seuss Treasury — 50 best kids books 392.0 Green Riding Hood 197.0 Weirdwood Manor 9.0 MangaZERO - comic reader
Great ! We are going in the right direction !
Looks like in the "Book" category of Apple store, save "Kindle" and "Audible" apps, other apps have a considerable number of downloads by users. It turns out that this genre is not as heavily influenced by outliers as other top popular genres.
Now, let's check whether the same pattern follows in the Android market as well.
We will go through the BOOKS_AND_REFERENCE" category of Android genres and see how the top 15 apps fare -
android_genre_count_check('BOOKS_AND_REFERENCE',15)
1000000000.0 Google Play Books 100000000.0 Wattpad 📖 Free Books 100000000.0 Bible 100000000.0 Audiobooks from Audible 100000000.0 Amazon Kindle 10000000.0 Wikipedia 10000000.0 Spanish English Translator 10000000.0 Quran for Android 10000000.0 Oxford Dictionary of English : Free 10000000.0 NOOK: Read eBooks & Magazines 10000000.0 Moon+ Reader 10000000.0 JW Library 10000000.0 HTC Help 10000000.0 FBReader: Favorite Book Reader 10000000.0 English Hindi Dictionary
Hmm... Interesting!
Except for 'Google Play Books', apps in the "BOOKS_AND_REFERENCE" category in Android market also show promising results. Most apps have 10 Million + downloads, a few of them with 100 Million + downloads.
It seems that we have a winner here..!
"Books & References" and related genres emerge in the lead after our brief analysis of the Apple and Android markets' data-sets. While many genres like Games, Communication, Social Networking, and Navigation may appear more popular as well as most common, a deep dive show us that these genres are either plagued by influential outliers which give a false sense of prosperity of apps in those genres, or have a lot of available apps, that don't have a lot of downloads, but their sheer numbers make the genre appear to have a lot of apps in it.
Books & Reference genre, on the other hand, has lesser number of "dead-apps", meaning that most apps have considerable number of downloads. Also, the genre is not as plagued by influential outliers as other genres.
Investing resources to develop the next app in the Books & References category at Raghav & Co. is the best bet !