In this project we analyze profiles of App Store and Google Play free apps. The main source of revenue from these apps in in-app advertisements. We use the number of reviews to measure the number of users who download and interact with advertisements in each app.
The goal of the project is to determine the features of an app that attracts a large number of users who view and engage with advertisements in the app.
We are using two data sets:
#open csv file for each dataset, read using reader function imported from csv module, store each in variable as list of lists
opened_file_google = open('/content/drive/My Drive/Datasets/googleplaystore.csv')
opened_file_apple = open('/content/drive/My Drive/Datasets/AppleStore.csv')
from csv import reader
google_data = list(reader(opened_file_google))
google_data_header = google_data[0]
google_data = google_data[1:]
apple_data = list(reader(opened_file_apple))
apple_data_header = apple_data[0]
apple_data = apple_data[1:]
def explore_data(dataset, start, end, rows_and_columns=False):
'''Passed dataset paraemter as list of lists, prints rows of dataset and if rows_and_columns parameter is passed True
then prints number of rows (including header row) and number of columns in dataset'''
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
print(google_data_header)
print('\n')
explore_data(google_data, 0, 3, True)
print('\n'*3)
print(apple_data_header)
print('\n')
explore_data(apple_data, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Number of rows: 10841 Number of columns: 13 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 7197 Number of columns: 16
Column headings for googleplaystore.csv data (click here for detailed description):
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
Column headings for AppleStore.csv data (click here for detailed description):
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
#Delete a row with error as identified in discussion forum in dataset documentation
print(len(google_data))
print(google_data[10472])
del google_data[10472]
print(len(google_data))
10841 ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 10840
There are duplicate entries for apps in the Google Play dataset. For example the app named 'Coloring book moana' has two separate entries in the dataset where each entry has a different value in the 'Reviews' column:
name = 'Coloring book moana'
for app in google_data:
if app[0] == name:
print(app)
print(google_data.index(app))
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 1 ['Coloring book moana', 'FAMILY', '3.9', '974', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 2033
duplicate_entries = []
unique_entries = []
for app in google_data:
name = app[0]
if name in unique_entries:
duplicate_entries.append(name)
else:
unique_entries.append(name)
print('Number of duplicate apps: ', len(duplicate_entries))
print('Examples of duplicate apps: ', duplicate_entries[:10])
Number of duplicate apps: 1181 Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']
There are 1181 duplicate entries in the dataset. Duplicates will be removed by only keeping the entry with the highest number of reviews and removing other entires with the same name: the enrty with the higest number of reviews will provide the most accurate rating.
To remove the duplicates, we will:
#initialize empty dictionary reviews_max
#loop over apps in google_data and update reviews column value if entry is a duplicate. Else add key, value pair to reviews_max if app name is in reviews_max
reviews_max = {}
for app in google_data:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max:
if n_reviews > reviews_max[name]:
reviews_max[name] = n_reviews
else:
reviews_max[name] = n_reviews
#print lengths of container variables to check loop has worked correctly
print('Length of google_data minus length of duplicate entries: ', len(google_data) - len(duplicate_entries))
print('Length of unique_entries: ', len(unique_entries))
print('Length of reviews_max: ', len(reviews_max))
Length of google_data minus length of duplicate entries: 9659 Length of unique_entries: 9659 Length of reviews_max: 9659
As expected, the dataset minus the number of duplicate entries, the unique_entries list and the reviews_max dictionary all have the same length.
#create two empty lists to store cleaned dataset and to store named of apps already added to cleaned dataset
#loop through apps in original dataset and store name and number of reviews
#if number of reviews is equal to the max number of reviews for apps of same name AND name of app in not in the list of names of apps already added then append app to cleaned dataset
#note: some rows in original dataset have duplicate entries with same number of reviews hence 'name not in already_added' required to prevent duplicates of these rows in cleaned data
google_cleaned = []
already_added = []
for app in google_data:
name = app[0]
n_reviews = float(app[3])
if (n_reviews == reviews_max[name]) & (name not in already_added):
google_cleaned.append(app)
already_added.append(name)
#explore the cleaed dataset
explore_data(google_cleaned, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9659 Number of columns: 13
Our dataset contains apps designed for non-english users.
The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system.
In the code box below is a function which takes a string as an argument and determines if the language of the string is English:
def is_english(s):
'''is_english returns True if the string only contains characters
with an output from ord() function in the range 0 to 127 and False if the
string contains one or more characters outside that range'''
for character in s:
if ord(character) > 127:
return False
return True
#test is_english function on some strings
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True False False False
Notice the function couldn't correctly identify certain English app names like 'Docs To Go™ Free Office Suite'
and 'Instachat 😜'
. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127. This means that the function will incorrectly identify many apps as non-English and we will lose useful data.
To minimize the effects of data loss we can modify the is_english function to return false only if the string contains more than 3 characters outside the 0 to 127 ASCII range:
def is_english(s):
'''is_english returns True if the string contains 3 or less characters
with an output from ord() function outside the range 0 to 127 and False if the
string contains 4 or more characters outside that range'''
count = 0
for character in s:
if ord(character) > 127:
count += 1
if count == 4:
return False
return True
#test modified is_english function on same strings
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True False True True
The function is still not perfect, and very few non-English apps might get past our filter, but this seems good enough at this point in our analysis — we shouldn't spend too much time on optimization at this point.
Now, use the new function to filter out non-English apps from both data sets. Loop through each data set. If an app name is identified as English, append the whole row to a new list:
english_google_cleaned = []
english_apple_data = []
for app in google_cleaned:
name = app[0]
if is_english(name):
english_google_cleaned.append(app)
for app in apple_data:
name = app[1] #name is in second column (index=1) of apple_data dataset
if is_english(name):
english_apple_data.append(app)
explore_data(english_google_cleaned, 0, 3, True)
print('\n')
explore_data(english_apple_data, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9614 Number of columns: 13 ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 6183 Number of columns: 16
We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis. We will do so with the following process:
Loop through each data set to isolate the free apps in separate lists.
After isolating the free apps, check the length of each data set to see how many apps are remaining.
free_english_google_cleaned = []
free_english_apple_data = []
for app in english_google_cleaned:
if (app[6] == 'Free') | (app[7] == '0'):
free_english_google_cleaned.append(app)
for app in english_apple_data:
if app[4] == '0.0':
free_english_apple_data.append(app)
explore_data(free_english_google_cleaned, 0, 3, True)
print('\n')
explore_data(free_english_apple_data, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 8864 Number of columns: 13 ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 3222 Number of columns: 16
We're left with 8864 Google Play apps and 3222 Apple Store apps, which should be enough for our analysis.
We seek to identify the features of an app profile that is successful in both the Google Play and Apple Store markets. We aim to determine the kinds of apps that are likely to be successful in both markets since our revenue from an app is mainly a function of the number of users of the app.
The validation process (to minimize risks and overheads) for a new app is as follows:
We will begin by inspecting both datasets and determining the columns we could use to generate frequency tables to find the most common app genres in each market.
print(google_data_header)
print('\n')
print(apple_data_header)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
We'll build two functions we can use to analyze the frequency tables:
def freq_table(dataset, index):
'''dataset is expected to be a list of lists and index is expected to be an integer
freq_table returns the relative frequency table (as a dictionary) for any column we want.
'''
table = {}
total = len(dataset)
for app in dataset:
key = app[index]
if key in table:
table[key] += 1
else:
table[key] = 1
for key in table:
table[key] = (table[key] / total) * 100
return table
print(freq_table(free_english_google_cleaned, 1))
{'ART_AND_DESIGN': 0.6430505415162455, 'AUTO_AND_VEHICLES': 0.9250902527075812, 'BEAUTY': 0.5979241877256317, 'BOOKS_AND_REFERENCE': 2.1435018050541514, 'BUSINESS': 4.591606498194946, 'COMICS': 0.6204873646209386, 'COMMUNICATION': 3.2378158844765346, 'DATING': 1.861462093862816, 'EDUCATION': 1.1620036101083033, 'ENTERTAINMENT': 0.9589350180505415, 'EVENTS': 0.7107400722021661, 'FINANCE': 3.7003610108303246, 'FOOD_AND_DRINK': 1.2409747292418771, 'HEALTH_AND_FITNESS': 3.0798736462093865, 'HOUSE_AND_HOME': 0.8235559566787004, 'LIBRARIES_AND_DEMO': 0.9363718411552346, 'LIFESTYLE': 3.9034296028880866, 'GAME': 9.724729241877256, 'FAMILY': 18.907942238267147, 'MEDICAL': 3.531137184115524, 'SOCIAL': 2.6624548736462095, 'SHOPPING': 2.2450361010830324, 'PHOTOGRAPHY': 2.944494584837545, 'SPORTS': 3.395758122743682, 'TRAVEL_AND_LOCAL': 2.33528880866426, 'TOOLS': 8.461191335740072, 'PERSONALIZATION': 3.3167870036101084, 'PRODUCTIVITY': 3.892148014440433, 'PARENTING': 0.6543321299638989, 'WEATHER': 0.8009927797833934, 'VIDEO_PLAYERS': 1.7937725631768955, 'NEWS_AND_MAGAZINES': 2.7978339350180503, 'MAPS_AND_NAVIGATION': 1.3989169675090252}
def display_table(dataset, index):
'''Takes in two parameters: dataset and index. dataset is expected to be a list of lists, and index is expected to be an integer.
Generates a frequency table using the freq_table() function.
Transforms the frequency table into a list of tuples (value, key), then sorts the list in a descending order using sorted() function.
Prints the entries of the frequency table.
Does not return anyhting.
'''
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
#use the display_table function to display the frequency table of the prime_genre column from english_apple_data
display_table(free_english_apple_data, 11)
Games : 58.16263190564867 Entertainment : 7.883302296710118 Photo & Video : 4.9658597144630665 Education : 3.662321539416512 Social Networking : 3.2898820608317814 Shopping : 2.60707635009311 Utilities : 2.5139664804469275 Sports : 2.1415270018621975 Music : 2.0484171322160147 Health & Fitness : 2.0173805090006205 Productivity : 1.7380509000620732 Lifestyle : 1.5828677839851024 News : 1.3345747982619491 Travel : 1.2414649286157666 Finance : 1.1173184357541899 Weather : 0.8690254500310366 Food & Drink : 0.8069522036002483 Reference : 0.5586592178770949 Business : 0.5276225946617008 Book : 0.4345127250155183 Navigation : 0.186219739292365 Medical : 0.186219739292365 Catalogs : 0.12414649286157665
For the prime_genre
column of the App Store data set:
By far the most common genre is 'Games' with over 50% relative frequency. The second most common genre is 'Entertainment' with around 7% relative frequency. Approximately 62% of English apps from the Apple Store in this dataset have a prime genre of 'Games' or 'Entertainment'. 'Education' and 'Photo and Video' comprise approximately 12%. It is clear that most apps are designed more for entertainment purposes than practical purposes and it seems likely that an app profile purposed towards entermainment might have a large number of users if we assume that app designers of iOS apps are responding to market demand and that market demand for an app genre can be inferred by the amount of offerings. However it could be that a small number of apps with practical purposes have a large number of users - maybe because one or two very effective and popular apps have been released.
Let's continue by examining the Category
and Genres
columns of the Google Play data set (two columns which seem to be related).
print('Categories frequency table: ')
print('\n')
display_table(free_english_google_cleaned, 1)
print('\n')
print('Genres frequency table: ')
print('\n')
display_table(free_english_google_cleaned, 9)
Categories frequency table: FAMILY : 18.907942238267147 GAME : 9.724729241877256 TOOLS : 8.461191335740072 BUSINESS : 4.591606498194946 LIFESTYLE : 3.9034296028880866 PRODUCTIVITY : 3.892148014440433 FINANCE : 3.7003610108303246 MEDICAL : 3.531137184115524 SPORTS : 3.395758122743682 PERSONALIZATION : 3.3167870036101084 COMMUNICATION : 3.2378158844765346 HEALTH_AND_FITNESS : 3.0798736462093865 PHOTOGRAPHY : 2.944494584837545 NEWS_AND_MAGAZINES : 2.7978339350180503 SOCIAL : 2.6624548736462095 TRAVEL_AND_LOCAL : 2.33528880866426 SHOPPING : 2.2450361010830324 BOOKS_AND_REFERENCE : 2.1435018050541514 DATING : 1.861462093862816 VIDEO_PLAYERS : 1.7937725631768955 MAPS_AND_NAVIGATION : 1.3989169675090252 FOOD_AND_DRINK : 1.2409747292418771 EDUCATION : 1.1620036101083033 ENTERTAINMENT : 0.9589350180505415 LIBRARIES_AND_DEMO : 0.9363718411552346 AUTO_AND_VEHICLES : 0.9250902527075812 HOUSE_AND_HOME : 0.8235559566787004 WEATHER : 0.8009927797833934 EVENTS : 0.7107400722021661 PARENTING : 0.6543321299638989 ART_AND_DESIGN : 0.6430505415162455 COMICS : 0.6204873646209386 BEAUTY : 0.5979241877256317 Genres frequency table: Tools : 8.449909747292418 Entertainment : 6.069494584837545 Education : 5.347472924187725 Business : 4.591606498194946 Productivity : 3.892148014440433 Lifestyle : 3.892148014440433 Finance : 3.7003610108303246 Medical : 3.531137184115524 Sports : 3.463447653429603 Personalization : 3.3167870036101084 Communication : 3.2378158844765346 Action : 3.1024368231046933 Health & Fitness : 3.0798736462093865 Photography : 2.944494584837545 News & Magazines : 2.7978339350180503 Social : 2.6624548736462095 Travel & Local : 2.3240072202166067 Shopping : 2.2450361010830324 Books & Reference : 2.1435018050541514 Simulation : 2.0419675090252705 Dating : 1.861462093862816 Arcade : 1.8501805054151623 Video Players & Editors : 1.7712093862815883 Casual : 1.7599277978339352 Maps & Navigation : 1.3989169675090252 Food & Drink : 1.2409747292418771 Puzzle : 1.128158844765343 Racing : 0.9927797833935018 Role Playing : 0.9363718411552346 Libraries & Demo : 0.9363718411552346 Auto & Vehicles : 0.9250902527075812 Strategy : 0.9138086642599278 House & Home : 0.8235559566787004 Weather : 0.8009927797833934 Events : 0.7107400722021661 Adventure : 0.6768953068592057 Comics : 0.6092057761732852 Beauty : 0.5979241877256317 Art & Design : 0.5979241877256317 Parenting : 0.4963898916967509 Card : 0.45126353790613716 Casino : 0.42870036101083037 Trivia : 0.41741877256317694 Educational;Education : 0.39485559566787 Board : 0.3835740072202166 Educational : 0.3722924187725632 Education;Education : 0.33844765342960287 Word : 0.2594765342960289 Casual;Pretend Play : 0.236913357400722 Music : 0.2030685920577617 Racing;Action & Adventure : 0.16922382671480143 Puzzle;Brain Games : 0.16922382671480143 Entertainment;Music & Video : 0.16922382671480143 Casual;Brain Games : 0.13537906137184114 Casual;Action & Adventure : 0.13537906137184114 Arcade;Action & Adventure : 0.12409747292418773 Action;Action & Adventure : 0.10153429602888085 Educational;Pretend Play : 0.09025270758122744 Simulation;Action & Adventure : 0.078971119133574 Parenting;Education : 0.078971119133574 Entertainment;Brain Games : 0.078971119133574 Board;Brain Games : 0.078971119133574 Parenting;Music & Video : 0.06768953068592057 Educational;Brain Games : 0.06768953068592057 Casual;Creativity : 0.06768953068592057 Art & Design;Creativity : 0.06768953068592057 Education;Pretend Play : 0.056407942238267145 Role Playing;Pretend Play : 0.04512635379061372 Education;Creativity : 0.04512635379061372 Role Playing;Action & Adventure : 0.033844765342960284 Puzzle;Action & Adventure : 0.033844765342960284 Entertainment;Creativity : 0.033844765342960284 Entertainment;Action & Adventure : 0.033844765342960284 Educational;Creativity : 0.033844765342960284 Educational;Action & Adventure : 0.033844765342960284 Education;Music & Video : 0.033844765342960284 Education;Brain Games : 0.033844765342960284 Education;Action & Adventure : 0.033844765342960284 Adventure;Action & Adventure : 0.033844765342960284 Video Players & Editors;Music & Video : 0.02256317689530686 Sports;Action & Adventure : 0.02256317689530686 Simulation;Pretend Play : 0.02256317689530686 Puzzle;Creativity : 0.02256317689530686 Music;Music & Video : 0.02256317689530686 Entertainment;Pretend Play : 0.02256317689530686 Casual;Education : 0.02256317689530686 Board;Action & Adventure : 0.02256317689530686 Video Players & Editors;Creativity : 0.01128158844765343 Trivia;Education : 0.01128158844765343 Travel & Local;Action & Adventure : 0.01128158844765343 Tools;Education : 0.01128158844765343 Strategy;Education : 0.01128158844765343 Strategy;Creativity : 0.01128158844765343 Strategy;Action & Adventure : 0.01128158844765343 Simulation;Education : 0.01128158844765343 Role Playing;Brain Games : 0.01128158844765343 Racing;Pretend Play : 0.01128158844765343 Puzzle;Education : 0.01128158844765343 Parenting;Brain Games : 0.01128158844765343 Music & Audio;Music & Video : 0.01128158844765343 Lifestyle;Pretend Play : 0.01128158844765343 Lifestyle;Education : 0.01128158844765343 Health & Fitness;Education : 0.01128158844765343 Health & Fitness;Action & Adventure : 0.01128158844765343 Entertainment;Education : 0.01128158844765343 Communication;Creativity : 0.01128158844765343 Comics;Creativity : 0.01128158844765343 Casual;Music & Video : 0.01128158844765343 Card;Action & Adventure : 0.01128158844765343 Books & Reference;Education : 0.01128158844765343 Art & Design;Pretend Play : 0.01128158844765343 Art & Design;Action & Adventure : 0.01128158844765343 Arcade;Pretend Play : 0.01128158844765343 Adventure;Education : 0.01128158844765343
Category
column of the Google Play dataset:Around 19% of English Apps on the Google play store are categroized as 'FAMILY'; around 10% as 'GAME' and around 9% as 'TOOLS'. All other categories have a relative frequency below 5%. It seems on first inspection as through there are significantly less apps purposed for entertainment in the Google Play store market than in the Apple Store market. However if we examine the 'FAMILY' category on the Google Play store we can see that this category (which accounts for almost 19% of the apps) means mostly games for kids. In any case, even taking this into account, there appears to be a much lower relative frequency of entertainment purposed apps on the Google Play store than the Apple store and is more balanced with practical apps.
Genres
column of the Google Play dataset:Around 9% have the genre of 'Tools', 6% of 'Entertainment' and 5% of 'Education'. All other genres have a relative frequency below 5%. The genres with lower relative frequencies in this table are subcategories and it is liklely that the relative frequency of top level categories has been affected by this more granular grouping of app genres. It is difficult to compare groupings for the Category
and genres
columns, and since we are looking for a holistic overview, we will decide to only work with the Category
column moving forward.
Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot
app.
Start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:
prime_genre_table = freq_table(free_english_apple_data, 11)
genre_dict = {}
for genre in prime_genre_table:
total_num_ratings = 0
len_genre = 0
for app in free_english_apple_data:
genre_app = app[11]
if genre_app == genre:
app_num_ratings = float(app[5])
total_num_ratings += app_num_ratings
len_genre += 1
mean_num_ratings = total_num_ratings / len_genre
genre_dict[genre] = mean_num_ratings
genre_list = []
for key in genre_dict:
key_val_as_tuple = (genre_dict[key], key)
genre_list.append(key_val_as_tuple)
genre_list_sorted = sorted(genre_list, reverse=True)
for entry in genre_list_sorted:
print(entry[1], ':', entry[0])
Navigation : 86090.33333333333 Reference : 74942.11111111111 Social Networking : 71548.34905660378 Music : 57326.530303030304 Weather : 52279.892857142855 Book : 39758.5 Food & Drink : 33333.92307692308 Finance : 31467.944444444445 Photo & Video : 28441.54375 Travel : 28243.8 Shopping : 26919.690476190477 Health & Fitness : 23298.015384615384 Sports : 23008.898550724636 Games : 22788.6696905016 News : 21248.023255813954 Productivity : 21028.410714285714 Utilities : 18684.456790123455 Lifestyle : 16485.764705882353 Entertainment : 14029.830708661417 Business : 7491.117647058823 Education : 7003.983050847458 Catalogs : 4004.0 Medical : 612.0
On average, Navigation apps (86k) have the highest average number of user ratings, followed by Reference apps (75k), then Social Networking apps (72k), then Music apps (57k), then Weather (52k) and then Book apps (40k).
Even though approx. 58% of apps are in the genre 'Games' it is clear that reference and practical apps have a much larger share of user ratings.
Let us look at the names and number of ratings of apps with a prime genre of 'Navigation':
for app in free_english_apple_data:
if app[11] == 'Navigation':
print(app[1], ':', app[5])
Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5
We can see that the majority of user ratings for apps with the 'Navigation' prime genre are for 'Waze' and 'Google Maps'. It would seem that this genre is dominated by a small number of popular apps and other apps do not have much traffic.
for app in free_english_apple_data:
if app[11] == 'Reference':
print(app[1], ':', app[5])
print()
for app in free_english_apple_data:
if app[11] == 'Book':
print(app[1], ':', app[5])
Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 教えて!goo : 0 Jishokun-Japanese English Dictionary & Translator : 0 Kindle – Read eBooks, Magazines & Textbooks : 252076 Audible – audio books, original series & podcasts : 105274 Color Therapy Adult Coloring Book for Adults : 84062 OverDrive – Library eBooks and Audiobooks : 65450 HOOKED - Chat Stories : 47829 BookShout: Read eBooks & Track Your Reading Goals : 879 Dr. Seuss Treasury — 50 best kids books : 451 Green Riding Hood : 392 Weirdwood Manor : 197 MangaZERO - comic reader : 9 ikouhoushi : 0 MangaTiara - love comic reader : 0 謎解き : 0 謎解き2016 : 0
The 'Reference' and 'Book' genres both appear to be dominated by apps created for popular publications or a small number of popular apps. The Reference apps with the highest number of user reviews are the 'Bible' app and Dictionary apps. The 'Book' apps with the highest number of user reviews are mostly eBook readers or audio book hosting apps.
for app in free_english_apple_data:
if app[11] == 'Social Networking':
print(app[1], ':', app[5])
Facebook : 2974676 Pinterest : 1061624 Skype for iPhone : 373519 Messenger : 351466 Tumblr : 334293 WhatsApp Messenger : 287589 Kik : 260965 ooVoo – Free Video Call, Text and Voice : 177501 TextNow - Unlimited Text + Calls : 164963 Viber Messenger – Text & Call : 164249 Followers - Social Analytics For Instagram : 112778 MeetMe - Chat and Meet New People : 97072 We Heart It - Fashion, wallpapers, quotes, tattoos : 90414 InsTrack for Instagram - Analytics Plus More : 85535 Tango - Free Video Call, Voice and Chat : 75412 LinkedIn : 71856 Match™ - #1 Dating App. : 60659 Skype for iPad : 60163 POF - Best Dating App for Conversations : 52642 Timehop : 49510 Find My Family, Friends & iPhone - Life360 Locator : 43877 Whisper - Share, Express, Meet : 39819 Hangouts : 36404 LINE PLAY - Your Avatar World : 34677 WeChat : 34584 Badoo - Meet New People, Chat, Socialize. : 34428 Followers + for Instagram - Follower Analytics : 28633 GroupMe : 28260 Marco Polo Video Walkie Talkie : 27662 Miitomo : 23965 SimSimi : 23530 Grindr - Gay and same sex guys chat, meet and date : 23201 Wishbone - Compare Anything : 20649 imo video calls and chat : 18841 After School - Funny Anonymous School News : 18482 Quick Reposter - Repost, Regram and Reshare Photos : 17694 Weibo HD : 16772 Repost for Instagram : 15185 Live.me – Live Video Chat & Make Friends Nearby : 14724 Nextdoor : 14402 Followers Analytics for Instagram - InstaReport : 13914 YouNow: Live Stream Video Chat : 12079 FollowMeter for Instagram - Followers Tracking : 11976 LINE : 11437 eHarmony™ Dating App - Meet Singles : 11124 Discord - Chat for Gamers : 9152 QQ : 9109 Telegram Messenger : 7573 Weibo : 7265 Periscope - Live Video Streaming Around the World : 6062 Chat for Whatsapp - iPad Version : 5060 QQ HD : 5058 Followers Analysis Tool For Instagram App Free : 4253 live.ly - live video streaming : 4145 Houseparty - Group Video Chat : 3991 SOMA Messenger : 3232 Monkey : 3060 Down To Lunch : 2535 Flinch - Video Chat Staring Contest : 2134 Highrise - Your Avatar Community : 2011 LOVOO - Dating Chat : 1985 PlayStation®Messages : 1918 BOO! - Video chat camera with filters & stickers : 1805 Qzone : 1649 Chatous - Chat with new people : 1609 Kiwi - Q&A : 1538 GhostCodes - a discovery app for Snapchat : 1313 Jodel : 1193 FireChat : 1037 Google Duo - simple video calling : 1033 Fiesta by Tango - Chat & Meet New People : 885 Google Allo — smart messaging : 862 Peach — share vividly : 727 Hey! VINA - Where Women Meet New Friends : 719 Battlefield™ Companion : 689 All Devices for WhatsApp - Messenger for iPad : 682 Chat for Pokemon Go - GoChat : 500 IAmNaughty – Dating App to Meet New People Online : 463 Qzone HD : 458 Zenly - Locate your friends in realtime : 427 League of Legends Friends : 420 豆瓣 : 407 Candid - Speak Your Mind Freely : 398 知乎 : 397 Selfeo : 366 Fake-A-Location Free ™ : 354 Popcorn Buzz - Free Group Calls : 281 Fam — Group video calling for iMessage : 279 QQ International : 274 Ameba : 269 SoundCloud Pulse: for creators : 240 Tantan : 235 Cougar Dating & Life Style App for Mature Women : 213 Rawr Messenger - Dab your chat : 180 WhenToPost: Best Time to Post Photos for Instagram : 158 Inke—Broadcast an amazing life : 147 Mustknow - anonymous video Q&A : 53 CTFxCmoji : 39 Lobi : 36 Chain: Collaborate On MyVideo Story/Group Video : 35 botman - Real time video chat : 7 BestieBox : 0 MATCH ON LINE chat : 0 niconico ch : 0 LINE BLOG : 0 bit-tube - Live Stream Video Chat : 0
The 'Social Networking' genre contains a larger number of apps than the other genres with a high average number of ratings. Although this genre also appears to be dominated by well known and popular apps like 'Facebook', 'Pinterest' and 'Skype'; there are a larger number of apps with an appreciable number of user ratings with suggests there is more chance that a new app may also attract a decent number of users.
Since all these popular genres are heavily skewed by a few apps with a large number of ratings, we could make our analysis more relevant to new apps by removing these very popular apps (>=30% of total number of ratings in genre) from the dataset and then recalculating the average number of ratings per app:
genre_dict = {}
for genre in prime_genre_table:
total_num_ratings = 0
len_genre = 0
for app in free_english_apple_data:
genre_app = app[11]
if genre_app == genre:
app_num_ratings = float(app[5])
total_num_ratings += app_num_ratings
#added a loop to remove apps for average calculation which have over 20% of total number of ratings in genre
new_total_num_ratings = total_num_ratings
for app in free_english_apple_data:
genre_app = app[11]
if genre_app == genre:
app_num_ratings = float(app[5])
if app_num_ratings >= 0.2*total_num_ratings:
new_total_num_ratings -= app_num_ratings
else:
len_genre += 1
mean_num_ratings = new_total_num_ratings / len_genre
genre_dict[genre] = mean_num_ratings
genre_list = []
for key in genre_dict:
key_val_as_tuple = (genre_dict[key], key)
genre_list.append(key_val_as_tuple)
genre_list_sorted = sorted(genre_list, reverse=True)
for entry in genre_list_sorted:
print(entry[1], ':', entry[0])
Social Networking : 43899.514285714286 Weather : 35859.666666666664 Music : 27782.953125 Shopping : 26919.690476190477 Book : 23426.384615384617 Sports : 23008.898550724636 Games : 22788.6696905016 Reference : 21355.176470588234 Productivity : 21028.410714285714 Finance : 19606.941176470587 Travel : 17527.358974358973 Photo & Video : 15025.716981132075 Entertainment : 14029.830708661417 News : 13323.97619047619 Utilities : 12925.0125 Food & Drink : 12675.083333333334 Health & Fitness : 10044.920634920634 Lifestyle : 9956.1 Education : 7003.983050847458 Business : 5541.75 Navigation : 4146.25 Catalogs : 890.3333333333334 Medical : 9.666666666666666
With the most popular market dominating apps removed in each genre, we can compare average number of ratings for apps with 20% or less of the total number of ratings for all apps in a given genre.
The following 5 genres have the highest average number of rating for smaller apps:
We could build a new social networking app, which is likely to be cheaper to build and not require paying for an external specialized API.
For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. Using the 'installs' column of the Google Play dataset we can look at the relative frequency of intervals for number of installs. The intervals are imprecise, but we can accept taking the lower limit of each interval as the float value for all values in the interval as we do not require high precision for our analysis. Note that the intervals contain non-numeric characters ',' and '+' but we can convert the intervals to floats using the str.replace(old, new)
method.
display_table(free_english_google_cleaned, 5) #'Installs' column is column indexed 5
1,000,000+ : 15.726534296028879 100,000+ : 11.552346570397113 10,000,000+ : 10.548285198555957 10,000+ : 10.198555956678701 1,000+ : 8.393501805054152 100+ : 6.915613718411552 5,000,000+ : 6.825361010830325 500,000+ : 5.561823104693141 50,000+ : 4.7721119133574 5,000+ : 4.512635379061372 10+ : 3.5424187725631766 500+ : 3.2490974729241873 50,000,000+ : 2.3014440433213 100,000,000+ : 2.1322202166064983 50+ : 1.917870036101083 5+ : 0.78971119133574 1+ : 0.5076714801444043 500,000,000+ : 0.2707581227436823 1,000,000,000+ : 0.22563176895306858 0+ : 0.04512635379061372 0 : 0.01128158844765343
Start by generating a frequency table for the Category column of the Google Play data set to get the unique app genres using the freq_table()
function:
categories_table = freq_table(free_english_google_cleaned, 1)
category_dict = {}
for category in categories_table:
total_installs = 0
len_category = 0
for app in free_english_google_cleaned:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = float(n_installs.replace('+', '').replace(',', ''))
total_installs += n_installs
len_category += 1
mean_installs = total_installs / len_category
category_dict[category] = mean_installs
category_list = []
for key in category_dict:
key_val_as_tuple = (category_dict[key], key)
category_list.append(key_val_as_tuple)
category_list_sorted = sorted(category_list, reverse=True)
for entry in category_list_sorted:
print(entry[1], ':', entry[0])
COMMUNICATION : 38456119.167247385 VIDEO_PLAYERS : 24727872.452830188 SOCIAL : 23253652.127118643 PHOTOGRAPHY : 17840110.40229885 PRODUCTIVITY : 16787331.344927534 GAME : 15588015.603248259 TRAVEL_AND_LOCAL : 13984077.710144928 ENTERTAINMENT : 11640705.88235294 TOOLS : 10801391.298666667 NEWS_AND_MAGAZINES : 9549178.467741935 BOOKS_AND_REFERENCE : 8767811.894736841 SHOPPING : 7036877.311557789 PERSONALIZATION : 5201482.6122448975 WEATHER : 5074486.197183099 HEALTH_AND_FITNESS : 4188821.9853479853 MAPS_AND_NAVIGATION : 4056941.7741935486 FAMILY : 3695641.8198090694 SPORTS : 3638640.1428571427 ART_AND_DESIGN : 1986335.0877192982 FOOD_AND_DRINK : 1924897.7363636363 EDUCATION : 1833495.145631068 BUSINESS : 1712290.1474201474 LIFESTYLE : 1437816.2687861272 FINANCE : 1387692.475609756 HOUSE_AND_HOME : 1331540.5616438356 DATING : 854028.8303030303 COMICS : 817657.2727272727 AUTO_AND_VEHICLES : 647317.8170731707 LIBRARIES_AND_DEMO : 638503.734939759 PARENTING : 542603.6206896552 BEAUTY : 513151.88679245283 EVENTS : 253542.22222222222 MEDICAL : 120550.61980830671
On average, apps in the COMMUNICATION category have the most number of installs (approx. 38,000,000). However, as with the average number of ratings of apps in different genres in the Apple Store dataset, by isolating the very popular apps we can see that the mean is highly skewed upwards by a few apps with 1 billion installs like WhatsApp Messenger, Messenger, Skype and Google Chrome and others with 500 million installs:
for app in free_english_google_cleaned:
if (app[1] == 'COMMUNICATION') & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
print(app[0], ':', app[5])
B612 - Beauty & Filter Camera : 100,000,000+ YouCam Makeup - Magic Selfie Makeovers : 100,000,000+ Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+ Google Photos : 1,000,000,000+ Retrica : 100,000,000+ Photo Editor Pro : 100,000,000+ BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+ PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+ Photo Collage Editor : 100,000,000+ Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+ PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+ Candy Camera - selfie, beauty camera, photo editor : 100,000,000+ YouCam Perfect - Selfie Photo Editor : 100,000,000+ Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+ S Photo Editor - Collage Maker , Photo Collage : 100,000,000+ AR effect : 100,000,000+ Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+ LINE Camera - Photo editor : 100,000,000+ Photo Editor Collage Maker Pro : 100,000,000+ WhatsApp Messenger : 1,000,000,000+ imo beta free calls and text : 100,000,000+ Android Messages : 100,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ Who : 100,000,000+ GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ Firefox Browser fast & private : 100,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Messenger Lite: Free Calls & Messages : 100,000,000+ Kik : 100,000,000+ KakaoTalk: Free Calls & Text : 100,000,000+ Opera Mini - fast web browser : 100,000,000+ Opera Browser: Fast and Secure : 100,000,000+ Telegram : 100,000,000+ Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+ UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+ Viber Messenger : 500,000,000+ WeChat : 100,000,000+ Yahoo Mail – Stay Organized : 100,000,000+ BBM - Free Calls & Messages : 100,000,000+
If we also look at the app names with the number of installs for apps in the categories with the second and third highest average number of reviews (VIDEO_PLAYERS and SOCIAL) we see that the VIDEO_PLAYERS category is dominated by Youtube, Google Play and MX Player and the SOCIAL category is skewed by Facebook, Facebook Lite, Google+, Instagram and Snapchat.
for category in ['VIDEO_PLAYERS', 'SOCIAL']:
print(category)
for app in free_english_google_cleaned:
if (app[1] == category) & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
print(app[0], ':', app[5])
print()
VIDEO_PLAYERS YouTube : 1,000,000,000+ Motorola Gallery : 100,000,000+ VLC for Android : 100,000,000+ Google Play Movies & TV : 1,000,000,000+ MX Player : 500,000,000+ Dubsmash : 100,000,000+ VivaVideo - Video Editor & Photo Movie : 100,000,000+ VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+ Motorola FM Radio : 100,000,000+ SOCIAL Facebook : 1,000,000,000+ Facebook Lite : 500,000,000+ Tumblr : 100,000,000+ Pinterest : 100,000,000+ Google+ : 1,000,000,000+ Badoo - Free Chat & Dating App : 100,000,000+ Tango - Live Video Broadcast : 100,000,000+ Instagram : 1,000,000,000+ Snapchat : 500,000,000+ LinkedIn : 100,000,000+ Tik Tok - including musical.ly : 100,000,000+ BIGO LIVE - Live Stream : 100,000,000+ VK : 100,000,000+
As in the Apple Store, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.
Let us repeat what we did for the Apple Store data and remove those few very popular apps that skew the average for each category by removing apps with 100 million installs or above and recalculating the average number of installs in each category without these market domineering apps:
category_dict = {}
for category in categories_table:
total_installs = 0
len_category = 0
for app in free_english_google_cleaned:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = float(n_installs.replace('+', '').replace(',', ''))
total_installs += n_installs
#added a nested loop to remove apps for average calculation for each genre which have over 100000000 installs
new_total_installs = total_installs
for app in free_english_google_cleaned:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = float(n_installs.replace('+', '').replace(',', ''))
if n_installs >= 100000000:
new_total_installs -= n_installs
else:
len_category += 1
mean_installs = new_total_installs / len_category
category_dict[category] = mean_installs
category_list = []
for key in category_dict:
key_val_as_tuple = (category_dict[key], key)
category_list.append(key_val_as_tuple)
category_list_sorted = sorted(category_list, reverse=True)
for entry in category_list_sorted:
print(entry[1], ':', entry[0])
PHOTOGRAPHY : 7670532.29338843 GAME : 6272564.694894147 ENTERTAINMENT : 6118250.0 VIDEO_PLAYERS : 5544878.133333334 WEATHER : 5074486.197183099 SHOPPING : 4640920.541237113 COMMUNICATION : 3603485.3884615386 PRODUCTIVITY : 3379657.318885449 TOOLS : 3191461.128987517 SOCIAL : 3084582.5201793723 SPORTS : 2994082.551839465 TRAVEL_AND_LOCAL : 2944079.6336633665 PERSONALIZATION : 2549775.832167832 MAPS_AND_NAVIGATION : 2484104.7540983604 FAMILY : 2342897.527075812 HEALTH_AND_FITNESS : 2005713.6605166052 ART_AND_DESIGN : 1986335.0877192982 FOOD_AND_DRINK : 1924897.7363636363 EDUCATION : 1833495.145631068 NEWS_AND_MAGAZINES : 1502841.8775510204 BOOKS_AND_REFERENCE : 1437212.2162162163 HOUSE_AND_HOME : 1331540.5616438356 BUSINESS : 1226918.7407407407 LIFESTYLE : 1152128.779710145 FINANCE : 1086125.7859327218 DATING : 854028.8303030303 COMICS : 817657.2727272727 AUTO_AND_VEHICLES : 647317.8170731707 LIBRARIES_AND_DEMO : 638503.734939759 PARENTING : 542603.6206896552 BEAUTY : 513151.88679245283 EVENTS : 253542.22222222222 MEDICAL : 120550.61980830671
The following 5 categories have the highest average number of installs for 'smaller' apps:
Lets look at the photography category in greater detail:
for app in free_english_google_cleaned:
if (app[1] == 'PHOTOGRAPHY') & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
print(app[0], ':', app[5])
print('\n'*2)
for app in free_english_google_cleaned:
if app[1] == 'PHOTOGRAPHY':
print(app[0], ':', app[5])
It appears that there are no apps with over 500,000,000 installs in this category but 19 apps in the 100,000,000 to 500,000,000 installs interval: it appears that the number of installs is more evenly distributed than in other categories.
We will now look at the names of apps which are 'moderately' popular in this category (between 1,000,000 and 100,000,000 installs):
for app in free_english_google_cleaned:
if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000+'
or app[5] == '5,000,000+'
or app[5] == '10,000,000+'
or app[5] == '50,000,000+'):
print(app[0], ':', app[5])
There are mainly camera apps and photo editing / organizer apps on this list with a few apps which are photo sharing (social) apps. It may not be a good idea to build an editing app since there will be significant competition. We may have success with an app that combines photo editing and social media but there are already a number of giants like Instagram and Snapchat who dominate this markets and would be very difficult to compete with.
The game genre is second on this list, but previously we found out this part of the market seems very saturated on the Apple Store, so we'd like to come up with a different app recommendation if possible as we are looking to recommend an app which has the potential to be successful in both the Apple and the Android markets.
Since the 'COMMUNICATION' and 'SOCIAL' categories are relatively high up this list of average installs with larger apps removed, and considering that some popular 'SOCIAL' apps could have been classified as 'COMMUNICATION', and further recalling that 'Social Networking' was potentially a promising genre in the apple store market: we will also look at the 'SOCIAL' category in more detail by again isolating 'moderately' popular apps whilst also looking at the apps categorized as 'COMMUNICATION' with a very large number of installs:
for app in free_english_google_cleaned:
if (app[1] == 'COMMUNICATION') & ((app[5] == '1,000,000,000+') | (app[5] == '500,000,000+') | (app[5] == '100,000,000+')):
print(app[0], ':', app[5])
print('\n')
for app in free_english_google_cleaned:
if (app[1] == 'SOCIAL') and (app[5] == '1,000,000+'
or app[5] == '5,000,000+'
or app[5] == '10,000,000+'
or app[5] == '50,000,000+'):
print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+ imo beta free calls and text : 100,000,000+ Android Messages : 100,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ Who : 100,000,000+ GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ Firefox Browser fast & private : 100,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Messenger Lite: Free Calls & Messages : 100,000,000+ Kik : 100,000,000+ KakaoTalk: Free Calls & Text : 100,000,000+ Opera Mini - fast web browser : 100,000,000+ Opera Browser: Fast and Secure : 100,000,000+ Telegram : 100,000,000+ Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+ UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+ Viber Messenger : 500,000,000+ WeChat : 100,000,000+ Yahoo Mail – Stay Organized : 100,000,000+ BBM - Free Calls & Messages : 100,000,000+ TextNow - free text + calls : 10,000,000+ The Messenger App : 1,000,000+ Messenger Pro : 1,000,000+ Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+ Telegram X : 5,000,000+ Jodel - The Hyperlocal App : 1,000,000+ Hide Something - Photo, Video : 5,000,000+ Love Sticker : 1,000,000+ Web Browser & Fast Explorer : 5,000,000+ LiveMe - Video chat, new friends, and make money : 10,000,000+ VidStatus app - Status Videos & Status Downloader : 5,000,000+ Love Images : 1,000,000+ SPARK - Live random video chat & meet new people : 5,000,000+ Facebook Local : 1,000,000+ Meet – Talk to Strangers Using Random Video Chat : 5,000,000+ MobilePatrol Public Safety App : 1,000,000+ 💘 WhatsLov: Smileys of love, stickers and GIF : 1,000,000+ HTC Social Plugin - Facebook : 10,000,000+ Quora : 10,000,000+ Kate Mobile for VK : 10,000,000+ Family GPS tracker KidControl + GPS by SMS Locator : 1,000,000+ Moment : 1,000,000+ Text Me: Text Free, Call Free, Second Phone Number : 10,000,000+ Text Free: WiFi Calling App : 5,000,000+ Text free - Free Text + Call : 10,000,000+ ooVoo Video Calls, Messaging & Stories : 50,000,000+ Whisper : 5,000,000+ Blogger : 5,000,000+ TwitCasting Live : 1,000,000+ YouNow: Live Stream Video Chat : 10,000,000+ Banjo : 1,000,000+ We Heart It : 10,000,000+ MeetMe: Chat & Meet New People : 50,000,000+ Timehop : 5,000,000+ Frontback - Social Photos : 1,000,000+ Path : 10,000,000+ SayHi Chat, Meet New People : 10,000,000+ Tapatalk - 100,000+ Forums : 10,000,000+ Couple - Relationship App : 1,000,000+ Nextdoor - Local neighborhood news & classifieds : 5,000,000+ LOVOO : 10,000,000+ Jaumo Dating, Flirt & Live Video : 10,000,000+ Zello PTT Walkie Talkie : 50,000,000+ textPlus: Free Text & Calls : 10,000,000+ magicApp Calling & Messaging : 10,000,000+ Dating App, Flirt & Chat : W-Match : 10,000,000+ Meetup : 5,000,000+ POF Free Dating App : 50,000,000+ Tagged - Meet, Chat & Dating : 10,000,000+ SKOUT - Meet, Chat, Go Live : 50,000,000+ Mico- Stranger Chat Random video Chat, Live, Meet : 10,000,000+ Waplog - Free Chat, Dating App, Meet Singles : 10,000,000+ B-Messenger Video Chat : 1,000,000+ Instachat 😜 : 5,000,000+ Fame Boom for Real Followers, Likes : 5,000,000+ FollowMeter for Instagram : 1,000,000+ pixiv : 1,000,000+ U LIVE – Video Chat & Stream : 1,000,000+ VMate Lite - Funny Short Videos Social Network : 1,000,000+ Legend - Animate Text in Video : 10,000,000+ GUYZ - Gay Chat & Gay Dating : 1,000,000+ Snaappy – 3D fun AR core communication platform : 1,000,000+ Find My Friends : 10,000,000+ Grindr - Gay chat : 10,000,000+ Lesbian Chat & Dating - SPICY : 1,000,000+ BOO! - Next Generation Messenger : 1,000,000+ Wishbone - Compare Anything : 1,000,000+ Fiesta by Tango - Find, Meet and Make New Friends : 1,000,000+ Periscope - Live Video : 10,000,000+ Free phone calls, free texting SMS on free number : 10,000,000+ Phone Tracker : Family Locator : 10,000,000+ HOLLA Live: Meet New People via Random Video Chat : 5,000,000+ +Download 4 Instagram Twitter : 1,000,000+ Hornet - Gay Social Network : 5,000,000+ Amino: Communities and Chats : 10,000,000+ EZ Video Download for Facebook : 1,000,000+ Messages, Text and Video Chat for Messenger : 10,000,000+ All Social Networks : 1,000,000+ Messenger Messenger : 10,000,000+ Facebook Creator : 1,000,000+ Friendly for Facebook : 1,000,000+ Faster for Facebook Lite : 1,000,000+ Messenger : 10,000,000+ Who Viewed My Facebook Profile - Stalkers Visitors : 5,000,000+ Stickers for Facebook : 1,000,000+ FunForMobile Ringtones & Chat : 5,000,000+ Frim: get new friends on local chat rooms : 5,000,000+
In this list there are a variety of dating apps and also apps that have been made as 'add-ons' or enhancements to the well known, very popular social apps. We can also see that there are a number of very successful apps in the 'COMMUNCIATION' category that could equally well be categorized as 'SOCIAL'.
There is competition in the middle market, but there may be potential in both markets for a new dating app which stands out somehow. We have seen that the Game genre is saturated on the apple store but perhaps there is a niche for a 'gamified' dating app. The book genre is quite high up the ranked list of average ratings of 'smaller' apps in the Apple Store: perhaps we could create a dating app which connects users based upon shared literature preferences or a dating app targeted at a religious demographic which is an offshoot from the very popular existing reference apps for the Bible and the Quran on the Apple Store.
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.
We concluded that creating a dating app with a unique selling point or specific focus could be profitable for both the Google Play and the App Store markets. The markets are already full of dating apps, so we need to add some special features or make the app specific to people of a certain demographic or with certain shared interests. This may limit our market size however, so alternatively we could create a 'gamified' dating app that leverages the high popularity of both the game and the social genres in both markets.