As a datascientist at a company that builds apps for Apple Store and Google Play markets our goal is to help developers to take a datadriven decisions about new app idea.
In this project our purpose is to build a profitable app that will be free to download and to install. Our main revenue relies on in-app ads. That's why we aim to reach more users and want our user to spend more time on our app. In such a a way more users will see and will engage with adds.
To find out what kind of apps mights be interesting for users we need to analyze data to help our developpers to understand what type of apps will attract more users.
For this project we're going to study Apple Store data and Google Market data in order to find the best directions to follow to build a new app. We want to make our app available on both markets.
We will focuse on free English-speaking apps.
In this task we're going to use a sample of already availbale data online.
Hopefully, we have available data for both platforms:
In the code below we :
list_data
.After these steps we want to find out what columns can be potentially interesting for our data analysis.
from csv import reader
def open_file(dataset, has_header=True):
opened_file = open(dataset, encoding='utf8')
read_file = reader(opened_file)
list_data = list(read_file)
if has_header:
return list_data[0], list_data[1:]
else:
return list_data[1:]
ios_header, ios = open_file('AppleStore.csv', has_header=True)
android_header, android = open_file('googleplaystore.csv', has_header=True)
The function explore_data
permits to explore our datasets from different perspectives:
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n')
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
print(ios_header)
print('\n')
explore_data(ios, 2, 5, rows_and_columns=True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'] Number of rows: 7197 Number of columns: 16
After a quick glance over our header we can assume that we might be interested in columns such as :
For more detailed info about columns check here.
print(android_header)
print('\n')
explore_data(android, 2, 5, rows_and_columns=True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] Number of rows: 10841 Number of columns: 13
After a quick glance over our header we can assume that we might be interested in columns such as :
For more detailed info about columns check here.
In these previous outputs we can observe that the following datasets have:
1. iOS apps:
* 16 columns
* 7197 rows
2. Android apps:
* 13 columns
* 10841 rows
So we might think that there are much more apps for Google Play market and we should build our app for that platform. The discrepency might come:
Before getting started with our analysis we need to figure out if our data set is clean, have no errors and all the data points correspond to our requirements.
The function below we will check if all of our data set rows have the same length. More exactly we compare the length of each row to the length of our header.
def length_row(header, dataset):
header_lenght = len(header)
count = 0
for row in dataset:
count += 1
if header_lenght != len(row):
return count, row
ios_error = length_row(ios_header, ios)
android_error_row, android_error_line = length_row(android_header, android)
print(ios_error)
None
As we can see our iOS data set doesn have such length problem.
Below we're printing the number of the row error as well as the line itself, our header and another line. This will help to visually undrestand where is the problem.
print(android_error_row)
print(android_error_line)
print()
print(android_header)
print()
print(android[0])
10473 ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']
Now we can see that Google Store dataset has an issue with a missing cell corresponding to 'Category' (ind 1) column on a line 10472. So we can delete this row to avoid any problems.
print(len(android))
del android[10472]
print(len(android))
10841 10840
Before moving on we need to make sure that our dataset contain no duplicates. We want to keep just one entry per app. Otherwise it can mislead our conclusion.
Below we're wrote a fucntion that checks if our datasets have duplicates application and how many.
def if_duplicate(dataset, index):
unique_apps =[]
duplicate_apps = []
for app in dataset:
name = app[index]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
return unique_apps, duplicate_apps
ios_unique, ios_duplicate = if_duplicate(ios, 1)
android_unique, android_duplicate = if_duplicate(android, 0)
print('Number of unique iOS apps is ', len(ios_unique))
print('Number of duplicate iOS apps is ', len(ios_duplicate))
print('\n')
print('Number of unique Android apps is ', len(android_unique))
print('Number of duplicate Android apps is ', len(android_duplicate))
Number of unique iOS apps is 7195 Number of duplicate iOS apps is 2 Number of unique Android apps is 9659 Number of duplicate Android apps is 1181
Those duplicates should be removed but we want to be able to make a choice which one is not or less usefull. For this reason we're going to verify the rows of duplicate apps in order to find a good criterion for deleting/keeping.
First of all we are going to find out the most frequent duplicates. This will give us more occurencies - so better view on value differences.
For this we are creating a frequency table with app name as a key and frequency as a value. Afterwards we want to check what duplicates are the most frequent in Google Market data set. We're checking what are the apps that have more than 7 occurences.
duplicates_freq = {}
for app in android:
name = app[0]
if name not in duplicates_freq:
duplicates_freq[name] = 1
else:
duplicates_freq[name] += 1
for app in duplicates_freq:
if duplicates_freq[app] >= 7:
print(app)
Duolingo: Learn Languages Free ROBLOX Candy Crush Saga 8 Ball Pool ESPN CBS Sports App - Scores, News, Stats & Watch Live
In the code below we are printing all rows with the apps name Duolingo: Learn Languages Free
.
print(android_header)
for app in android:
name = app[0]
if name == 'Duolingo: Learn Languages Free':
print('\n')
print(app)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6289924', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device'] ['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6290507', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device'] ['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6290507', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device'] ['Duolingo: Learn Languages Free', 'EDUCATION', '4.7', '6290507', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device'] ['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6294400', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device'] ['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6294397', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 1, 2018', 'Varies with device', 'Varies with device'] ['Duolingo: Learn Languages Free', 'FAMILY', '4.7', '6297590', 'Varies with device', '100,000,000+', 'Free', '0', 'Everyone', 'Education;Education', 'August 6, 2018', 'Varies with device', 'Varies with device']
We have 7 occurences with several but nevertheless important differences:
This explain why we can not make our deletion randomly. Of course, in this case we're dealing with the same app but maybe different version, or maybe it's justa data scrapping issue (data collection was performed in different periods).
That's why it would be more relevant to keep the app with more important review number.
print(ios_duplicate)
['Mannequin Challenge', 'VR Roller Coaster']
iOS app dataset has only two duplicates.
for app in ios:
name = app[1]
if name == 'VR Roller Coaster':
print('\n')
print(app)
print('\n')
print(ios_header)
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1'] ['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1'] ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
The main difference in duplicate apps consists in raiting count and rating count version. For this reason we are going to use review count criterion to delete our ios apps duplicates.
To do so we will:
def highest_review_app(dataset, ind_app, ind_review):
max_reviews = {}
for row in dataset:
app = row[ind_app]
review = float(row[ind_review])
if app in max_reviews and max_reviews[app] < review:
max_reviews[app] = review
elif app not in max_reviews:
max_reviews[app] = review
return max_reviews
ios_max_review = highest_review_app(ios, 1, 5)
android_max_review = highest_review_app(android, 0, 3)
print('Expected Google Market dataset length : ', len(android) - len(android_duplicate))
print('Actual Google Market dataset length : ', len(android_max_review))
print('\n')
print('Expected Apple Store dataset length : ', len(ios) - len(ios_duplicate))
print('Actual Apple Store dataset length : ', len(ios_max_review))
Expected Google Market dataset length : 9659 Actual Google Market dataset length : 9659 Expected Apple Store dataset length : 7195 Actual Apple Store dataset length : 7195
The function below:
We need the last step because we might have several apps with the same name and the same number of reviews thats why we need to double check with an already_added
list. If we don't use this condition we might end up with several duplicates.
def cleaning(dataset, ind_app, ind_review):
apps_clean = []
already_added = []
max_review = highest_review_app(dataset, ind_app, ind_review)
for app in dataset:
name = app[ind_app]
n_review = float(app[ind_review])
if (name not in already_added) and (n_review == max_review[name]):
apps_clean.append(app)
already_added.append(name)
return apps_clean
android_clean = cleaning(android, 0, 3)
explore_data(android_clean, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9659 Number of columns: 13
ios_clean = cleaning(ios, 1, 5)
explore_data(ios_clean, 0, 3, True)
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 7195 Number of columns: 16
Little reminder: our goal is to creat an app for English speaking users. This is why our second step will be cleaning the dataset from Non-English apps. Unfortunately, we can not be straightforward an ddelete all the apps that have non-ASCII characters.
Some apps may contain non-ASCII characters but still remain English speaking apps. For example, some app names may contain emojis or different characters that are not included in ASCII range of 127.
print(ord('™'))
print(ord('😜'))
8482 128540
In our case we our going to limit our app name filter to only 3 non-ASCII characters. In case if the apps name consists of 3 or less characters we will check if all the characters are non-ASCII.
def is_english(string):
non_ascii = 0
for ch in string:
if ord(ch) > 127:
non_ascii += 1
if non_ascii == len(string) or non_ascii >= 3:
return False
else:
return True
print(is_english('Instachat'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
True True False
Now, using this function we can explore our dataset and sort non-english apps (by our definition) by saving only English apps.
android_en = []
ios_en = []
for app in android_clean:
name = app[0]
if is_english(name):
android_en.append(app)
for app in ios_clean:
name = app[1]
if is_english(name):
ios_en.append(app)
explore_data(android_en, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9597 Number of columns: 13
explore_data(ios_en, 0, 3, True)
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 6147 Number of columns: 16
As we mentioned above: we're planing to build only free apps, thats why for our decision making process we want to keep only free apps.
In the code below we are creating new lists that will contain only free apps.
android_final = []
ios_final = []
for app in android_en:
price = app[7]
if price == '0':
android_final.append(app)
for app in ios_en:
price = app[4]
if price == '0.0':
ios_final.append(app)
print(len(android_final))
print(len(ios_final))
8848 3196
We're left with 8848 Android apps and 3196 iOS apps.
After all the cleaning process we can study our data in order to find presumably the most profitable category to develop. Our low lisk strategy is following:
We will study both dataset because in the future we're planning to engage with both markets an dwe need to find the most succesful niche.
Our next step will be discovering the most popular genres within each dataset. In the code below we are going to create a frequency table for each genre.
def freq_table(dataset, index):
table ={}
count = 0
#creating a frequency table
for row in dataset:
elem = row[index]
count += 1
if elem in table:
table[elem] += 1
else:
table[elem] = 1
#calculating percentage of each app category
for elem in table:
percentage = (table[elem] / count) * 100
table[elem] = percentage
return table
print(freq_table(ios_final, -5))
{'Social Networking': 3.254067584480601, 'Photo & Video': 5.006257822277847, 'Games': 58.260325406758454, 'Music': 2.065081351689612, 'Reference': 0.5319148936170213, 'Health & Fitness': 2.033792240300375, 'Weather': 0.8760951188986232, 'Utilities': 2.471839799749687, 'Travel': 1.2202753441802252, 'Shopping': 2.5969962453066335, 'News': 1.3454317897371715, 'Navigation': 0.18773466833541927, 'Lifestyle': 1.5644555694618274, 'Entertainment': 7.853566958698373, 'Food & Drink': 0.8135168961201502, 'Sports': 2.1589486858573217, 'Book': 0.37546933667083854, 'Finance': 1.0951188986232792, 'Education': 3.6921151439299122, 'Productivity': 1.7521902377972465, 'Business': 0.5319148936170213, 'Catalogs': 0.1251564455569462, 'Medical': 0.18773466833541927}
These results are not quite readable so we are going to sort our genres/categories according to their percentage in a descneding order.
To do so we're going to:
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
val_to_append = (table[key], key)
table_display.append(val_to_append)
table_sort = sorted(table_display, reverse = True)
for elem in table_sort:
print(elem[1], ' : ', elem[0])
print(display_table(ios_final, -5)) #exploring prime_genre
Games : 58.260325406758454 Entertainment : 7.853566958698373 Photo & Video : 5.006257822277847 Education : 3.6921151439299122 Social Networking : 3.254067584480601 Shopping : 2.5969962453066335 Utilities : 2.471839799749687 Sports : 2.1589486858573217 Music : 2.065081351689612 Health & Fitness : 2.033792240300375 Productivity : 1.7521902377972465 Lifestyle : 1.5644555694618274 News : 1.3454317897371715 Travel : 1.2202753441802252 Finance : 1.0951188986232792 Weather : 0.8760951188986232 Food & Drink : 0.8135168961201502 Reference : 0.5319148936170213 Business : 0.5319148936170213 Book : 0.37546933667083854 Navigation : 0.18773466833541927 Medical : 0.18773466833541927 Catalogs : 0.1251564455569462 None
Observation of the ios dataset: - Games is the most important category with more than half (58,26%) - Entertainement takes the second place with almost 8% - Photo & Video have reached 5% - Education reaches 3,69%
This shows that the most present apps in our dataset are apps designed for fun and less for practical purposes(education, productivity, utilities, weather, business, etc.)
In this dataset app categories are not distributed equally.
Nevertheless, giving this information we can not assume that these apps have the greatest number of users. The demand might be less important as the offer.
print(display_table(android_final, -4))#exploring Genre
Tools : 8.44258589511754 Entertainment : 6.080470162748644 Education : 5.357142857142857 Business : 4.599909584086799 Productivity : 3.899186256781193 Lifestyle : 3.8765822784810124 Finance : 3.7070524412296564 Medical : 3.5375226039783 Sports : 3.4584086799276674 Personalization : 3.322784810126582 Communication : 3.2323688969258586 Action : 3.096745027124774 Health & Fitness : 3.0854430379746836 Photography : 2.949819168173599 News & Magazines : 2.802893309222423 Social : 2.667269439421338 Travel & Local : 2.328209764918626 Shopping : 2.2490958408679926 Books & Reference : 2.1360759493670884 Simulation : 2.0456600361663653 Dating : 1.8648282097649187 Arcade : 1.842224231464738 Video Players & Editors : 1.7744122965641953 Casual : 1.763110307414105 Maps & Navigation : 1.3901446654611211 Food & Drink : 1.2432188065099457 Puzzle : 1.1301989150090417 Racing : 0.9945750452079566 Role Playing : 0.9380650994575045 Libraries & Demo : 0.9380650994575045 Auto & Vehicles : 0.9267631103074141 Strategy : 0.9154611211573236 House & Home : 0.8024412296564195 Weather : 0.7911392405063291 Events : 0.7120253164556962 Adventure : 0.6668173598553345 Comics : 0.599005424954792 Beauty : 0.599005424954792 Art & Design : 0.599005424954792 Parenting : 0.4972875226039783 Card : 0.45207956600361665 Trivia : 0.4181735985533454 Casino : 0.4181735985533454 Educational;Education : 0.39556962025316456 Board : 0.3842676311030741 Educational : 0.3729656419529837 Education;Education : 0.33905967450271246 Word : 0.25994575045207957 Casual;Pretend Play : 0.23734177215189875 Music : 0.2034358047016275 Racing;Action & Adventure : 0.16952983725135623 Puzzle;Brain Games : 0.16952983725135623 Entertainment;Music & Video : 0.16952983725135623 Casual;Brain Games : 0.13562386980108498 Casual;Action & Adventure : 0.13562386980108498 Arcade;Action & Adventure : 0.12432188065099457 Action;Action & Adventure : 0.10171790235081375 Educational;Pretend Play : 0.09041591320072333 Simulation;Action & Adventure : 0.07911392405063292 Parenting;Education : 0.07911392405063292 Entertainment;Brain Games : 0.07911392405063292 Board;Brain Games : 0.07911392405063292 Parenting;Music & Video : 0.06781193490054249 Educational;Brain Games : 0.06781193490054249 Casual;Creativity : 0.06781193490054249 Art & Design;Creativity : 0.06781193490054249 Education;Pretend Play : 0.05650994575045208 Role Playing;Pretend Play : 0.045207956600361664 Education;Creativity : 0.045207956600361664 Role Playing;Action & Adventure : 0.033905967450271246 Puzzle;Action & Adventure : 0.033905967450271246 Entertainment;Creativity : 0.033905967450271246 Entertainment;Action & Adventure : 0.033905967450271246 Educational;Creativity : 0.033905967450271246 Educational;Action & Adventure : 0.033905967450271246 Education;Music & Video : 0.033905967450271246 Education;Brain Games : 0.033905967450271246 Education;Action & Adventure : 0.033905967450271246 Adventure;Action & Adventure : 0.033905967450271246 Video Players & Editors;Music & Video : 0.022603978300180832 Sports;Action & Adventure : 0.022603978300180832 Simulation;Pretend Play : 0.022603978300180832 Puzzle;Creativity : 0.022603978300180832 Music;Music & Video : 0.022603978300180832 Entertainment;Pretend Play : 0.022603978300180832 Casual;Education : 0.022603978300180832 Board;Action & Adventure : 0.022603978300180832 Video Players & Editors;Creativity : 0.011301989150090416 Trivia;Education : 0.011301989150090416 Travel & Local;Action & Adventure : 0.011301989150090416 Tools;Education : 0.011301989150090416 Strategy;Education : 0.011301989150090416 Strategy;Creativity : 0.011301989150090416 Strategy;Action & Adventure : 0.011301989150090416 Simulation;Education : 0.011301989150090416 Role Playing;Brain Games : 0.011301989150090416 Racing;Pretend Play : 0.011301989150090416 Puzzle;Education : 0.011301989150090416 Parenting;Brain Games : 0.011301989150090416 Music & Audio;Music & Video : 0.011301989150090416 Lifestyle;Pretend Play : 0.011301989150090416 Lifestyle;Education : 0.011301989150090416 Health & Fitness;Education : 0.011301989150090416 Health & Fitness;Action & Adventure : 0.011301989150090416 Entertainment;Education : 0.011301989150090416 Communication;Creativity : 0.011301989150090416 Comics;Creativity : 0.011301989150090416 Casual;Music & Video : 0.011301989150090416 Card;Action & Adventure : 0.011301989150090416 Books & Reference;Education : 0.011301989150090416 Art & Design;Pretend Play : 0.011301989150090416 Art & Design;Action & Adventure : 0.011301989150090416 Arcade;Pretend Play : 0.011301989150090416 Adventure;Education : 0.011301989150090416 None
This percentage proportion shows us that we Google Play market has more of practical apps than fun apps.
In the top selection we have :
Other practical apps are more equally represented through the whole set of categories. This was not the case for iOS apps.
print(display_table(android_final, 1)) #exploring category
FAMILY : 18.942133815551536 GAME : 9.697106690777577 TOOLS : 8.453887884267631 BUSINESS : 4.599909584086799 PRODUCTIVITY : 3.899186256781193 LIFESTYLE : 3.887884267631103 FINANCE : 3.7070524412296564 MEDICAL : 3.5375226039783 SPORTS : 3.390596745027125 PERSONALIZATION : 3.322784810126582 COMMUNICATION : 3.2323688969258586 HEALTH_AND_FITNESS : 3.0854430379746836 PHOTOGRAPHY : 2.949819168173599 NEWS_AND_MAGAZINES : 2.802893309222423 SOCIAL : 2.667269439421338 TRAVEL_AND_LOCAL : 2.3395117540687163 SHOPPING : 2.2490958408679926 BOOKS_AND_REFERENCE : 2.1360759493670884 DATING : 1.8648282097649187 VIDEO_PLAYERS : 1.7970162748643763 MAPS_AND_NAVIGATION : 1.3901446654611211 FOOD_AND_DRINK : 1.2432188065099457 EDUCATION : 1.164104882459313 ENTERTAINMENT : 0.9606690777576853 LIBRARIES_AND_DEMO : 0.9380650994575045 AUTO_AND_VEHICLES : 0.9267631103074141 HOUSE_AND_HOME : 0.8024412296564195 WEATHER : 0.7911392405063291 EVENTS : 0.7120253164556962 PARENTING : 0.6555153707052441 ART_AND_DESIGN : 0.6442133815551537 COMICS : 0.6103074141048824 BEAUTY : 0.599005424954792 None
From this representation we can assume that the biggest part of apps were designed for practical purpose. All the top categories , except Game category with 10% are represented by productive apps.
However the Family
category is a bit vague. We are going to see what kind of apps are represented in this category and what types of genre is assigned to this category.
In the code below we've:
FAMILY
.def category_genre(dataset, elem, ind_imp, ind_comp):
table = {}
l_comp = []
for row in dataset:
if elem == row[ind_imp]:
l_comp.append(row[ind_comp])
for el in l_comp:
if el in table:
table[el] += 1
else:
table[el] = 1
return table, len(l_comp)
cat_to_gen, len_category = category_genre(android_final, 'FAMILY' , 1, -4)
genre_sort = []
for genre in cat_to_gen:
freq_genre = (cat_to_gen[genre], genre)
genre_sort.append(freq_genre)
genre_sort = sorted(genre_sort, reverse = True)
print('Google Play Market contains ', len_category, ' apps which category is FAMILY')
print()
for genre in genre_sort:
print(genre[1], ' : ', genre[0])
Google Play Market contains 1676 apps which category is FAMILY Entertainment : 458 Education : 382 Simulation : 174 Casual : 134 Puzzle : 78 Role Playing : 72 Strategy : 66 Educational;Education : 35 Educational : 33 Education;Education : 24 Casual;Pretend Play : 21 Racing;Action & Adventure : 15 Puzzle;Brain Games : 15 Entertainment;Music & Video : 12 Casual;Action & Adventure : 12 Casual;Brain Games : 11 Arcade;Action & Adventure : 11 Educational;Pretend Play : 8 Action;Action & Adventure : 8 Simulation;Action & Adventure : 7 Board;Brain Games : 7 Entertainment;Brain Games : 6 Educational;Brain Games : 6 Casual;Creativity : 6 Role Playing;Pretend Play : 4 Education;Pretend Play : 4 Role Playing;Action & Adventure : 3 Puzzle;Action & Adventure : 3 Entertainment;Action & Adventure : 3 Educational;Creativity : 3 Educational;Action & Adventure : 3 Education;Music & Video : 3 Education;Action & Adventure : 3 Adventure;Action & Adventure : 3 Sports;Action & Adventure : 2 Simulation;Pretend Play : 2 Puzzle;Creativity : 2 Music;Music & Video : 2 Entertainment;Pretend Play : 2 Entertainment;Creativity : 2 Education;Creativity : 2 Casual;Education : 2 Board;Action & Adventure : 2 Art & Design;Creativity : 2 Video Players & Editors;Music & Video : 1 Trivia;Education : 1 Strategy;Education : 1 Strategy;Creativity : 1 Strategy;Action & Adventure : 1 Simulation;Education : 1 Role Playing;Brain Games : 1 Racing;Pretend Play : 1 Puzzle;Education : 1 Music & Audio;Music & Video : 1 Lifestyle;Education : 1 Health & Fitness;Education : 1 Health & Fitness;Action & Adventure : 1 Entertainment;Education : 1 Education;Brain Games : 1 Communication;Creativity : 1 Casual;Music & Video : 1 Card;Action & Adventure : 1 Books & Reference;Education : 1 Art & Design;Pretend Play : 1 Art & Design;Action & Adventure : 1 Arcade;Pretend Play : 1 Adventure;Education : 1
As we've expected the biggest part of apps allocated to the FAMILY
category has fun purpose: genre Entertainement
and different sub-types of entertainement and games take most important part in this category.
Nevertheless, Google Play app distribution between categories is more balanceв in comparison to iOS app dataset.
We need to find out what types of genres/categories are the most popular among users. For this we need to calculate the avarage number of installs for each app genre.
There are no information about the install numbers in our iOS dataset. So we're going to use the raiting_count_tot
column.
In the code below we're:
freq_genre_ios = freq_table(ios_final, -5)
for genre in freq_genre_ios:
total = 0
len_genre = 0
for row in ios_final:
genre_app = row[-5]
if genre == genre_app:
raiting = float(row[5])
total += raiting
len_genre += 1
average_raiting = total / len_genre
print(genre, ' : ', average_raiting)
Social Networking : 72916.54807692308 Photo & Video : 28441.54375 Games : 22935.43984962406 Music : 57326.530303030304 Reference : 79350.4705882353 Health & Fitness : 23298.015384615384 Weather : 52279.892857142855 Utilities : 19156.493670886077 Travel : 28964.05128205128 Shopping : 27230.734939759037 News : 21248.023255813954 Navigation : 86090.33333333333 Lifestyle : 16815.48 Entertainment : 14195.358565737051 Food & Drink : 33333.92307692308 Sports : 23008.898550724636 Book : 46384.916666666664 Finance : 32367.02857142857 Education : 7003.983050847458 Productivity : 21028.410714285714 Business : 7491.117647058823 Catalogs : 4004.0 Medical : 612.0
It seems like the apps that received more raitings belong to Social Networking
, Reference
, Music
, Navigation
and Education
.
These categories must be dominated by the largest groups. Navigation
by Waze, Google Maps, etc. Social Networking
by Facebook, Pinterest, Instagram, etc. Music
by Spotify, Shazam, Pandora, etc.
These big influencers have a big impact on the whole map.
def most_comm(dataset, pattern):
for app in dataset:
name = app[1]
genre = app[-5]
if genre == pattern:
print(name, ':', app[5])
print('Most common apps within Navigation category :')
print(most_comm(ios_final, 'Navigation'))
Most common apps within Navigation category : Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5 None
print('Most common apps within Social Networking category')
print(most_comm(ios_final, 'Social Networking'))
Most common apps within Social Networking category Facebook : 2974676 Pinterest : 1061624 Skype for iPhone : 373519 Messenger : 351466 Tumblr : 334293 WhatsApp Messenger : 287589 Kik : 260965 ooVoo – Free Video Call, Text and Voice : 177501 TextNow - Unlimited Text + Calls : 164963 Viber Messenger – Text & Call : 164249 Followers - Social Analytics For Instagram : 112778 MeetMe - Chat and Meet New People : 97072 We Heart It - Fashion, wallpapers, quotes, tattoos : 90414 InsTrack for Instagram - Analytics Plus More : 85535 Tango - Free Video Call, Voice and Chat : 75412 LinkedIn : 71856 Match™ - #1 Dating App. : 60659 Skype for iPad : 60163 POF - Best Dating App for Conversations : 52642 Timehop : 49510 Find My Family, Friends & iPhone - Life360 Locator : 43877 Whisper - Share, Express, Meet : 39819 Hangouts : 36404 LINE PLAY - Your Avatar World : 34677 WeChat : 34584 Badoo - Meet New People, Chat, Socialize. : 34428 Followers + for Instagram - Follower Analytics : 28633 GroupMe : 28260 Marco Polo Video Walkie Talkie : 27662 Miitomo : 23965 SimSimi : 23530 Grindr - Gay and same sex guys chat, meet and date : 23201 Wishbone - Compare Anything : 20649 imo video calls and chat : 18841 After School - Funny Anonymous School News : 18482 Quick Reposter - Repost, Regram and Reshare Photos : 17694 Weibo HD : 16772 Repost for Instagram : 15185 Live.me – Live Video Chat & Make Friends Nearby : 14724 Nextdoor : 14402 Followers Analytics for Instagram - InstaReport : 13914 YouNow: Live Stream Video Chat : 12079 FollowMeter for Instagram - Followers Tracking : 11976 LINE : 11437 eHarmony™ Dating App - Meet Singles : 11124 Discord - Chat for Gamers : 9152 QQ : 9109 Telegram Messenger : 7573 Weibo : 7265 Periscope - Live Video Streaming Around the World : 6062 Chat for Whatsapp - iPad Version : 5060 QQ HD : 5058 Followers Analysis Tool For Instagram App Free : 4253 live.ly - live video streaming : 4145 Houseparty - Group Video Chat : 3991 SOMA Messenger : 3232 Monkey : 3060 Down To Lunch : 2535 Flinch - Video Chat Staring Contest : 2134 Highrise - Your Avatar Community : 2011 LOVOO - Dating Chat : 1985 PlayStation®Messages : 1918 BOO! - Video chat camera with filters & stickers : 1805 Qzone : 1649 Chatous - Chat with new people : 1609 Kiwi - Q&A : 1538 GhostCodes - a discovery app for Snapchat : 1313 Jodel : 1193 FireChat : 1037 Google Duo - simple video calling : 1033 Fiesta by Tango - Chat & Meet New People : 885 Google Allo — smart messaging : 862 Peach — share vividly : 727 Hey! VINA - Where Women Meet New Friends : 719 Battlefield™ Companion : 689 All Devices for WhatsApp - Messenger for iPad : 682 Chat for Pokemon Go - GoChat : 500 IAmNaughty – Dating App to Meet New People Online : 463 Qzone HD : 458 Zenly - Locate your friends in realtime : 427 League of Legends Friends : 420 Candid - Speak Your Mind Freely : 398 Selfeo : 366 Fake-A-Location Free ™ : 354 Popcorn Buzz - Free Group Calls : 281 Fam — Group video calling for iMessage : 279 QQ International : 274 Ameba : 269 SoundCloud Pulse: for creators : 240 Tantan : 235 Cougar Dating & Life Style App for Mature Women : 213 Rawr Messenger - Dab your chat : 180 WhenToPost: Best Time to Post Photos for Instagram : 158 Inke—Broadcast an amazing life : 147 Mustknow - anonymous video Q&A : 53 CTFxCmoji : 39 Lobi : 36 Chain: Collaborate On MyVideo Story/Group Video : 35 botman - Real time video chat : 7 BestieBox : 0 MATCH ON LINE chat : 0 niconico ch : 0 LINE BLOG : 0 bit-tube - Live Stream Video Chat : 0 None
As for Reference
category this result is influenced by the Bible app and Dictionary.com
print('Most common apps within Reference category')
print(most_comm(ios_final, 'Reference'))
Most common apps within Reference category Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 Jishokun-Japanese English Dictionary & Translator : 0 None
Nevertheless we can explore more this category for few reasons:
For example, we might:
print('Most common apps within Education category')
print(most_comm(ios_final, 'Education'))
Most common apps within Education category Duolingo - Learn Spanish, French and more : 162701 Guess My Age Math Magic : 123190 Lumosity - Brain Training : 96534 Elevate - Brain Training and Games : 58092 Fit Brains Trainer : 46363 ClassDojo : 35440 Memrise: learn languages : 20383 Peak - Brain Training : 20322 Canvas by Instructure : 19981 ABCmouse.com - Early Learning Academy : 18749 Quizlet: Study Flashcards, Languages & Vocabulary : 16683 Photomath - Camera Calculator : 16523 iTunes U : 15801 Blackboard Mobile Learn™ : 13567 Star Chart : 13482 Remind: Fast, Efficient School Messaging : 9796 PBS KIDS Video : 8651 Toca Kitchen Monsters : 8062 Toca Hair Salon - Christmas Gift : 8049 Edmodo : 7197 Prodigy Math Game : 6683 Epic! - Unlimited Books for Kids : 6676 ChineseSkill -Learn Mandarin Chinese Language Free : 6077 Google Classroom : 5942 TED : 5782 Khan Academy: you can learn anything : 5459 Got It - Homework Help Math, Chem, Physics Solver : 4903 PowerSchool Mobile : 4547 SkyView® Free - Explore the Universe : 4188 Hopscotch : 4057 IXL - Math and English : 3546 Simply Piano by JoyTunes - Learn & play piano : 2925 Kids A-Z : 2887 Infinite Campus Mobile Portal : 2286 PlayKids - Educational Cartoons and Games for Kids : 2196 Memorado Brain Training for Memory & Mindfulness : 2067 Bookshelf : 2064 Mathway - Math Problem Solver : 1854 Schoology : 1777 HelloTalk Language Exchange Learning App : 1619 SpellingCity : 1566 Nick Jr. : 1541 Babbel – Learn Languages Spanish, French & more : 1533 Yup - Homework Help with Math & Science Tutors : 1424 Mondly: Learn 33 Languages: Spanish English French : 1395 WWF Together : 1385 Tinycards - Learn with Fun, Free Flashcards : 1131 Nearpod : 1057 Starfall FREE : 1019 Reflex Student : 1010 GoldieBlox and the Movie Machine : 1000 Pearson eText : 981 codeSpark Academy with The Foos - coding for kids : 977 Dr. Panda Restaurant Asia : 853 NOGGIN - Preschool Shows & Educational Kids Videos : 782 Tynker - Learn to Code. Programming Made Easy. : 771 BrainHQ - Brain Training : 684 Top Hat Lecture : 668 Pearson eText for Schools : 609 Curious World: Games, Videos, Books for Children : 604 McGraw-Hill K-12 ConnectED Mobile : 594 Socrative Student : 581 Swift Playgrounds : 578 MarcoPolo Ocean : 529 TestNav : 491 Starfall Learn to Read : 474 Speakaboos Reading App: Stories & Songs for Kids : 440 Bloxels: Build, Play & Share Your Own Video Games : 382 GoNoodle Kids : 372 Global Shark Tracker : 336 The Robot Factory by Tinybop : 335 Daniel Tiger’s Day & Night : 314 Kahoot! Play Fun Learning Games : 300 Spanish SOLO: Learn Spanish With Lessons On The Go : 275 Math 42 : 248 Star Walk 2 Ads+ Night Sky Map - Stars and Planets : 161 Toca Dance Free : 149 Endless Learning Academy : 143 270toWin : 141 Win the White House : 123 Sago Mini Babies Dress Up : 115 Nancy Drew Codes and Clues Mystery Coding Game : 110 1600 : 110 BEAKER by THIX : 94 Highlights™ Shapes - Preschool Learning Puzzles : 90 Little Panda's Candy Shop - Lollipop Factory : 84 Mathpix - Solve and graph math using pictures : 83 Blue Apprentice Elementary Science Game by Galxyz : 79 PINKFONG Birthday Party : 70 Hopster: Kids TV, Nursery Rhymes, Music, Fun Games : 58 Sago Mini Holiday Trucks and Diggers : 56 Dr. Panda Toy Cars Free : 51 Virry Educational. Play, learn with real animals : 50 Highlights Monster Day : 49 PlayKids Learn - Learning through play : 49 PBS KIDS ScratchJr : 38 Lemon Lumberjack's Letter Mill : 34 Ready Jet Go! Space Explorer : 34 Chinese Recipes - Asian cuisine : 32 Nature Cat's Great Outdoors : 31 Show My Homework : 17 PINKFONG 123 Numbers : 17 Aquarium VR : 12 Little Panda Mini Games-3D : 9 Stylish School Timetable : 7 Merry Christmas -Activities : 7 Mastering the piano with Lang Lang : 6 Baby Panda's Carnival : 6 Driving test 2017 : 5 Cutie Patootie - Xmas Surprise : 5 Free IQ Test: Calculate your IQ : 5 Beautiful Japanese Handwriting for iPhone : 0 GhostCallDX : 0 Baby Learns Transportation : 0 Baby Panda's Bath Time : 0 Beautiful Japanese Handwriting : 0 Spring Festival by BabyBus : 0 Dinosaur Planet : 0 None
For example, we can notice that Education
category is predominated by learning langues apps or training apps. So we could make some app in the middle of 3 categories : Reference
, Education
and Book
.
Other categories represent less interest for us:
Weather
apps - people so not spend a lot of time watching weather forecast. So our chances to get a profit from in-app adds is quite low. We should get a reliable weather data which may require us to connect to non-free APIs.Food and drink
are dominated by huge businesses (Starbucks, Dunkin' Donuts, McDonald's, etc.). ANd we might need an actual cooking and delivery service.Finance
apps - will require us to hire domain expert in order to put in place banking, pay systems, transfers, etc.In Google Play Market we have the number of installs that we can use in order to find the most popular apps.
However the value of this column are not precise:
print(display_table(android_final, 5)) #installs column
1,000,000+ : 15.75497287522604 100,000+ : 11.539330922242314 10,000,000+ : 10.567359855334539 10,000+ : 10.194394213381555 1,000+ : 8.39737793851718 100+ : 6.928119349005425 5,000,000+ : 6.826401446654612 500,000+ : 5.560578661844485 50,000+ : 4.769439421338156 5,000+ : 4.486889692585895 10+ : 3.5375226039783 500+ : 3.2436708860759493 50,000,000+ : 2.2830018083182644 100,000,000+ : 2.1360759493670884 50+ : 1.9213381555153706 5+ : 0.7911392405063291 1+ : 0.5085895117540687 500,000,000+ : 0.27124773960216997 1,000,000,000+ : 0.22603978300180833 0+ : 0.045207956600361664 0 : 0.011301989150090416 None
Information presented as such doesn't help us to unterstand if the app was downloaded 1,000,000 times or 4,000,000.
However we don't need the meticulous precision. And we'll consider the numbers as they are: 100,000+ will correspond to 100000 downloads.
Thus, we will transform our entries into integers.
freq_category_android = freq_table(android_final, 1)
sort_list = []
for category in freq_category_android:
total = 0
len_category = 0
for app in android_final:
category_app = app[1]
if category_app == category:
n_instals = app[5]
n_instals = n_instals.replace('+', '')
n_instals = n_instals.replace(',', '')
total += float(n_instals)
len_category += 1
#print(n_instals)
average_installs = total / len_category
install_cat = (average_installs, category)
sort_list.append(install_cat)
#print(category, ' : ', average_installs)
sort_list = sorted(sort_list, reverse = True)
for el in sort_list:
print(el[1], ' : ', el[0])
COMMUNICATION : 38590581.08741259 VIDEO_PLAYERS : 24727872.452830188 SOCIAL : 23253652.127118643 PHOTOGRAPHY : 17840110.40229885 PRODUCTIVITY : 16787331.344927534 GAME : 15544014.51048951 TRAVEL_AND_LOCAL : 13984077.710144928 ENTERTAINMENT : 11640705.88235294 TOOLS : 10830251.970588235 NEWS_AND_MAGAZINES : 9549178.467741935 BOOKS_AND_REFERENCE : 8814199.78835979 SHOPPING : 7036877.311557789 PERSONALIZATION : 5201482.6122448975 WEATHER : 5145550.285714285 HEALTH_AND_FITNESS : 4188821.9853479853 MAPS_AND_NAVIGATION : 4049274.6341463416 FAMILY : 3695641.8198090694 SPORTS : 3650602.276666667 ART_AND_DESIGN : 1986335.0877192982 FOOD_AND_DRINK : 1924897.7363636363 EDUCATION : 1833495.145631068 BUSINESS : 1712290.1474201474 LIFESTYLE : 1446158.2238372094 FINANCE : 1387692.475609756 HOUSE_AND_HOME : 1360598.042253521 DATING : 854028.8303030303 COMICS : 832613.8888888889 AUTO_AND_VEHICLES : 647317.8170731707 LIBRARIES_AND_DEMO : 638503.734939759 PARENTING : 542603.6206896552 BEAUTY : 513151.88679245283 EVENTS : 253542.22222222222 MEDICAL : 120550.61980830671
As we can notice the most installed apps belong to COMMUNICATION
category. This specific category is dominated by : WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts with billion installs. And other apps have over 100 and 500 million installs.
def highest_android_installs(dataset, category, ind_instal):
for app in dataset:
if app[1] == category and (app[ind_instal] == '1,000,000,000+' or app[ind_instal] == '500,000,000+' or app[ind_instal] == '100,000,000+'):
print(app[0], ' : ', app[ind_instal])
print(highest_android_installs(android_final, 'COMMUNICATION', 5))
WhatsApp Messenger : 1,000,000,000+ imo beta free calls and text : 100,000,000+ Android Messages : 100,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ Who : 100,000,000+ GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ Firefox Browser fast & private : 100,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Messenger Lite: Free Calls & Messages : 100,000,000+ Kik : 100,000,000+ KakaoTalk: Free Calls & Text : 100,000,000+ Opera Mini - fast web browser : 100,000,000+ Opera Browser: Fast and Secure : 100,000,000+ Telegram : 100,000,000+ Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+ UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+ Viber Messenger : 500,000,000+ WeChat : 100,000,000+ Yahoo Mail – Stay Organized : 100,000,000+ BBM - Free Calls & Messages : 100,000,000+ None
If we delete the giant from this category we'll notice how the avarage will decrease.
under_100_m = []
for app in android_final:
category = 'COMMUNICATION'
n_installs = app[5]
n_installs = n_installs.replace('+', '')
n_installs = n_installs.replace(',', '')
if category == app[1] and float(n_installs) < 100000000:
under_100_m.append(float(n_installs))
print(sum(under_100_m) / len(under_100_m))
3617398.420849421
#difference between average in Communication category with giant apps and without
print(sort_list[0][0] - (sum(under_100_m) / len(under_100_m)))
34973182.66656317
The same pattern can be observed for VIDEO_PLAYERS
category.This category is largely dominated by 9 apps.
print(highest_android_installs(android_final, 'VIDEO_PLAYERS', 5))
YouTube : 1,000,000,000+ Motorola Gallery : 100,000,000+ VLC for Android : 100,000,000+ Google Play Movies & TV : 1,000,000,000+ MX Player : 500,000,000+ Dubsmash : 100,000,000+ VivaVideo - Video Editor & Photo Movie : 100,000,000+ VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+ Motorola FM Radio : 100,000,000+ None
That's why these apps may creat the impression that these categories are extremely popular and deserve our attention.
Category 'BOOKS_AND_REFERENCE' seems to be popular as well and not overdominated by several apps.
for app in android_final:
if app[1] == 'BOOKS_AND_REFERENCE':
print(app[0], ':', app[5])
E-Book Read - Read Book for free : 50,000+ Download free book with green book : 100,000+ Wikipedia : 10,000,000+ Cool Reader : 10,000,000+ Free Panda Radio Music : 100,000+ Book store : 1,000,000+ FBReader: Favorite Book Reader : 10,000,000+ English Grammar Complete Handbook : 500,000+ Free Books - Spirit Fanfiction and Stories : 1,000,000+ Google Play Books : 1,000,000,000+ AlReader -any text book reader : 5,000,000+ Offline English Dictionary : 100,000+ Offline: English to Tagalog Dictionary : 500,000+ FamilySearch Tree : 1,000,000+ Cloud of Books : 1,000,000+ Recipes of Prophetic Medicine for free : 500,000+ ReadEra – free ebook reader : 1,000,000+ Anonymous caller detection : 10,000+ Ebook Reader : 5,000,000+ Litnet - E-books : 100,000+ Read books online : 5,000,000+ English to Urdu Dictionary : 500,000+ eBoox: book reader fb2 epub zip : 1,000,000+ English Persian Dictionary : 500,000+ Flybook : 500,000+ All Maths Formulas : 1,000,000+ Ancestry : 5,000,000+ HTC Help : 10,000,000+ English translation from Bengali : 100,000+ Pdf Book Download - Read Pdf Book : 100,000+ Free Book Reader : 100,000+ eBoox new: Reader for fb2 epub zip books : 50,000+ Only 30 days in English, the guideline is guaranteed : 500,000+ Moon+ Reader : 10,000,000+ SH-02J Owner's Manual (Android 8.0) : 50,000+ English-Myanmar Dictionary : 1,000,000+ Golden Dictionary (EN-AR) : 1,000,000+ All Language Translator Free : 1,000,000+ Azpen eReader : 500,000+ URBANO V 02 instruction manual : 100,000+ Bible : 100,000,000+ C Programs and Reference : 50,000+ C Offline Tutorial : 1,000+ C Programs Handbook : 50,000+ Amazon Kindle : 100,000,000+ Aab e Hayat Full Novel : 100,000+ Aldiko Book Reader : 10,000,000+ Google I/O 2018 : 500,000+ R Language Reference Guide : 10,000+ Learn R Programming Full : 5,000+ R Programing Offline Tutorial : 1,000+ Guide for R Programming : 5+ Learn R Programming : 10+ R Quick Reference Big Data : 1,000+ V Made : 100,000+ Wattpad 📖 Free Books : 100,000,000+ Dictionary - WordWeb : 5,000,000+ Guide (for X-MEN) : 100,000+ AC Air condition Troubleshoot,Repair,Maintenance : 5,000+ AE Bulletins : 1,000+ Ae Allah na Dai (Rasa) : 10,000+ 50000 Free eBooks & Free AudioBooks : 5,000,000+ Ag PhD Field Guide : 10,000+ Ag PhD Deficiencies : 10,000+ Ag PhD Planting Population Calculator : 1,000+ Ag PhD Soybean Diseases : 1,000+ Fertilizer Removal By Crop : 50,000+ A-J Media Vault : 50+ Al-Quran (Free) : 10,000,000+ Al Quran (Tafsir & by Word) : 500,000+ Al Quran Indonesia : 10,000,000+ Al'Quran Bahasa Indonesia : 10,000,000+ Al Quran Al karim : 1,000,000+ Al-Muhaffiz : 50,000+ Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+ Al-Quran 30 Juz free copies : 500,000+ Koran Read &MP3 30 Juz Offline : 1,000,000+ Hafizi Quran 15 lines per page : 1,000,000+ Quran for Android : 10,000,000+ Surah Al-Waqiah : 100,000+ Hisnul Al Muslim - Hisn Invocations & Adhkaar : 100,000+ Satellite AR : 1,000,000+ Audiobooks from Audible : 100,000,000+ Kinot & Eichah for Tisha B'Av : 10,000+ AW Tozer Devotionals - Daily : 5,000+ Tozer Devotional -Series 1 : 1,000+ The Pursuit of God : 1,000+ AY Sing : 5,000+ Ay Hasnain k Nana Milad Naat : 10,000+ Ay Mohabbat Teri Khatir Novel : 10,000+ Arizona Statutes, ARS (AZ Law) : 1,000+ Oxford A-Z of English Usage : 1,000,000+ BD Fishpedia : 1,000+ BD All Sim Offer : 10,000+ Youboox - Livres, BD et magazines : 500,000+ B&H Kids AR : 10,000+ B y H Niños ES : 5,000+ Dictionary.com: Find Definitions for English Words : 10,000,000+ English Dictionary - Offline : 10,000,000+ Bible KJV : 5,000,000+ Borneo Bible, BM Bible : 10,000+ MOD Black for BM : 100+ BM Box : 1,000+ Anime Mod for BM : 100+ NOOK: Read eBooks & Magazines : 10,000,000+ NOOK Audiobooks : 500,000+ NOOK App for NOOK Devices : 500,000+ Browsery by Barnes & Noble : 5,000+ bp e-store : 1,000+ Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+ BR Ambedkar Biography & Quotes : 10,000+ BU Alsace : 100+ Catholic La Bu Zo Kam : 500+ Khrifa Hla Bu (Solfa) : 10+ Kristian Hla Bu : 10,000+ SA HLA BU : 1,000+ Learn SAP BW : 500+ Learn SAP BW on HANA : 500+ CA Laws 2018 (California Laws and Codes) : 5,000+ Bootable Methods(USB-CD-DVD) : 10,000+ cloudLibrary : 100,000+ SDA Collegiate Quarterly : 500+ Sabbath School : 100,000+ Cypress College Library : 100+ Stats Royale for Clash Royale : 1,000,000+ GATE 21 years CS Papers(2011-2018 Solved) : 50+ Learn CT Scan Of Head : 5,000+ Easy Cv maker 2018 : 10,000+ How to Write CV : 100,000+ CW Nuclear : 1,000+ CY Spray nozzle : 10+ BibleRead En Cy Zh Yue : 5+ CZ-Help : 5+ Guide for DB Xenoverse : 10,000+ Guide for DB Xenoverse 2 : 10,000+ Guide for IMS DB : 10+ DC HSEMA : 5,000+ DC Public Library : 1,000+ Painting Lulu DC Super Friends : 1,000+ Dictionary : 10,000,000+ Fix Error Google Playstore : 1,000+ D. H. Lawrence Poems FREE : 1,000+ Bilingual Dictionary Audio App : 5,000+ DM Screen : 10,000+ wikiHow: how to do anything : 1,000,000+ Dr. Doug's Tips : 1,000+ Bible du Semeur-BDS (French) : 50,000+ La citadelle du musulman : 50,000+ DV 2019 Entry Guide : 10,000+ DV 2019 - EDV Photo & Form : 50,000+ DV 2018 Winners Guide : 1,000+ EB Annual Meetings : 1,000+ EC - AP & Telangana : 5,000+ TN Patta Citta & EC : 10,000+ AP Stamps and Registration : 10,000+ CompactiMa EC pH Calibration : 100+ EGW Writings 2 : 100,000+ EGW Writings : 1,000,000+ Bible with EGW Comments : 100,000+ My Little Pony AR Guide : 1,000,000+ SDA Sabbath School Quarterly : 500,000+ Duaa Ek Ibaadat : 5,000+ Spanish English Translator : 10,000,000+ Dictionary - Merriam-Webster : 10,000,000+ JW Library : 10,000,000+ Oxford Dictionary of English : Free : 10,000,000+ English Hindi Dictionary : 10,000,000+ English to Hindi Dictionary : 5,000,000+ EP Research Service : 1,000+ Hymnes et Louanges : 100,000+ EU Charter : 1,000+ EU Data Protection : 1,000+ EU IP Codes : 100+ EW PDF : 5+ BakaReader EX : 100,000+ EZ Quran : 50,000+ FA Part 1 & 2 Past Papers Solved Free – Offline : 5,000+ La Fe de Jesus : 1,000+ La Fe de Jesús : 500+ Le Fe de Jesus : 500+ Florida - Pocket Brainbook : 1,000+ Florida Statutes (FL Code) : 1,000+ English To Shona Dictionary : 10,000+ Greek Bible FP (Audio) : 1,000+ Golden Dictionary (FR-AR) : 500,000+ Fanfic-FR : 5,000+ Bulgarian French Dictionary Fr : 10,000+ Chemin (fr) : 1,000+ The SCP Foundation DB fr nn5n : 1,000+
print(highest_android_installs(android_final, 'BOOKS_AND_REFERENCE', 5))
Google Play Books : 1,000,000,000+ Bible : 100,000,000+ Amazon Kindle : 100,000,000+ Wattpad 📖 Free Books : 100,000,000+ Audiobooks from Audible : 100,000,000+ None
This niche seems to be dominated only by specific software allowinf to proccess the books and libraries. So this category may be very interesting for us.
Creating a similar app will be risky, but we might create an app based on a specific book, include different fun features.
In this project we analyzed data about App Store and Google Play mobile apps in order to advise our developpers team on the future app.
We concluded that the best idea would be taking a popular modern book and create an app with fun feautures. This idea seems to be profitable for both markets.