The goal of this project is to do an in-depth analysis of available phone app data from both Google and Apple, with the goal of uncovering potential areas in which app development could be focused to ensure a profitable return. This will mainly be with the intention of providing an ad-based offering to the app markets which will ideally recieve a large number of downloads.
We will also look into any other usable insights from this data which can further help to steer our focus. This analysis could then be offered to a company which specialises in app development.
With there being millions of apps on both the Google and Apple store, it is unfeasible for this entire dataset to be acquired without incurring significant costs (both financially and in terms of time spent). For the purpose of our analysis we only require a small subsection of this data to be able to tease out any significant insights. Luckily, there exists some publicly available datasets which meet these requirements.
Here is a dataset which contains approximately ten thousand Android apps from Google Play. It can be downloaded here.
Here is a dataset which contains approximately seven thousand iOS apps from the App Store. It can be downloaded here.
This should be sufficient to provide us with a basic analysis of app store purchases at this stage.
We will begin by opening both of these datasets and taking a look.
from csv import reader
# Open the Apple dataset and assign header and data info
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
apple_data = list(read_file)
apple_header = apple_data[0]
apple_data = apple_data[1:]
# Open the Google dataset and assign header and data info
opened_file = open('googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
google_data = list(read_file)
google_header = google_data[0]
google_data = google_data[1:]
Before we begin looking into these datasets any deeper, I will create a function which will display the data in a more readable format for a quick view of the datasets.
def explore_data(dataset, start, end, rows_and_columns = False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n')
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
I will now use this function to give us a basic snapshot of each of the datasets.
Firstly, we will have a quick glance at the apple data.
print(apple_header)
print('\n')
explore_data(apple_data, 0, 3, True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 7197 Number of columns: 16
The documentation for this dataset which gives more information on the columns can be found here.
At a glance, the column names here that could help with our analysis are 'price', 'track_name', 'user_rating', 'cont_rating' and' prime_genre'.
Next we will look at the Google data.
print(google_header)
print('\n')
explore_data(google_data, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Number of rows: 10841 Number of columns: 13
The documentation for this dataset which gives more information on the columns can be found here
The column names here that could help with our analysis are 'App', 'Rating', 'Installs', 'Type', 'Content Rating' and Genres.
After looking at the discussion forum related to the Google data here, it appears that there is an error for a certain row (10472). Lets compare this row against the header and a correct row.
print(google_header)
print('\n')
explore_data(google_data,10472,10474)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] ['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']
As we can see, this entry is missing a 'category' entry, which has brought the whole row out of alignment. In this case it will be easier to just delete this row, rather than trying to fix it.
del google_data[10472]
explore_data(google_data,10472,10474)
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up'] ['Sat-Fi Voice', 'COMMUNICATION', '3.4', '37', '14M', '1,000+', 'Free', '0', 'Everyone', 'Communication', 'November 21, 2014', '2.2.1.5', '2.2 and up']
This entry has now been removed from the dataset. The discussion section regarding the apple data does not indicate any such errors as far as we can ascertain.
Some further investigation of the Google data shows that some duplicate entries also exist in this code. For example, the Instagram app appears several times in the dataset. We assume this must be because the performance of the app has been recorded at multiple different points in time, based on the changing 'Reviews' data.
for app in google_data:
if app[0] == 'Instagram':
print(app)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
To try and give us some more insight into how common this issue is, we will loop through the data and count how many duplicates we have.
unique_apps = []
duplicate_apps = []
for app in google_data:
name = app[0]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
print('Number of Duplicates: ',len(duplicate_apps))
print('Duplicate Examples: ', duplicate_apps[:10])
Number of Duplicates: 1181 Duplicate Examples: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']
In order to ensure our analysis is accurate, we will need to remove these duplicate entries. Rather than just removing them at random, we will need to remove them based on some sort of logic. In this case, we would suggest keeping the latest entry based on the one with the highest number of reviews.
To remind ourselves, the starting number of entries in the dataset is 10,840.
print(len(google_data))
10840
We will now run a simple loop on the data to highlight those duplicate entries, based on it not having the highest number of reviews.
reviews_max = {}
for row in google_data:
name = row[0]
n_reviews = float(row[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
if name not in reviews_max:
reviews_max[name] = n_reviews
print('Remaining Entries: ' , len(reviews_max))
print('Duplicates Found: ' , 10840 - len(reviews_max))
Remaining Entries: 9659 Duplicates Found: 1181
As we can now see, this shows that the number of unique entries is 9,659, after highlighting 1,181 duplicate entries.
In order to ensure our analysis is as accurate as it can be, we will need to clean this dataset and remove these duplicate entries. This will be done by using the dictionary object we just created, and having it referenced by our main dataset to create a new clean dataset.
This will be done below.
android_clean = []
already_added = []
for row in google_data:
name = row[0]
n_reviews = float(row[3])
if reviews_max[name] == n_reviews and name not in already_added:
android_clean.append(row)
already_added.append(name)
We have now worked through the dataset and created a new clean list of unique apps based on the entry with the highest number of reviews. As predicted this dataset contains 9,659 entries.
print(len(android_clean))
9659
Upon further exploration of the data, there appears to be apps which are directed towards non-english speakers. Since we will be developing apps for an English speaking audience, we will want to have these types of apps removed from our datasets to ensure a more accurate analysis.
print(apple_data[813][1])
print(apple_data[6731][1])
print(android_clean[4412][0])
print(android_clean[7940][0])
爱奇艺PPS -《欢乐颂2》电视剧热播 【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き&ブロックパズル〜 中国語 AQリスニング لعبة تقدر تربح DZ
In order to correct this, we will write a function to detect these based on if the name contains any non-english characters
def is_english(string):
non_ascii = 0
for character in string:
if ord(character) > 127:
non_ascii += 1
if non_ascii > 3:
return False
else:
return True
print(is_english("爱奇艺PPS -《欢乐颂2》电视剧热播"))
print(is_english("Hello!"))
print(is_english("Instachat 😜"))
False True True
In order not to highlight any apps which have a single non-english character (such as "Instachat 😜"), the function has been set up to return any apps with more than 3 Non-English characters.
We can now loop through our datasets using this function and remove those entries which have more than 3 Non-English characters.
print('Apple Starting app number: ', len(apple_data))
apple_eng = []
google_eng = []
for row in apple_data:
name = row[1]
if is_english(name) == True:
apple_eng.append(row)
print('Apple app number with Non-Engish apps removed: ', len(apple_eng))
print('\n')
print('Google Starting app number: ', len(android_clean))
for row in android_clean:
name = row[0]
if is_english(name) == True:
google_eng.append(row)
print('Google app number with Non-Engish apps removed: ', len(google_eng))
Apple Starting app number: 7197 Apple app number with Non-Engish apps removed: 6183 Google Starting app number: 9659 Google app number with Non-Engish apps removed: 9614
Our next step will be to isolate the free apps in both datasets, as this is the data we want to do analysis on.
This is easily done by isolating the apps with their price as '0' for the Apple apps, and 'Free' for the Android apps.
google_free = []
apple_free = []
print('All Android apps: ', len(google_eng))
for row in google_eng:
if row[6] == "Free":
google_free.append(row)
print('All free Android apps: ', len(google_free))
print('\n')
print('All Apple apps: ', len(apple_eng))
for row in apple_eng:
if row[4] == '0.0':
apple_free.append(row)
print('All free Apple apps: ', len(apple_free))
All Android apps: 9614 All free Android apps: 8863 All Apple apps: 6183 All free Apple apps: 3222
As we can see, this leaves us with a 'clean' dataset of 8,863 Anroid apps, and 3,222 Apple apps.
We can now start to do some more in-depth analysis of our data.
As mentioned previously, it is the goal of this analysis to find a niche which could be exploited by an app developer by creating free apps which will be successful in this area. To start we will begin steps to locate apps which meet the criteria of being successful in both the Apple and Play markets. Our goal will be to then use this app profile to look into how to we could replicate creating our own apps with the same criteria.
In order to keep costs minimal and to effectively create successful apps, it would be suggested to build a minimal Android version of the app for the Play Store. If the app gets a good response, then further development and porting to Apple would be the recommended next steps. This way not too much time and cost is invested into apps which are not expected to be financially successful.
Our next step will be to get an understanding of the most common genres in each dataset to help give us an understanding of where to begin.
Let us remind ourselves of the structure of our data.
print(google_header)
print(google_free[:3])
print('\n')
print(apple_header)
print(apple_free[:3])
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] [['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']] ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] [['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]
For the Google dataset, we will want to use the 'Category' and 'Genre' section to do this.
For the Apple dataset, we will want to use the 'prime_genre' section to do this.
In order to better perform analysis on this data, we will create a function which will create a dictionary object which shows the frequency that each value appears in the dataset as a percentage.
#function to create a frequency table of any dataset we give it
def freq_table(dataset, index):
table = {}
total = 0
for row in dataset:
total += 1
value = row[index]
if value in table:
table[value] += 1
else:
table[value] = 1
table_percentages = {}
for key in table:
percentage = (table[key] / total) * 100
table_percentages[key] = round(percentage, 2)
return table_percentages
We will also create another function which will sort this frequency table into a more readable order.
#function to sort the freq table into order
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
We will now have a look at what genres are the most common in the Apple store by checking how the prime_genre entries are dispersed in the dataset.
display_table(apple_free, 11)
Games : 58.16 Entertainment : 7.88 Photo & Video : 4.97 Education : 3.66 Social Networking : 3.29 Shopping : 2.61 Utilities : 2.51 Sports : 2.14 Music : 2.05 Health & Fitness : 2.02 Productivity : 1.74 Lifestyle : 1.58 News : 1.33 Travel : 1.24 Finance : 1.12 Weather : 0.87 Food & Drink : 0.81 Reference : 0.56 Business : 0.53 Book : 0.43 Navigation : 0.19 Medical : 0.19 Catalogs : 0.12
From the apple data we can see that in regards to free English apps in the App store, the most common category of apps is games, followed quite far behind by Entertainment.
This indicates that the majority of apps are more for entertainment reasons rather than having a practical purposes.
Initial recommendations might be to advise the development of games apps, however, further digging is required in terms of the number of users for each app type, rather than just the raw numbers of the app, as it could be misleading.
We will now do the same for the Google data based on the Category column, and the Genres column respectively.
display_table(google_free, 1)
FAMILY : 18.9 GAME : 9.73 TOOLS : 8.46 BUSINESS : 4.59 LIFESTYLE : 3.9 PRODUCTIVITY : 3.89 FINANCE : 3.7 MEDICAL : 3.53 SPORTS : 3.4 PERSONALIZATION : 3.32 COMMUNICATION : 3.24 HEALTH_AND_FITNESS : 3.08 PHOTOGRAPHY : 2.94 NEWS_AND_MAGAZINES : 2.8 SOCIAL : 2.66 TRAVEL_AND_LOCAL : 2.34 SHOPPING : 2.25 BOOKS_AND_REFERENCE : 2.14 DATING : 1.86 VIDEO_PLAYERS : 1.79 MAPS_AND_NAVIGATION : 1.4 FOOD_AND_DRINK : 1.24 EDUCATION : 1.16 ENTERTAINMENT : 0.96 LIBRARIES_AND_DEMO : 0.94 AUTO_AND_VEHICLES : 0.93 HOUSE_AND_HOME : 0.82 WEATHER : 0.8 EVENTS : 0.71 PARENTING : 0.65 ART_AND_DESIGN : 0.64 COMICS : 0.62 BEAUTY : 0.6
display_table(google_free, 9)
Tools : 8.45 Entertainment : 6.07 Education : 5.35 Business : 4.59 Productivity : 3.89 Lifestyle : 3.89 Finance : 3.7 Medical : 3.53 Sports : 3.46 Personalization : 3.32 Communication : 3.24 Action : 3.1 Health & Fitness : 3.08 Photography : 2.94 News & Magazines : 2.8 Social : 2.66 Travel & Local : 2.32 Shopping : 2.25 Books & Reference : 2.14 Simulation : 2.04 Dating : 1.86 Arcade : 1.85 Video Players & Editors : 1.77 Casual : 1.76 Maps & Navigation : 1.4 Food & Drink : 1.24 Puzzle : 1.13 Racing : 0.99 Role Playing : 0.94 Libraries & Demo : 0.94 Auto & Vehicles : 0.93 Strategy : 0.9 House & Home : 0.82 Weather : 0.8 Events : 0.71 Adventure : 0.68 Comics : 0.61 Beauty : 0.6 Art & Design : 0.6 Parenting : 0.5 Card : 0.45 Casino : 0.43 Trivia : 0.42 Educational;Education : 0.39 Board : 0.38 Educational : 0.37 Education;Education : 0.34 Word : 0.26 Casual;Pretend Play : 0.24 Music : 0.2 Racing;Action & Adventure : 0.17 Puzzle;Brain Games : 0.17 Entertainment;Music & Video : 0.17 Casual;Brain Games : 0.14 Casual;Action & Adventure : 0.14 Arcade;Action & Adventure : 0.12 Action;Action & Adventure : 0.1 Educational;Pretend Play : 0.09 Simulation;Action & Adventure : 0.08 Parenting;Education : 0.08 Entertainment;Brain Games : 0.08 Board;Brain Games : 0.08 Parenting;Music & Video : 0.07 Educational;Brain Games : 0.07 Casual;Creativity : 0.07 Art & Design;Creativity : 0.07 Education;Pretend Play : 0.06 Role Playing;Pretend Play : 0.05 Education;Creativity : 0.05 Role Playing;Action & Adventure : 0.03 Puzzle;Action & Adventure : 0.03 Entertainment;Creativity : 0.03 Entertainment;Action & Adventure : 0.03 Educational;Creativity : 0.03 Educational;Action & Adventure : 0.03 Education;Music & Video : 0.03 Education;Brain Games : 0.03 Education;Action & Adventure : 0.03 Adventure;Action & Adventure : 0.03 Video Players & Editors;Music & Video : 0.02 Sports;Action & Adventure : 0.02 Simulation;Pretend Play : 0.02 Puzzle;Creativity : 0.02 Music;Music & Video : 0.02 Entertainment;Pretend Play : 0.02 Casual;Education : 0.02 Board;Action & Adventure : 0.02 Video Players & Editors;Creativity : 0.01 Trivia;Education : 0.01 Travel & Local;Action & Adventure : 0.01 Tools;Education : 0.01 Strategy;Education : 0.01 Strategy;Creativity : 0.01 Strategy;Action & Adventure : 0.01 Simulation;Education : 0.01 Role Playing;Brain Games : 0.01 Racing;Pretend Play : 0.01 Puzzle;Education : 0.01 Parenting;Brain Games : 0.01 Music & Audio;Music & Video : 0.01 Lifestyle;Pretend Play : 0.01 Lifestyle;Education : 0.01 Health & Fitness;Education : 0.01 Health & Fitness;Action & Adventure : 0.01 Entertainment;Education : 0.01 Communication;Creativity : 0.01 Comics;Creativity : 0.01 Casual;Music & Video : 0.01 Card;Action & Adventure : 0.01 Books & Reference;Education : 0.01 Art & Design;Pretend Play : 0.01 Art & Design;Action & Adventure : 0.01 Arcade;Pretend Play : 0.01 Adventure;Education : 0.01
As we can see above, the category of Family is the most common category in the Google data, which has a large lead over the next category of games (which was the most common in the apple data). It is however, likely there will be a lot of cross over between family and games though. This is confirmed when we look at the Play store, as the Family section lists a lots of children's games.
There does not appear to be a massive difference between category and genre in the dataset, except that the genres column appears to be more granular. When it is looked at by genre, they highest % comes under the tools genre. Looking through the data, it appears that this is because apps which are related to games have many genres and so it is dispersing them through the dataset more.
This dataset does indicate that there is a healthy mix of more practical apps which are also popular as compared to the apple data, which is more games dominated.
Without further analysis, it is hard to recommend a particular app category, however, it does appear that games, family and some practical tools would be good choices. The risk of this is we will be entering an already competitive marketplace. It may be wiser to seek out a more profitable, but more niche area of the marketplace we could better exploit.
Firstly, I will just create a couple of simple functions to allow me to order the results by converting the datasets to dictionary objects so they can be sorted to be more readable.
#Function to convert list to a dictionary
def turn_to_dict(dataset):
table = {}
for row in dataset:
table[row[0]] = row[1]
return table
#sorts a dictionary dataset
def sort_tab(dataset, index):
table = turn_to_dict(dataset)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
We can now begin to get an idea of the most popular categories for the Apple data. This is done by counting the average number of reviews for each app genre.
Apple_genres = freq_table(apple_free, 11)
final_list = []
for genre in Apple_genres:
total = 0
len_genre = 0
for row in apple_free:
genre_app = row[11]
if genre_app == genre:
n_ratings = float(row[5])
total += n_ratings
len_genre += 1
avg_no_ratings = round(total / len_genre, 1)
final_list.append((genre, avg_no_ratings))
sort_tab(final_list, 0)
Navigation : 86090.3 Reference : 74942.1 Social Networking : 71548.3 Music : 57326.5 Weather : 52279.9 Book : 39758.5 Food & Drink : 33333.9 Finance : 31467.9 Photo & Video : 28441.5 Travel : 28243.8 Shopping : 26919.7 Health & Fitness : 23298.0 Sports : 23008.9 Games : 22788.7 News : 21248.0 Productivity : 21028.4 Utilities : 18684.5 Lifestyle : 16485.8 Entertainment : 14029.8 Business : 7491.1 Education : 7004.0 Catalogs : 4004.0 Medical : 612.0
Using the number of ratings as a proxy for the number of downloads, we can see that in the Apple dataset, the most downloaded genre of app is Navigation, followed by Reference, Social Networking and then by Music.
Navigation being the most popular download category is quite unexpected.
Let us try and determine why this may be the case:
for app in apple_free:
if app[-5] == 'Navigation':
print(app[1], ':', app[5]) # print name and number of ratings
Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5
As we can see, there are not very many apps which fall into this category. It appears that Waze and Google Maps make up for the vast majority of downloads in this category which explains why it is so popular.
Let us do the same check for Social Networking.
for app in apple_free:
if app[-5] == 'Social Networking':
print(app[1], ':', app[5]) # print name and number of ratings
Facebook : 2974676 Pinterest : 1061624 Skype for iPhone : 373519 Messenger : 351466 Tumblr : 334293 WhatsApp Messenger : 287589 Kik : 260965 ooVoo – Free Video Call, Text and Voice : 177501 TextNow - Unlimited Text + Calls : 164963 Viber Messenger – Text & Call : 164249 Followers - Social Analytics For Instagram : 112778 MeetMe - Chat and Meet New People : 97072 We Heart It - Fashion, wallpapers, quotes, tattoos : 90414 InsTrack for Instagram - Analytics Plus More : 85535 Tango - Free Video Call, Voice and Chat : 75412 LinkedIn : 71856 Match™ - #1 Dating App. : 60659 Skype for iPad : 60163 POF - Best Dating App for Conversations : 52642 Timehop : 49510 Find My Family, Friends & iPhone - Life360 Locator : 43877 Whisper - Share, Express, Meet : 39819 Hangouts : 36404 LINE PLAY - Your Avatar World : 34677 WeChat : 34584 Badoo - Meet New People, Chat, Socialize. : 34428 Followers + for Instagram - Follower Analytics : 28633 GroupMe : 28260 Marco Polo Video Walkie Talkie : 27662 Miitomo : 23965 SimSimi : 23530 Grindr - Gay and same sex guys chat, meet and date : 23201 Wishbone - Compare Anything : 20649 imo video calls and chat : 18841 After School - Funny Anonymous School News : 18482 Quick Reposter - Repost, Regram and Reshare Photos : 17694 Weibo HD : 16772 Repost for Instagram : 15185 Live.me – Live Video Chat & Make Friends Nearby : 14724 Nextdoor : 14402 Followers Analytics for Instagram - InstaReport : 13914 YouNow: Live Stream Video Chat : 12079 FollowMeter for Instagram - Followers Tracking : 11976 LINE : 11437 eHarmony™ Dating App - Meet Singles : 11124 Discord - Chat for Gamers : 9152 QQ : 9109 Telegram Messenger : 7573 Weibo : 7265 Periscope - Live Video Streaming Around the World : 6062 Chat for Whatsapp - iPad Version : 5060 QQ HD : 5058 Followers Analysis Tool For Instagram App Free : 4253 live.ly - live video streaming : 4145 Houseparty - Group Video Chat : 3991 SOMA Messenger : 3232 Monkey : 3060 Down To Lunch : 2535 Flinch - Video Chat Staring Contest : 2134 Highrise - Your Avatar Community : 2011 LOVOO - Dating Chat : 1985 PlayStation®Messages : 1918 BOO! - Video chat camera with filters & stickers : 1805 Qzone : 1649 Chatous - Chat with new people : 1609 Kiwi - Q&A : 1538 GhostCodes - a discovery app for Snapchat : 1313 Jodel : 1193 FireChat : 1037 Google Duo - simple video calling : 1033 Fiesta by Tango - Chat & Meet New People : 885 Google Allo — smart messaging : 862 Peach — share vividly : 727 Hey! VINA - Where Women Meet New Friends : 719 Battlefield™ Companion : 689 All Devices for WhatsApp - Messenger for iPad : 682 Chat for Pokemon Go - GoChat : 500 IAmNaughty – Dating App to Meet New People Online : 463 Qzone HD : 458 Zenly - Locate your friends in realtime : 427 League of Legends Friends : 420 豆瓣 : 407 Candid - Speak Your Mind Freely : 398 知乎 : 397 Selfeo : 366 Fake-A-Location Free ™ : 354 Popcorn Buzz - Free Group Calls : 281 Fam — Group video calling for iMessage : 279 QQ International : 274 Ameba : 269 SoundCloud Pulse: for creators : 240 Tantan : 235 Cougar Dating & Life Style App for Mature Women : 213 Rawr Messenger - Dab your chat : 180 WhenToPost: Best Time to Post Photos for Instagram : 158 Inke—Broadcast an amazing life : 147 Mustknow - anonymous video Q&A : 53 CTFxCmoji : 39 Lobi : 36 Chain: Collaborate On MyVideo Story/Group Video : 35 botman - Real time video chat : 7 BestieBox : 0 MATCH ON LINE chat : 0 niconico ch : 0 LINE BLOG : 0 bit-tube - Live Stream Video Chat : 0
Like with Navigation, Social networking is also dominated by just a few massive apps, such as facebook and so it may not be feasible for us to exploit that market. The same will no doubt be the case for Photo & video, as that will be dominated by instagram, youtube, etc. I also fee Games and music will require quite high intensity apps in terms of development.
Based on this, I believe that looking into the reference category may be a good course of action, as it requires a lot less app development, and appears to have a healthy number of users for the apps.
If we look at the 'Reference' category, it looks like it is relatively popular (despite the Bible and Dictionary.com slightly skewing this due to how popular they are).
for app in apple_free:
if app[-5] == 'Reference':
print(app[1], ':', app[5]) # print name and number of ratings
Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 教えて!goo : 0 Jishokun-Japanese English Dictionary & Translator : 0
We will do a similar analysis on the Google data.
Luckily for this dataset we actually have information on the number of installs. Below we can see what % of the dataset falls into which.
display_table(google_free, 5) # the Installs columns
1,000,000+ : 15.73 100,000+ : 11.55 10,000,000+ : 10.55 10,000+ : 10.2 1,000+ : 8.39 100+ : 6.92 5,000,000+ : 6.83 500,000+ : 5.56 50,000+ : 4.77 5,000+ : 4.51 10+ : 3.54 500+ : 3.25 50,000,000+ : 2.3 100,000,000+ : 2.13 50+ : 1.92 5+ : 0.79 1+ : 0.51 500,000,000+ : 0.27 1,000,000,000+ : 0.23 0+ : 0.05
The issue with this is that the numbers are not overly precise. There is a big difference between 1,000,000+ and 5,000,000+ and we don't know where exactly an app falls between these massive numbers.
Despite this, it does give us some basic info on app popularity, even if not at a very granular level. For the purpose of this analysis we will assume that 1,000,000+ is equal to 1,000,000, etc. This should give us a basic idea of how popular the apps are.
We will do a similar analysis of the data like we did with the Apple data above to see if we can gleam any insight into what are the most popular categories.
Google_genres = freq_table(google_free, 1)
final_list = []
for category in Google_genres:
total = 0
len_category = 0
for app in google_free:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
total += float(n_installs)
len_category += 1
avg_n_installs = round(total / len_category, 1)
final_list.append((category, avg_n_installs))
sort_tab(final_list, 0)
COMMUNICATION : 38456119.2 VIDEO_PLAYERS : 24727872.5 SOCIAL : 23253652.1 PHOTOGRAPHY : 17840110.4 PRODUCTIVITY : 16787331.3 GAME : 15588015.6 TRAVEL_AND_LOCAL : 13984077.7 ENTERTAINMENT : 11640705.9 TOOLS : 10801391.3 NEWS_AND_MAGAZINES : 9549178.5 BOOKS_AND_REFERENCE : 8767811.9 SHOPPING : 7036877.3 PERSONALIZATION : 5201482.6 WEATHER : 5074486.2 HEALTH_AND_FITNESS : 4188822.0 MAPS_AND_NAVIGATION : 4056941.8 FAMILY : 3697848.2 SPORTS : 3638640.1 ART_AND_DESIGN : 1986335.1 FOOD_AND_DRINK : 1924897.7 EDUCATION : 1833495.1 BUSINESS : 1712290.1 LIFESTYLE : 1437816.3 FINANCE : 1387692.5 HOUSE_AND_HOME : 1331540.6 DATING : 854028.8 COMICS : 817657.3 AUTO_AND_VEHICLES : 647317.8 LIBRARIES_AND_DEMO : 638503.7 PARENTING : 542603.6 BEAUTY : 513151.9 EVENTS : 253542.2 MEDICAL : 120550.6
We can see that COMMUNICATION is the most popular category of download, followed by VIDEO_PLAYERS, SOCIAL and PHOTOGRAPHY.
Again, this will no doubt be dominated by big players which may be skewing the numbers.
Let us examine the most popular category to see how this looks
for app in google_free:
if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
or app[5] == '500,000,000+'
or app[5] == '100,000,000+'):
print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+ imo beta free calls and text : 100,000,000+ Android Messages : 100,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ Who : 100,000,000+ GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ Firefox Browser fast & private : 100,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Messenger Lite: Free Calls & Messages : 100,000,000+ Kik : 100,000,000+ KakaoTalk: Free Calls & Text : 100,000,000+ Opera Mini - fast web browser : 100,000,000+ Opera Browser: Fast and Secure : 100,000,000+ Telegram : 100,000,000+ Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+ UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+ Viber Messenger : 500,000,000+ WeChat : 100,000,000+ Yahoo Mail – Stay Organized : 100,000,000+ BBM - Free Calls & Messages : 100,000,000+
Like with the Apple data, this dataset is also dominated by a few big players. To get an understanding of what the average number of installs is without these gianst skewing the figure we will do a quick calculation without counting these.
under_100_m = []
all_comm_apps = []
for app in google_free:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
if (app[1] == 'COMMUNICATION') :
all_comm_apps.append(float(n_installs))
if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
under_100_m.append(float(n_installs))
print('All COMMUNICATION Apps avg downloads: ', round(sum(under_100_m) / len(under_100_m),1))
print('COMMUNICATION Apps with under 100 million avg downloads: ',round(sum(all_comm_apps) / len(all_comm_apps),1))
print('COMMUNICATION Apps with 100 million+ avg downloads market %:',round(100-(sum(under_100_m) / len(under_100_m))/(sum(all_comm_apps) / len(all_comm_apps))*100))
All COMMUNICATION Apps avg downloads: 3603485.4 COMMUNICATION Apps with under 100 million avg downloads: 38456119.2 COMMUNICATION Apps with 100 million+ avg downloads market %: 91
As we can see, it is quite a substantial difference. This should be considered when looking at the data.
However, when we look at the Android data categories, we can see that BOOKS_AND_REFERENCE appears to be quite popular at 8767811 downloads. This may be a good area to look to exploit, as app development should not be overly expensive, and we recall that when we looked at the Apple data this was a suggested area to look into.
Based on this and the Apple data, this genre appears to be a good place to start looking for app development opportunities. Let us continue with that line of enquiry to see if it is still feasible with the Android data.
for app in google_free:
if app[1] == 'BOOKS_AND_REFERENCE':
print(app[0], ':', app[5])
E-Book Read - Read Book for free : 50,000+ Download free book with green book : 100,000+ Wikipedia : 10,000,000+ Cool Reader : 10,000,000+ Free Panda Radio Music : 100,000+ Book store : 1,000,000+ FBReader: Favorite Book Reader : 10,000,000+ English Grammar Complete Handbook : 500,000+ Free Books - Spirit Fanfiction and Stories : 1,000,000+ Google Play Books : 1,000,000,000+ AlReader -any text book reader : 5,000,000+ Offline English Dictionary : 100,000+ Offline: English to Tagalog Dictionary : 500,000+ FamilySearch Tree : 1,000,000+ Cloud of Books : 1,000,000+ Recipes of Prophetic Medicine for free : 500,000+ ReadEra – free ebook reader : 1,000,000+ Anonymous caller detection : 10,000+ Ebook Reader : 5,000,000+ Litnet - E-books : 100,000+ Read books online : 5,000,000+ English to Urdu Dictionary : 500,000+ eBoox: book reader fb2 epub zip : 1,000,000+ English Persian Dictionary : 500,000+ Flybook : 500,000+ All Maths Formulas : 1,000,000+ Ancestry : 5,000,000+ HTC Help : 10,000,000+ English translation from Bengali : 100,000+ Pdf Book Download - Read Pdf Book : 100,000+ Free Book Reader : 100,000+ eBoox new: Reader for fb2 epub zip books : 50,000+ Only 30 days in English, the guideline is guaranteed : 500,000+ Moon+ Reader : 10,000,000+ SH-02J Owner's Manual (Android 8.0) : 50,000+ English-Myanmar Dictionary : 1,000,000+ Golden Dictionary (EN-AR) : 1,000,000+ All Language Translator Free : 1,000,000+ Azpen eReader : 500,000+ URBANO V 02 instruction manual : 100,000+ Bible : 100,000,000+ C Programs and Reference : 50,000+ C Offline Tutorial : 1,000+ C Programs Handbook : 50,000+ Amazon Kindle : 100,000,000+ Aab e Hayat Full Novel : 100,000+ Aldiko Book Reader : 10,000,000+ Google I/O 2018 : 500,000+ R Language Reference Guide : 10,000+ Learn R Programming Full : 5,000+ R Programing Offline Tutorial : 1,000+ Guide for R Programming : 5+ Learn R Programming : 10+ R Quick Reference Big Data : 1,000+ V Made : 100,000+ Wattpad 📖 Free Books : 100,000,000+ Dictionary - WordWeb : 5,000,000+ Guide (for X-MEN) : 100,000+ AC Air condition Troubleshoot,Repair,Maintenance : 5,000+ AE Bulletins : 1,000+ Ae Allah na Dai (Rasa) : 10,000+ 50000 Free eBooks & Free AudioBooks : 5,000,000+ Ag PhD Field Guide : 10,000+ Ag PhD Deficiencies : 10,000+ Ag PhD Planting Population Calculator : 1,000+ Ag PhD Soybean Diseases : 1,000+ Fertilizer Removal By Crop : 50,000+ A-J Media Vault : 50+ Al-Quran (Free) : 10,000,000+ Al Quran (Tafsir & by Word) : 500,000+ Al Quran Indonesia : 10,000,000+ Al'Quran Bahasa Indonesia : 10,000,000+ Al Quran Al karim : 1,000,000+ Al-Muhaffiz : 50,000+ Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+ Al-Quran 30 Juz free copies : 500,000+ Koran Read &MP3 30 Juz Offline : 1,000,000+ Hafizi Quran 15 lines per page : 1,000,000+ Quran for Android : 10,000,000+ Surah Al-Waqiah : 100,000+ Hisnul Al Muslim - Hisn Invocations & Adhkaar : 100,000+ Satellite AR : 1,000,000+ Audiobooks from Audible : 100,000,000+ Kinot & Eichah for Tisha B'Av : 10,000+ AW Tozer Devotionals - Daily : 5,000+ Tozer Devotional -Series 1 : 1,000+ The Pursuit of God : 1,000+ AY Sing : 5,000+ Ay Hasnain k Nana Milad Naat : 10,000+ Ay Mohabbat Teri Khatir Novel : 10,000+ Arizona Statutes, ARS (AZ Law) : 1,000+ Oxford A-Z of English Usage : 1,000,000+ BD Fishpedia : 1,000+ BD All Sim Offer : 10,000+ Youboox - Livres, BD et magazines : 500,000+ B&H Kids AR : 10,000+ B y H Niños ES : 5,000+ Dictionary.com: Find Definitions for English Words : 10,000,000+ English Dictionary - Offline : 10,000,000+ Bible KJV : 5,000,000+ Borneo Bible, BM Bible : 10,000+ MOD Black for BM : 100+ BM Box : 1,000+ Anime Mod for BM : 100+ NOOK: Read eBooks & Magazines : 10,000,000+ NOOK Audiobooks : 500,000+ NOOK App for NOOK Devices : 500,000+ Browsery by Barnes & Noble : 5,000+ bp e-store : 1,000+ Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+ BR Ambedkar Biography & Quotes : 10,000+ BU Alsace : 100+ Catholic La Bu Zo Kam : 500+ Khrifa Hla Bu (Solfa) : 10+ Kristian Hla Bu : 10,000+ SA HLA BU : 1,000+ Learn SAP BW : 500+ Learn SAP BW on HANA : 500+ CA Laws 2018 (California Laws and Codes) : 5,000+ Bootable Methods(USB-CD-DVD) : 10,000+ cloudLibrary : 100,000+ SDA Collegiate Quarterly : 500+ Sabbath School : 100,000+ Cypress College Library : 100+ Stats Royale for Clash Royale : 1,000,000+ GATE 21 years CS Papers(2011-2018 Solved) : 50+ Learn CT Scan Of Head : 5,000+ Easy Cv maker 2018 : 10,000+ How to Write CV : 100,000+ CW Nuclear : 1,000+ CY Spray nozzle : 10+ BibleRead En Cy Zh Yue : 5+ CZ-Help : 5+ Modlitební knížka CZ : 500+ Guide for DB Xenoverse : 10,000+ Guide for DB Xenoverse 2 : 10,000+ Guide for IMS DB : 10+ DC HSEMA : 5,000+ DC Public Library : 1,000+ Painting Lulu DC Super Friends : 1,000+ Dictionary : 10,000,000+ Fix Error Google Playstore : 1,000+ D. H. Lawrence Poems FREE : 1,000+ Bilingual Dictionary Audio App : 5,000+ DM Screen : 10,000+ wikiHow: how to do anything : 1,000,000+ Dr. Doug's Tips : 1,000+ Bible du Semeur-BDS (French) : 50,000+ La citadelle du musulman : 50,000+ DV 2019 Entry Guide : 10,000+ DV 2019 - EDV Photo & Form : 50,000+ DV 2018 Winners Guide : 1,000+ EB Annual Meetings : 1,000+ EC - AP & Telangana : 5,000+ TN Patta Citta & EC : 10,000+ AP Stamps and Registration : 10,000+ CompactiMa EC pH Calibration : 100+ EGW Writings 2 : 100,000+ EGW Writings : 1,000,000+ Bible with EGW Comments : 100,000+ My Little Pony AR Guide : 1,000,000+ SDA Sabbath School Quarterly : 500,000+ Duaa Ek Ibaadat : 5,000+ Spanish English Translator : 10,000,000+ Dictionary - Merriam-Webster : 10,000,000+ JW Library : 10,000,000+ Oxford Dictionary of English : Free : 10,000,000+ English Hindi Dictionary : 10,000,000+ English to Hindi Dictionary : 5,000,000+ EP Research Service : 1,000+ Hymnes et Louanges : 100,000+ EU Charter : 1,000+ EU Data Protection : 1,000+ EU IP Codes : 100+ EW PDF : 5+ BakaReader EX : 100,000+ EZ Quran : 50,000+ FA Part 1 & 2 Past Papers Solved Free – Offline : 5,000+ La Fe de Jesus : 1,000+ La Fe de Jesús : 500+ Le Fe de Jesus : 500+ Florida - Pocket Brainbook : 1,000+ Florida Statutes (FL Code) : 1,000+ English To Shona Dictionary : 10,000+ Greek Bible FP (Audio) : 1,000+ Golden Dictionary (FR-AR) : 500,000+ Fanfic-FR : 5,000+ Bulgarian French Dictionary Fr : 10,000+ Chemin (fr) : 1,000+ The SCP Foundation DB fr nn5n : 1,000+
Like before, there is a small number of very popular apps which are skewing the average.
for app in google_free:
if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
or app[5] == '500,000,000+'
or app[5] == '100,000,000+'):
print(app[0], ':', app[5])
Google Play Books : 1,000,000,000+ Bible : 100,000,000+ Amazon Kindle : 100,000,000+ Wattpad 📖 Free Books : 100,000,000+ Audiobooks from Audible : 100,000,000+
Luckily this is not a massive number of apps, which means that the market still has potential. Let us see what some of the more popular apps in this category are to help give us some app ideas.
for app in google_free:
if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
or app[5] == '5,000,000+'
or app[5] == '10,000,000+'
or app[5] == '50,000,000+'):
print(app[0], ':', app[5])
Wikipedia : 10,000,000+ Cool Reader : 10,000,000+ Book store : 1,000,000+ FBReader: Favorite Book Reader : 10,000,000+ Free Books - Spirit Fanfiction and Stories : 1,000,000+ AlReader -any text book reader : 5,000,000+ FamilySearch Tree : 1,000,000+ Cloud of Books : 1,000,000+ ReadEra – free ebook reader : 1,000,000+ Ebook Reader : 5,000,000+ Read books online : 5,000,000+ eBoox: book reader fb2 epub zip : 1,000,000+ All Maths Formulas : 1,000,000+ Ancestry : 5,000,000+ HTC Help : 10,000,000+ Moon+ Reader : 10,000,000+ English-Myanmar Dictionary : 1,000,000+ Golden Dictionary (EN-AR) : 1,000,000+ All Language Translator Free : 1,000,000+ Aldiko Book Reader : 10,000,000+ Dictionary - WordWeb : 5,000,000+ 50000 Free eBooks & Free AudioBooks : 5,000,000+ Al-Quran (Free) : 10,000,000+ Al Quran Indonesia : 10,000,000+ Al'Quran Bahasa Indonesia : 10,000,000+ Al Quran Al karim : 1,000,000+ Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+ Koran Read &MP3 30 Juz Offline : 1,000,000+ Hafizi Quran 15 lines per page : 1,000,000+ Quran for Android : 10,000,000+ Satellite AR : 1,000,000+ Oxford A-Z of English Usage : 1,000,000+ Dictionary.com: Find Definitions for English Words : 10,000,000+ English Dictionary - Offline : 10,000,000+ Bible KJV : 5,000,000+ NOOK: Read eBooks & Magazines : 10,000,000+ Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+ Stats Royale for Clash Royale : 1,000,000+ Dictionary : 10,000,000+ wikiHow: how to do anything : 1,000,000+ EGW Writings : 1,000,000+ My Little Pony AR Guide : 1,000,000+ Spanish English Translator : 10,000,000+ Dictionary - Merriam-Webster : 10,000,000+ JW Library : 10,000,000+ Oxford Dictionary of English : Free : 10,000,000+ English Hindi Dictionary : 10,000,000+ English to Hindi Dictionary : 5,000,000+
It appears there is a lot of demand for apps which display popular texts, such as religious books and other important reference libraries.
This is looking promising. However, recall that the Google data also has a 'genres' column which gives more granular data that 'Category'. Let's do the same analysis and see what it shows us.
Google_genres = freq_table(google_free, 9)
final_list = []
for category in Google_genres:
total = 0
len_category = 0
for app in google_free:
category_app = app[9]
if category_app == category:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
total += float(n_installs)
len_category += 1
avg_n_installs = round(total / len_category, 1)
final_list.append((category, avg_n_installs))
sort_tab(final_list, 0)
Communication : 38456119.2 Adventure;Action & Adventure : 35333333.3 Video Players & Editors : 24947335.8 Social : 23253652.1 Arcade : 22888365.5 Casual : 19569221.6 Puzzle;Action & Adventure : 18366666.7 Photography : 17840110.4 Educational;Action & Adventure : 17016666.7 Productivity : 16787331.3 Racing : 15910645.7 Travel & Local : 14051476.1 Casual;Action & Adventure : 12916666.7 Action : 12603588.9 Strategy : 11339901.3 Tools : 10802461.2 Tools;Education : 10000000.0 Role Playing;Brain Games : 10000000.0 Lifestyle;Pretend Play : 10000000.0 Casual;Music & Video : 10000000.0 Card;Action & Adventure : 10000000.0 Adventure;Education : 10000000.0 News & Magazines : 9549178.5 Music : 9445583.3 Educational;Pretend Play : 9375000.0 Puzzle;Brain Games : 9280666.7 Word : 9094458.7 Racing;Action & Adventure : 8816666.7 Books & Reference : 8767811.9 Puzzle : 8302861.9 Video Players & Editors;Music & Video : 7500000.0 Shopping : 7036877.3 Role Playing;Action & Adventure : 7000000.0 Casual;Pretend Play : 6957142.9 Entertainment;Music & Video : 6413333.3 Action;Action & Adventure : 5888888.9 Entertainment : 5602792.8 Education;Brain Games : 5333333.3 Casual;Creativity : 5333333.3 Role Playing;Pretend Play : 5275000.0 Personalization : 5201482.6 Weather : 5074486.2 Sports;Action & Adventure : 5050000.0 Music;Music & Video : 5050000.0 Video Players & Editors;Creativity : 5000000.0 Adventure : 4922785.3 Simulation;Action & Adventure : 4857142.9 Education;Education : 4759517.0 Board : 4759209.1 Sports : 4596842.6 Educational;Brain Games : 4433333.3 Health & Fitness : 4188822.0 Maps & Navigation : 4056941.8 Entertainment;Creativity : 4000000.0 Role Playing : 3965645.4 Card : 3815462.5 Trivia : 3475712.7 Simulation : 3475484.1 Casino : 3427910.5 Entertainment;Brain Games : 3314285.7 Arcade;Action & Adventure : 3190909.2 Entertainment;Pretend Play : 3000000.0 Board;Action & Adventure : 3000000.0 Education;Creativity : 2875000.0 Entertainment;Action & Adventure : 2333333.3 Educational;Creativity : 2333333.3 Art & Design : 2122850.9 Education;Music & Video : 2033333.3 Food & Drink : 1924897.7 Education;Pretend Play : 1800000.0 Educational;Education : 1737143.1 Business : 1712290.1 Casual;Brain Games : 1425916.7 Lifestyle : 1412998.3 Finance : 1387692.5 House & Home : 1331540.6 Parenting;Music & Video : 1118333.3 Strategy;Creativity : 1000000.0 Strategy;Action & Adventure : 1000000.0 Racing;Pretend Play : 1000000.0 Parenting;Brain Games : 1000000.0 Health & Fitness;Action & Adventure : 1000000.0 Entertainment;Education : 1000000.0 Education;Action & Adventure : 1000000.0 Casual;Education : 1000000.0 Arcade;Pretend Play : 1000000.0 Dating : 854028.8 Comics : 831873.1 Puzzle;Creativity : 750000.0 Auto & Vehicles : 647317.8 Libraries & Demo : 638503.7 Education : 550185.4 Simulation;Pretend Play : 550000.0 Beauty : 513151.9 Strategy;Education : 500000.0 Music & Audio;Music & Video : 500000.0 Communication;Creativity : 500000.0 Art & Design;Pretend Play : 500000.0 Parenting : 467977.5 Parenting;Education : 452857.1 Educational : 411184.8 Board;Brain Games : 407142.9 Art & Design;Creativity : 285000.0 Events : 253542.2 Medical : 120550.6 Travel & Local;Action & Adventure : 100000.0 Puzzle;Education : 100000.0 Lifestyle;Education : 100000.0 Health & Fitness;Education : 100000.0 Art & Design;Action & Adventure : 100000.0 Comics;Creativity : 50000.0 Books & Reference;Education : 1000.0 Simulation;Education : 500.0 Trivia;Education : 100.0
As expected, this does not provide a massive amount of insight at first glance due to the listings being dominated by various games and entertainment apps. However, the 'Books & Reference' ranks relatively highly still, which further supports the idea of pursuing this when based on genre instead of Category.
Furthermore, looking down the list it does give some ideas of other avenues of potential exploration for apps which should be relatively inexpensive to develop, should we wish to further expand into different niches further down the line.
If we were to explore offering a games or entertainment based app with an in-app purchasing or subscription model, this listing gives some ideas of relatively inexpensive ways to do this. For example, trivia, puzzle and word games appear to be quite popular. These would be relatively inexpensive games to develop (compared to high performance actions games, etc) and could probably share a lot of the code framework of apps we will have developed with our Book & Reference offerings. 'Travel & Local' apps also appear to be very popular, which is another avenue we could explore. Again, this could feed off the existing code base we will have developed with our 'books & Reference' offerings to allow us to develop apps such as travel guides and local lists apps.
Based on the information we are able to obtain from the datasets we have, I believe that developing apps which present popular texts would be a profitable initial venture. This should be relatively easy to develop, and gives lots of scope in which to expand.
Ideally to set ourselves apart from the competition, we could look at adding various additional features to this such as daily quotes, audio book versions, quick search features, bookmarks, etc.
Once this framework is developed and we have some good momentum and income, it should be relatively easy to our operation to allow us to create many popular apps in other areas such as games, travel guides and tools.