Notebook

Analysis of App Data with the aim of discovering profitable market niches¶

The goal of this project is to do an in-depth analysis of available phone app data from both Google and Apple, with the goal of uncovering potential areas in which app development could be focused to ensure a profitable return. This will mainly be with the intention of providing an ad-based offering to the app markets which will ideally recieve a large number of downloads.

We will also look into any other usable insights from this data which can further help to steer our focus. This analysis could then be offered to a company which specialises in app development.

Acquiring data some sample data for analysis¶

With there being millions of apps on both the Google and Apple store, it is unfeasible for this entire dataset to be acquired without incurring significant costs (both financially and in terms of time spent). For the purpose of our analysis we only require a small subsection of this data to be able to tease out any significant insights. Luckily, there exists some publicly available datasets which meet these requirements.

Here is a dataset which contains approximately ten thousand Android apps from Google Play. It can be downloaded here.

Here is a dataset which contains approximately seven thousand iOS apps from the App Store. It can be downloaded here.

This should be sufficient to provide us with a basic analysis of app store purchases at this stage.

We will begin by opening both of these datasets and taking a look.

In [126]:

from csv import reader

# Open the Apple dataset and assign header and data info
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
apple_data = list(read_file)
apple_header = apple_data[0]
apple_data = apple_data[1:]

# Open the Google dataset and assign header and data info
opened_file = open('googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
google_data = list(read_file)
google_header = google_data[0]
google_data = google_data[1:]

Before we begin looking into these datasets any deeper, I will create a function which will display the data in a more readable format for a quick view of the datasets.

In [127]:

def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

I will now use this function to give us a basic snapshot of each of the datasets.

Exploring the Data¶

Firstly, we will have a quick glance at the apple data.

In [128]:

print(apple_header)
print('\n')
explore_data(apple_data, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16

The documentation for this dataset which gives more information on the columns can be found here.

At a glance, the column names here that could help with our analysis are 'price', 'track_name', 'user_rating', 'cont_rating' and' prime_genre'.

Next we will look at the Google data.

In [129]:

print(google_header)
print('\n')
explore_data(google_data, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13

The documentation for this dataset which gives more information on the columns can be found here

The column names here that could help with our analysis are 'App', 'Rating', 'Installs', 'Type', 'Content Rating' and Genres.

Cleaning the Data¶

Removing Errors¶

After looking at the discussion forum related to the Google data here, it appears that there is an error for a certain row (10472). Lets compare this row against the header and a correct row.

In [130]:

print(google_header)
print('\n')
explore_data(google_data,10472,10474)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']

As we can see, this entry is missing a 'category' entry, which has brought the whole row out of alignment. In this case it will be easier to just delete this row, rather than trying to fix it.

In [131]:

del google_data[10472]
explore_data(google_data,10472,10474)

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


['Sat-Fi Voice', 'COMMUNICATION', '3.4', '37', '14M', '1,000+', 'Free', '0', 'Everyone', 'Communication', 'November 21, 2014', '2.2.1.5', '2.2 and up']

This entry has now been removed from the dataset. The discussion section regarding the apple data does not indicate any such errors as far as we can ascertain.

Removing Duplicates¶

Some further investigation of the Google data shows that some duplicate entries also exist in this code. For example, the Instagram app appears several times in the dataset. We assume this must be because the performance of the app has been recorded at multiple different points in time, based on the changing 'Reviews' data.

In [132]:

for app in google_data:
    if app[0] == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

To try and give us some more insight into how common this issue is, we will loop through the data and count how many duplicates we have.

In [133]:

unique_apps = []
duplicate_apps = []

for app in google_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of Duplicates: ',len(duplicate_apps))        
print('Duplicate Examples: ', duplicate_apps[:10])

Number of Duplicates:  1181
Duplicate Examples:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']

In order to ensure our analysis is accurate, we will need to remove these duplicate entries. Rather than just removing them at random, we will need to remove them based on some sort of logic. In this case, we would suggest keeping the latest entry based on the one with the highest number of reviews.

To remind ourselves, the starting number of entries in the dataset is 10,840.

In [134]:

print(len(google_data))

We will now run a simple loop on the data to highlight those duplicate entries, based on it not having the highest number of reviews.

In [135]:

reviews_max = {}

for row in google_data:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews

    if name not in reviews_max:
        reviews_max[name] = n_reviews

print('Remaining Entries: ' , len(reviews_max))
print('Duplicates Found: ' , 10840 - len(reviews_max))

Remaining Entries:  9659
Duplicates Found:  1181

As we can now see, this shows that the number of unique entries is 9,659, after highlighting 1,181 duplicate entries.

In order to ensure our analysis is as accurate as it can be, we will need to clean this dataset and remove these duplicate entries. This will be done by using the dictionary object we just created, and having it referenced by our main dataset to create a new clean dataset.

This will be done below.

In [136]:

android_clean = []
already_added = []

for row in google_data:
    name = row[0]
    n_reviews = float(row[3])
    
    if reviews_max[name] == n_reviews and name not in already_added:        
        android_clean.append(row)   
        already_added.append(name)

We have now worked through the dataset and created a new clean list of unique apps based on the entry with the highest number of reviews. As predicted this dataset contains 9,659 entries.

In [137]:

print(len(android_clean))

Removing Non-English Apps¶

Upon further exploration of the data, there appears to be apps which are directed towards non-english speakers. Since we will be developing apps for an English speaking audience, we will want to have these types of apps removed from our datasets to ensure a more accurate analysis.

In [138]:

print(apple_data[813][1])
print(apple_data[6731][1])
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ

In order to correct this, we will write a function to detect these based on if the name contains any non-english characters

In [139]:

def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

In [140]:

print(is_english("爱奇艺PPS -《欢乐颂2》电视剧热播"))
print(is_english("Hello!"))
print(is_english("Instachat 😜"))

False
True
True

In order not to highlight any apps which have a single non-english character (such as "Instachat 😜"), the function has been set up to return any apps with more than 3 Non-English characters.

We can now loop through our datasets using this function and remove those entries which have more than 3 Non-English characters.

In [141]:

print('Apple Starting app number: ', len(apple_data))

apple_eng = []
google_eng = []

for row in apple_data:
    name = row[1]
    if is_english(name) == True:
        apple_eng.append(row)

print('Apple app number with Non-Engish apps removed: ', len(apple_eng))
print('\n')
print('Google Starting app number: ', len(android_clean))

for row in android_clean:
    name = row[0]
    if is_english(name) == True:
        google_eng.append(row)

print('Google app number with Non-Engish apps removed: ', len(google_eng))

Apple Starting app number:  7197
Apple app number with Non-Engish apps removed:  6183


Google Starting app number:  9659
Google app number with Non-Engish apps removed:  9614

Isolating the Free Apps¶

Our next step will be to isolate the free apps in both datasets, as this is the data we want to do analysis on.

This is easily done by isolating the apps with their price as '0' for the Apple apps, and 'Free' for the Android apps.

In [142]:

google_free = []
apple_free = []

print('All Android apps: ', len(google_eng))

for row in google_eng:
    if row[6] == "Free":
        google_free.append(row)
        
print('All free Android apps: ', len(google_free))  

print('\n')

print('All Apple apps: ', len(apple_eng))

for row in apple_eng:
    if row[4] == '0.0':
        apple_free.append(row)
        
print('All free Apple apps: ', len(apple_free)) 

All Android apps:  9614
All free Android apps:  8863


All Apple apps:  6183
All free Apple apps:  3222

As we can see, this leaves us with a 'clean' dataset of 8,863 Anroid apps, and 3,222 Apple apps.

Analysing the Data¶

We can now start to do some more in-depth analysis of our data.

As mentioned previously, it is the goal of this analysis to find a niche which could be exploited by an app developer by creating free apps which will be successful in this area. To start we will begin steps to locate apps which meet the criteria of being successful in both the Apple and Play markets. Our goal will be to then use this app profile to look into how to we could replicate creating our own apps with the same criteria.

In order to keep costs minimal and to effectively create successful apps, it would be suggested to build a minimal Android version of the app for the Play Store. If the app gets a good response, then further development and porting to Apple would be the recommended next steps. This way not too much time and cost is invested into apps which are not expected to be financially successful.

Most common apps by Genre¶

Our next step will be to get an understanding of the most common genres in each dataset to help give us an understanding of where to begin.

Let us remind ourselves of the structure of our data.

In [143]:

print(google_header)
print(google_free[:3]) 
print('\n')
print(apple_header)
print(apple_free[:3]) 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]

For the Google dataset, we will want to use the 'Category' and 'Genre' section to do this.

For the Apple dataset, we will want to use the 'prime_genre' section to do this.

In order to better perform analysis on this data, we will create a function which will create a dictionary object which shows the frequency that each value appears in the dataset as a percentage.

In [144]:

#function to create a frequency table of any dataset we give it

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = round(percentage, 2)
    
    return table_percentages

We will also create another function which will sort this frequency table into a more readable order.

In [145]:

#function to sort the freq table into order

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We will now have a look at what genres are the most common in the Apple store by checking how the prime_genre entries are dispersed in the dataset.

In [146]:

display_table(apple_free, 11)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12

From the apple data we can see that in regards to free English apps in the App store, the most common category of apps is games, followed quite far behind by Entertainment.

This indicates that the majority of apps are more for entertainment reasons rather than having a practical purposes.

Initial recommendations might be to advise the development of games apps, however, further digging is required in terms of the number of users for each app type, rather than just the raw numbers of the app, as it could be misleading.

We will now do the same for the Google data based on the Category column, and the Genres column respectively.

In [147]:

display_table(google_free, 1)

FAMILY : 18.9
GAME : 9.73
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6

In [148]:

display_table(google_free, 9)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.9
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;Brain Games : 0.14
Casual;Action & Adventure : 0.14
Arcade;Action & Adventure : 0.12
Action;Action & Adventure : 0.1
Educational;Pretend Play : 0.09
Simulation;Action & Adventure : 0.08
Parenting;Education : 0.08
Entertainment;Brain Games : 0.08
Board;Brain Games : 0.08
Parenting;Music & Video : 0.07
Educational;Brain Games : 0.07
Casual;Creativity : 0.07
Art & Design;Creativity : 0.07
Education;Pretend Play : 0.06
Role Playing;Pretend Play : 0.05
Education;Creativity : 0.05
Role Playing;Action & Adventure : 0.03
Puzzle;Action & Adventure : 0.03
Entertainment;Creativity : 0.03
Entertainment;Action & Adventure : 0.03
Educational;Creativity : 0.03
Educational;Action & Adventure : 0.03
Education;Music & Video : 0.03
Education;Brain Games : 0.03
Education;Action & Adventure : 0.03
Adventure;Action & Adventure : 0.03
Video Players & Editors;Music & Video : 0.02
Sports;Action & Adventure : 0.02
Simulation;Pretend Play : 0.02
Puzzle;Creativity : 0.02
Music;Music & Video : 0.02
Entertainment;Pretend Play : 0.02
Casual;Education : 0.02
Board;Action & Adventure : 0.02
Video Players & Editors;Creativity : 0.01
Trivia;Education : 0.01
Travel & Local;Action & Adventure : 0.01
Tools;Education : 0.01
Strategy;Education : 0.01
Strategy;Creativity : 0.01
Strategy;Action & Adventure : 0.01
Simulation;Education : 0.01
Role Playing;Brain Games : 0.01
Racing;Pretend Play : 0.01
Puzzle;Education : 0.01
Parenting;Brain Games : 0.01
Music & Audio;Music & Video : 0.01
Lifestyle;Pretend Play : 0.01
Lifestyle;Education : 0.01
Health & Fitness;Education : 0.01
Health & Fitness;Action & Adventure : 0.01
Entertainment;Education : 0.01
Communication;Creativity : 0.01
Comics;Creativity : 0.01
Casual;Music & Video : 0.01
Card;Action & Adventure : 0.01
Books & Reference;Education : 0.01
Art & Design;Pretend Play : 0.01
Art & Design;Action & Adventure : 0.01
Arcade;Pretend Play : 0.01
Adventure;Education : 0.01

As we can see above, the category of Family is the most common category in the Google data, which has a large lead over the next category of games (which was the most common in the apple data). It is however, likely there will be a lot of cross over between family and games though. This is confirmed when we look at the Play store, as the Family section lists a lots of children's games.

There does not appear to be a massive difference between category and genre in the dataset, except that the genres column appears to be more granular. When it is looked at by genre, they highest % comes under the tools genre. Looking through the data, it appears that this is because apps which are related to games have many genres and so it is dispersing them through the dataset more.

This dataset does indicate that there is a healthy mix of more practical apps which are also popular as compared to the apple data, which is more games dominated.

Without further analysis, it is hard to recommend a particular app category, however, it does appear that games, family and some practical tools would be good choices. The risk of this is we will be entering an already competitive marketplace. It may be wiser to seek out a more profitable, but more niche area of the marketplace we could better exploit.

Apple Data Analysis¶

Firstly, I will just create a couple of simple functions to allow me to order the results by converting the datasets to dictionary objects so they can be sorted to be more readable.

In [149]:

#Function to convert list to a dictionary
def turn_to_dict(dataset):
    table = {}
   
    for row in dataset:
        table[row[0]] = row[1]
    
    return table

#sorts a dictionary dataset
def sort_tab(dataset, index):
    table = turn_to_dict(dataset)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We can now begin to get an idea of the most popular categories for the Apple data. This is done by counting the average number of reviews for each app genre.

In [150]:

Apple_genres = freq_table(apple_free, 11)

final_list = []

for genre in Apple_genres:
    total = 0
    len_genre = 0
    for row in apple_free:
        genre_app = row[11]
        if genre_app == genre:            
            n_ratings = float(row[5])
            total += n_ratings
            len_genre += 1
            
    avg_no_ratings = round(total / len_genre, 1)

    final_list.append((genre, avg_no_ratings))

sort_tab(final_list, 0)

Navigation : 86090.3
Reference : 74942.1
Social Networking : 71548.3
Music : 57326.5
Weather : 52279.9
Book : 39758.5
Food & Drink : 33333.9
Finance : 31467.9
Photo & Video : 28441.5
Travel : 28243.8
Shopping : 26919.7
Health & Fitness : 23298.0
Sports : 23008.9
Games : 22788.7
News : 21248.0
Productivity : 21028.4
Utilities : 18684.5
Lifestyle : 16485.8
Entertainment : 14029.8
Business : 7491.1
Education : 7004.0
Catalogs : 4004.0
Medical : 612.0

Using the number of ratings as a proxy for the number of downloads, we can see that in the Apple dataset, the most downloaded genre of app is Navigation, followed by Reference, Social Networking and then by Music.

Navigation being the most popular download category is quite unexpected.

Let us try and determine why this may be the case:

In [151]:

for app in apple_free:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5

As we can see, there are not very many apps which fall into this category. It appears that Waze and Google Maps make up for the vast majority of downloads in this category which explains why it is so popular.

Let us do the same check for Social Networking.

In [152]:

for app in apple_free:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 23965
SimSimi : 23530
Grindr - Gay and same sex guys chat, meet and date : 23201
Wishbone - Compare Anything : 20649
imo video calls and chat : 18841
After School - Funny Anonymous School News : 18482
Quick Reposter - Repost, Regram and Reshare Photos : 17694
Weibo HD : 16772
Repost for Instagram : 15185
Live.me – Live Video Chat & Make Friends Nearby : 14724
Nextdoor : 14402
Followers Analytics for Instagram - InstaReport : 13914
YouNow: Live Stream Video Chat : 12079
FollowMeter for Instagram - Followers Tracking : 11976
LINE : 11437
eHarmony™ Dating App - Meet Singles : 11124
Discord - Chat for Gamers : 9152
QQ : 9109
Telegram Messenger : 7573
Weibo : 7265
Periscope - Live Video Streaming Around the World : 6062
Chat for Whatsapp - iPad Version : 5060
QQ HD : 5058
Followers Analysis Tool For Instagram App Free : 4253
live.ly - live video streaming : 4145
Houseparty - Group Video Chat : 3991
SOMA Messenger : 3232
Monkey : 3060
Down To Lunch : 2535
Flinch - Video Chat Staring Contest : 2134
Highrise - Your Avatar Community : 2011
LOVOO - Dating Chat : 1985
PlayStation®Messages : 1918
BOO! - Video chat camera with filters & stickers : 1805
Qzone : 1649
Chatous - Chat with new people : 1609
Kiwi - Q&A : 1538
GhostCodes - a discovery app for Snapchat : 1313
Jodel : 1193
FireChat : 1037
Google Duo - simple video calling : 1033
Fiesta by Tango - Chat & Meet New People : 885
Google Allo — smart messaging : 862
Peach — share vividly : 727
Hey! VINA - Where Women Meet New Friends : 719
Battlefield™ Companion : 689
All Devices for WhatsApp - Messenger for iPad : 682
Chat for Pokemon Go - GoChat : 500
IAmNaughty – Dating App to Meet New People Online : 463
Qzone HD : 458
Zenly - Locate your friends in realtime : 427
League of Legends Friends : 420
豆瓣 : 407
Candid - Speak Your Mind Freely : 398
知乎 : 397
Selfeo : 366
Fake-A-Location Free ™ : 354
Popcorn Buzz - Free Group Calls : 281
Fam — Group video calling for iMessage : 279
QQ International : 274
Ameba : 269
SoundCloud Pulse: for creators : 240
Tantan : 235
Cougar Dating & Life Style App for Mature Women : 213
Rawr Messenger - Dab your chat : 180
WhenToPost: Best Time to Post Photos for Instagram : 158
Inke—Broadcast an amazing life : 147
Mustknow - anonymous video Q&A : 53
CTFxCmoji : 39
Lobi : 36
Chain: Collaborate On MyVideo Story/Group Video : 35
botman - Real time video chat : 7
BestieBox : 0
MATCH ON LINE chat : 0
niconico ch : 0
LINE BLOG : 0
bit-tube - Live Stream Video Chat : 0

Like with Navigation, Social networking is also dominated by just a few massive apps, such as facebook and so it may not be feasible for us to exploit that market. The same will no doubt be the case for Photo & video, as that will be dominated by instagram, youtube, etc. I also fee Games and music will require quite high intensity apps in terms of development.

Based on this, I believe that looking into the reference category may be a good course of action, as it requires a lot less app development, and appears to have a healthy number of users for the apps.

If we look at the 'Reference' category, it looks like it is relatively popular (despite the Bible and Dictionary.com slightly skewing this due to how popular they are).

In [153]:

for app in apple_free:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0

Google Data Analysis¶

We will do a similar analysis on the Google data.

Luckily for this dataset we actually have information on the number of installs. Below we can see what % of the dataset falls into which.

In [154]:

display_table(google_free, 5) # the Installs columns

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05

The issue with this is that the numbers are not overly precise. There is a big difference between 1,000,000+ and 5,000,000+ and we don't know where exactly an app falls between these massive numbers.

Despite this, it does give us some basic info on app popularity, even if not at a very granular level. For the purpose of this analysis we will assume that 1,000,000+ is equal to 1,000,000, etc. This should give us a basic idea of how popular the apps are.

We will do a similar analysis of the data like we did with the Apple data above to see if we can gleam any insight into what are the most popular categories.

In [155]:

Google_genres = freq_table(google_free, 1)

final_list = []

for category in Google_genres:
    total = 0
    len_category = 0
    for app in google_free:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = round(total / len_category, 1)
    
    final_list.append((category, avg_n_installs))

sort_tab(final_list, 0)

COMMUNICATION : 38456119.2
VIDEO_PLAYERS : 24727872.5
SOCIAL : 23253652.1
PHOTOGRAPHY : 17840110.4
PRODUCTIVITY : 16787331.3
GAME : 15588015.6
TRAVEL_AND_LOCAL : 13984077.7
ENTERTAINMENT : 11640705.9
TOOLS : 10801391.3
NEWS_AND_MAGAZINES : 9549178.5
BOOKS_AND_REFERENCE : 8767811.9
SHOPPING : 7036877.3
PERSONALIZATION : 5201482.6
WEATHER : 5074486.2
HEALTH_AND_FITNESS : 4188822.0
MAPS_AND_NAVIGATION : 4056941.8
FAMILY : 3697848.2
SPORTS : 3638640.1
ART_AND_DESIGN : 1986335.1
FOOD_AND_DRINK : 1924897.7
EDUCATION : 1833495.1
BUSINESS : 1712290.1
LIFESTYLE : 1437816.3
FINANCE : 1387692.5
HOUSE_AND_HOME : 1331540.6
DATING : 854028.8
COMICS : 817657.3
AUTO_AND_VEHICLES : 647317.8
LIBRARIES_AND_DEMO : 638503.7
PARENTING : 542603.6
BEAUTY : 513151.9
EVENTS : 253542.2
MEDICAL : 120550.6

We can see that COMMUNICATION is the most popular category of download, followed by VIDEO_PLAYERS, SOCIAL and PHOTOGRAPHY.

Again, this will no doubt be dominated by big players which may be skewing the numbers.

Let us examine the most popular category to see how this looks

In [156]:

for app in google_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
BBM - Free Calls & Messages : 100,000,000+

Like with the Apple data, this dataset is also dominated by a few big players. To get an understanding of what the average number of installs is without these gianst skewing the figure we will do a quick calculation without counting these.

In [157]:

under_100_m = []
all_comm_apps = []

for app in google_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')

    if (app[1] == 'COMMUNICATION') :
        all_comm_apps.append(float(n_installs))
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
print('All COMMUNICATION Apps avg downloads: ', round(sum(under_100_m) / len(under_100_m),1))
print('COMMUNICATION Apps with under 100 million avg downloads: ',round(sum(all_comm_apps) / len(all_comm_apps),1))

print('COMMUNICATION Apps with 100 million+ avg downloads market %:',round(100-(sum(under_100_m) / len(under_100_m))/(sum(all_comm_apps) / len(all_comm_apps))*100))

All COMMUNICATION Apps avg downloads:  3603485.4
COMMUNICATION Apps with under 100 million avg downloads:  38456119.2
COMMUNICATION Apps with 100 million+ avg downloads market %: 91

As we can see, it is quite a substantial difference. This should be considered when looking at the data.

However, when we look at the Android data categories, we can see that BOOKS_AND_REFERENCE appears to be quite popular at 8767811 downloads. This may be a good area to look to exploit, as app development should not be overly expensive, and we recall that when we looked at the Apple data this was a suggested area to look into.

Based on this and the Apple data, this genre appears to be a good place to start looking for app development opportunities. Let us continue with that line of enquiry to see if it is still feasible with the Android data.

In [158]:

for app in google_free:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
English translation from Bengali : 100,000+
Pdf Book Download - Read Pdf Book : 100,000+
Free Book Reader : 100,000+
eBoox new: Reader for fb2 epub zip books : 50,000+
Only 30 days in English, the guideline is guaranteed : 500,000+
Moon+ Reader : 10,000,000+
SH-02J Owner's Manual (Android 8.0) : 50,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Azpen eReader : 500,000+
URBANO V 02 instruction manual : 100,000+
Bible : 100,000,000+
C Programs and Reference : 50,000+
C Offline Tutorial : 1,000+
C Programs Handbook : 50,000+
Amazon Kindle : 100,000,000+
Aab e Hayat Full Novel : 100,000+
Aldiko Book Reader : 10,000,000+
Google I/O 2018 : 500,000+
R Language Reference Guide : 10,000+
Learn R Programming Full : 5,000+
R Programing Offline Tutorial : 1,000+
Guide for R Programming : 5+
Learn R Programming : 10+
R Quick Reference Big Data : 1,000+
V Made : 100,000+
Wattpad 📖 Free Books : 100,000,000+
Dictionary - WordWeb : 5,000,000+
Guide (for X-MEN) : 100,000+
AC Air condition Troubleshoot,Repair,Maintenance : 5,000+
AE Bulletins : 1,000+
Ae Allah na Dai (Rasa) : 10,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Ag PhD Field Guide : 10,000+
Ag PhD Deficiencies : 10,000+
Ag PhD Planting Population Calculator : 1,000+
Ag PhD Soybean Diseases : 1,000+
Fertilizer Removal By Crop : 50,000+
A-J Media Vault : 50+
Al-Quran (Free) : 10,000,000+
Al Quran (Tafsir & by Word) : 500,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al-Muhaffiz : 50,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Al-Quran 30 Juz free copies : 500,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
Hafizi Quran 15 lines per page : 1,000,000+
Quran for Android : 10,000,000+
Surah Al-Waqiah : 100,000+
Hisnul Al Muslim - Hisn Invocations & Adhkaar : 100,000+
Satellite AR : 1,000,000+
Audiobooks from Audible : 100,000,000+
Kinot & Eichah for Tisha B'Av : 10,000+
AW Tozer Devotionals - Daily : 5,000+
Tozer Devotional -Series 1 : 1,000+
The Pursuit of God : 1,000+
AY Sing : 5,000+
Ay Hasnain k Nana Milad Naat : 10,000+
Ay Mohabbat Teri Khatir Novel : 10,000+
Arizona Statutes, ARS (AZ Law) : 1,000+
Oxford A-Z of English Usage : 1,000,000+
BD Fishpedia : 1,000+
BD All Sim Offer : 10,000+
Youboox - Livres, BD et magazines : 500,000+
B&H Kids AR : 10,000+
B y H Niños ES : 5,000+
Dictionary.com: Find Definitions for English Words : 10,000,000+
English Dictionary - Offline : 10,000,000+
Bible KJV : 5,000,000+
Borneo Bible, BM Bible : 10,000+
MOD Black for BM : 100+
BM Box : 1,000+
Anime Mod for BM : 100+
NOOK: Read eBooks & Magazines : 10,000,000+
NOOK Audiobooks : 500,000+
NOOK App for NOOK Devices : 500,000+
Browsery by Barnes & Noble : 5,000+
bp e-store : 1,000+
Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+
BR Ambedkar Biography & Quotes : 10,000+
BU Alsace : 100+
Catholic La Bu Zo Kam : 500+
Khrifa Hla Bu (Solfa) : 10+
Kristian Hla Bu : 10,000+
SA HLA BU : 1,000+
Learn SAP BW : 500+
Learn SAP BW on HANA : 500+
CA Laws 2018 (California Laws and Codes) : 5,000+
Bootable Methods(USB-CD-DVD) : 10,000+
cloudLibrary : 100,000+
SDA Collegiate Quarterly : 500+
Sabbath School : 100,000+
Cypress College Library : 100+
Stats Royale for Clash Royale : 1,000,000+
GATE 21 years CS Papers(2011-2018 Solved) : 50+
Learn CT Scan Of Head : 5,000+
Easy Cv maker 2018 : 10,000+
How to Write CV : 100,000+
CW Nuclear : 1,000+
CY Spray nozzle : 10+
BibleRead En Cy Zh Yue : 5+
CZ-Help : 5+
Modlitební knížka CZ : 500+
Guide for DB Xenoverse : 10,000+
Guide for DB Xenoverse 2 : 10,000+
Guide for IMS DB : 10+
DC HSEMA : 5,000+
DC Public Library : 1,000+
Painting Lulu DC Super Friends : 1,000+
Dictionary : 10,000,000+
Fix Error Google Playstore : 1,000+
D. H. Lawrence Poems FREE : 1,000+
Bilingual Dictionary Audio App : 5,000+
DM Screen : 10,000+
wikiHow: how to do anything : 1,000,000+
Dr. Doug's Tips : 1,000+
Bible du Semeur-BDS (French) : 50,000+
La citadelle du musulman : 50,000+
DV 2019 Entry Guide : 10,000+
DV 2019 - EDV Photo & Form : 50,000+
DV 2018 Winners Guide : 1,000+
EB Annual Meetings : 1,000+
EC - AP & Telangana : 5,000+
TN Patta Citta & EC : 10,000+
AP Stamps and Registration : 10,000+
CompactiMa EC pH Calibration : 100+
EGW Writings 2 : 100,000+
EGW Writings : 1,000,000+
Bible with EGW Comments : 100,000+
My Little Pony AR Guide : 1,000,000+
SDA Sabbath School Quarterly : 500,000+
Duaa Ek Ibaadat : 5,000+
Spanish English Translator : 10,000,000+
Dictionary - Merriam-Webster : 10,000,000+
JW Library : 10,000,000+
Oxford Dictionary of English : Free : 10,000,000+
English Hindi Dictionary : 10,000,000+
English to Hindi Dictionary : 5,000,000+
EP Research Service : 1,000+
Hymnes et Louanges : 100,000+
EU Charter : 1,000+
EU Data Protection : 1,000+
EU IP Codes : 100+
EW PDF : 5+
BakaReader EX : 100,000+
EZ Quran : 50,000+
FA Part 1 & 2 Past Papers Solved Free – Offline : 5,000+
La Fe de Jesus : 1,000+
La Fe de Jesús : 500+
Le Fe de Jesus : 500+
Florida - Pocket Brainbook : 1,000+
Florida Statutes (FL Code) : 1,000+
English To Shona Dictionary : 10,000+
Greek Bible FP (Audio) : 1,000+
Golden Dictionary (FR-AR) : 500,000+
Fanfic-FR : 5,000+
Bulgarian French Dictionary Fr : 10,000+
Chemin (fr) : 1,000+
The SCP Foundation DB fr nn5n : 1,000+

Like before, there is a small number of very popular apps which are skewing the average.

In [159]:

for app in google_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+

Luckily this is not a massive number of apps, which means that the market still has potential. Let us see what some of the more popular apps in this category are to help give us some app ideas.

In [160]:

for app in google_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
Hafizi Quran 15 lines per page : 1,000,000+
Quran for Android : 10,000,000+
Satellite AR : 1,000,000+
Oxford A-Z of English Usage : 1,000,000+
Dictionary.com: Find Definitions for English Words : 10,000,000+
English Dictionary - Offline : 10,000,000+
Bible KJV : 5,000,000+
NOOK: Read eBooks & Magazines : 10,000,000+
Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+
Stats Royale for Clash Royale : 1,000,000+
Dictionary : 10,000,000+
wikiHow: how to do anything : 1,000,000+
EGW Writings : 1,000,000+
My Little Pony AR Guide : 1,000,000+
Spanish English Translator : 10,000,000+
Dictionary - Merriam-Webster : 10,000,000+
JW Library : 10,000,000+
Oxford Dictionary of English : Free : 10,000,000+
English Hindi Dictionary : 10,000,000+
English to Hindi Dictionary : 5,000,000+

It appears there is a lot of demand for apps which display popular texts, such as religious books and other important reference libraries.

This is looking promising. However, recall that the Google data also has a 'genres' column which gives more granular data that 'Category'. Let's do the same analysis and see what it shows us.

In [161]:

Google_genres = freq_table(google_free, 9)

final_list = []

for category in Google_genres:
    total = 0
    len_category = 0
    for app in google_free:
        category_app = app[9]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = round(total / len_category, 1)
    
    final_list.append((category, avg_n_installs))

sort_tab(final_list, 0)

Communication : 38456119.2
Adventure;Action & Adventure : 35333333.3
Video Players & Editors : 24947335.8
Social : 23253652.1
Arcade : 22888365.5
Casual : 19569221.6
Puzzle;Action & Adventure : 18366666.7
Photography : 17840110.4
Educational;Action & Adventure : 17016666.7
Productivity : 16787331.3
Racing : 15910645.7
Travel & Local : 14051476.1
Casual;Action & Adventure : 12916666.7
Action : 12603588.9
Strategy : 11339901.3
Tools : 10802461.2
Tools;Education : 10000000.0
Role Playing;Brain Games : 10000000.0
Lifestyle;Pretend Play : 10000000.0
Casual;Music & Video : 10000000.0
Card;Action & Adventure : 10000000.0
Adventure;Education : 10000000.0
News & Magazines : 9549178.5
Music : 9445583.3
Educational;Pretend Play : 9375000.0
Puzzle;Brain Games : 9280666.7
Word : 9094458.7
Racing;Action & Adventure : 8816666.7
Books & Reference : 8767811.9
Puzzle : 8302861.9
Video Players & Editors;Music & Video : 7500000.0
Shopping : 7036877.3
Role Playing;Action & Adventure : 7000000.0
Casual;Pretend Play : 6957142.9
Entertainment;Music & Video : 6413333.3
Action;Action & Adventure : 5888888.9
Entertainment : 5602792.8
Education;Brain Games : 5333333.3
Casual;Creativity : 5333333.3
Role Playing;Pretend Play : 5275000.0
Personalization : 5201482.6
Weather : 5074486.2
Sports;Action & Adventure : 5050000.0
Music;Music & Video : 5050000.0
Video Players & Editors;Creativity : 5000000.0
Adventure : 4922785.3
Simulation;Action & Adventure : 4857142.9
Education;Education : 4759517.0
Board : 4759209.1
Sports : 4596842.6
Educational;Brain Games : 4433333.3
Health & Fitness : 4188822.0
Maps & Navigation : 4056941.8
Entertainment;Creativity : 4000000.0
Role Playing : 3965645.4
Card : 3815462.5
Trivia : 3475712.7
Simulation : 3475484.1
Casino : 3427910.5
Entertainment;Brain Games : 3314285.7
Arcade;Action & Adventure : 3190909.2
Entertainment;Pretend Play : 3000000.0
Board;Action & Adventure : 3000000.0
Education;Creativity : 2875000.0
Entertainment;Action & Adventure : 2333333.3
Educational;Creativity : 2333333.3
Art & Design : 2122850.9
Education;Music & Video : 2033333.3
Food & Drink : 1924897.7
Education;Pretend Play : 1800000.0
Educational;Education : 1737143.1
Business : 1712290.1
Casual;Brain Games : 1425916.7
Lifestyle : 1412998.3
Finance : 1387692.5
House & Home : 1331540.6
Parenting;Music & Video : 1118333.3
Strategy;Creativity : 1000000.0
Strategy;Action & Adventure : 1000000.0
Racing;Pretend Play : 1000000.0
Parenting;Brain Games : 1000000.0
Health & Fitness;Action & Adventure : 1000000.0
Entertainment;Education : 1000000.0
Education;Action & Adventure : 1000000.0
Casual;Education : 1000000.0
Arcade;Pretend Play : 1000000.0
Dating : 854028.8
Comics : 831873.1
Puzzle;Creativity : 750000.0
Auto & Vehicles : 647317.8
Libraries & Demo : 638503.7
Education : 550185.4
Simulation;Pretend Play : 550000.0
Beauty : 513151.9
Strategy;Education : 500000.0
Music & Audio;Music & Video : 500000.0
Communication;Creativity : 500000.0
Art & Design;Pretend Play : 500000.0
Parenting : 467977.5
Parenting;Education : 452857.1
Educational : 411184.8
Board;Brain Games : 407142.9
Art & Design;Creativity : 285000.0
Events : 253542.2
Medical : 120550.6
Travel & Local;Action & Adventure : 100000.0
Puzzle;Education : 100000.0
Lifestyle;Education : 100000.0
Health & Fitness;Education : 100000.0
Art & Design;Action & Adventure : 100000.0
Comics;Creativity : 50000.0
Books & Reference;Education : 1000.0
Simulation;Education : 500.0
Trivia;Education : 100.0

As expected, this does not provide a massive amount of insight at first glance due to the listings being dominated by various games and entertainment apps. However, the 'Books & Reference' ranks relatively highly still, which further supports the idea of pursuing this when based on genre instead of Category.

Furthermore, looking down the list it does give some ideas of other avenues of potential exploration for apps which should be relatively inexpensive to develop, should we wish to further expand into different niches further down the line.

If we were to explore offering a games or entertainment based app with an in-app purchasing or subscription model, this listing gives some ideas of relatively inexpensive ways to do this. For example, trivia, puzzle and word games appear to be quite popular. These would be relatively inexpensive games to develop (compared to high performance actions games, etc) and could probably share a lot of the code framework of apps we will have developed with our Book & Reference offerings. 'Travel & Local' apps also appear to be very popular, which is another avenue we could explore. Again, this could feed off the existing code base we will have developed with our 'books & Reference' offerings to allow us to develop apps such as travel guides and local lists apps.

Conclusion¶

Based on the information we are able to obtain from the datasets we have, I believe that developing apps which present popular texts would be a profitable initial venture. This should be relatively easy to develop, and gives lots of scope in which to expand.

Ideally to set ourselves apart from the competition, we could look at adding various additional features to this such as daily quotes, audio book versions, quick search features, bookmarks, etc.

Once this framework is developed and we have some good momentum and income, it should be relatively easy to our operation to allow us to create many popular apps in other areas such as games, travel guides and tools.