This is a data analysis project looking at the apps listed in the App Store and Google Play markets, and do a profiling of free apps in the respective app marketplace.
Note: We are only interested in English language apps in this project.
Through this project, our aim is to:
For this project we are going to use the following datasets:
Let's now explore the dataset to understand in a bit more detail.
First, we will create three functions so that we can reuse it to read both iOS and Android dataset from CSV, and print some sample data for our initial analysis.
# Function to read a CSV file, and return the contents as list of lists
def csv_reader(file_name_with_path):
from csv import reader
open_file = open(file_name_with_path)
read_file = reader(open_file)
dataset = list(read_file)
return dataset
# Function to read the dataset (list of lists) and print data range as passed
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
# If row and column statistics is required (passed as parameter)
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
# Function to check if all the columns have data in the dataset - Dataset is to be passed with the header row
def print_missing_column_values(dataset):
for row in dataset[1:]:
header_length = len(dataset[0])
row_length = len(row)
if row_length != header_length:
print('Index = ',dataset.index(row))
print('Data row = ',row)
print('\n')
Let's look at a few rows from both the datasets.
print('='*5+'Apple Store'+'='*5+'\n')
apple = csv_reader('AppleStore.csv')
explore_data(apple[1:],0,5,True)
print('\n')
print('='*5+'Google Play Store'+'='*5+'\n')
android = csv_reader('googleplaystore.csv')
explore_data(android[1:],0,5,True)
=====Apple Store===== ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'] Number of rows: 7197 Number of columns: 16 =====Google Play Store===== ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] Number of rows: 10841 Number of columns: 13
Now that we see some sample data along with number of rows/columns in each dataset, let's understand the columns, and see the ones that might be useful for our analysis.
print('='*5+'Apple Store'+'='*5+'\n')
print(apple[0])
print('\n')
print('='*5+'Google Play Store'+'='*5+'\n')
print(android[0])
apple_header = apple[0]
android_header = android[0]
=====Apple Store===== ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] =====Google Play Store===== ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
In the iOS dataset, few columns that could be useful are:
For full details, refer to the dataset documentation.
In the Android dataset, few columns that could be useful are:
For full details, refer to the dataset documentation.
We will now start to analyse the data to see if there is any data cleansing we need to do before doing profiling and analysis.
We will use the function we created print_missing_column_values to see in both the dataset if there are apps which have missing information and printing those rows so that we can take a decision.
print('='*5+'Google Play Store'+'='*5+'\n')
print_missing_column_values(android)
print('='*5+'Apple Play Store'+'='*5+'\n')
print_missing_column_values(apple)
=====Google Play Store===== Index = 10473 Data row = ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] =====Apple Play Store=====
From the above display, we can see that one of the app in Google Playstore seem to have a data point missing and from comparing against the sample row for Google Play apps, it looks like this app is missing value for Category column.
Checking this app in the Google Playstore reveals that it is categorised as *Lifestyle*.
We will correct this.
android[10473].insert(1,'LIFESTYLE')
print(android[10473])
['Life Made WI-Fi Touchscreen Photo Frame', 'LIFESTYLE', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
As per the discussion about the Google Playstore, we see that this dataset suffers from lot of duplicates. However, the Apple appstore dataset does not have any duplicates.
Let's now see how many duplicate apps we have in the Google Playstore, and look at some of those.
duplicate_apps = []
unique_apps = []
for app in android[1:]:
name = app[0]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
print('Total number of apps in the dataset: ',len(android[1:]))
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('='*5+'Few duplicate apps'+'='*5+'\n')
print(duplicate_apps[:15])
Total number of apps in the dataset: 10841 Number of duplicate apps: 1181 =====Few duplicate apps===== ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
We can see about 1181 apps are duplicates.
Rather than randomly removing the duplicates, we will use the column *reviews* on the basis that higher the total number of reviews the more recent the app entry.
First step is to build a dictionary based on the android dataset so that we have the app name and its reviews count that is maximum for that app.
reviews_max = {}
for row in android[1:]:
name = row[0]
n_reviews = float(row[3]) #total number of reviews
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
print('Expected rows after cleanup: ',len(reviews_max))
Expected rows after cleanup: 9660
Now that we have the app and its maximum reviews count, we use the above dictionary to build our unique apps dataset. After cleanup we should have 9660 unique rows.
Note: We need to check the existence in the already_added list to make sure that we add the app only once if the duplicates has same maximum number of reviews for that app.
android_clean = []
already_added = []
for row in android[1:]:
name = row[0]
n_reviews = float(row[3]) #total number of reviews
if (reviews_max[name] == n_reviews) and (name not in already_added):
android_clean.append(row)
already_added.append(name)
print(len(android_clean))
print(android_clean[0:3])
9660 [['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]
In this project our aim is only to analyse and profile English language apps and hence we need to identify any Non-English apps and remove from our dataset.
First, we will write a function is_english that does the following:
Note: To avoid the mistake of removing some apps with smileys and other special characters (e.g. 'Instachat 😜' or 'Docs To Go™ Free Office Suite'), we will establish a rule that we only return as non-english if the string has more than 3 characters.
def is_english(a_string):
no_of_chars = 0
for char in a_string:
if ord(char) > 127:
no_of_chars += 1
if no_of_chars > 3:
return False
return True
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Instachat 😜'))
print(is_english('Docs To Go™ Free Office Suite'))
False True True
Then, we use this function in loop through our Apple and Android dataset to build our dataset of English only apps.
apple_english_only = []
android_english_only = []
for app in android_clean:
name = app[0]
if is_english(name):
android_english_only.append(app)
print('English only android apps: ',len(android_english_only))
print('Non-English android apps: ',len(android_clean) - len(android_english_only))
print('% of English apps in Android: ',round((len(android_english_only)/len(android_clean))*100,2))
print('\n')
for app in apple[1:]:
name = app[1]
if is_english(name):
apple_english_only.append(app)
print('English only Apple apps: ',len(apple_english_only))
print('Non-English apple apps: ',len(apple[1:]) - len(apple_english_only))
print('% of English apps in Apple: ',round((len(apple_english_only)/len(apple[1:]))*100,2))
English only android apps: 9615 Non-English android apps: 45 % of English apps in Android: 99.53 English only Apple apps: 6183 Non-English apple apps: 1014 % of English apps in Apple: 85.91
It seems while Android dataset has lot of duplicate apps, Apple dataset have a lot more Non-English apps than Android.
As we have mentioned before, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Hence we need isolate free apps from the non-free apps in both the datasets.
This would be our last step in the data cleaning process.
Note:
apple_final = []
for app in apple_english_only:
price = float(app[4])
if price == 0:
apple_final.append(app)
print('Total free apps in Apple Store: ',len(apple_final))
android_final = []
for app in android_english_only:
free = app[6]
if free == 'Free':
android_final.append(app)
print('Total free apps in Android Store: ',len(android_final))
Total free apps in Apple Store: 3222 Total free apps in Android Store: 8864
After going through a series of data cleaning measures, we finally have 3222 apps in the Apple dataset and 8864 apps in the Android dataset that we are going to use for our profiling and analysis further.
As mentioned at the start, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.
To minimise risks and overhead, we would have the same app develop in both iOS and Android. But the idea is to launch on the Android market, and based on response from users, we develop further, and if profitable then we develop an iOS version.
So for this reason, we need analyse and determine the app profile/genres that could attract more users in both iOS and Android.
In the Apple dataset, we have a clear column called prime_genre that can aid our analysis. However, in the Android dataset, we have two columns Category and Genres.
Before we start profiling our app based on the dataset and respective columns for genre, we will build two functions that we will reuse for both datasets to build the frequency table.
freq_table: This function builds a frequency table by taking a dataset (list of lists) and index number of the column for which we are building the frequency table. Also, we will return the frequency table as a percentage.
display_table: This function uses the freq_table from above function, and sorts by highest percentage and displays the result.
# Function to build a frequency table
# Takes:
# A dataset (a list of lists) and
# Index number for the column we are building the frequency table
def freq_table(dataset, index):
result = {}
for row in dataset:
key = row[index]
if key in result:
result[key] += 1
else:
result[key] = 1
# Make the frequency table as a percentage
total_apps = len(dataset)
for key in result:
result[key] /= total_apps
result[key] *= 100
result[key] = round(result[key] ,2)
return result
def display_table(freq_tbl):
table = freq_tbl
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
Now we will use the function *display_table* against the Apple app store dataset to see the results based on the *prime_genre* column
print('='*5+'Apple App Store - By Prime_Genre'+'='*5+'\n')
freq_tbl = freq_table(apple_final,11) #prime_genre
display_table(freq_tbl)
print('\n')
=====Apple App Store - By Prime_Genre===== Games : 58.16 Entertainment : 7.88 Photo & Video : 4.97 Education : 3.66 Social Networking : 3.29 Shopping : 2.61 Utilities : 2.51 Sports : 2.14 Music : 2.05 Health & Fitness : 2.02 Productivity : 1.74 Lifestyle : 1.58 News : 1.33 Travel : 1.24 Finance : 1.12 Weather : 0.87 Food & Drink : 0.81 Reference : 0.56 Business : 0.53 Book : 0.43 Navigation : 0.19 Medical : 0.19 Catalogs : 0.12
Purely, from the genre perspective, we see could see the following pattern based on the number of apps:
Even though the proportion based on the number of apps might present the above picture, the same might not be true with regard to the number of users/reviews.
Let's now look at the Android dataset using the columns category and genres.
print('='*5+'Google Play Store - By Category '+'='*5+'\n')
freq_tbl = freq_table(android_final,1) #Category
display_table(freq_tbl)
print('\n')
print('='*5+'Google Play Store - By Genre '+'='*5+'\n')
freq_tbl = freq_table(android_final,9) #Genre
display_table(freq_tbl)
=====Google Play Store - By Category ===== FAMILY : 18.9 GAME : 9.72 TOOLS : 8.46 BUSINESS : 4.59 LIFESTYLE : 3.91 PRODUCTIVITY : 3.89 FINANCE : 3.7 MEDICAL : 3.53 SPORTS : 3.4 PERSONALIZATION : 3.32 COMMUNICATION : 3.24 HEALTH_AND_FITNESS : 3.08 PHOTOGRAPHY : 2.94 NEWS_AND_MAGAZINES : 2.8 SOCIAL : 2.66 TRAVEL_AND_LOCAL : 2.34 SHOPPING : 2.25 BOOKS_AND_REFERENCE : 2.14 DATING : 1.86 VIDEO_PLAYERS : 1.79 MAPS_AND_NAVIGATION : 1.4 FOOD_AND_DRINK : 1.24 EDUCATION : 1.16 ENTERTAINMENT : 0.96 LIBRARIES_AND_DEMO : 0.94 AUTO_AND_VEHICLES : 0.93 HOUSE_AND_HOME : 0.82 WEATHER : 0.8 EVENTS : 0.71 PARENTING : 0.65 ART_AND_DESIGN : 0.64 COMICS : 0.62 BEAUTY : 0.6 =====Google Play Store - By Genre ===== Tools : 8.45 Entertainment : 6.07 Education : 5.35 Business : 4.59 Productivity : 3.89 Lifestyle : 3.89 Finance : 3.7 Medical : 3.53 Sports : 3.46 Personalization : 3.32 Communication : 3.24 Action : 3.1 Health & Fitness : 3.08 Photography : 2.94 News & Magazines : 2.8 Social : 2.66 Travel & Local : 2.32 Shopping : 2.25 Books & Reference : 2.14 Simulation : 2.04 Dating : 1.86 Arcade : 1.85 Video Players & Editors : 1.77 Casual : 1.76 Maps & Navigation : 1.4 Food & Drink : 1.24 Puzzle : 1.13 Racing : 0.99 Role Playing : 0.94 Libraries & Demo : 0.94 Auto & Vehicles : 0.93 Strategy : 0.9 House & Home : 0.82 Weather : 0.8 Events : 0.71 Adventure : 0.68 Comics : 0.61 Beauty : 0.6 Art & Design : 0.6 Parenting : 0.5 Card : 0.45 Casino : 0.43 Trivia : 0.42 Educational;Education : 0.39 Board : 0.38 Educational : 0.37 Education;Education : 0.34 Word : 0.26 Casual;Pretend Play : 0.24 Music : 0.2 Racing;Action & Adventure : 0.17 Puzzle;Brain Games : 0.17 Entertainment;Music & Video : 0.17 Casual;Brain Games : 0.14 Casual;Action & Adventure : 0.14 Arcade;Action & Adventure : 0.12 Action;Action & Adventure : 0.1 Educational;Pretend Play : 0.09 Simulation;Action & Adventure : 0.08 Parenting;Education : 0.08 Entertainment;Brain Games : 0.08 Board;Brain Games : 0.08 Parenting;Music & Video : 0.07 Educational;Brain Games : 0.07 Casual;Creativity : 0.07 Art & Design;Creativity : 0.07 Education;Pretend Play : 0.06 Role Playing;Pretend Play : 0.05 Education;Creativity : 0.05 Role Playing;Action & Adventure : 0.03 Puzzle;Action & Adventure : 0.03 Entertainment;Creativity : 0.03 Entertainment;Action & Adventure : 0.03 Educational;Creativity : 0.03 Educational;Action & Adventure : 0.03 Education;Music & Video : 0.03 Education;Brain Games : 0.03 Education;Action & Adventure : 0.03 Adventure;Action & Adventure : 0.03 Video Players & Editors;Music & Video : 0.02 Sports;Action & Adventure : 0.02 Simulation;Pretend Play : 0.02 Puzzle;Creativity : 0.02 Music;Music & Video : 0.02 Entertainment;Pretend Play : 0.02 Casual;Education : 0.02 Board;Action & Adventure : 0.02 Video Players & Editors;Creativity : 0.01 Trivia;Education : 0.01 Travel & Local;Action & Adventure : 0.01 Tools;Education : 0.01 Strategy;Education : 0.01 Strategy;Creativity : 0.01 Strategy;Action & Adventure : 0.01 Simulation;Education : 0.01 Role Playing;Brain Games : 0.01 Racing;Pretend Play : 0.01 Puzzle;Education : 0.01 Parenting;Brain Games : 0.01 Music & Audio;Music & Video : 0.01 Lifestyle;Pretend Play : 0.01 Lifestyle;Education : 0.01 Health & Fitness;Education : 0.01 Health & Fitness;Action & Adventure : 0.01 Entertainment;Education : 0.01 Communication;Creativity : 0.01 Comics;Creativity : 0.01 Casual;Music & Video : 0.01 Card;Action & Adventure : 0.01 Books & Reference;Education : 0.01 Art & Design;Pretend Play : 0.01 Art & Design;Action & Adventure : 0.01 Arcade;Pretend Play : 0.01 Adventure;Education : 0.01 : 0.01
Before we look at the profile of apps, in Android dataset, we have both Category and Genre columns. And looking at the frequency table above, it is clear that the Genre column is more granular and seem to have a sub-category level information.
Since at this point our analysis is going to involve high level categorisation, we will continue our analysis only with Category column for Android dataset.
At first when we look at the frequency table based on category column we see that Android market seem to have a more balanced spread of apps across categories unlike Apple app store. Also, we have more apps that are for practical (such as productivity, finance, family, tools) than fun purposes.
However, when we look at the apps in the FAMILY category (about 19%) for example, most of them are game apps for kids. Even then, we see apps more for practical purposes than fun unlike Apple.
Up to this point, we found that the Apple app Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.
Let's start with calculating the average number of user ratings per app genre on the App Store.
app_store_genres = freq_table(apple_final, 11)
genre_avg_ratings = {}
for genre in app_store_genres:
total = 0 #total apps in genre
len_genre = 0 #total rating count of all apps in genre
for app in apple_final:
genre_app = app[11]
if genre == genre_app:
app_usr_rating_tot = int(app[5])
len_genre += 1
total += app_usr_rating_tot
avg_genre = round(total/len_genre)
genre_avg_ratings[genre] = avg_genre
display_table(genre_avg_ratings)
Navigation : 86090 Reference : 74942 Social Networking : 71548 Music : 57327 Weather : 52280 Book : 39758 Food & Drink : 33334 Finance : 31468 Photo & Video : 28442 Travel : 28244 Shopping : 26920 Health & Fitness : 23298 Sports : 23009 Games : 22789 News : 21248 Productivity : 21028 Utilities : 18684 Lifestyle : 16486 Entertainment : 14030 Business : 7491 Education : 7004 Catalogs : 4004 Medical : 612
Even though games genre had larger proportion of apps, we see that Navigation and Reference genres have more average users.
Before making any conclusions, let's dig a little deeper on the apps in these genre.
def top_apps_by_genre(dataset, genre, genre_index, appname_index, users_index, top_n = 5, pct = False):
genre_apps = []
total_genre_users = 0
for app in dataset:
app_genre = app[genre_index]
app_name = app[appname_index]
app_users = int((app[users_index].replace(',','')).replace('+',''))
if app_genre == genre:
total_genre_users += app_users
app_tupple = (app_users, app_name)
genre_apps.append(app_tupple)
top = 0
print('*'*5,'Top',top_n,'apps for',genre,'*'*5)
for app in sorted(genre_apps, reverse = True):
top += 1
if top > top_n:
print('\n')
break
app_name = app[1]
app_users = app[0]
if pct == True:
if total_genre_users != 0:
app_user_pct = round((app_users / total_genre_users) * 100,2)
else:
app_user_pct = 0
print(app_name,':',app_users, '(' + str(app_user_pct)+'%)')
else:
print(app_name,':',app_users)
top_apps_by_genre(dataset = apple_final, genre="Navigation", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = apple_final, genre="Reference", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = apple_final, genre="Social Networking", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = apple_final, genre="Music", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = apple_final, genre="Weather", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = apple_final, genre="Book", genre_index=-5, appname_index=1, users_index=5, pct=True)
top_apps_by_genre(dataset = apple_final, genre="Finance", genre_index=-5, appname_index=1, users_index=5, pct=True)
***** Top 5 apps for Navigation ***** Waze - GPS Navigation, Maps & Real-time Traffic : 345046 (66.8%) Google Maps - Navigation & Transit : 154911 (29.99%) Geocaching® : 12811 (2.48%) CoPilot GPS – Car Navigation & Offline Maps : 3582 (0.69%) ImmobilienScout24: Real Estate Search in Germany : 187 (0.04%) ***** Top 5 apps for Reference ***** Bible : 985920 (73.09%) Dictionary.com Dictionary & Thesaurus : 200047 (14.83%) Dictionary.com Dictionary & Thesaurus for iPad : 54175 (4.02%) Google Translate : 26786 (1.99%) Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 (1.37%) ***** Top 5 apps for Social Networking ***** Facebook : 2974676 (39.22%) Pinterest : 1061624 (14.0%) Skype for iPhone : 373519 (4.93%) Messenger : 351466 (4.63%) Tumblr : 334293 (4.41%) ***** Top 5 apps for Music ***** Pandora - Music & Radio : 1126879 (29.78%) Spotify Music : 878563 (23.22%) Shazam - Discover music, artists, videos & lyrics : 402925 (10.65%) iHeartRadio – Free Music & Radio Stations : 293228 (7.75%) SoundCloud - Music & Audio : 135744 (3.59%) ***** Top 5 apps for Weather ***** The Weather Channel: Forecast, Radar & Alerts : 495626 (33.86%) The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648 (14.25%) WeatherBug - Local Weather, Radar, Maps, Alerts : 188583 (12.88%) MyRadar NOAA Weather Radar Forecast : 150158 (10.26%) AccuWeather - Weather for Life : 144214 (9.85%) ***** Top 5 apps for Book ***** Kindle – Read eBooks, Magazines & Textbooks : 252076 (45.29%) Audible – audio books, original series & podcasts : 105274 (18.91%) Color Therapy Adult Coloring Book for Adults : 84062 (15.1%) OverDrive – Library eBooks and Audiobooks : 65450 (11.76%) HOOKED - Chat Stories : 47829 (8.59%) ***** Top 5 apps for Finance ***** Chase Mobile℠ : 233270 (20.59%) Mint: Personal Finance, Budget, Bills & Money : 232940 (20.56%) Bank of America - Mobile Banking : 119773 (10.57%) PayPal - Send and request money safely : 119487 (10.55%) Credit Karma: Free Credit Scores, Reports & Alerts : 101679 (8.98%)
Looking at the Top 5 apps for few genres which command highest users, we see the following patterns:
Even though Book genre is dominated by Amazon, we see promise that small business apps such as "Color Therapy Adult Coloring Book for Adults", "HOOKED - Chat Stories" and "OverDrive – Library eBooks and Audiobooks" have good percentage of user base too (Combined ~35%).
So if we bring a standalone app in the Book genre perhaps of some famous best seller book there is potential for maximising ad-revenue by keeping the user within our app for longer time or creativity app such as kids or adult colouring book.
Note: We do need to check about the rights and partnership for published books, but more importantly it should not already been an eBook/Audio book through Amazon's platform.
Finance genre is interesting:
Even though this genre requires deeper domain expertise, but if we could partner with some wealth management company or Financial adviser then we have a potential to add significant value to the user at the same time maximising the ad-revenue related to the financial advise in our app.
From all of the genres in the AppStore, two genres definetly emerge as potentials in the AppStore:
Both of these definetly fits our theme of apps for practical use.
Let's now explore the Google PlayStore.
In PlayStore dataset, the Installs column has a range such as 1,000+, 10,000+. But for our analysis we will consider these as hard numbers (By replacing "," and "+") as we are going to employ the same technique across the dataset it should not cause any error in judgment.
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
play_store_genres = freq_table(android_final, 1)
genre_avg_ratings = {}
for genre in play_store_genres:
total = 0 #total apps in genre
len_genre = 0 #total rating count of all apps in genre
for app in android_final:
genre_app = app[1]
if genre == genre_app:
app_usr_rating_tot = int((app[5].replace(',','')).replace('+',''))
len_genre += 1
total += app_usr_rating_tot
avg_genre = round(total/len_genre)
genre_avg_ratings[genre] = avg_genre
display_table(genre_avg_ratings)
COMMUNICATION : 38456119 VIDEO_PLAYERS : 24727872 SOCIAL : 23253652 PHOTOGRAPHY : 17840110 PRODUCTIVITY : 16787331 GAME : 15588016 TRAVEL_AND_LOCAL : 13984078 ENTERTAINMENT : 11640706 TOOLS : 10801391 NEWS_AND_MAGAZINES : 9549178 BOOKS_AND_REFERENCE : 8767812 SHOPPING : 7036877 PERSONALIZATION : 5201483 WEATHER : 5074486 HEALTH_AND_FITNESS : 4188822 MAPS_AND_NAVIGATION : 4056942 FAMILY : 3697848 SPORTS : 3638640 ART_AND_DESIGN : 1986335 FOOD_AND_DRINK : 1924898 EDUCATION : 1833495 BUSINESS : 1712290 LIFESTYLE : 1433676 FINANCE : 1387692 HOUSE_AND_HOME : 1331541 DATING : 854029 COMICS : 817657 AUTO_AND_VEHICLES : 647318 LIBRARIES_AND_DEMO : 638504 PARENTING : 542604 BEAUTY : 513152 EVENTS : 253542 MEDICAL : 120551
Even though the categorisation on the PlayStore is slightly different to the AppStore, there are some common themes and categories. But PlayStore user base shows good promise for both fun and practical purpose apps.
Let's drill a bit more into the top 3 categories here and also we will explore two potentials from the App store - Books and Finance genre.
top_apps_by_genre(dataset = android_final, genre="COMMUNICATION", genre_index=1, appname_index=0, users_index=5, pct=True)
top_apps_by_genre(dataset = android_final, genre="VIDEO_PLAYERS", genre_index=1, appname_index=0, users_index=5, pct=True)
top_apps_by_genre(dataset = android_final, genre="PRODUCTIVITY", genre_index=1, appname_index=0, users_index=5, pct=True)
top_apps_by_genre(dataset = android_final, genre="FINANCE", genre_index=1, appname_index=0, users_index=5, pct=True)
top_apps_by_genre(dataset = android_final, genre="BOOKS_AND_REFERENCE", genre_index=1, appname_index=0, users_index=5, pct=True)
***** Top 5 apps for COMMUNICATION ***** WhatsApp Messenger : 1000000000 (9.06%) Skype - free IM & video calls : 1000000000 (9.06%) Messenger – Text and Video Chat for Free : 1000000000 (9.06%) Hangouts : 1000000000 (9.06%) Google Chrome: Fast & Secure : 1000000000 (9.06%) ***** Top 5 apps for VIDEO_PLAYERS ***** YouTube : 1000000000 (25.43%) Google Play Movies & TV : 1000000000 (25.43%) MX Player : 500000000 (12.72%) VivaVideo - Video Editor & Photo Movie : 100000000 (2.54%) VideoShow-Video Editor, Video Maker, Beauty Camera : 100000000 (2.54%) ***** Top 5 apps for PRODUCTIVITY ***** Google Drive : 1000000000 (17.27%) Microsoft Word : 500000000 (8.63%) Google Calendar : 500000000 (8.63%) Dropbox : 500000000 (8.63%) Cloud Print : 500000000 (8.63%) ***** Top 5 apps for FINANCE ***** Google Pay : 100000000 (21.97%) PayPal : 50000000 (10.99%) İşCep : 10000000 (2.2%) Wells Fargo Mobile : 10000000 (2.2%) Mobile Bancomer : 10000000 (2.2%) ***** Top 5 apps for BOOKS_AND_REFERENCE ***** Google Play Books : 1000000000 (60.03%) Wattpad 📖 Free Books : 100000000 (6.0%) Bible : 100000000 (6.0%) Audiobooks from Audible : 100000000 (6.0%) Amazon Kindle : 100000000 (6.0%)
for app in android_final:
if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
or app[5] == '500,000,000+'
or app[5] == '100,000,000+'):
print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+ imo beta free calls and text : 100,000,000+ Android Messages : 100,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ Who : 100,000,000+ GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ Firefox Browser fast & private : 100,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Messenger Lite: Free Calls & Messages : 100,000,000+ Kik : 100,000,000+ KakaoTalk: Free Calls & Text : 100,000,000+ Opera Mini - fast web browser : 100,000,000+ Opera Browser: Fast and Secure : 100,000,000+ Telegram : 100,000,000+ Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+ UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+ Viber Messenger : 500,000,000+ WeChat : 100,000,000+ Yahoo Mail – Stay Organized : 100,000,000+ BBM - Free Calls & Messages : 100,000,000+
If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:
under_100_m = []
for app in android_final:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
under_100_m.append(float(n_installs))
sum(under_100_m) / len(under_100_m)
3603485.3884615386
We see a similar pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).
Again, these app genres might seem more popular than reality and moreover, these niches seem to be dominated by a few giants who are hard to compete against.
The game genre seems popular here too, but previously we found that in App Store this market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.
The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811, but, here again we see few giants dominating the market such as Amazon, Google and Bible.
To reiterate our primary goal - We are going to launch the app in both AppStore and PlayStore albeit different timeline.
So from that perspective, we are looking at the Top 5 apps on user base by the same genre/category in the AppStore we listed as potentials (Books and Finance).
Here again the Books category shows dominance by big player with their marketplace apps such as Google or Amazon.
This makes it especially hard to launch a famous/best selling book as app without having publishing license/rights issue with these big players is going to be hard.
Our recommendation at this stage is to develop a Personal Finance app in both AppStore and PlayStore.
Note: