This project aims to identify mobile profitable app profiles for both Android and iOS market.
Problem to solve: our client develops free apps and would like to know what app types would be more profitable to build. The business model of the company is building free apps to generate revenue from in-app ads. Thus, our target is to identify app profiles that have a great user reach.
We are going to use two datasets. Both are a sample of the total number of apps available on App Store and Google Play Store.
Google Play data set includes aproximately ten thousand iOS apps. App Store dataset includes data about around seven thousand android apps.
Here are the links to both datasets:
To avoid duplication of code, let's create a function open_dataset
that we are going to use to open the files.
from csv import reader
def open_dataset(dataset):
opened_file = open(dataset)
read_file = reader(opened_file)
data = list(read_file)
header = dataset[0]
file = dataset[1:]
return data
#open AppleStore data set
applestore_dataset = open_dataset('AppleStore.csv')
#open GooglePlay data set
googleplaystore_dataset = open_dataset('googleplaystore.csv')
Next we are going to define a function explore_data
to view the headers, rows content, numbers of columns and rows for each data set.
def explore_data(dataset, start, end, rows_and_columns = False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
data_google = explore_data(googleplaystore_dataset, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] Number of rows: 10842 Number of columns: 13
Looking at the Google data set columns, we can identify some which could potentially be helpful to look closer at: Category, Genre, Price, Reviews.
data_apple = explore_data(applestore_dataset, 0, 3, True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] Number of rows: 7198 Number of columns: 16
Looking at the Apple data set, columns like track_name, price, rating_count_tot, prime genre could be useful to answer to our question.
Discussions on Google Play data set suggest row 10472 has missing or erroneus data. Let's compare the row against the header and if inconsistent, delete it.
print(googleplaystore_dataset[0])
print('\n')
print(googleplaystore_dataset[10473])
del googleplaystore_dataset[10473]
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
Row 10472 proved to be correct, but row 10473 seemed to have missing value for 'Category' column.
google_unique_apps_list = []
google_duplicates_apps_list = []
for app in googleplaystore_dataset:
name = app[0]
if name in google_unique_apps_list:
google_duplicates_apps_list.append(name)
else:
google_unique_apps_list.append(name)
print("Number of duplicates apps in Google Play data set:", len(google_duplicates_apps_list))
print('\n')
print("Examples of duplicates apps in Google Play data set:", google_duplicates_apps_list[0:10])
Number of duplicates apps in Google Play data set: 1181 Examples of duplicates apps in Google Play data set: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']
apple_unique_apps_list = []
apple_duplicates_apps_list = []
for app in applestore_dataset:
name = app[0]
if name in apple_unique_apps_list:
apple_duplicates_apps_list.append(name)
else:
apple_unique_apps_list.append(name)
print("Number of duplicates apps in Apple data set:", len(apple_duplicates_apps_list))
print('\n')
print("Examples of duplicates apps in Apple data set:", apple_duplicates_apps_list[0:10])
Number of duplicates apps in Apple data set: 0 Examples of duplicates apps in Apple data set: []
Google Play data set has 1181 duplicate entries while Apple data set has 0.
# check how a duplicate looks like
for app in googleplaystore_dataset:
name = app[0]
if name == "Slack":
print(app)
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device'] ['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device'] ['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
Looking closer at our duplicates, it seems that some of them have different rating numbers. We are going to keep the app version that has the highest number of ratings. We assume these include the most recent numbers.
Next we are going to remove duplicates from Google Play data set.
To do that, we will:
reviews_max
that includse app names as keys and the value is going to be the highest number of ratingsreviews_max = {}
for app in googleplaystore_dataset[1:]:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
print('Initial number of rows:', len(googleplaystore_dataset))
print('Number of rows after removing duplicates:',len(reviews_max))
Initial number of rows: 10841 Number of rows after removing duplicates: 9659
Next lines of code are going to create a dataset without duplicates by:
android clean
and already_added
reviews_max
dictionary.already added
to make sure we remove duplicates where the number of reviews is the same for all the entries.#this list will store our new clean data set (the above dictionary)
android_clean = []
#this list will store just app names
already_added = []
for app in googleplaystore_dataset[1:]:
name = app[0]
n_reviews = float(app[3])
if n_reviews == reviews_max[name] and name not in already_added:
android_clean.append(app)
already_added.append(name)
print('Number of rows without duplicates:', len(android_clean))
Number of rows without duplicates: 9659
We want to develop an app that is directed toward an English speaking audience. Thus, we want to remove any apps with names which include symbols not used in English text.
Following lines of code define a function is_english
that is going to loop through the input and assess weather the characters of the string have an assigned number in ASCII system greater than 127.
#define is_english function to identify non-english characters
def is_english(string):
non_ascii = 0
for character in string:
if ord(character) > 127:
non_ascii += 1
if non_ascii > 3:
return False
else:
return True
#test our function
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english("爱奇艺PPS -《欢乐颂2》电视剧热播"))
True True False
Next, we are going to apply the above function to both data sets to remove any apps which include non-English characters.
english_apps_google = []
english_apps_apple = []
for app in android_clean:
name = app[0]
if is_english(name):
english_apps_google.append(app)
for app in applestore_dataset:
name = app[1]
if is_english(name):
english_apps_apple.append(app)
print(len(english_apps_apple))
len(english_apps_google)
explore_data(english_apps_apple, 0, 3, True)
print('\n')
explore_data(english_apps_google, 0, 3, True)
6184 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] Number of rows: 6184 Number of columns: 16 ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9614 Number of columns: 13
We are left with 9614 Android apps and 6184 iOS apps.
We want to build a free app, so we want to keep only representative data for our project. Thus, we will keep free apps only in our data sets.
free_apps_google = []
free_apps_apple = []
for app in english_apps_google:
name = app[0]
price = app[6]
if price == "Free":
free_apps_google.append(app)
for app in english_apps_apple:
name = app[1]
price = app[4]
if price == "0.0":
free_apps_apple.append(app)
print('iOS, number of rows after isolating free apps:', len(free_apps_apple))
print('Android, number of rows after isolating free apps:', len(free_apps_google))
iOS, number of rows after isolating free apps: 3222 Android, number of rows after isolating free apps: 8863
We now have with 8863 Android apps and 3222 iOS apps
As our goal is to determine what kind of apps are more likely to attract users, we want to determine what app profiles are more popular on both Android and iOS markets.
We'll build two functions to analyze the frequency tables:
freq_tables
to generate frequency tables that show percentagesdisplay_table
to display the percentages in a descending orderdef freq_table(dataset, index):
freq_apps = {}
total = 0
for app in dataset:
total +=1
column = app[index]
if column in freq_apps:
freq_apps[column] += 1
else:
freq_apps[column] = 1
percentages = {}
for app in freq_apps:
percentage = (freq_apps[app]/total) * 100
percentages[app] = percentage
return percentages
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
Let's start with column 'prime_genre' from iOS data set
#display percentage for prime_genre column
display_table(free_apps_apple, 11)
Games : 58.16263190564867 Entertainment : 7.883302296710118 Photo & Video : 4.9658597144630665 Education : 3.662321539416512 Social Networking : 3.2898820608317814 Shopping : 2.60707635009311 Utilities : 2.5139664804469275 Sports : 2.1415270018621975 Music : 2.0484171322160147 Health & Fitness : 2.0173805090006205 Productivity : 1.7380509000620732 Lifestyle : 1.5828677839851024 News : 1.3345747982619491 Travel : 1.2414649286157666 Finance : 1.1173184357541899 Weather : 0.8690254500310366 Food & Drink : 0.8069522036002483 Reference : 0.5586592178770949 Business : 0.5276225946617008 Book : 0.4345127250155183 Navigation : 0.186219739292365 Medical : 0.186219739292365 Catalogs : 0.12414649286157665
More than a half apps from App Store data set seem to be games app, followed by entertainment and reacreational apps.
Android data set seem to have two columns which are relevent for the app genre: Genres and Category. Let's look into both.
#display percentage for Genres column
display_table(free_apps_google, 9 )
Tools : 8.450863138892023 Entertainment : 6.070179397495204 Education : 5.348076272142616 Business : 4.592124562789123 Productivity : 3.8925871601038025 Lifestyle : 3.8925871601038025 Finance : 3.7007785174320205 Medical : 3.5315355974275078 Sports : 3.463838429425702 Personalization : 3.317161232088458 Communication : 3.2381812027530184 Action : 3.102786866749408 Health & Fitness : 3.0802211440821394 Photography : 2.944826808078529 News & Magazines : 2.798149610741284 Social : 2.6627552747376737 Travel & Local : 2.324269434728647 Shopping : 2.245289405393208 Books & Reference : 2.1437436533904997 Simulation : 2.042197901387792 Dating : 1.8616721200496444 Arcade : 1.8503892587160102 Video Players & Editors : 1.771409229380571 Casual : 1.7601263680469368 Maps & Navigation : 1.399074805370642 Food & Drink : 1.241114746699763 Puzzle : 1.128286133363421 Racing : 0.9928917973598104 Role Playing : 0.9364774906916393 Libraries & Demo : 0.9364774906916393 Auto & Vehicles : 0.9251946293580051 Strategy : 0.9026289066907368 House & Home : 0.8236488773552973 Weather : 0.8010831546880289 Events : 0.7108202640189552 Adventure : 0.6769716800180525 Comics : 0.6092745120162473 Beauty : 0.5979916506826132 Art & Design : 0.5979916506826132 Parenting : 0.4964458986799052 Card : 0.4513144533453684 Casino : 0.42874873067809993 Trivia : 0.4174658693444658 Educational;Education : 0.3949001466771973 Board : 0.38361728534356315 Educational : 0.37233442400992894 Education;Education : 0.33848584000902626 Word : 0.25950581067358686 Casual;Pretend Play : 0.2369400880063184 Music : 0.20309150400541578 Racing;Action & Adventure : 0.16924292000451313 Puzzle;Brain Games : 0.16924292000451313 Entertainment;Music & Video : 0.16924292000451313 Casual;Brain Games : 0.1353943360036105 Casual;Action & Adventure : 0.1353943360036105 Arcade;Action & Adventure : 0.1241114746699763 Action;Action & Adventure : 0.10154575200270789 Educational;Pretend Play : 0.09026289066907367 Simulation;Action & Adventure : 0.07898002933543948 Parenting;Education : 0.07898002933543948 Entertainment;Brain Games : 0.07898002933543948 Board;Brain Games : 0.07898002933543948 Parenting;Music & Video : 0.06769716800180525 Educational;Brain Games : 0.06769716800180525 Casual;Creativity : 0.06769716800180525 Art & Design;Creativity : 0.06769716800180525 Education;Pretend Play : 0.05641430666817105 Role Playing;Pretend Play : 0.045131445334536835 Education;Creativity : 0.045131445334536835 Role Playing;Action & Adventure : 0.033848584000902626 Puzzle;Action & Adventure : 0.033848584000902626 Entertainment;Creativity : 0.033848584000902626 Entertainment;Action & Adventure : 0.033848584000902626 Educational;Creativity : 0.033848584000902626 Educational;Action & Adventure : 0.033848584000902626 Education;Music & Video : 0.033848584000902626 Education;Brain Games : 0.033848584000902626 Education;Action & Adventure : 0.033848584000902626 Adventure;Action & Adventure : 0.033848584000902626 Video Players & Editors;Music & Video : 0.022565722667268417 Sports;Action & Adventure : 0.022565722667268417 Simulation;Pretend Play : 0.022565722667268417 Puzzle;Creativity : 0.022565722667268417 Music;Music & Video : 0.022565722667268417 Entertainment;Pretend Play : 0.022565722667268417 Casual;Education : 0.022565722667268417 Board;Action & Adventure : 0.022565722667268417 Video Players & Editors;Creativity : 0.011282861333634209 Trivia;Education : 0.011282861333634209 Travel & Local;Action & Adventure : 0.011282861333634209 Tools;Education : 0.011282861333634209 Strategy;Education : 0.011282861333634209 Strategy;Creativity : 0.011282861333634209 Strategy;Action & Adventure : 0.011282861333634209 Simulation;Education : 0.011282861333634209 Role Playing;Brain Games : 0.011282861333634209 Racing;Pretend Play : 0.011282861333634209 Puzzle;Education : 0.011282861333634209 Parenting;Brain Games : 0.011282861333634209 Music & Audio;Music & Video : 0.011282861333634209 Lifestyle;Pretend Play : 0.011282861333634209 Lifestyle;Education : 0.011282861333634209 Health & Fitness;Education : 0.011282861333634209 Health & Fitness;Action & Adventure : 0.011282861333634209 Entertainment;Education : 0.011282861333634209 Communication;Creativity : 0.011282861333634209 Comics;Creativity : 0.011282861333634209 Casual;Music & Video : 0.011282861333634209 Card;Action & Adventure : 0.011282861333634209 Books & Reference;Education : 0.011282861333634209 Art & Design;Pretend Play : 0.011282861333634209 Art & Design;Action & Adventure : 0.011282861333634209 Arcade;Pretend Play : 0.011282861333634209 Adventure;Education : 0.011282861333634209
#display percentage for Category column
print('\n')
display_table(free_apps_google, 1 )
FAMILY : 18.898792733837304 GAME : 9.725826469592688 TOOLS : 8.462146000225657 BUSINESS : 4.592124562789123 LIFESTYLE : 3.9038700214374367 PRODUCTIVITY : 3.8925871601038025 FINANCE : 3.7007785174320205 MEDICAL : 3.5315355974275078 SPORTS : 3.396141261423897 PERSONALIZATION : 3.317161232088458 COMMUNICATION : 3.2381812027530184 HEALTH_AND_FITNESS : 3.0802211440821394 PHOTOGRAPHY : 2.944826808078529 NEWS_AND_MAGAZINES : 2.798149610741284 SOCIAL : 2.6627552747376737 TRAVEL_AND_LOCAL : 2.335552296062281 SHOPPING : 2.245289405393208 BOOKS_AND_REFERENCE : 2.1437436533904997 DATING : 1.8616721200496444 VIDEO_PLAYERS : 1.7939749520478394 MAPS_AND_NAVIGATION : 1.399074805370642 FOOD_AND_DRINK : 1.241114746699763 EDUCATION : 1.1621347173643235 ENTERTAINMENT : 0.9590432133589079 LIBRARIES_AND_DEMO : 0.9364774906916393 AUTO_AND_VEHICLES : 0.9251946293580051 HOUSE_AND_HOME : 0.8236488773552973 WEATHER : 0.8010831546880289 EVENTS : 0.7108202640189552 PARENTING : 0.6544059573507841 ART_AND_DESIGN : 0.6431230960171499 COMICS : 0.6205573733498815 BEAUTY : 0.5979916506826132
Android data set seems to have a more diverse range of common genres as compared to Apple. Moving forward, we will use column Category as we are rather more interested in the bigger picture.
We are going to look at number of installs for Google Play data and at rating_count_tot for App Store data to determine the most popular apps.
Next we are going to calculate the average number of user ratings per app genre on the App Store:
unique_genres = freq_table(free_apps_apple, -5)
for genre in unique_genres:
total = 0
len_genre = 0
for app in free_apps_apple:
genre_app = app[-5]
if genre_app == genre:
n_ratings = float(app[5])
total += n_ratings
len_genre += 1
avg_n_ratings = total / len_genre
print(genre, ':', avg_n_ratings)
Social Networking : 71548.34905660378 Photo & Video : 28441.54375 Games : 22788.6696905016 Music : 57326.530303030304 Reference : 74942.11111111111 Health & Fitness : 23298.015384615384 Weather : 52279.892857142855 Utilities : 18684.456790123455 Travel : 28243.8 Shopping : 26919.690476190477 News : 21248.023255813954 Navigation : 86090.33333333333 Lifestyle : 16485.764705882353 Entertainment : 14029.830708661417 Food & Drink : 33333.92307692308 Sports : 23008.898550724636 Book : 39758.5 Finance : 31467.944444444445 Education : 7003.983050847458 Productivity : 21028.410714285714 Business : 7491.117647058823 Catalogs : 4004.0 Medical : 612.0
Navigation seems to be the most popular genre in App Store data set. Let's have a closer look into top 5 genres.
for app in free_apps_apple:
if app[-5] == 'Navigation':
print(app[1], ':', app[5])
Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5
It seems that Waze and Google Maps are outliers and influence the average of number of ratings for this genre.
for app in free_apps_apple:
if app[-5] == 'Social Networking':
print(app[1], ':', app[5])
Facebook : 2974676 Pinterest : 1061624 Skype for iPhone : 373519 Messenger : 351466 Tumblr : 334293 WhatsApp Messenger : 287589 Kik : 260965 ooVoo – Free Video Call, Text and Voice : 177501 TextNow - Unlimited Text + Calls : 164963 Viber Messenger – Text & Call : 164249 Followers - Social Analytics For Instagram : 112778 MeetMe - Chat and Meet New People : 97072 We Heart It - Fashion, wallpapers, quotes, tattoos : 90414 InsTrack for Instagram - Analytics Plus More : 85535 Tango - Free Video Call, Voice and Chat : 75412 LinkedIn : 71856 Match™ - #1 Dating App. : 60659 Skype for iPad : 60163 POF - Best Dating App for Conversations : 52642 Timehop : 49510 Find My Family, Friends & iPhone - Life360 Locator : 43877 Whisper - Share, Express, Meet : 39819 Hangouts : 36404 LINE PLAY - Your Avatar World : 34677 WeChat : 34584 Badoo - Meet New People, Chat, Socialize. : 34428 Followers + for Instagram - Follower Analytics : 28633 GroupMe : 28260 Marco Polo Video Walkie Talkie : 27662 Miitomo : 23965 SimSimi : 23530 Grindr - Gay and same sex guys chat, meet and date : 23201 Wishbone - Compare Anything : 20649 imo video calls and chat : 18841 After School - Funny Anonymous School News : 18482 Quick Reposter - Repost, Regram and Reshare Photos : 17694 Weibo HD : 16772 Repost for Instagram : 15185 Live.me – Live Video Chat & Make Friends Nearby : 14724 Nextdoor : 14402 Followers Analytics for Instagram - InstaReport : 13914 YouNow: Live Stream Video Chat : 12079 FollowMeter for Instagram - Followers Tracking : 11976 LINE : 11437 eHarmony™ Dating App - Meet Singles : 11124 Discord - Chat for Gamers : 9152 QQ : 9109 Telegram Messenger : 7573 Weibo : 7265 Periscope - Live Video Streaming Around the World : 6062 Chat for Whatsapp - iPad Version : 5060 QQ HD : 5058 Followers Analysis Tool For Instagram App Free : 4253 live.ly - live video streaming : 4145 Houseparty - Group Video Chat : 3991 SOMA Messenger : 3232 Monkey : 3060 Down To Lunch : 2535 Flinch - Video Chat Staring Contest : 2134 Highrise - Your Avatar Community : 2011 LOVOO - Dating Chat : 1985 PlayStation®Messages : 1918 BOO! - Video chat camera with filters & stickers : 1805 Qzone : 1649 Chatous - Chat with new people : 1609 Kiwi - Q&A : 1538 GhostCodes - a discovery app for Snapchat : 1313 Jodel : 1193 FireChat : 1037 Google Duo - simple video calling : 1033 Fiesta by Tango - Chat & Meet New People : 885 Google Allo — smart messaging : 862 Peach — share vividly : 727 Hey! VINA - Where Women Meet New Friends : 719 Battlefield™ Companion : 689 All Devices for WhatsApp - Messenger for iPad : 682 Chat for Pokemon Go - GoChat : 500 IAmNaughty – Dating App to Meet New People Online : 463 Qzone HD : 458 Zenly - Locate your friends in realtime : 427 League of Legends Friends : 420 豆瓣 : 407 Candid - Speak Your Mind Freely : 398 知乎 : 397 Selfeo : 366 Fake-A-Location Free ™ : 354 Popcorn Buzz - Free Group Calls : 281 Fam — Group video calling for iMessage : 279 QQ International : 274 Ameba : 269 SoundCloud Pulse: for creators : 240 Tantan : 235 Cougar Dating & Life Style App for Mature Women : 213 Rawr Messenger - Dab your chat : 180 WhenToPost: Best Time to Post Photos for Instagram : 158 Inke—Broadcast an amazing life : 147 Mustknow - anonymous video Q&A : 53 CTFxCmoji : 39 Lobi : 36 Chain: Collaborate On MyVideo Story/Group Video : 35 botman - Real time video chat : 7 BestieBox : 0 MATCH ON LINE chat : 0 niconico ch : 0 LINE BLOG : 0 bit-tube - Live Stream Video Chat : 0
We can notice a similar pattern also for Social Networking column. The average number of user ratings is heavily influenced by outliers like Facebook, Pinterest, Skype, Messenger.
for app in free_apps_apple:
if app[-5] == 'Music':
print(app[1], ':', app[5])
Pandora - Music & Radio : 1126879 Spotify Music : 878563 Shazam - Discover music, artists, videos & lyrics : 402925 iHeartRadio – Free Music & Radio Stations : 293228 SoundCloud - Music & Audio : 135744 Magic Piano by Smule : 131695 Smule Sing! : 119316 TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420 Amazon Music : 106235 SoundHound Song Search & Music Player : 82602 Sonos Controller : 48905 Bandsintown Concerts : 30845 Karaoke - Sing Karaoke, Unlimited Songs! : 28606 My Mixtapez Music : 26286 Sing Karaoke Songs Unlimited with StarMaker : 26227 Ringtones for iPhone & Ringtone Maker : 25403 Musi - Unlimited Music For YouTube : 25193 AutoRap by Smule : 18202 Spinrilla - Mixtapes For Free : 15053 Napster - Top Music & Radio : 14268 edjing Mix:DJ turntable to remix and scratch music : 13580 Free Music - MP3 Streamer & Playlist Manager Pro : 13443 Free Piano app by Yokee : 13016 Google Play Music : 10118 Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975 TIDAL : 7398 YouTube Music : 7109 Nicki Minaj: The Empire : 5196 Sounds app - Music And Friends : 5126 SongFlip - Free Music Streamer : 5004 Simple Radio - Live AM & FM Radio Stations : 4787 Deezer - Listen to your Favorite Music & Playlists : 4677 Ringtones for iPhone with Ringtone Maker : 4013 Bose SoundTouch : 3687 Amazon Alexa : 3018 DatPiff : 2815 Trebel Music - Unlimited Music Downloader : 2570 Free Music Play - Mp3 Streamer & Player : 2496 Acapella from PicPlayPost : 2487 Coach Guitar - Lessons & Easy Tabs For Beginners : 2416 Musicloud - MP3 and FLAC Music Player for Cloud Platforms. : 2211 Piano - Play Keyboard Music Games with Magic Tiles : 1636 Boom: Best Equalizer & Magical Surround Sound : 1375 Music Freedom - Unlimited Free MP3 Music Streaming : 1246 AmpMe - A Portable Social Party Music Speaker : 1047 Medly - Music Maker : 933 Bose Connect : 915 Music Memos : 909 UE BOOM : 612 LiveMixtapes : 555 NOISE : 355 MP3 Music Player & Streamer for Clouds : 329 Musical Video Maker - Create Music clips lip sync : 320 Cloud Music Player - Downloader & Playlist Manager : 319 Remixlive - Remix loops with pads : 288 QQ音乐HD : 224 Blocs Wave - Make & Record Music : 158 PlayGround • Music At Your Fingertips : 150 Music and Chill : 135 The Singing Machine Mobile Karaoke App : 130 radio.de - Der Radioplayer : 64 Free Music - Player & Streamer for Dropbox, OneDrive & Google Drive : 46 NRJ Radio : 38 Smart Music: Streaming Videos and Radio : 17 BOSS Tuner : 13 PetitLyrics : 0
for app in free_apps_apple:
if app[-5] == 'Reference':
print(app[1], ':', app[5])
Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 教えて!goo : 0 Jishokun-Japanese English Dictionary & Translator : 0
for app in free_apps_apple:
if app[-5] == 'Weather':
print(app[1], ':', app[5])
The Weather Channel: Forecast, Radar & Alerts : 495626 The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648 WeatherBug - Local Weather, Radar, Maps, Alerts : 188583 MyRadar NOAA Weather Radar Forecast : 150158 AccuWeather - Weather for Life : 144214 Yahoo Weather : 112603 Weather Underground: Custom Forecast & Local Radar : 49192 NOAA Weather Radar - Weather Forecast & HD Radar : 45696 Weather Live Free - Weather Forecast & Alerts : 35702 Storm Radar : 22792 QuakeFeed Earthquake Map, Alerts, and News : 6081 Moji Weather - Free Weather Forecast : 2333 Hurricane by American Red Cross : 1158 Forecast Bar : 375 Hurricane Tracker WESH 2 Orlando, Central Florida : 203 FEMA : 128 iWeather - World weather forecast : 80 Weather - Radar - Storm with Morecast App : 78 Yurekuru Call : 53 Weather & Radar : 37 WRAL Weather Alert : 25 Météo-France : 24 JaxReady : 22 Freddy the Frogcaster's Weather Station : 14 Almanac Long-Range Weather Forecast : 12 TodayAir : 0 wetter.com : 0 WarnWetter : 0
Reference and Weather columns seem to also have outliers but it looks like there is not a huge number of apps offering this kind of service. Let's keep these two in mind and see if there is a common pattern also in the Google play data set.
Let's look into Google Play data set to see what is the average number of reviews.
unique_google = freq_table(free_apps_google, 1)
for category in unique_google:
total = 0
len_category = 0
for app in free_apps_google:
category_app = app[1]
if category_app == category:
installs = app[5]
n_install = installs.replace('+', '')
new_install = n_install.replace(',', '')
new_install = float(new_install)
total += new_install
len_category += 1
avg_install = total/len_category
print(category, ':', avg_install)
ART_AND_DESIGN : 1986335.0877192982 AUTO_AND_VEHICLES : 647317.8170731707 BEAUTY : 513151.88679245283 BOOKS_AND_REFERENCE : 8767811.894736841 BUSINESS : 1712290.1474201474 COMICS : 817657.2727272727 COMMUNICATION : 38456119.167247385 DATING : 854028.8303030303 EDUCATION : 1833495.145631068 ENTERTAINMENT : 11640705.88235294 EVENTS : 253542.22222222222 FINANCE : 1387692.475609756 FOOD_AND_DRINK : 1924897.7363636363 HEALTH_AND_FITNESS : 4188821.9853479853 HOUSE_AND_HOME : 1331540.5616438356 LIBRARIES_AND_DEMO : 638503.734939759 LIFESTYLE : 1437816.2687861272 GAME : 15588015.603248259 FAMILY : 3697848.1731343283 MEDICAL : 120550.61980830671 SOCIAL : 23253652.127118643 SHOPPING : 7036877.311557789 PHOTOGRAPHY : 17840110.40229885 SPORTS : 3638640.1428571427 TRAVEL_AND_LOCAL : 13984077.710144928 TOOLS : 10801391.298666667 PERSONALIZATION : 5201482.6122448975 PRODUCTIVITY : 16787331.344927534 PARENTING : 542603.6206896552 WEATHER : 5074486.197183099 VIDEO_PLAYERS : 24727872.452830188 NEWS_AND_MAGAZINES : 9549178.467741935 MAPS_AND_NAVIGATION : 4056941.7741935486
Communication seem to have the highest number of installs. Let's have a closer look and see if there are any outliers.
display_table(free_apps_google, 5)
1,000,000+ : 15.728308699086089 100,000+ : 11.55365000564143 10,000,000+ : 10.549475346947986 10,000+ : 10.199706645605326 1,000+ : 8.394448832223853 100+ : 6.916393997517771 5,000,000+ : 6.826131106848697 500,000+ : 5.562450637481666 50,000+ : 4.772650344127271 5,000+ : 4.513144533453684 10+ : 3.542818458761142 500+ : 3.2494640640866526 50,000,000+ : 2.3017037120613786 100,000,000+ : 2.1324607920568655 50+ : 1.9180864267178157 5+ : 0.7898002933543946 1+ : 0.5077287600135394 500,000,000+ : 0.270788672007221 1,000,000,000+ : 0.2256572266726842 0+ : 0.045131445334536835
for app in free_apps_google:
if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
or app[5] == '500,000,000+'
or app[5] == '100,000,000+'):
print(app[0], ':', app[5])
WhatsApp Messenger : 1,000,000,000+ imo beta free calls and text : 100,000,000+ Android Messages : 100,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ Who : 100,000,000+ GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ Firefox Browser fast & private : 100,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Messenger Lite: Free Calls & Messages : 100,000,000+ Kik : 100,000,000+ KakaoTalk: Free Calls & Text : 100,000,000+ Opera Mini - fast web browser : 100,000,000+ Opera Browser: Fast and Secure : 100,000,000+ Telegram : 100,000,000+ Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+ UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+ Viber Messenger : 500,000,000+ WeChat : 100,000,000+ Yahoo Mail – Stay Organized : 100,000,000+ BBM - Free Calls & Messages : 100,000,000+
As expected, apps like Whatsapp, Skype, Messenger, Google Chrome and another few apps influence the average. Let's remove these and check again the average of number of reviews
remove_outliers = []
for app in free_apps_google:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
remove_outliers.append(float(n_installs))
sum(under_100_m) / len(under_100_m)
3603485.3884615386
We can see that the average has significantly reduced from 38456119 to 3603485.
We can see that the reference genre is popular also on android, so we will stick with books and reference column. Let's have a closer look.
for app in free_apps_google:
if app[1] == 'BOOKS_AND_REFERENCE':
print(app[0], ':', app[5])
E-Book Read - Read Book for free : 50,000+ Download free book with green book : 100,000+ Wikipedia : 10,000,000+ Cool Reader : 10,000,000+ Free Panda Radio Music : 100,000+ Book store : 1,000,000+ FBReader: Favorite Book Reader : 10,000,000+ English Grammar Complete Handbook : 500,000+ Free Books - Spirit Fanfiction and Stories : 1,000,000+ Google Play Books : 1,000,000,000+ AlReader -any text book reader : 5,000,000+ Offline English Dictionary : 100,000+ Offline: English to Tagalog Dictionary : 500,000+ FamilySearch Tree : 1,000,000+ Cloud of Books : 1,000,000+ Recipes of Prophetic Medicine for free : 500,000+ ReadEra – free ebook reader : 1,000,000+ Anonymous caller detection : 10,000+ Ebook Reader : 5,000,000+ Litnet - E-books : 100,000+ Read books online : 5,000,000+ English to Urdu Dictionary : 500,000+ eBoox: book reader fb2 epub zip : 1,000,000+ English Persian Dictionary : 500,000+ Flybook : 500,000+ All Maths Formulas : 1,000,000+ Ancestry : 5,000,000+ HTC Help : 10,000,000+ English translation from Bengali : 100,000+ Pdf Book Download - Read Pdf Book : 100,000+ Free Book Reader : 100,000+ eBoox new: Reader for fb2 epub zip books : 50,000+ Only 30 days in English, the guideline is guaranteed : 500,000+ Moon+ Reader : 10,000,000+ SH-02J Owner's Manual (Android 8.0) : 50,000+ English-Myanmar Dictionary : 1,000,000+ Golden Dictionary (EN-AR) : 1,000,000+ All Language Translator Free : 1,000,000+ Azpen eReader : 500,000+ URBANO V 02 instruction manual : 100,000+ Bible : 100,000,000+ C Programs and Reference : 50,000+ C Offline Tutorial : 1,000+ C Programs Handbook : 50,000+ Amazon Kindle : 100,000,000+ Aab e Hayat Full Novel : 100,000+ Aldiko Book Reader : 10,000,000+ Google I/O 2018 : 500,000+ R Language Reference Guide : 10,000+ Learn R Programming Full : 5,000+ R Programing Offline Tutorial : 1,000+ Guide for R Programming : 5+ Learn R Programming : 10+ R Quick Reference Big Data : 1,000+ V Made : 100,000+ Wattpad 📖 Free Books : 100,000,000+ Dictionary - WordWeb : 5,000,000+ Guide (for X-MEN) : 100,000+ AC Air condition Troubleshoot,Repair,Maintenance : 5,000+ AE Bulletins : 1,000+ Ae Allah na Dai (Rasa) : 10,000+ 50000 Free eBooks & Free AudioBooks : 5,000,000+ Ag PhD Field Guide : 10,000+ Ag PhD Deficiencies : 10,000+ Ag PhD Planting Population Calculator : 1,000+ Ag PhD Soybean Diseases : 1,000+ Fertilizer Removal By Crop : 50,000+ A-J Media Vault : 50+ Al-Quran (Free) : 10,000,000+ Al Quran (Tafsir & by Word) : 500,000+ Al Quran Indonesia : 10,000,000+ Al'Quran Bahasa Indonesia : 10,000,000+ Al Quran Al karim : 1,000,000+ Al-Muhaffiz : 50,000+ Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+ Al-Quran 30 Juz free copies : 500,000+ Koran Read &MP3 30 Juz Offline : 1,000,000+ Hafizi Quran 15 lines per page : 1,000,000+ Quran for Android : 10,000,000+ Surah Al-Waqiah : 100,000+ Hisnul Al Muslim - Hisn Invocations & Adhkaar : 100,000+ Satellite AR : 1,000,000+ Audiobooks from Audible : 100,000,000+ Kinot & Eichah for Tisha B'Av : 10,000+ AW Tozer Devotionals - Daily : 5,000+ Tozer Devotional -Series 1 : 1,000+ The Pursuit of God : 1,000+ AY Sing : 5,000+ Ay Hasnain k Nana Milad Naat : 10,000+ Ay Mohabbat Teri Khatir Novel : 10,000+ Arizona Statutes, ARS (AZ Law) : 1,000+ Oxford A-Z of English Usage : 1,000,000+ BD Fishpedia : 1,000+ BD All Sim Offer : 10,000+ Youboox - Livres, BD et magazines : 500,000+ B&H Kids AR : 10,000+ B y H Niños ES : 5,000+ Dictionary.com: Find Definitions for English Words : 10,000,000+ English Dictionary - Offline : 10,000,000+ Bible KJV : 5,000,000+ Borneo Bible, BM Bible : 10,000+ MOD Black for BM : 100+ BM Box : 1,000+ Anime Mod for BM : 100+ NOOK: Read eBooks & Magazines : 10,000,000+ NOOK Audiobooks : 500,000+ NOOK App for NOOK Devices : 500,000+ Browsery by Barnes & Noble : 5,000+ bp e-store : 1,000+ Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+ BR Ambedkar Biography & Quotes : 10,000+ BU Alsace : 100+ Catholic La Bu Zo Kam : 500+ Khrifa Hla Bu (Solfa) : 10+ Kristian Hla Bu : 10,000+ SA HLA BU : 1,000+ Learn SAP BW : 500+ Learn SAP BW on HANA : 500+ CA Laws 2018 (California Laws and Codes) : 5,000+ Bootable Methods(USB-CD-DVD) : 10,000+ cloudLibrary : 100,000+ SDA Collegiate Quarterly : 500+ Sabbath School : 100,000+ Cypress College Library : 100+ Stats Royale for Clash Royale : 1,000,000+ GATE 21 years CS Papers(2011-2018 Solved) : 50+ Learn CT Scan Of Head : 5,000+ Easy Cv maker 2018 : 10,000+ How to Write CV : 100,000+ CW Nuclear : 1,000+ CY Spray nozzle : 10+ BibleRead En Cy Zh Yue : 5+ CZ-Help : 5+ Modlitební knížka CZ : 500+ Guide for DB Xenoverse : 10,000+ Guide for DB Xenoverse 2 : 10,000+ Guide for IMS DB : 10+ DC HSEMA : 5,000+ DC Public Library : 1,000+ Painting Lulu DC Super Friends : 1,000+ Dictionary : 10,000,000+ Fix Error Google Playstore : 1,000+ D. H. Lawrence Poems FREE : 1,000+ Bilingual Dictionary Audio App : 5,000+ DM Screen : 10,000+ wikiHow: how to do anything : 1,000,000+ Dr. Doug's Tips : 1,000+ Bible du Semeur-BDS (French) : 50,000+ La citadelle du musulman : 50,000+ DV 2019 Entry Guide : 10,000+ DV 2019 - EDV Photo & Form : 50,000+ DV 2018 Winners Guide : 1,000+ EB Annual Meetings : 1,000+ EC - AP & Telangana : 5,000+ TN Patta Citta & EC : 10,000+ AP Stamps and Registration : 10,000+ CompactiMa EC pH Calibration : 100+ EGW Writings 2 : 100,000+ EGW Writings : 1,000,000+ Bible with EGW Comments : 100,000+ My Little Pony AR Guide : 1,000,000+ SDA Sabbath School Quarterly : 500,000+ Duaa Ek Ibaadat : 5,000+ Spanish English Translator : 10,000,000+ Dictionary - Merriam-Webster : 10,000,000+ JW Library : 10,000,000+ Oxford Dictionary of English : Free : 10,000,000+ English Hindi Dictionary : 10,000,000+ English to Hindi Dictionary : 5,000,000+ EP Research Service : 1,000+ Hymnes et Louanges : 100,000+ EU Charter : 1,000+ EU Data Protection : 1,000+ EU IP Codes : 100+ EW PDF : 5+ BakaReader EX : 100,000+ EZ Quran : 50,000+ FA Part 1 & 2 Past Papers Solved Free – Offline : 5,000+ La Fe de Jesus : 1,000+ La Fe de Jesús : 500+ Le Fe de Jesus : 500+ Florida - Pocket Brainbook : 1,000+ Florida Statutes (FL Code) : 1,000+ English To Shona Dictionary : 10,000+ Greek Bible FP (Audio) : 1,000+ Golden Dictionary (FR-AR) : 500,000+ Fanfic-FR : 5,000+ Bulgarian French Dictionary Fr : 10,000+ Chemin (fr) : 1,000+ The SCP Foundation DB fr nn5n : 1,000+
Let's repeat the above process to check for outliers.
for app in free_apps_google:
if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
or app[5] == '500,000,000+'
or app[5] == '100,000,000+'):
print(app[0], ':', app[5])
Google Play Books : 1,000,000,000+ Bible : 100,000,000+ Amazon Kindle : 100,000,000+ Wattpad 📖 Free Books : 100,000,000+ Audiobooks from Audible : 100,000,000+
It looks like the list of very popular apps providing this service is not a a huge one. Let look into the rest of the apps with lower numbers of installs.
for app in free_apps_google:
if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
or app[5] == '5,000,000+'
or app[5] == '10,000,000+'
or app[5] == '50,000,000+'):
print(app[0], ':', app[5])
Wikipedia : 10,000,000+ Cool Reader : 10,000,000+ Book store : 1,000,000+ FBReader: Favorite Book Reader : 10,000,000+ Free Books - Spirit Fanfiction and Stories : 1,000,000+ AlReader -any text book reader : 5,000,000+ FamilySearch Tree : 1,000,000+ Cloud of Books : 1,000,000+ ReadEra – free ebook reader : 1,000,000+ Ebook Reader : 5,000,000+ Read books online : 5,000,000+ eBoox: book reader fb2 epub zip : 1,000,000+ All Maths Formulas : 1,000,000+ Ancestry : 5,000,000+ HTC Help : 10,000,000+ Moon+ Reader : 10,000,000+ English-Myanmar Dictionary : 1,000,000+ Golden Dictionary (EN-AR) : 1,000,000+ All Language Translator Free : 1,000,000+ Aldiko Book Reader : 10,000,000+ Dictionary - WordWeb : 5,000,000+ 50000 Free eBooks & Free AudioBooks : 5,000,000+ Al-Quran (Free) : 10,000,000+ Al Quran Indonesia : 10,000,000+ Al'Quran Bahasa Indonesia : 10,000,000+ Al Quran Al karim : 1,000,000+ Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+ Koran Read &MP3 30 Juz Offline : 1,000,000+ Hafizi Quran 15 lines per page : 1,000,000+ Quran for Android : 10,000,000+ Satellite AR : 1,000,000+ Oxford A-Z of English Usage : 1,000,000+ Dictionary.com: Find Definitions for English Words : 10,000,000+ English Dictionary - Offline : 10,000,000+ Bible KJV : 5,000,000+ NOOK: Read eBooks & Magazines : 10,000,000+ Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+ Stats Royale for Clash Royale : 1,000,000+ Dictionary : 10,000,000+ wikiHow: how to do anything : 1,000,000+ EGW Writings : 1,000,000+ My Little Pony AR Guide : 1,000,000+ Spanish English Translator : 10,000,000+ Dictionary - Merriam-Webster : 10,000,000+ JW Library : 10,000,000+ Oxford Dictionary of English : Free : 10,000,000+ English Hindi Dictionary : 10,000,000+ English to Hindi Dictionary : 5,000,000+
It looks like dictonaries are very popular. Also there are a few Al-Quran apps and a few other niches. There is only one app for favourite books. Maybe this could be a niche our client could explore. Building an app which offers a forum to readers to discuss about books,review, add favourite books and share ideas with other people.
We analysed mobile apps data for two markets: App Store and Google play. The purpose of the analysis was to come up with a recommendation of what app profile would be probitable to develop for both markets.
The analysis concluded that offering an app which would give readers a forum to share ideas, reviews and exchange favourite books lists could be profitable for both markets.