Main source of revenue for our company is number of users accessing in-app ads which are primarily seen in free apps. Higher the user engagement with ads higher the revenues.Goal of the project is to analyze data to help our developers understand what type of free apps are likely to attract more users.
opened_file_1 = open('AppleStore.csv')
opened_file_2 = open('googleplaystore.csv')
from csv import reader
read_file_1 = reader(opened_file_1)
read_file_2 = reader(opened_file_2)
ios_dataset = list(read_file_1)
google_dataset = list (read_file_2)
ios_header = ios_dataset[0]
ios_body = ios_dataset[1:]
google_header = google_dataset[0]
google_body = google_dataset[1:]
To make it easier for us to explore the dataset and get an idea about no. of rows and columns in a dataset we will write explore_data
function
def explore_data(dataset, start, end, rows_and_columns = False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty)
if rows_and_columns:
print('Number of row:', len(dataset))
print('Number of columns:', len(dataset[0]))
Lets explore details about the ios dataset
:
print(ios_header)
print('\n')
explore_data(ios_body,0,3, rows_and_columns = True)
print('\n')
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of row: 7197 Number of columns: 16
As we can see there are 7197 rows of data and 16 columns. The columns that could help us with our analysis would be 'track_name'
, 'price'
, 'currency'
, 'rating_count_tot'
, 'user_rating'
, 'prime_genre'
Lets explore details about google play store dataset:
print(google_header)
print('\n')
explore_data(google_body,0,3, rows_and_columns = True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Number of row: 10841 Number of columns: 13
There are 10841 rows of data and 13 columns. Following columns can help us with analysis App
, Category
, Rating
, Installs
, Price
, Genre
This process would involve removing and correcting wrong data, removing duplicate data and modifying the data to fit the purpose of our analysis.
As per the discussion section, row 10472 of google play store dataset is not correct. Lets pull it up to check
print(google_body[10472])
print('\n')
print('No. of columns are: ',len(google_body[10472]))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] No. of columns are: 12
This row has missing 'category'
section, so we will remove it to keep rest of the data consistent by running del statement only once.
del google_body[10472] # Don't run this more than once
print(len(google_body))
10840
In google play store database discussion we can see that instagram is duplicated. To check it we can do the following:
for app in google_body:
name = app[0]
if name == 'Instagram':
print (app)
print('\n')
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
duplicate_apps = []
unique_apps = []
for app in google_body:
name = app[0] # pulls data from first element of each list in google_body list of List
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
print ('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])
Number of duplicate apps: 1181 Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
Rather than removing duplicates randomly, we can choose a specific criteria for removing duplicates. We can choose to keep app with highest number of reviews and remove the other entries for any given app.
Lets create a dictionary which would have max no. of reviews stored for particular app.
reviews_max = {}
for app in google_body:
name = app[0]
n_reviews = float(app[3]) # converting to float for performing numerical operations
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
if name not in reviews_max:
reviews_max[name] = n_reviews
len(reviews_max)
9659
We create two list android_clean
and already_added
which would have apps which have max no. of reivews in reviews_max
dictionary and which are not already added. That way we can make sure no entries are duplicated
android_clean = [] # only 1 entry for app with max reviews
already_added =[]
for app in google_body:
name = app[0]
n_reviews = float(app[3])
if (n_reviews == reviews_max[name]) and (name not in already_added):
android_clean.append(app)
already_added.append(name)
len(android_clean)
9659
We would like to analyze only the apps that are designedfor an English-speaking audience.Therefore, we would like to remove non-english apps.
def is_english(string) :
for character in string:
if ord(character) >127:
return False
return True
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('IDocs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True False False False
This shows that many English apps will be incorrectly labeled as non-English if they have emojis or character which fall outside the ASCII range.That would mean we can lose useful data.
We can add condition that if it has 3 characters more than 127 then
def is_english(string) :
non_ascii = 0
for character in string:
if ord(character) >127:
non_ascii += 1
if non_ascii > 3:
return False
else:
return True
print(is_english('Instachat 😜'))
True
Now we can use is_english
to filter out non-english apps in both our data_sets
android_english = []
ios_english = []
for app in android_clean:
name = app[0]
if is_english(name):
android_english.append(app)
for app in ios_body:
name = app[1]
if is_english(name):
ios_english.append(app)
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english,0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of row: 9614 Number of columns: 13 ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of row: 6183 Number of columns: 16
To isolate free we apps , we can use condition that price = 0
android_final =[]
ios_final=[]
for app in android_english:
price = app[7]
if price == '0':
android_final.append(app)
for app in ios_english:
price = app[4]
if price == '0.0':
ios_final.append(app)
print(len(android_final))
print(len(ios_final))
8864 3222
From our cleaned datasets we need to determine most common genre for both google playstore and Apple store and find profile that would attract users in both markets.
We can create frequency table of genres of the apps for us to look into most common genre.
def freq_table(dataset, index):
table = {}
total = 0
for row in dataset:
total += 1
value = row[index]
if value in table:
table[value] +=1
else:
table[value] = 1
table_percentages ={}
for key in table:
percentage = (table[key] /total) * 100
table_percentages[key] = percentage
return table_percentages
def display_table(dataset,index) :
table = freq_table(dataset,index)
table_display =[]
for key in table:
key_val_as_tuple = (table[key] , key) # value added first to help with sorting using sorted method
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse =True)
for entry in table_sorted:
print(entry[1], ' :', round(entry[0],2)) # this would print key first then value
display_table(android_final,1)
FAMILY : 18.91 GAME : 9.72 TOOLS : 8.46 BUSINESS : 4.59 LIFESTYLE : 3.9 PRODUCTIVITY : 3.89 FINANCE : 3.7 MEDICAL : 3.53 SPORTS : 3.4 PERSONALIZATION : 3.32 COMMUNICATION : 3.24 HEALTH_AND_FITNESS : 3.08 PHOTOGRAPHY : 2.94 NEWS_AND_MAGAZINES : 2.8 SOCIAL : 2.66 TRAVEL_AND_LOCAL : 2.34 SHOPPING : 2.25 BOOKS_AND_REFERENCE : 2.14 DATING : 1.86 VIDEO_PLAYERS : 1.79 MAPS_AND_NAVIGATION : 1.4 FOOD_AND_DRINK : 1.24 EDUCATION : 1.16 ENTERTAINMENT : 0.96 LIBRARIES_AND_DEMO : 0.94 AUTO_AND_VEHICLES : 0.93 HOUSE_AND_HOME : 0.82 WEATHER : 0.8 EVENTS : 0.71 PARENTING : 0.65 ART_AND_DESIGN : 0.64 COMICS : 0.62 BEAUTY : 0.6
In Google play store,in the english apps, Family related apps dominate at 18.9% followed by Game, Tools and Business Category.
display_table(ios_final,-5) # For finding frequency table for prime_genre section
Games : 58.16 Entertainment : 7.88 Photo & Video : 4.97 Education : 3.66 Social Networking : 3.29 Shopping : 2.61 Utilities : 2.51 Sports : 2.14 Music : 2.05 Health & Fitness : 2.02 Productivity : 1.74 Lifestyle : 1.58 News : 1.33 Travel : 1.24 Finance : 1.12 Weather : 0.87 Food & Drink : 0.81 Reference : 0.56 Business : 0.53 Book : 0.43 Navigation : 0.19 Medical : 0.19 Catalogs : 0.12
In Applestore , in the english apps, games have 58.16 % share followed by Entertainment 7.88 % and Photo & Video being 5%
Lets Check the genre
column in google play store
display_table(android_final,-4)
Tools : 8.45 Entertainment : 6.07 Education : 5.35 Business : 4.59 Productivity : 3.89 Lifestyle : 3.89 Finance : 3.7 Medical : 3.53 Sports : 3.46 Personalization : 3.32 Communication : 3.24 Action : 3.1 Health & Fitness : 3.08 Photography : 2.94 News & Magazines : 2.8 Social : 2.66 Travel & Local : 2.32 Shopping : 2.25 Books & Reference : 2.14 Simulation : 2.04 Dating : 1.86 Arcade : 1.85 Video Players & Editors : 1.77 Casual : 1.76 Maps & Navigation : 1.4 Food & Drink : 1.24 Puzzle : 1.13 Racing : 0.99 Role Playing : 0.94 Libraries & Demo : 0.94 Auto & Vehicles : 0.93 Strategy : 0.91 House & Home : 0.82 Weather : 0.8 Events : 0.71 Adventure : 0.68 Comics : 0.61 Beauty : 0.6 Art & Design : 0.6 Parenting : 0.5 Card : 0.45 Casino : 0.43 Trivia : 0.42 Educational;Education : 0.39 Board : 0.38 Educational : 0.37 Education;Education : 0.34 Word : 0.26 Casual;Pretend Play : 0.24 Music : 0.2 Racing;Action & Adventure : 0.17 Puzzle;Brain Games : 0.17 Entertainment;Music & Video : 0.17 Casual;Brain Games : 0.14 Casual;Action & Adventure : 0.14 Arcade;Action & Adventure : 0.12 Action;Action & Adventure : 0.1 Educational;Pretend Play : 0.09 Simulation;Action & Adventure : 0.08 Parenting;Education : 0.08 Entertainment;Brain Games : 0.08 Board;Brain Games : 0.08 Parenting;Music & Video : 0.07 Educational;Brain Games : 0.07 Casual;Creativity : 0.07 Art & Design;Creativity : 0.07 Education;Pretend Play : 0.06 Role Playing;Pretend Play : 0.05 Education;Creativity : 0.05 Role Playing;Action & Adventure : 0.03 Puzzle;Action & Adventure : 0.03 Entertainment;Creativity : 0.03 Entertainment;Action & Adventure : 0.03 Educational;Creativity : 0.03 Educational;Action & Adventure : 0.03 Education;Music & Video : 0.03 Education;Brain Games : 0.03 Education;Action & Adventure : 0.03 Adventure;Action & Adventure : 0.03 Video Players & Editors;Music & Video : 0.02 Sports;Action & Adventure : 0.02 Simulation;Pretend Play : 0.02 Puzzle;Creativity : 0.02 Music;Music & Video : 0.02 Entertainment;Pretend Play : 0.02 Casual;Education : 0.02 Board;Action & Adventure : 0.02 Video Players & Editors;Creativity : 0.01 Trivia;Education : 0.01 Travel & Local;Action & Adventure : 0.01 Tools;Education : 0.01 Strategy;Education : 0.01 Strategy;Creativity : 0.01 Strategy;Action & Adventure : 0.01 Simulation;Education : 0.01 Role Playing;Brain Games : 0.01 Racing;Pretend Play : 0.01 Puzzle;Education : 0.01 Parenting;Brain Games : 0.01 Music & Audio;Music & Video : 0.01 Lifestyle;Pretend Play : 0.01 Lifestyle;Education : 0.01 Health & Fitness;Education : 0.01 Health & Fitness;Action & Adventure : 0.01 Entertainment;Education : 0.01 Communication;Creativity : 0.01 Comics;Creativity : 0.01 Casual;Music & Video : 0.01 Card;Action & Adventure : 0.01 Books & Reference;Education : 0.01 Art & Design;Pretend Play : 0.01 Art & Design;Action & Adventure : 0.01 Arcade;Pretend Play : 0.01 Adventure;Education : 0.01
From Analysis so far we can see Googleplay store seems to have more balanced app breakdown between productivity and entertainment apps while Appleplay store is dominated by Games, Entertainment and fun realted apps. We can analyze further to find out apps which have most users.
Lets compute average number of ratings in Apple App store apps for each genre. We will arrange them in descending order to find out which genre has highest number of ratings
genres_ios = freq_table(ios_final,11)
display=[]
for genre in genres_ios:
total = 0
len_genre = 0
for app in ios_final:
genre_app = app[11]
if genre_app == genre:
n_ratings = float(app[5])
total += n_ratings
len_genre += 1
avg_n_ratings = total / len_genre
genre_rating_tuple= (avg_n_ratings, genre)
display.append(genre_rating_tuple)
display_sorted = sorted(display, reverse = True)
for entry in display_sorted:
print(entry[1], ': ', round(entry[0],2))
Navigation : 86090.33 Reference : 74942.11 Social Networking : 71548.35 Music : 57326.53 Weather : 52279.89 Book : 39758.5 Food & Drink : 33333.92 Finance : 31467.94 Photo & Video : 28441.54 Travel : 28243.8 Shopping : 26919.69 Health & Fitness : 23298.02 Sports : 23008.9 Games : 22788.67 News : 21248.02 Productivity : 21028.41 Utilities : 18684.46 Lifestyle : 16485.76 Entertainment : 14029.83 Business : 7491.12 Education : 7003.98 Catalogs : 4004.0 Medical : 612.0
Navigation
has highest number of user response followed by References
and Social Networking
.
We can further analyze which apps in Navigation
, References
and Social Networking
have highest response.
for app in ios_final:
genre = app[11]
if genre == 'Navigation':
print(app[1], ': ', app[5])
Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5
for app in ios_final:
genre = app[11]
if genre == 'Reference':
print(app[1], ': ', app[5])
Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 教えて!goo : 0 Jishokun-Japanese English Dictionary & Translator : 0
top5_social_networking={}
for app in ios_final[:250]:
genre = app[11]
if genre == 'Social Networking':
print(app[1], ': ', app[5])
Facebook : 2974676 Pinterest : 1061624 Skype for iPhone : 373519 Messenger : 351466 Tumblr : 334293 WhatsApp Messenger : 287589 Kik : 260965 ooVoo – Free Video Call, Text and Voice : 177501 TextNow - Unlimited Text + Calls : 164963 Viber Messenger – Text & Call : 164249 Followers - Social Analytics For Instagram : 112778 MeetMe - Chat and Meet New People : 97072 We Heart It - Fashion, wallpapers, quotes, tattoos : 90414 InsTrack for Instagram - Analytics Plus More : 85535 Tango - Free Video Call, Voice and Chat : 75412 LinkedIn : 71856
From this we can conclude that most of results are skewed by 1 or 2 apps which encompass most of the user ratings. We can see that Waze , google maps top the numbers in navigation but we are not intending to build a free Navigation app as would need access to third party API which can be a paid service.
We can potentially consider converting a famous book into an app and have dictionary and quiz functionality built in. This would engage users and they can recieve rewards/points by watching in-app adds to unlock further chapters of our book. Quizzes can also have hints which can be accessed by watching in-app ads.
Now lets analyze google play store data:
display_table(android_final, 5) # Installs in android
1,000,000+ : 15.73 100,000+ : 11.55 10,000,000+ : 10.55 10,000+ : 10.2 1,000+ : 8.39 100+ : 6.92 5,000,000+ : 6.83 500,000+ : 5.56 50,000+ : 4.77 5,000+ : 4.51 10+ : 3.54 500+ : 3.25 50,000,000+ : 2.3 100,000,000+ : 2.13 50+ : 1.92 5+ : 0.79 1+ : 0.51 500,000,000+ : 0.27 1,000,000,000+ : 0.23 0+ : 0.05 0 : 0.01
The numbers as we can see are not precise, 100,000 + can be anything from 100,000 and above but for our current analysis we just need to find out which genre attracts most users.
So for our current analysis we will consider apps with 100,000 + installs as 100,000 installs. For computation we will need to convert strings to floats and remove commas.
categories_android = freq_table(android_final,1)
display=[]
for category in categories_android:
total = 0
len_category = 0
for app in android_final:
category_app =app[1]
if category_app == category:
n_installs= app[5]
n_installs = n_installs.replace(',','')
n_installs = n_installs.replace('+','')
total += float(n_installs)
len_category += 1
avg_n_installs = total / len_category
avg_category_tuple = (avg_n_installs, category)
display.append(avg_category_tuple)
display_sorted = sorted(display,reverse = True)
for entry in display_sorted:
print(entry[1],':', round(entry[0],2))
COMMUNICATION : 38456119.17 VIDEO_PLAYERS : 24727872.45 SOCIAL : 23253652.13 PHOTOGRAPHY : 17840110.4 PRODUCTIVITY : 16787331.34 GAME : 15588015.6 TRAVEL_AND_LOCAL : 13984077.71 ENTERTAINMENT : 11640705.88 TOOLS : 10801391.3 NEWS_AND_MAGAZINES : 9549178.47 BOOKS_AND_REFERENCE : 8767811.89 SHOPPING : 7036877.31 PERSONALIZATION : 5201482.61 WEATHER : 5074486.2 HEALTH_AND_FITNESS : 4188821.99 MAPS_AND_NAVIGATION : 4056941.77 FAMILY : 3695641.82 SPORTS : 3638640.14 ART_AND_DESIGN : 1986335.09 FOOD_AND_DRINK : 1924897.74 EDUCATION : 1833495.15 BUSINESS : 1712290.15 LIFESTYLE : 1437816.27 FINANCE : 1387692.48 HOUSE_AND_HOME : 1331540.56 DATING : 854028.83 COMICS : 817657.27 AUTO_AND_VEHICLES : 647317.82 LIBRARIES_AND_DEMO : 638503.73 PARENTING : 542603.62 BEAUTY : 513151.89 EVENTS : 253542.22 MEDICAL : 120550.62
for app in android_final:
if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+'or app[5]== '500,000,000'):
print(app[0], ':',app[5])
WhatsApp Messenger : 1,000,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Viber Messenger : 500,000,000+
under_100m =[]
for app in android_final:
n_installs = app[5]
n_installs = n_installs.replace(',','')
n_installs = n_installs.replace('+','')
if (app[1] == "COMMUNICATION") and (float(n_installs) <100000000):
under_100m.append(float(n_installs))
new_avg_communication = sum(under_100m) / len(under_100m)
print("Average installs for Communication Apps under 100 million is {:,}".format(round(new_avg_communication,2)))
Average installs for Communication Apps under 100 million is 3,603,485.39
This shows that data is majorly skewed fro apps above 100 million , if we consider apps below 100 million then average number of installs are 10 times lower than the average.
If we look into the apps in Communication category these are all well established giants and competing in this category would make no sense. Plus we might need to have extra technical expertise in creating navigation related apps which might use paid third party API access. Similarly, this logic will apply other categories like video players, Social Media, Photography. Games would have huge competition and there are lot of Game apps which are freely available and so this category seems saturated. We can ask our developer Team to focus on converting any open source book available into an app which would offer interactive mode like quizzes, dictionary and if any user gets stuck during the quizzes they can watch our in-app videos and get points to unlock hints or solve the quiz. Idea is to excite user and keep them hooked to the storyline. Also, other category which we can consider as potential alternative is Productivity category but main hurdle would be user engagement time with the app. They would use productivity app only when needed and we might not have potential to have in-app adds in such a short span of time.