Main source of revenue for our company is number of users accessing in-app ads,in-app ads are primarily seen in free apps. Higher the user engagement with ads higher the revenues.Goal of the project is to analyze data to help our developers understand what type of free apps are likely to attract more users.
opened_file_1 = open('AppleStore.csv')
opened_file_2 = open('googleplaystore.csv')
from csv import reader
read_file_1 = reader(opened_file_1)
read_file_2 = reader(opened_file_2)
ios_dataset = list(read_file_1)
google_dataset = list (read_file_2)
ios_header = ios_dataset[0]
ios_body = ios_dataset[1:]
google_header = google_dataset[0]
google_body = google_dataset[1:]
To make it easier for us to explore the dataset and get an idea about no. of rows and columns in a dataset we will write explore_data function
def explore_data(dataset, start, end, rows_and_columns = False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty)
if rows_and_columns:
print('Number of row:', len(dataset))
print('Number of columns:', len(dataset[0]))
Lets explore details about the ios dataset:
print(ios_header)
print('\n')
explore_data(ios_body,0,3, rows_and_columns = True)
print('\n')
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of row: 7197 Number of columns: 16
As we can see there are 7197 rows of data and 16 columns. The columns that could help us with our analysis would be 'track_name', 'price', 'currency', 'rating_count_tot', 'user_rating', 'prime_genre'
Lets explore details about google play store dataset:
print(google_header)
print('\n')
explore_data(google_body,0,3, rows_and_columns = True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Number of row: 10841 Number of columns: 13
There are 10841 rows of data and 13 columns. Following columns can help us with analysis 'App', 'Category', 'Rating', 'Installs', 'Price' , 'Genre'
This process would involve removing and correcting wrong data, removing duplicate data and modifying the data to fir the purpose of our analysis.
As per the discussion section, row 10472 of google play store dataset is not correct. Lets pull it up to check
print(google_body[10472])
print('\n')
print('No. of columns are: ',len(google_body[10472]))
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] No. of columns are: 12
This row has missing 'category' section, so we will remove it to keep rest of the data consistent by running del statement only once.
del google_body[10472] # Don't run this more than once
print(len(google_body))
10840
In google play store database discussion we can see that instagram is duplicated. To check it we can do the following:
for app in google_body:
name = app[0]
if name == 'Instagram':
print (app)
print('\n')
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
duplicate_apps = []
unique_apps = []
for app in google_body:
name = app[0]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)
print ('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])
Number of duplicate apps: 1181 Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
Rather than removing duplicates randomly, we can choose a specific criteria for removing duplicates. We can choose to keep app with highest number of reviews and remove the other entries for any given app.
Lets create a dictionary which would have max no. of reviews stored for particular app.
reviews_max = {}
for app in google_body:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
if name not in reviews_max:
reviews_max[name] = n_reviews
len(reviews_max)
9659
We create two list android_clean and already_added which would have apps which have max no. of reivews in reviews_max dictionary and which are not already added. That way we can make sure no entries are duplicated
android_clean = []
already_added =[]
for app in google_body:
name = app[0]
n_reviews = float(app[3])
if (n_reviews == reviews_max[name]) and (name not in already_added):
android_clean.append(app)
already_added.append(name)
len(android_clean)
9659
We would like to analyze only the apps that are designedfor an English-speaking audience.Therefore, we would like to remove non-english apps.
def is_english(string) :
for character in string:
if ord(character) >127:
return False
return True
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('IDocs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True False False False
This shows that many English apps will be incorrectly labeled as non-English if they have emojis or character which fall outside the ASCII range.That would mean we can lose useful data.
We can add condition that if it has 3 characters more than 127 then
def is_english(string) :
non_ascii = 0
for character in string:
if ord(character) >127:
non_ascii += 1
if non_ascii > 3:
return False
else:
return True
print(is_english('Instachat 😜'))
True
Now we can use is_english to filter out non-english apps in both our data_sets
android_english = []
ios_english = []
for app in android_clean:
name = app[0]
if is_english(name):
android_english.append(app)
for app in ios_body:
name = app[1]
if is_english(name):
ios_english.append(app)
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english,0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of row: 9614 Number of columns: 13 ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of row: 6183 Number of columns: 16
To isolate free we apps , we can use condition that price = 0
android_final =[]
ios_final=[]
for app in android_english:
price = app[7]
if price == '0':
android_final.append(app)
for app in ios_english:
price = app[4]
if price == '0.0':
ios_final.append(app)
print(len(android_final))
print(len(ios_final))
8864 3222
From our cleaned datasets we need to determine most common genre for both google playstore and Apple store and find profile that would attract users in both markets.
We can create frequency table of genres of the apps for us to look into most common genre.
def freq_table(dataset, index):
table = {}
total = 0
for row in dataset:
total += 1
value = row[index]
if value in table:
table[value] +=1
else:
table[value] = 1
table_percentages ={}
for key in table:
percentage = (table[key] /total) * 100
table_percentages[key] = percentage
return table_percentages
def display_table(dataset,index) :
table = freq_table(dataset,index)
table_display =[]
for key in table:
key_val_as_tuple = (table[key] , key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse =True)
for entry in table_sorted:
print(entry[1], ' :', round(entry[0],2))
display_table(android_final,1)
FAMILY : 18.91 GAME : 9.72 TOOLS : 8.46 BUSINESS : 4.59 LIFESTYLE : 3.9 PRODUCTIVITY : 3.89 FINANCE : 3.7 MEDICAL : 3.53 SPORTS : 3.4 PERSONALIZATION : 3.32 COMMUNICATION : 3.24 HEALTH_AND_FITNESS : 3.08 PHOTOGRAPHY : 2.94 NEWS_AND_MAGAZINES : 2.8 SOCIAL : 2.66 TRAVEL_AND_LOCAL : 2.34 SHOPPING : 2.25 BOOKS_AND_REFERENCE : 2.14 DATING : 1.86 VIDEO_PLAYERS : 1.79 MAPS_AND_NAVIGATION : 1.4 FOOD_AND_DRINK : 1.24 EDUCATION : 1.16 ENTERTAINMENT : 0.96 LIBRARIES_AND_DEMO : 0.94 AUTO_AND_VEHICLES : 0.93 HOUSE_AND_HOME : 0.82 WEATHER : 0.8 EVENTS : 0.71 PARENTING : 0.65 ART_AND_DESIGN : 0.64 COMICS : 0.62 BEAUTY : 0.6
In Google play store,in the english apps, Family related apps dominate at 18.9% followed by Game, Tools and Business Category.
display_table(ios_final,-5) # For finding frequency table for prime_genre section
Games : 58.16 Entertainment : 7.88 Photo & Video : 4.97 Education : 3.66 Social Networking : 3.29 Shopping : 2.61 Utilities : 2.51 Sports : 2.14 Music : 2.05 Health & Fitness : 2.02 Productivity : 1.74 Lifestyle : 1.58 News : 1.33 Travel : 1.24 Finance : 1.12 Weather : 0.87 Food & Drink : 0.81 Reference : 0.56 Business : 0.53 Book : 0.43 Navigation : 0.19 Medical : 0.19 Catalogs : 0.12
In Applestore , in the english apps, games have 58.16 % share followed by Entertainment 7.88 % and Photo & Video being 5%
Lets Check the genre column in google play store
display_table(android_final,-4)
Tools : 8.45 Entertainment : 6.07 Education : 5.35 Business : 4.59 Productivity : 3.89 Lifestyle : 3.89 Finance : 3.7 Medical : 3.53 Sports : 3.46 Personalization : 3.32 Communication : 3.24 Action : 3.1 Health & Fitness : 3.08 Photography : 2.94 News & Magazines : 2.8 Social : 2.66 Travel & Local : 2.32 Shopping : 2.25 Books & Reference : 2.14 Simulation : 2.04 Dating : 1.86 Arcade : 1.85 Video Players & Editors : 1.77 Casual : 1.76 Maps & Navigation : 1.4 Food & Drink : 1.24 Puzzle : 1.13 Racing : 0.99 Role Playing : 0.94 Libraries & Demo : 0.94 Auto & Vehicles : 0.93 Strategy : 0.91 House & Home : 0.82 Weather : 0.8 Events : 0.71 Adventure : 0.68 Comics : 0.61 Beauty : 0.6 Art & Design : 0.6 Parenting : 0.5 Card : 0.45 Casino : 0.43 Trivia : 0.42 Educational;Education : 0.39 Board : 0.38 Educational : 0.37 Education;Education : 0.34 Word : 0.26 Casual;Pretend Play : 0.24 Music : 0.2 Racing;Action & Adventure : 0.17 Puzzle;Brain Games : 0.17 Entertainment;Music & Video : 0.17 Casual;Brain Games : 0.14 Casual;Action & Adventure : 0.14 Arcade;Action & Adventure : 0.12 Action;Action & Adventure : 0.1 Educational;Pretend Play : 0.09 Simulation;Action & Adventure : 0.08 Parenting;Education : 0.08 Entertainment;Brain Games : 0.08 Board;Brain Games : 0.08 Parenting;Music & Video : 0.07 Educational;Brain Games : 0.07 Casual;Creativity : 0.07 Art & Design;Creativity : 0.07 Education;Pretend Play : 0.06 Role Playing;Pretend Play : 0.05 Education;Creativity : 0.05 Role Playing;Action & Adventure : 0.03 Puzzle;Action & Adventure : 0.03 Entertainment;Creativity : 0.03 Entertainment;Action & Adventure : 0.03 Educational;Creativity : 0.03 Educational;Action & Adventure : 0.03 Education;Music & Video : 0.03 Education;Brain Games : 0.03 Education;Action & Adventure : 0.03 Adventure;Action & Adventure : 0.03 Video Players & Editors;Music & Video : 0.02 Sports;Action & Adventure : 0.02 Simulation;Pretend Play : 0.02 Puzzle;Creativity : 0.02 Music;Music & Video : 0.02 Entertainment;Pretend Play : 0.02 Casual;Education : 0.02 Board;Action & Adventure : 0.02 Video Players & Editors;Creativity : 0.01 Trivia;Education : 0.01 Travel & Local;Action & Adventure : 0.01 Tools;Education : 0.01 Strategy;Education : 0.01 Strategy;Creativity : 0.01 Strategy;Action & Adventure : 0.01 Simulation;Education : 0.01 Role Playing;Brain Games : 0.01 Racing;Pretend Play : 0.01 Puzzle;Education : 0.01 Parenting;Brain Games : 0.01 Music & Audio;Music & Video : 0.01 Lifestyle;Pretend Play : 0.01 Lifestyle;Education : 0.01 Health & Fitness;Education : 0.01 Health & Fitness;Action & Adventure : 0.01 Entertainment;Education : 0.01 Communication;Creativity : 0.01 Comics;Creativity : 0.01 Casual;Music & Video : 0.01 Card;Action & Adventure : 0.01 Books & Reference;Education : 0.01 Art & Design;Pretend Play : 0.01 Art & Design;Action & Adventure : 0.01 Arcade;Pretend Play : 0.01 Adventure;Education : 0.01
From Analysis so far we can see Googleplay store seems to have more balanced app breakdown between productivity and entertainment apps while Appleplay store is dominated by Games, Entertainment and fun realted apps. We can analyze further to find out apps which have most users.
Lets compute average number of ratings in Apple App store apps for each genre. We will arrange them in descending order to find our which genre has highest number of ratings
genres_ios = freq_table(ios_final,11)
display=[]
for genre in genres_ios:
total = 0
len_genre = 0
for app in ios_final:
genre_app = app[11]
if genre_app == genre:
n_ratings = float(app[5])
total += n_ratings
len_genre += 1
avg_n_ratings = total / len_genre
genre_rating_tuple= (avg_n_ratings, genre)
display.append(genre_rating_tuple)
display_sorted = sorted(display, reverse = True)
for entry in display_sorted:
print(entry[1], ': ', entry[0])
Navigation : 86090.33333333333 Reference : 74942.11111111111 Social Networking : 71548.34905660378 Music : 57326.530303030304 Weather : 52279.892857142855 Book : 39758.5 Food & Drink : 33333.92307692308 Finance : 31467.944444444445 Photo & Video : 28441.54375 Travel : 28243.8 Shopping : 26919.690476190477 Health & Fitness : 23298.015384615384 Sports : 23008.898550724636 Games : 22788.6696905016 News : 21248.023255813954 Productivity : 21028.410714285714 Utilities : 18684.456790123455 Lifestyle : 16485.764705882353 Entertainment : 14029.830708661417 Business : 7491.117647058823 Education : 7003.983050847458 Catalogs : 4004.0 Medical : 612.0
Navigation has highest number of user response followed by refrences and Social Networking.
We can further analyze which apps in Navigation, References and Social Networking have highest response.
for app in ios_final:
genre = app[11]
if genre == 'Navigation':
print(app[1], ': ', app[5])
Waze - GPS Navigation, Maps & Real-time Traffic : 345046 Google Maps - Navigation & Transit : 154911 Geocaching® : 12811 CoPilot GPS – Car Navigation & Offline Maps : 3582 ImmobilienScout24: Real Estate Search in Germany : 187 Railway Route Search : 5
for app in ios_final:
genre = app[11]
if genre == 'Reference':
print(app[1], ': ', app[5])
Bible : 985920 Dictionary.com Dictionary & Thesaurus : 200047 Dictionary.com Dictionary & Thesaurus for iPad : 54175 Google Translate : 26786 Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418 New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588 Merriam-Webster Dictionary : 16849 Night Sky : 12122 City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535 LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693 GUNS MODS for Minecraft PC Edition - Mods Tools : 1497 Guides for Pokémon GO - Pokemon GO News and Cheats : 826 WWDC : 762 Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718 VPN Express : 14 Real Bike Traffic Rider Virtual Reality Glasses : 8 教えて!goo : 0 Jishokun-Japanese English Dictionary & Translator : 0
for app in ios_final:
genre = app[11]
if genre == 'Social Networking':
print(app[1], ': ', app[5])
Facebook : 2974676 Pinterest : 1061624 Skype for iPhone : 373519 Messenger : 351466 Tumblr : 334293 WhatsApp Messenger : 287589 Kik : 260965 ooVoo – Free Video Call, Text and Voice : 177501 TextNow - Unlimited Text + Calls : 164963 Viber Messenger – Text & Call : 164249 Followers - Social Analytics For Instagram : 112778 MeetMe - Chat and Meet New People : 97072 We Heart It - Fashion, wallpapers, quotes, tattoos : 90414 InsTrack for Instagram - Analytics Plus More : 85535 Tango - Free Video Call, Voice and Chat : 75412 LinkedIn : 71856 Match™ - #1 Dating App. : 60659 Skype for iPad : 60163 POF - Best Dating App for Conversations : 52642 Timehop : 49510 Find My Family, Friends & iPhone - Life360 Locator : 43877 Whisper - Share, Express, Meet : 39819 Hangouts : 36404 LINE PLAY - Your Avatar World : 34677 WeChat : 34584 Badoo - Meet New People, Chat, Socialize. : 34428 Followers + for Instagram - Follower Analytics : 28633 GroupMe : 28260 Marco Polo Video Walkie Talkie : 27662 Miitomo : 23965 SimSimi : 23530 Grindr - Gay and same sex guys chat, meet and date : 23201 Wishbone - Compare Anything : 20649 imo video calls and chat : 18841 After School - Funny Anonymous School News : 18482 Quick Reposter - Repost, Regram and Reshare Photos : 17694 Weibo HD : 16772 Repost for Instagram : 15185 Live.me – Live Video Chat & Make Friends Nearby : 14724 Nextdoor : 14402 Followers Analytics for Instagram - InstaReport : 13914 YouNow: Live Stream Video Chat : 12079 FollowMeter for Instagram - Followers Tracking : 11976 LINE : 11437 eHarmony™ Dating App - Meet Singles : 11124 Discord - Chat for Gamers : 9152 QQ : 9109 Telegram Messenger : 7573 Weibo : 7265 Periscope - Live Video Streaming Around the World : 6062 Chat for Whatsapp - iPad Version : 5060 QQ HD : 5058 Followers Analysis Tool For Instagram App Free : 4253 live.ly - live video streaming : 4145 Houseparty - Group Video Chat : 3991 SOMA Messenger : 3232 Monkey : 3060 Down To Lunch : 2535 Flinch - Video Chat Staring Contest : 2134 Highrise - Your Avatar Community : 2011 LOVOO - Dating Chat : 1985 PlayStation®Messages : 1918 BOO! - Video chat camera with filters & stickers : 1805 Qzone : 1649 Chatous - Chat with new people : 1609 Kiwi - Q&A : 1538 GhostCodes - a discovery app for Snapchat : 1313 Jodel : 1193 FireChat : 1037 Google Duo - simple video calling : 1033 Fiesta by Tango - Chat & Meet New People : 885 Google Allo — smart messaging : 862 Peach — share vividly : 727 Hey! VINA - Where Women Meet New Friends : 719 Battlefield™ Companion : 689 All Devices for WhatsApp - Messenger for iPad : 682 Chat for Pokemon Go - GoChat : 500 IAmNaughty – Dating App to Meet New People Online : 463 Qzone HD : 458 Zenly - Locate your friends in realtime : 427 League of Legends Friends : 420 豆瓣 : 407 Candid - Speak Your Mind Freely : 398 知乎 : 397 Selfeo : 366 Fake-A-Location Free ™ : 354 Popcorn Buzz - Free Group Calls : 281 Fam — Group video calling for iMessage : 279 QQ International : 274 Ameba : 269 SoundCloud Pulse: for creators : 240 Tantan : 235 Cougar Dating & Life Style App for Mature Women : 213 Rawr Messenger - Dab your chat : 180 WhenToPost: Best Time to Post Photos for Instagram : 158 Inke—Broadcast an amazing life : 147 Mustknow - anonymous video Q&A : 53 CTFxCmoji : 39 Lobi : 36 Chain: Collaborate On MyVideo Story/Group Video : 35 botman - Real time video chat : 7 BestieBox : 0 MATCH ON LINE chat : 0 niconico ch : 0 LINE BLOG : 0 bit-tube - Live Stream Video Chat : 0
From this we can conclude that most of results are skewed by 1 or 2 apps which encompass most of the user ratings. We can see that Waze , google maps top the numbers in navigation but we are not intending to build a free Navigation app as would need access to third party API which can be a paid service.
We can potentially consider converting a famous book into an app and have dictionary and quiz functionality built in. This would engage users and they can recieve rewards/points by watching in-app adds to unlock further chapters of our book. Quizzes can also have hints which can be accessed by watching in-app ads.
Now lets analyze google play store data:
display_table(android_final, 5) # Installs in android
1,000,000+ : 15.73 100,000+ : 11.55 10,000,000+ : 10.55 10,000+ : 10.2 1,000+ : 8.39 100+ : 6.92 5,000,000+ : 6.83 500,000+ : 5.56 50,000+ : 4.77 5,000+ : 4.51 10+ : 3.54 500+ : 3.25 50,000,000+ : 2.3 100,000,000+ : 2.13 50+ : 1.92 5+ : 0.79 1+ : 0.51 500,000,000+ : 0.27 1,000,000,000+ : 0.23 0+ : 0.05 0 : 0.01
The numbers as we can see are not precise, 100,000 + can be anything from 100,000 and above but for our current analysis we just need to find out which genre attracts most users.
So for our current analysis we will consider apps with 100,000 + installs as 100,000 installs. For computation we will need to convert strings to floats and remove commas.
categories_android = freq_table(android_final,1)
display=[]
for category in categories_android:
total = 0
len_category = 0
for app in android_final:
category_app =app[1]
if category_app == category:
n_installs= app[5]
n_installs = n_installs.replace(',','')
n_installs = n_installs.replace('+','')
total += float(n_installs)
len_category += 1
avg_n_installs = total / len_category
avg_category_tuple = (avg_n_installs, category)
display.append(avg_category_tuple)
display_sorted = sorted(display,reverse = True)
for entry in display_sorted:
print(entry[1],':', entry[0])
COMMUNICATION : 38456119.167247385 VIDEO_PLAYERS : 24727872.452830188 SOCIAL : 23253652.127118643 PHOTOGRAPHY : 17840110.40229885 PRODUCTIVITY : 16787331.344927534 GAME : 15588015.603248259 TRAVEL_AND_LOCAL : 13984077.710144928 ENTERTAINMENT : 11640705.88235294 TOOLS : 10801391.298666667 NEWS_AND_MAGAZINES : 9549178.467741935 BOOKS_AND_REFERENCE : 8767811.894736841 SHOPPING : 7036877.311557789 PERSONALIZATION : 5201482.6122448975 WEATHER : 5074486.197183099 HEALTH_AND_FITNESS : 4188821.9853479853 MAPS_AND_NAVIGATION : 4056941.7741935486 FAMILY : 3695641.8198090694 SPORTS : 3638640.1428571427 ART_AND_DESIGN : 1986335.0877192982 FOOD_AND_DRINK : 1924897.7363636363 EDUCATION : 1833495.145631068 BUSINESS : 1712290.1474201474 LIFESTYLE : 1437816.2687861272 FINANCE : 1387692.475609756 HOUSE_AND_HOME : 1331540.5616438356 DATING : 854028.8303030303 COMICS : 817657.2727272727 AUTO_AND_VEHICLES : 647317.8170731707 LIBRARIES_AND_DEMO : 638503.734939759 PARENTING : 542603.6206896552 BEAUTY : 513151.88679245283 EVENTS : 253542.22222222222 MEDICAL : 120550.61980830671
for app in android_final:
if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+'or app[5]== '500,000,000'):
print(app[0], ':',app[5])
WhatsApp Messenger : 1,000,000,000+ Google Duo - High Quality Video Calls : 500,000,000+ Messenger – Text and Video Chat for Free : 1,000,000,000+ imo free video calls and chat : 500,000,000+ Skype - free IM & video calls : 1,000,000,000+ LINE: Free Calls & Messages : 500,000,000+ Google Chrome: Fast & Secure : 1,000,000,000+ UC Browser - Fast Download Private & Secure : 500,000,000+ Gmail : 1,000,000,000+ Hangouts : 1,000,000,000+ Viber Messenger : 500,000,000+
If we look into these apps these are all well established giants and competing in this category would make no sense. We can ask our developer Team to focus on converting any open source book available into an app which would offer interactive mode like quizzes, dictionary and if any user gets stuck during the quizzes they can watch our in-app videos and get points to unlock hints or solve the quiz. Idea is to excite user and keep them hooked to the storyline.