For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.
We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better.
Our goal for this project is to analyze data from Google Play and App Store datasets both in csv
formats to understand what type of apps are likely to attract more users.
Open the datasets for both GooglePlayStore and AppleStore respectively using reader
library.
playstore_file = open("googleplaystore.csv")
applestore_file = open("AppleStore.csv")
from csv import reader
playstore = reader(playstore_file)
applestore = reader(applestore_file)
play_data = list(playstore)
apps_data = list(applestore)
print("Headers from playstore data: \n ",play_data[0])
print("\n first 5 rows from playstore:\n",play_data[1:6])
print("\n \n")
print("Headers from AppStore data: \n ",apps_data[0])
print("\n first 5 rows from appstore:\n",apps_data[1:6])
Headers from playstore data: ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] first 5 rows from playstore: [['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']] Headers from AppStore data: ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] first 5 rows from appstore: [['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']]
Now,lets explore the datasets. explore_data()
helps in exploring the datasets.
explore_data()
takes four params :
dataset
- which is expected to be a list of lists.
start
and end
- both expected to be integers and represent the starting and the ending indices of a slice from the data set.
rows_and_columns
- which is expected to be a Boolean and has False
as a default argument
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
explore_data()
¶Call explore_data()
function to see the results for AppleStore
data
Call explore_data()
function to see the results for GooglePlay
data
explore_data(play_data[1:],1,6)
explore_data(play_data[1:],1,4,True)
explore_data(apps_data[1:],1,4)
explore_data(apps_data[1:],1,4,True)
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'] ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 10841 Number of columns: 13 ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] Number of rows: 7197 Number of columns: 16
Before proceeding any further, we need to clean our dataset to:
According to the discussion, we see that an error is reported and it is reported at row index 10472 (ie, 10473 including the headers). Print row 10473 to view the data.
print("Google play data headers: \n",play_data[0:1])
print("Data at index 10473 \n",play_data[10473])
Google play data headers: [['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']] Data at index 10473 ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
From the above results, we see that the entry for column Category
is missing from the result-set.
At row index 10473, the Category
column is missing and hence, we delete the row using del
statement
# print the length of dataset before deletion
print("length of google store dataset before deleting row 10473:",len(play_data))
# Delete the row with inaccurate data
del play_data[10473]
# print the length of dataset after deletion
print("\nlength of google store dataset after deleting row 10473: ",len(play_data))
length of google store dataset before deleting row 10473: 10842 length of google store dataset after deleting row 10473: 10841
If we explore the discussions, we notice that some apps have duplicate entries. For example from the code below, Instagram has 4 entries.
for app in play_data[1:]:
app_name = app[0]
if app_name == 'Instagram':
print(app)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
Now, lets find the total number of unique entries and duplicate entries in Google Play dataset.
play_data
dataset and check and append the app name to respective listmax_reviews =0
duplicates =[]
distinct=[]
for app in play_data:
app_name = app[0]
if app_name in distinct:
duplicates.append(app_name)
else:
distinct.append(app_name)
print("total unique apps: ",len(distinct))
print("\ntotal duplicate apps: ",len(duplicates))
total unique apps: 9660 total duplicate apps: 1181
Now, we have the duplicate rows in list named duplicates
and all unique rows in distinct
list.
The duplicate apps cannot be removed randomly, instead if you examine the rows we printed for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews.
We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we will only keep the row with the highest number of reviews and remove the other entries for any given app.
reviews_max = {}
for app in play_data[1:]:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
We found that there are 1,181 cases where an app occurs more than once, so the length of our dictionary (of unique apps) should be equal to the difference between the length of our data set and 1,181.
We can see that the length of reviews_max
and play_data
minus the total number duplicates in the dataset are the same.
print("length of dataset with highest number of reviews for each app:",len(reviews_max))
length of dataset with highest number of reviews for each app: 9659
print("length of dataset after removing the duplicates: ",len(play_data[1:]) - 1181)
length of dataset after removing the duplicates: 9659
From the above, both the values are equal, which shows our dataset is accurate ( length of dataset with the entries with the highest number of reviews for each app(ie,duplicates removed) and the length of our entire dataset minus the number of duplicates match ie, 9659)
For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:
We start by initializing two empty lists, android_clean
and already_added
.
We loop through the Goole Play dataset and also using the reviews_max
dictionary, we append the app name to the list android_clean
if the app is not already in already_added
AND maximum reviews for the app is equal to the maximum reviews in reviews_max
else append app name to already_added
. (see code cell 51)
Now, lets code it as follows:
android_clean = []
already_added = []
for app in play_data[1:]:
name = app[0]
n_reviews = float(app[3])
if (reviews_max[name] == n_reviews) and (name not in already_added):
android_clean.append(app)
already_added.append(name)
print("length of the clean dataset:",len(android_clean))
length of the clean dataset: 9659
Let's also confirm that the number of rows is 9,659 using the explore_data()
.
explore_data(android_clean, 0, 3, True)
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9659 Number of columns: 13
The dataset has 9659 rows which is the expected result and this ensures that everything went as expected.
We use English for the apps we develop at our company, and we'd like to analyze only the apps that are directed toward an English-speaking audience.We're not interested in keeping these apps with names that suggest they are not directed toward an English-speaking audience.
One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).
All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.
We built this function below, and we use the built-in ord()
function to find out the corresponding encoding number of each character.
print('ASCII for \'a\':',ord('a'))
print('ASCII for \'%\':',ord('%'))
print('ASCII for 4:',ord('4'))
ASCII for 'a': 97 ASCII for '%': 37 ASCII for 4: 52
From ASCII chart,if ASCII value is less than or equal to 127 then it belongs to common English characters.So,if an app name contains a character that is greater than 127, then it probably means that the app has a non-English name.
We define a function isEnglish_app()
to determine if the app name is non-English, which returns True
or False
according to the parameter passed.
def isEnglish_app(string):
for ch in string:
if ord(ch) > 127 :
return False
else:
return True
isEnglish_app('电视剧热播')
False
isEnglish_app('Instachat 😜')
True
isEnglish_app('Docs To Go™ Free Office Suite')
True
Though isEnglish_app()
determines the non-English app names, it is not efficient enough to identify app names like 'Docs To Go™ Free Office Suite'
or app names with emojis in it because emojis and speacial characters like ™
fall outside the ASCII range.
So to minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English.
def isEnglish_app(string):
char_count =0
for ch in string:
if ord(ch) > 127 :
char_count += 1
if char_count > 3:
return False
else:
return True
print(isEnglish_app('电电电电电'))
print(isEnglish_app('Instachat 😜'))
False True
Using the new function, let's filter out non-English apps from both Google Play dataset and AppStore dataset.
ios_english =[]
gplay_english =[]
for i in android_clean:
name = i[0]
if isEnglish_app(name):
gplay_english.append(i)
for i in apps_data[1:]:
name = i[1]
if isEnglish_app(name):
ios_english.append(i)
print("English apps in play store",len(gplay_english))
print("\nEnglish apps in apple store",len(ios_english))
print("\n")
explore_data(gplay_english,1,3, True)
explore_data(ios_english,1,3,True)
English apps in play store 9614 English apps in apple store 6183 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 9614 Number of columns: 13 ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 6183 Number of columns: 16
The number of English apps in Google PlayStore is 9614 and in AppStore is 6183.
According to the requirement, we only build apps that are free to download and install. We need to isolate only the free apps for our analysis.
In the case of Google PlayStore apps, we check if the type of app is Free
to determine if the app is paid/free. And, in AppStore dataset we check the Price
column.
gplay_free =[]
ios_free =[]
for app in gplay_english[1:]:
type = app[6]
if type == "Free":
gplay_free.append(app)
for app in ios_english[1:]:
price = float(app[4])
if price == 0.0:
ios_free.append(app)
print("Free play store apps: ", len(gplay_free))
print("Free Apple store apps: ", len(ios_free))
Free play store apps: 8862 Free Apple store apps: 3221
Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.
We generate frequency tables for the prime_genre
column of the App Store data set, and for the Genres
and Category
columns of the Google Play data set to find the most common genres in each market.
We'll build two functions we can use to analyze the frequency tables:
One function to generate frequency tables that show percentages
Another function we can use to display the percentages in a descending order
def freq_table(dataset, index):
table = {}
total = 0
for row in dataset:
total += 1
value = row[index]
if value in table:
table[value] += 1
else:
table[value] = 1
table_percentages = {}
for key in table:
percentage = (table[key] / total) * 100
table_percentages[key] = percentage
return table_percentages
# to display the frequency table of the columns in dataset
# (E.G: prime_genre in AppStore,
# Genres and Category in Google PlayStore)
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
print("\nGOOGLE PLAY APPS BY GENRE\n")
display_table(gplay_free, -4) # # -4 is the negative index for genre
print("\nGOOGLE PLAY APPS BY CATEGORY \n")
display_table(gplay_free,1) # by Category
GOOGLE PLAY APPS BY GENRE Tools : 8.451816745655607 Entertainment : 6.070864364703228 Education : 5.348679756262695 Business : 4.5926427443015125 Productivity : 3.8930264048747465 Lifestyle : 3.8930264048747465 Finance : 3.7011961182577298 Medical : 3.5319341006544795 Sports : 3.4642292936131795 Personalization : 3.3175355450236967 Communication : 3.238546603475513 Action : 3.1031369893929135 Health & Fitness : 3.080568720379147 Photography : 2.945159106296547 News & Magazines : 2.798465357707064 Social : 2.663055743624464 Travel & Local : 2.324531708417964 Shopping : 2.2455427668697814 Books & Reference : 2.143985556307831 Simulation : 2.0424283457458814 Dating : 1.8618821936357481 Arcade : 1.8505980591288649 Video Players & Editors : 1.7716091175806816 Casual : 1.7603249830737984 Maps & Navigation : 1.399232678853532 Food & Drink : 1.2412547957571656 Puzzle : 1.128413450688332 Racing : 0.9930038366057323 Role Playing : 0.9365831640713158 Libraries & Demo : 0.9365831640713158 Auto & Vehicles : 0.9252990295644324 Strategy : 0.9027307605506657 House & Home : 0.8237418190024826 Weather : 0.8011735499887158 Events : 0.7109004739336493 Adventure : 0.6770480704129994 Comics : 0.6093432633716994 Beauty : 0.598059128864816 Art & Design : 0.5867749943579328 Parenting : 0.49650191830286616 Card : 0.45136538027533285 Casino : 0.4287971112615662 Trivia : 0.41751297675468296 Educational;Education : 0.3949447077409162 Board : 0.3836605732340329 Educational : 0.3723764387271496 Education;Education : 0.3385240352064997 Word : 0.2595350936583164 Casual;Pretend Play : 0.23696682464454977 Music : 0.2031144211238998 Racing;Action & Adventure : 0.16926201760324985 Puzzle;Brain Games : 0.16926201760324985 Entertainment;Music & Video : 0.16926201760324985 Casual;Brain Games : 0.13540961408259986 Casual;Action & Adventure : 0.13540961408259986 Arcade;Action & Adventure : 0.12412547957571654 Action;Action & Adventure : 0.1015572105619499 Educational;Pretend Play : 0.09027307605506657 Simulation;Action & Adventure : 0.07898894154818326 Parenting;Education : 0.07898894154818326 Entertainment;Brain Games : 0.07898894154818326 Board;Brain Games : 0.07898894154818326 Parenting;Music & Video : 0.06770480704129993 Educational;Brain Games : 0.06770480704129993 Casual;Creativity : 0.06770480704129993 Art & Design;Creativity : 0.06770480704129993 Education;Pretend Play : 0.056420672534416606 Role Playing;Pretend Play : 0.045136538027533285 Education;Creativity : 0.045136538027533285 Role Playing;Action & Adventure : 0.033852403520649964 Puzzle;Action & Adventure : 0.033852403520649964 Entertainment;Creativity : 0.033852403520649964 Entertainment;Action & Adventure : 0.033852403520649964 Educational;Creativity : 0.033852403520649964 Educational;Action & Adventure : 0.033852403520649964 Education;Music & Video : 0.033852403520649964 Education;Brain Games : 0.033852403520649964 Education;Action & Adventure : 0.033852403520649964 Adventure;Action & Adventure : 0.033852403520649964 Video Players & Editors;Music & Video : 0.022568269013766643 Sports;Action & Adventure : 0.022568269013766643 Simulation;Pretend Play : 0.022568269013766643 Puzzle;Creativity : 0.022568269013766643 Music;Music & Video : 0.022568269013766643 Entertainment;Pretend Play : 0.022568269013766643 Casual;Education : 0.022568269013766643 Board;Action & Adventure : 0.022568269013766643 Video Players & Editors;Creativity : 0.011284134506883321 Trivia;Education : 0.011284134506883321 Travel & Local;Action & Adventure : 0.011284134506883321 Tools;Education : 0.011284134506883321 Strategy;Education : 0.011284134506883321 Strategy;Creativity : 0.011284134506883321 Strategy;Action & Adventure : 0.011284134506883321 Simulation;Education : 0.011284134506883321 Role Playing;Brain Games : 0.011284134506883321 Racing;Pretend Play : 0.011284134506883321 Puzzle;Education : 0.011284134506883321 Parenting;Brain Games : 0.011284134506883321 Music & Audio;Music & Video : 0.011284134506883321 Lifestyle;Pretend Play : 0.011284134506883321 Lifestyle;Education : 0.011284134506883321 Health & Fitness;Education : 0.011284134506883321 Health & Fitness;Action & Adventure : 0.011284134506883321 Entertainment;Education : 0.011284134506883321 Communication;Creativity : 0.011284134506883321 Comics;Creativity : 0.011284134506883321 Casual;Music & Video : 0.011284134506883321 Card;Action & Adventure : 0.011284134506883321 Books & Reference;Education : 0.011284134506883321 Art & Design;Pretend Play : 0.011284134506883321 Art & Design;Action & Adventure : 0.011284134506883321 Arcade;Pretend Play : 0.011284134506883321 Adventure;Education : 0.011284134506883321 GOOGLE PLAY APPS BY CATEGORY FAMILY : 18.900925299029563 GAME : 9.726923944933423 TOOLS : 8.463100880162491 BUSINESS : 4.5926427443015125 LIFESTYLE : 3.9043105393816293 PRODUCTIVITY : 3.8930264048747465 FINANCE : 3.7011961182577298 MEDICAL : 3.5319341006544795 SPORTS : 3.39652448657188 PERSONALIZATION : 3.3175355450236967 COMMUNICATION : 3.238546603475513 HEALTH_AND_FITNESS : 3.080568720379147 PHOTOGRAPHY : 2.945159106296547 NEWS_AND_MAGAZINES : 2.798465357707064 SOCIAL : 2.663055743624464 TRAVEL_AND_LOCAL : 2.335815842924848 SHOPPING : 2.2455427668697814 BOOKS_AND_REFERENCE : 2.143985556307831 DATING : 1.8618821936357481 VIDEO_PLAYERS : 1.7941773865944481 MAPS_AND_NAVIGATION : 1.399232678853532 FOOD_AND_DRINK : 1.2412547957571656 EDUCATION : 1.162265854208982 ENTERTAINMENT : 0.9591514330850823 LIBRARIES_AND_DEMO : 0.9365831640713158 AUTO_AND_VEHICLES : 0.9252990295644324 HOUSE_AND_HOME : 0.8237418190024826 WEATHER : 0.8011735499887158 EVENTS : 0.7109004739336493 PARENTING : 0.6544798013992327 ART_AND_DESIGN : 0.631911532385466 COMICS : 0.6206273978785828 BEAUTY : 0.598059128864816
From the above result cell, it is evident that category Family is the most common in Android apps followed by Game and Tools and Tools is the most common genre on PlayStore followed by Entertainment and Education.
print("AppStore apps - BY PRIME GENRE \n")
display_table(ios_free, -5)# -5 is the negative index for genre
AppStore apps - BY PRIME GENRE Games : 58.180689226948154 Entertainment : 7.885749767153058 Photo & Video : 4.967401428127911 Education : 3.6634585532443342 Social Networking : 3.2598571872089415 Shopping : 2.607885749767153 Utilities : 2.5147469729897547 Sports : 2.1421918658801617 Music : 2.049053089102763 Health & Fitness : 2.018006830176964 Productivity : 1.7385904998447685 Lifestyle : 1.5833592052157717 News : 1.334989133809376 Travel : 1.2418503570319777 Finance : 1.11766532132878 Weather : 0.8692952499223843 Food & Drink : 0.8072027320707855 Reference : 0.55883266066439 Business : 0.5277864017385905 Book : 0.43464762496119214 Navigation : 0.18627755355479667 Medical : 0.18627755355479667 Catalogs : 0.12418503570319776
From the above,amoung the free English apps the most common genre in AppStore is the Games genre followed by Entertainment and Photo & Video. These apps are designed for fun or entertainment, and we can thus conclude that from the apps in AppStore, apps used for fun/entertained are common than the other apps such as those used for other practical purposes.
If we compare the apps by genre in both the datasets usin the frequenct tables alone, users of AppStore apps are more inclined to apps in entertainment and fun genre and Android users are balanced between apps meant for practical purposes and entertainment.
To find the most popular app, we can calculate the average number of installs for each app genre. In PlayStore data, we have the Installs
column that gives us the total number of installs. However, in AppStore dataset, we take into consideration the rating_count_total
column and calculate the average of user ratings per app genre.
We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.)
display_table(gplay_free, 5) # 5 is the index for Installs column
1,000,000+ : 15.730083502595352 100,000+ : 11.554953735048521 10,000,000+ : 10.550665763935905 10,000+ : 10.189573459715639 1,000+ : 8.395396073121193 100+ : 6.917174452719477 5,000,000+ : 6.82690137666441 500,000+ : 5.563078311893477 50,000+ : 4.773188896411646 5,000+ : 4.513653802753328 10+ : 3.5432182351613632 500+ : 3.2498307379823967 50,000,000+ : 2.3019634394041977 100,000,000+ : 2.132701421800948 50+ : 1.9183028661701647 5+ : 0.7898894154818324 1+ : 0.5077860528097494 500,000,000+ : 0.2708192281651997 1,000,000,000+ : 0.22568269013766643 0+ : 0.045136538027533285
To perform computations(calculate the average number of installs per app genre for the Google Play data set), however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.
category_playstore = freq_table(gplay_free,1) # 1 is the index for Category
category_sorted = sorted(category_playstore, reverse = False)
free_apps_sorted = sorted(gplay_free, reverse = False)
for category in category_sorted:
total = 0
len_category = 0
for app in free_apps_sorted:
category_app = app[1]
if category_app == category:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
total += float(n_installs)
len_category += 1
avg_n_installs = total / len_category
print(category, ':', (avg_n_installs))
ART_AND_DESIGN : 2021626.7857142857 AUTO_AND_VEHICLES : 647317.8170731707 BEAUTY : 513151.88679245283 BOOKS_AND_REFERENCE : 8767811.894736841 BUSINESS : 1712290.1474201474 COMICS : 817657.2727272727 COMMUNICATION : 38456119.167247385 DATING : 854028.8303030303 EDUCATION : 1833495.145631068 ENTERTAINMENT : 11640705.88235294 EVENTS : 253542.22222222222 FAMILY : 3697848.1731343283 FINANCE : 1387692.475609756 FOOD_AND_DRINK : 1924897.7363636363 GAME : 15588015.603248259 HEALTH_AND_FITNESS : 4188821.9853479853 HOUSE_AND_HOME : 1331540.5616438356 LIBRARIES_AND_DEMO : 638503.734939759 LIFESTYLE : 1437816.2687861272 MAPS_AND_NAVIGATION : 4056941.7741935486 MEDICAL : 120550.61980830671 NEWS_AND_MAGAZINES : 9549178.467741935 PARENTING : 542603.6206896552 PERSONALIZATION : 5201482.6122448975 PHOTOGRAPHY : 17840110.40229885 PRODUCTIVITY : 16787331.344927534 SHOPPING : 7036877.311557789 SOCIAL : 23253652.127118643 SPORTS : 3638640.1428571427 TOOLS : 10801391.298666667 TRAVEL_AND_LOCAL : 13984077.710144928 VIDEO_PLAYERS : 24727872.452830188 WEATHER : 5074486.197183099
On an average, *Communication* apps(WhatsApp, Viber, Skype etc) have more installs (38456119.167247385) and hence would be more profitable on Google PlayStore. The other two categorys with more installs are *Video Players* (24727872.4528302 ) and *Social* (23253652.1271186 )
appstore_genre = freq_table(ios_free, -5)
for genre in appstore_genre:
len_genre = 0
total = 0
for row in ios_free:
genre_app = row[-5]
if genre_app == genre:
ratings_count = float(row[5])
total+= ratings_count
len_genre +=1
average_ratings = total/len_genre
print(genre,':', average_ratings)
News : 21248.023255813954 Education : 7003.983050847458 Social Networking : 43899.514285714286 Utilities : 18684.456790123455 Book : 39758.5 Business : 7491.117647058823 Navigation : 86090.33333333333 Health & Fitness : 23298.015384615384 Productivity : 21028.410714285714 Sports : 23008.898550724636 Photo & Video : 28441.54375 Travel : 28243.8 Reference : 74942.11111111111 Games : 22788.6696905016 Entertainment : 14029.830708661417 Finance : 31467.944444444445 Lifestyle : 16485.764705882353 Weather : 52279.892857142855 Food & Drink : 33333.92307692308 Music : 57326.530303030304 Shopping : 26919.690476190477 Catalogs : 4004.0 Medical : 612.0
The top three apps with highest user ratings in AppStore by Genre are Navigation, Reference and Music apps. Navigation apps are recommended for the App Store based on average number of ratings.
If we look closer at the results,Social Networking apps in AppStore is not far behind based on average number of ratings ie, 43899.5142857143 .
for i in ios_free:
if i[-5] == 'Social Networking':
print(i[1])
Pinterest Skype for iPhone Messenger Tumblr WhatsApp Messenger Kik ooVoo – Free Video Call, Text and Voice TextNow - Unlimited Text + Calls Viber Messenger – Text & Call Followers - Social Analytics For Instagram MeetMe - Chat and Meet New People We Heart It - Fashion, wallpapers, quotes, tattoos InsTrack for Instagram - Analytics Plus More Tango - Free Video Call, Voice and Chat LinkedIn Match™ - #1 Dating App. Skype for iPad POF - Best Dating App for Conversations Timehop Find My Family, Friends & iPhone - Life360 Locator Whisper - Share, Express, Meet Hangouts LINE PLAY - Your Avatar World WeChat Badoo - Meet New People, Chat, Socialize. Followers + for Instagram - Follower Analytics GroupMe Marco Polo Video Walkie Talkie Miitomo SimSimi Grindr - Gay and same sex guys chat, meet and date Wishbone - Compare Anything imo video calls and chat After School - Funny Anonymous School News Quick Reposter - Repost, Regram and Reshare Photos Weibo HD Repost for Instagram Live.me – Live Video Chat & Make Friends Nearby Nextdoor Followers Analytics for Instagram - InstaReport YouNow: Live Stream Video Chat FollowMeter for Instagram - Followers Tracking LINE eHarmony™ Dating App - Meet Singles Discord - Chat for Gamers QQ Telegram Messenger Weibo Periscope - Live Video Streaming Around the World Chat for Whatsapp - iPad Version QQ HD Followers Analysis Tool For Instagram App Free live.ly - live video streaming Houseparty - Group Video Chat SOMA Messenger Monkey Down To Lunch Flinch - Video Chat Staring Contest Highrise - Your Avatar Community LOVOO - Dating Chat PlayStation®Messages BOO! - Video chat camera with filters & stickers Qzone Chatous - Chat with new people Kiwi - Q&A GhostCodes - a discovery app for Snapchat Jodel FireChat Google Duo - simple video calling Fiesta by Tango - Chat & Meet New People Google Allo — smart messaging Peach — share vividly Hey! VINA - Where Women Meet New Friends Battlefield™ Companion All Devices for WhatsApp - Messenger for iPad Chat for Pokemon Go - GoChat IAmNaughty – Dating App to Meet New People Online Qzone HD Zenly - Locate your friends in realtime League of Legends Friends 豆瓣 Candid - Speak Your Mind Freely 知乎 Selfeo Fake-A-Location Free ™ Popcorn Buzz - Free Group Calls Fam — Group video calling for iMessage QQ International Ameba SoundCloud Pulse: for creators Tantan Cougar Dating & Life Style App for Mature Women Rawr Messenger - Dab your chat WhenToPost: Best Time to Post Photos for Instagram Inke—Broadcast an amazing life Mustknow - anonymous video Q&A CTFxCmoji Lobi Chain: Collaborate On MyVideo Story/Group Video botman - Real time video chat BestieBox MATCH ON LINE chat niconico ch LINE BLOG bit-tube - Live Stream Video Chat
for i in gplay_free:
if i[1] == "SOCIAL":
print(i[0])
Facebook Facebook Lite Tumblr Social network all in one 2018 Pinterest TextNow - free text + calls Google+ The Messenger App Messenger Pro Free Messages, Video, Chat,Text for Messenger Plus Telegram X The Video Messenger App Jodel - The Hyperlocal App Hide Something - Photo, Video Love Sticker Web Browser & Fast Explorer LiveMe - Video chat, new friends, and make money VidStatus app - Status Videos & Status Downloader Love Images Web Browser ( Fast & Secure Web Explorer) SPARK - Live random video chat & meet new people Golden telegram Facebook Local Meet – Talk to Strangers Using Random Video Chat MobilePatrol Public Safety App 💘 WhatsLov: Smileys of love, stickers and GIF HTC Social Plugin - Facebook Quora Kate Mobile for VK Family GPS tracker KidControl + GPS by SMS Locator Moment Text Me: Text Free, Call Free, Second Phone Number Text Free: WiFi Calling App Badoo - Free Chat & Dating App Text free - Free Text + Call Tango - Live Video Broadcast ooVoo Video Calls, Messaging & Stories Whisper Blogger Bloglovin' Blogaway for Android (Blogger) Instagram TwitCasting Live Stream - Live Video Community YouNow: Live Stream Video Chat Mirrativ: Live Stream Any App Snapchat Banjo We Heart It MeetMe: Chat & Meet New People Timehop Frontback - Social Photos LinkedIn Path SayHi Chat, Meet New People Tapatalk - 100,000+ Forums Couple - Relationship App Nextdoor - Local neighborhood news & classifieds LOVOO Jaumo Dating, Flirt & Live Video Patook - make platonic friends Zello PTT Walkie Talkie textPlus: Free Text & Calls magicApp Calling & Messaging Dating App, Flirt & Chat : W-Match uCiC- Videos and Photos on demand Meetup POF Free Dating App Tagged - Meet, Chat & Dating SKOUT - Meet, Chat, Go Live Mico- Stranger Chat Random video Chat, Live, Meet Waplog - Free Chat, Dating App, Meet Singles Tik Tok - including musical.ly B-Messenger Video Chat BIGO LIVE - Live Stream Greeting Cards & Wishes Share G - Images Sharing - Wallpapers App H letter images Instachat 😜 Fame Boom for Real Followers, Likes FollowMeter for Instagram Mali J KDRAMA Amino for K-Drama Fans KPOP Amino for K-Pop Entertainment EXO-L Amino for EXO Fans Verdad o Reto pixiv Join R, Community Engagement U LIVE – Video Chat & Stream See U - Random video chat, video chat What U See U-Report Meet U - Get Friends for Snapchat, Kik & Instagram VMate Lite - Funny Short Videos Social Network V Bucks ProTips New i-share AF/KLM (AFKL ishare) Eternal Light AG Message AI - Write Better Messages (Free) Jamaa Amino for Animal Jam Legend - Animate Text in Video GUYZ - Gay Chat & Gay Dating eChallan Andhra Pradesh (AP) Snaappy – 3D fun AR core communication platform Undertale AU Amino Au Pair Media Sosial TNI AU BA 3 Banjarmasin iCard BD Plus Movement BE YAY - TBH Find My Friends BG LINKED (BGLINKED) Zdravei.BG BGKontakti London BG Kontakti BGKontakti Bayern BG Kontakti BGKontakti Vienna BG Kontakti Discípulos em BH BH Connect Gayvox - Gay Lesbian Bi Dating Grindr - Gay chat Lesbian Chat & Dating - SPICY Alumni BJ VK BK Traffic Control cum Chart Daily Murli Saar Widget Myjob@BM bm-Events BOO! - Next Generation Messenger Wishbone - Compare Anything BR Chat Bot Br Browser Dr B R Ambedkar (Jai Bhim) Dr. B.R.Ambedkar Black Social BT Communicator BT Dating -Find your match, help cupid, be social Fiesta by Tango - Find, Meet and Make New Friends Evasion.bz CB Heroes CG Districts Digi-TV.ch Students.ch CJ Gospel Hour Pekalongan CJ Hashtags For Likes.co CP Dialer C.P. CERVANTES (TOBARRA) Cyprus Police Rande.cz signály.cz DB Event App DC Comics Amino DF BugMeNot Noticias DF Periscope - Live Video Daddyhunt: Gay Dating Free phone calls, free texting SMS on free number Phone Tracker : Family Locator HOLLA Live: Meet New People via Random Video Chat Dating.dk DK Murali GirlTalk.dk +Download 4 Instagram Twitter DM Me - Chat DM for IG 😘 - Image & Video Saver for Instagram Auto DM for Twitter 🔥 DM Storage (for twitter) Fake Chat (Direct Message) Otto DM DN Blog DP and Status for WhatsApp 2018 Dp For Whatsapp Profile Pictures and DP for Whatsapp Dp for Facebook Best DP and Status Instant DP Downloader for Instagram DP Display Pictures Life Quotes Motivational GM Qeek for Instagram - Zoom profile insta DP DV Statistics DW Streaming News Dz quran-DZ TN e Sevai TN EB Bill Patta Citta EC Birth All Hub TNEB Bill Online Payment (Tamil) UP EB Bill Payment & Details TN EC Online New TN Patta /Chitta /EC New EG Way Life Equestria Amino for MLP Ek Maratha Hum Ek Hain 2.02 Sabka Malik Ek Sai EO RAIPUR Anime et Manga Amino en Français Hornet - Gay Social Network Amino: Communities and Chats European Solidarity Corps Reisedealz.eu Eddsworld Amino Rejoin Your Ex Amleen Ey Coupe Adhémar EY 2017 EZ Video Download for Facebook Messages, Text and Video Chat for Messenger All Social Networks Messenger Messenger Facebook Creator Swift for Facebook Lite Friendly for Facebook Faster for Facebook Lite Puffin for Facebook Profile Tracker - Who Viewed My Facebook Profile Pink Color for Facebook Messenger Stickers for Imo, fb, whatsapp Who Viewed My Facebook Profile - Stalkers Visitors Downloader plus for FB MB Notifications for FB (Free) Phoenix - Facebook & Messenger Faster Social for Facebook Check Your Visitors on FB ? Stickers for Facebook Lite Messenger for Facebook Lite Mini for Facebook lite FB Advanced Search funny Image Comments for FB Unlimited Group Links - Whatsapp, FB, Telegram FCB Connect - FC Barcelona Frases Cristianas de Esperanza y Fe FutureNet your social app Alarm.fo – choose your info FunForMobile Ringtones & Chat Chat For Strangers - Video Chat Fr Daoud Lamei Naruto & Boruto FR Frim: get new friends on local chat rooms
Though there are many apps already available under the Socal genre in both the stores, we recommend that an app under Social
or Social Networking
genre would be profitable on both the markets.
In this project, we were analysing the apps on both Android Google PlayStore and Apple's AppStore to recommend an app genre that would potentially be profitable on both the markets.
After analysing the popularity of the genres on both AppStore and Google PlayStore, we recommend that apps in the Social
or Socail Networking
genre would be profitable.