Since the data we need is too large (you can check how large in the Statistica website), we are going to analyze smaller samples of it, this is going to give us a good answer to our question.
With that in mind, we are going to use data from two Kaggle Data Sources. One contains data about approximately 10000 Android apps from Google Play and can be found here. The second one contains about approximately 7200 iOS apps from the App Store and can be found here.
In this section we are going to import, read and store our data, so it can be cleaned next.
# First lets open the files
opened_ios = open('/home/nathalia/Documents/2 data science/6 DataQuest Projects/1 Apps Analysis/AppleStore.csv', encoding='utf8')
opened_android = open('/home/nathalia/Documents/2 data science/6 DataQuest Projects/1 Apps Analysis/googleplaystore.csv', encoding='utf8')
# Then we need to import the csv reader to read it and store in the lists
from csv import reader
readed_ios = reader(opened_ios)
readed_android = reader(opened_android)
# Once they are readed, we can store it inside lists
ios_data = list(readed_ios)
android_data = list(readed_android)
Now we are going to define a function to help us visualize what we just created.
# This function will help redability printing empty lines and will also return how many rows and columns
def explore_data(dataset, start, end, rows_and_columns = False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new empty line to help redability
if rows_and_columns == True:
print('Number of rows: ', len(dataset))
print('Number of columns: ', len(dataset[0]))
Next we'll make some short analysis of the dimensions and columns of both datasets. The meaning of the columns of Google Play Store are not described by the dataset owner, but from the Apple Store dataset can be found here.
print('Google Play Store apps data columns and dimensions: \n')
explore_data(android_data, 0, 1, True)
print('\n','-'*60,'\n')
print('Apple Store apps data columns and dimensions: \n ')
explore_data(ios_data, 0, 1, True)
Google Play Store apps data columns and dimensions: ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] Number of rows: 10842 Number of columns: 13 ------------------------------------------------------------ Apple Store apps data columns and dimensions: ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] Number of rows: 7198 Number of columns: 16
Looking at the columns of our dataset we can identify some important information that we'll use to answer our main question, like:
Google Play Store = 'App', 'Category', 'Rating', 'Installs', 'Type', 'Price', 'Genres'
App Store = 'id', 'price', 'rating_count_tot', 'user_rating', 'cont_rating', 'prime_genre'
Now we'll store the column names in a different list so we can use it in the future for reference.
# android column names
columns_android = android_data[0]
# ios column names
columns_ios = ios_data[0]
print('Android columns\n', columns_android)
print('\n iOS columns\n', columns_ios)
Android columns ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] iOS columns ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
This section has the purpose of removing wrong data, checking for mistakes and other issues that could prejudice our analysis.
Checking the discussion forum of our dataset we can see there's an error in row 10472 (actually 10473). We print it to be sure, then we'll delete it.
android_data[10473] # we can see here that there's no 'Category'
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
# because of that we'll delete it
del android_data[10473]
# check if it's gone
explore_data(android_data, 0, 1, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] Number of rows: 10841 Number of columns: 13
After the removal, we'll check again to see if there are any other wrong data. Looking at it we can see that there are several duplicated rows in the data, as we can see through the for statement bellow.
# Looking at the Instagram entries
for app in android_data:
name = app[0]
if name == 'Instagram':
print(app)
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
Knowing that, we will create a function that will give us the number of duplicates and store the name of them, so next we can remove the duplicates.
# First we will store the duplicated values into lists
duplicated = []
unique = []
for app in android_data:
name = app[0]
if name in unique:
duplicated.append(name)
else:
unique.append(name)
# Then we can see how many entries each list have
print(len(duplicated))
print(len(unique))
1181 9660
With that in mind, we won't remove the duplicates randomly. Looking at the 'Instagram' entries we printed, we can see that the 4th column has the number of reviews, thinking about it logically we can see that the biggest number reflects the newer entry. This is the entry we are going to keep.
# Creating a dictionary to store name and maximum number of reviews
reviews_max = {}
# Looping throught the data to populate the dictionary
for app in android_data[1:]:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
# Seeing if everything went fine
print("Size of reviews_max dictionary:", len(reviews_max))
Size of reviews_max dictionary: 9659
The loop consists in checking two things:
already_added
list).Once the entry has the same number of review and hasn't been added, it will be stored in our new list created to receive the cleaned data android_clean
.
Finally we print the length of the list to see if everything went fine (it has to be the same length as the reviews_max that we did above).
# Creating a list to store our cleaned data
android_clean = []
# Creating a help list to store the apps names
already_added = []
# Looping throught the data to clean it
for app in android_data[1:]:
name = app[0]
n_reviews = float(app[3])
if n_reviews == reviews_max[name] and name not in already_added:
android_clean.append(app)
already_added.append(name)
print("Size of android_clean list:", len(android_clean))
Size of android_clean list: 9659
Using a similar approach we are going to insert the unique ids in the list unique
and the repeated ones in the duplicated
. Then we'll see the length of them.
# Creating the lists
unique = []
duplicated = []
# Populating them with the unique and duplicated values
for app in ios_data:
app_id = app[0]
if app_id in unique:
duplicated.append(app_id)
else:
unique.append(app_id)
# Checking to see how it looks like
print("Unique values: {} \nDuplicated values: {}".format(len(unique), len(duplicated)))
Unique values: 7198 Duplicated values: 0
As we can see there are no duplicated values in this dataset. But other issue comes, when we think that our company wants to develop an app to the English-speaking audience, we need to remove the apps that are not in English. And in both datasets we see this type of entry.
To see that this is a true issue, let's look at these examples bellow.
# Printing non-english apps in the ios database
print("Apple Store non-english apps:\n \n{} \n{}\n".format(ios_data[814][1], ios_data[6732][1]))
# Printing non-english apps in the android database
print("\nGoogle Play Store non-english apps:\n \n{} \n{}".format(android_clean[4412][0],android_clean[7940][0]))
Apple Store non-english apps: 爱奇艺PPS -《欢乐颂2》电视剧热播 【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き&ブロックパズル〜 Google Play Store non-english apps: 中国語 AQリスニング لعبة تقدر تربح DZ
To do so, we need to remove all the apps that have characters that are not contained in English language. We can know that by the index of every letter (this is found with the ord()
built-in function), the characters from the English alphabet are in the range of 0 to 127 (according to the ASCII).
With that in mind we are going to create a function that sees if the app name has non-English characters.
# Loops over the chars and see if they have any non-english characters
def english_char(a_string):
for char in a_string:
if ord(char) > 127:
return False
return True
# Let's check some values
print("'Instagram' is ensglish-language app.\t\t\t\t", english_char('Instagram'))
print("'爱奇艺PPS -《欢乐颂2》电视剧热播' is an english-language app.\t", english_char('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print("'Docs To Go™ Free Office Suite' is an english app.\t\t", english_char('Docs To Go™ Free Office Suite'))
print("'Instachat 😜' is an english-language app.\t\t\t", english_char('Instachat 😜'))
'Instagram' is ensglish-language app. True '爱奇艺PPS -《欢乐颂2》电视剧热播' is an english-language app. False 'Docs To Go™ Free Office Suite' is an english app. False 'Instachat 😜' is an english-language app. False
The reason that our function returns 'Instachat 😜' and 'Docs To Go™ Free Office Suite' as non-English-language apps is because emojis and characters like ™
fall outside the ASCII range and have numbers over 127.
We cannot exclude this values from our dataset, since it is really useful. To fix it in some way, we will allow our strings to have maximum 3 special characters. Bellow we will rewrite our function to do that.
# Loops over the chars and see if they have more than 3 non-english charachters
def english_char(a_string):
special_char = 0
for char in a_string:
if ord(char) > 127:
special_char += 1
if special_char >= 3:
return False
else:
return True
# Let's check some values
print("'Instagram' is ensglish-language app.\t\t\t\t", english_char('Instagram'))
print("'爱奇艺PPS -《欢乐颂2》电视剧热播' is an english-language app.\t", english_char('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print("'Docs To Go™ Free Office Suite' is an english app.\t\t", english_char('Docs To Go™ Free Office Suite'))
print("'Instachat 😜' is an english-language app.\t\t\t", english_char('Instachat 😜'))
'Instagram' is ensglish-language app. True '爱奇艺PPS -《欢乐颂2》电视剧热播' is an english-language app. False 'Docs To Go™ Free Office Suite' is an english app. True 'Instachat 😜' is an english-language app. True
Now we'll use our new function to filter the apps that are in English language and make our dataset cleaner.
# Cleaning the android data
play_store_data = []
for app in android_clean[1:]:
name = app[0]
if name not in android_data:
if english_char(name) == True:
play_store_data.append(app)
a = len(android_clean)
b = len(play_store_data)
print("The new clean dataset has {} entries and the older had {}, {} entries were removed.".format(b, a, a - b))
The new clean dataset has 9596 entries and the older had 9659, 63 entries were removed.
# Cleaning the ios data
app_store_data = []
for app in ios_data[1:]:
app_id = app[0]
if app_id not in app_store_data:
if english_char(app_id) == True:
app_store_data.append(app)
a = len(ios_data)
b = len(app_store_data)
print("The new clean dataset has {} entries and the older had {}, {} entries were removed.".format(b, a, a - b))
The new clean dataset has 7197 entries and the older had 7198, 1 entries were removed.
Now that we made it we have just another thing to do. Since our company wants to create a free app with ads in it, we need to separate the free from the non-free apps. That's the last step of our data cleaning process.
First we will check in which columns the prices are in both datasets.
print(columns_android)
print('\n')
print(columns_ios)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
Looking at it we can see we need the 8th column of android data and the 5th of the ios data. Knowing that we will start the separation.
# Creating lists to store the free apps
free_android_apps = []
free_ios_apps = []
for app in play_store_data:
price = float(app[7].replace('$',''))
if price == 0.00:
free_android_apps.append(app)
for app in app_store_data:
price = float(app[4].replace('$',''))
if price == 0.00:
free_ios_apps.append(app)
print('number of free android apps:',len(free_android_apps))
print('number of all android apps:', len(play_store_data))
print('\nnumber of free ios apps:',len(free_ios_apps))
print('number of all ios apps:', len(app_store_data))
number of free android apps: 8847 number of all android apps: 9596 number of free ios apps: 4056 number of all ios apps: 7197
Thinking about the primary objective (launch and app in App Store and Play Store), we need to evaluate both platforms. To minimize the risk the validation strategy for an app idea is comprised of three steps (as said in our DataQuest course):
To do so, we are going to use the Genres
column (index 9) and Category
column (index 1) in Google Play Store dataset and the prime_genre
column (index 11) in our App Store dataset.
To analyze it this way, we are going to make two functions: one to make the frequency tables with percentages and another to put the percentages in descending order.
# Making a function that creates frequency tables that show percentages
def freq_table(dataset, index):
frequency_table = {}
total = len(dataset)
for row in dataset:
entry = row[index]
if entry in frequency_table:
frequency_table[entry] += 1
else:
frequency_table[entry] = 1
for key in frequency_table:
frequency_table[key] /= total/100
frequency_table[key] = round(frequency_table[key], 4)
return frequency_table
# Let's see if it worked printing the first entries of the dictionary (it's kind of tricky)
print(list((freq_table(free_android_apps,1)).items())[:5])
[('ART_AND_DESIGN', 0.633), ('AUTO_AND_VEHICLES', 0.9269), ('BEAUTY', 0.5991), ('BOOKS_AND_REFERENCE', 2.1363), ('BUSINESS', 4.6004)]
Above we have created the dictionary and it worked. To slice it we've printed a slice of a list made with its items (dictionaries cannot be sliced), we didn't change the dictionary, we just printed it in a different way. Let's do the same thing with the App Store data.
# Making a function that displays the percentages in a descending order
# This function was already made by DataQuest, they claimed that this is easier with other resources
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
# Display table for free_android_apps genres
display_table(free_android_apps, 9)
Tools : 8.4435 Entertainment : 6.0812 Education : 5.3577 Business : 4.6004 Productivity : 3.8996 Lifestyle : 3.877 Finance : 3.7075 Medical : 3.5379 Sports : 3.4588 Personalization : 3.3232 Communication : 3.2327 Action : 3.0971 Health & Fitness : 3.0858 Photography : 2.9502 News & Magazines : 2.8032 Social : 2.6676 Travel & Local : 2.3285 Shopping : 2.2494 Books & Reference : 2.1363 Simulation : 2.0459 Dating : 1.865 Arcade : 1.8424 Video Players & Editors : 1.7746 Casual : 1.7633 Maps & Navigation : 1.3903 Food & Drink : 1.2434 Puzzle : 1.1303 Racing : 0.9947 Role Playing : 0.9382 Libraries & Demo : 0.9382 Auto & Vehicles : 0.9269 Strategy : 0.9156 House & Home : 0.8025 Weather : 0.7912 Events : 0.7121 Adventure : 0.6669 Comics : 0.5991 Beauty : 0.5991 Art & Design : 0.5878 Parenting : 0.4973 Card : 0.4521 Trivia : 0.4182 Casino : 0.4182 Educational;Education : 0.3956 Board : 0.3843 Educational : 0.373 Education;Education : 0.3391 Word : 0.26 Casual;Pretend Play : 0.2374 Music : 0.2035 Racing;Action & Adventure : 0.1695 Puzzle;Brain Games : 0.1695 Entertainment;Music & Video : 0.1695 Casual;Brain Games : 0.1356 Casual;Action & Adventure : 0.1356 Arcade;Action & Adventure : 0.1243 Action;Action & Adventure : 0.1017 Educational;Pretend Play : 0.0904 Simulation;Action & Adventure : 0.0791 Parenting;Education : 0.0791 Entertainment;Brain Games : 0.0791 Board;Brain Games : 0.0791 Parenting;Music & Video : 0.0678 Educational;Brain Games : 0.0678 Casual;Creativity : 0.0678 Art & Design;Creativity : 0.0678 Education;Pretend Play : 0.0565 Role Playing;Pretend Play : 0.0452 Education;Creativity : 0.0452 Role Playing;Action & Adventure : 0.0339 Puzzle;Action & Adventure : 0.0339 Entertainment;Creativity : 0.0339 Entertainment;Action & Adventure : 0.0339 Educational;Creativity : 0.0339 Educational;Action & Adventure : 0.0339 Education;Music & Video : 0.0339 Education;Brain Games : 0.0339 Education;Action & Adventure : 0.0339 Adventure;Action & Adventure : 0.0339 Video Players & Editors;Music & Video : 0.0226 Sports;Action & Adventure : 0.0226 Simulation;Pretend Play : 0.0226 Puzzle;Creativity : 0.0226 Music;Music & Video : 0.0226 Entertainment;Pretend Play : 0.0226 Casual;Education : 0.0226 Board;Action & Adventure : 0.0226 Video Players & Editors;Creativity : 0.0113 Trivia;Education : 0.0113 Travel & Local;Action & Adventure : 0.0113 Tools;Education : 0.0113 Strategy;Education : 0.0113 Strategy;Creativity : 0.0113 Strategy;Action & Adventure : 0.0113 Simulation;Education : 0.0113 Role Playing;Brain Games : 0.0113 Racing;Pretend Play : 0.0113 Puzzle;Education : 0.0113 Parenting;Brain Games : 0.0113 Music & Audio;Music & Video : 0.0113 Lifestyle;Pretend Play : 0.0113 Lifestyle;Education : 0.0113 Health & Fitness;Education : 0.0113 Health & Fitness;Action & Adventure : 0.0113 Entertainment;Education : 0.0113 Communication;Creativity : 0.0113 Comics;Creativity : 0.0113 Casual;Music & Video : 0.0113 Card;Action & Adventure : 0.0113 Books & Reference;Education : 0.0113 Art & Design;Pretend Play : 0.0113 Art & Design;Action & Adventure : 0.0113 Arcade;Pretend Play : 0.0113 Adventure;Education : 0.0113
# Display table for free_android_apps categories
display_table(free_android_apps, 1)
FAMILY : 18.9443 GAME : 9.6982 TOOLS : 8.4548 BUSINESS : 4.6004 PRODUCTIVITY : 3.8996 LIFESTYLE : 3.8883 FINANCE : 3.7075 MEDICAL : 3.5379 SPORTS : 3.391 PERSONALIZATION : 3.3232 COMMUNICATION : 3.2327 HEALTH_AND_FITNESS : 3.0858 PHOTOGRAPHY : 2.9502 NEWS_AND_MAGAZINES : 2.8032 SOCIAL : 2.6676 TRAVEL_AND_LOCAL : 2.3398 SHOPPING : 2.2494 BOOKS_AND_REFERENCE : 2.1363 DATING : 1.865 VIDEO_PLAYERS : 1.7972 MAPS_AND_NAVIGATION : 1.3903 FOOD_AND_DRINK : 1.2434 EDUCATION : 1.1642 ENTERTAINMENT : 0.9608 LIBRARIES_AND_DEMO : 0.9382 AUTO_AND_VEHICLES : 0.9269 HOUSE_AND_HOME : 0.8025 WEATHER : 0.7912 EVENTS : 0.7121 PARENTING : 0.6556 ART_AND_DESIGN : 0.633 COMICS : 0.6104 BEAUTY : 0.5991
# Display table for free_ios_apps prime_genres
display_table(free_ios_apps, 11)
Games : 55.646 Entertainment : 8.2347 Photo & Video : 4.1174 Social Networking : 3.5256 Education : 3.2544 Shopping : 2.9832 Utilities : 2.6874 Lifestyle : 2.3176 Finance : 2.071 Sports : 1.9477 Health & Fitness : 1.8738 Music : 1.6519 Book : 1.6272 Productivity : 1.5286 News : 1.43 Travel : 1.3807 Food & Drink : 1.0602 Weather : 0.7643 Reference : 0.4931 Navigation : 0.4931 Business : 0.4931 Catalogs : 0.2219 Medical : 0.1972
Relevant questions to answer about our data:
With that in mind we can see that App Store has a big amount of entertainment apps, on the other hand Play Store is more balanced, having productivity, education and business in the top 5. But this data isn't enough to set a strategy.
We need to know how many people have downloaded this apps.
The Play Store dataset has a column called Installs
(index 5) that gives us already this information. App Store dataset has not this information, the solution we have is to look at the total number of user ratings in the rating_count_tot
(index 5) , since we do not have any better option.
Next we are going to calculate the average number of user ratings per app genre in App Store.
# Making a frequency table for prime_genre
prime_genre_freq = freq_table(free_ios_apps, 11)
for genre in prime_genre_freq:
total = 0
len_genre = 0
for app in free_ios_apps:
genre_app = app[11]
if genre_app == genre:
n_ratings = float(app[5])
total += n_ratings
len_genre += 1
avg_ratings = round(total/len_genre, 4)
genre_ratings = [genre, avg_ratings]
print(genre, genre_ratings)
Social Networking ['Social Networking', 53078.1958] Photo & Video ['Photo & Video', 27249.8922] Games ['Games', 18924.689] Music ['Music', 56482.0299] Reference ['Reference', 67447.9] Health & Fitness ['Health & Fitness', 19952.3158] Weather ['Weather', 47220.9355] Utilities ['Utilities', 14010.1009] Travel ['Travel', 20216.0179] Shopping ['Shopping', 18746.6777] News ['News', 15892.7241] Navigation ['Navigation', 25972.05] Lifestyle ['Lifestyle', 8978.3085] Entertainment ['Entertainment', 10822.9611] Food & Drink ['Food & Drink', 20179.093] Sports ['Sports', 20128.9747] Book ['Book', 8498.3333] Finance ['Finance', 13522.2619] Education ['Education', 6266.3333] Productivity ['Productivity', 19053.8871] Business ['Business', 6367.8] Catalogs ['Catalogs', 1779.5556] Medical ['Medical', 459.75]
Now we need to evaluate the amount of installs of the apps from the Play Store. They do not have a exact number for each app, so we are going work with what we have. Once we take this values, we will treat them and print so we can analyze it.
# Making a frequency table for Category (PlayStore)
category_freq = freq_table(free_android_apps, 1)
for category in category_freq:
total = 0
len_category = 0
for app in free_android_apps:
category_app = app[1]
if category_app == category:
n_ratings = float((app[5].replace('+', '')).replace(',',''))
total += n_ratings
len_category += 1
avg_ratings = round(total/len_category, 4)
category_ratings = [category, avg_ratings]
print(category, avg_ratings)
ART_AND_DESIGN 2021626.7857 AUTO_AND_VEHICLES 647317.8171 BEAUTY 513151.8868 BOOKS_AND_REFERENCE 8814199.7884 BUSINESS 1712290.1474 COMICS 832613.8889 COMMUNICATION 38590581.0874 DATING 854028.8303 EDUCATION 1833495.1456 ENTERTAINMENT 11640705.8824 EVENTS 253542.2222 FINANCE 1387692.4756 FOOD_AND_DRINK 1924897.7364 HEALTH_AND_FITNESS 4188821.9853 HOUSE_AND_HOME 1360598.0423 LIBRARIES_AND_DEMO 638503.7349 LIFESTYLE 1446158.2238 GAME 15544014.5105 FAMILY 3695641.8198 MEDICAL 120550.6198 SOCIAL 23253652.1271 SHOPPING 7036877.3116 PHOTOGRAPHY 17840110.4023 SPORTS 3650602.2767 TRAVEL_AND_LOCAL 13984077.7101 TOOLS 10830251.9706 PERSONALIZATION 5201482.6122 PRODUCTIVITY 16787331.3449 PARENTING 542603.6207 WEATHER 5145550.2857 VIDEO_PLAYERS 24727872.4528 NEWS_AND_MAGAZINES 9549178.4677 MAPS_AND_NAVIGATION 4049274.6341
# Making a frequency table for Category/Genre column (PlayStore)
genre_freq = freq_table(free_android_apps, 9)
for genre in genre_freq:
total = 0
len_genre = 0
for app in free_android_apps:
genre_app = app[9]
if genre_app == genre:
n_ratings = float((app[5].replace('+', '')).replace(',',''))
total += n_ratings
len_genre += 1
avg_ratings = round(total/len_genre, 4)
genre_ratings = [genre, avg_ratings]
print(genre, avg_ratings)
Art & Design 2163482.6923 Art & Design;Creativity 285000.0 Auto & Vehicles 647317.8171 Beauty 513151.8868 Books & Reference 8814199.7884 Business 1712290.1474 Comics 847380.1887 Comics;Creativity 50000.0 Communication 38590581.0874 Dating 854028.8303 Education 550185.443 Education;Creativity 2875000.0 Education;Education 4759517.0 Education;Pretend Play 1800000.0 Education;Brain Games 5333333.3333 Entertainment 5602792.7751 Entertainment;Brain Games 3314285.7143 Entertainment;Creativity 4000000.0 Entertainment;Music & Video 6413333.3333 Events 253542.2222 Finance 1387692.4756 Food & Drink 1924897.7364 Health & Fitness 4188821.9853 House & Home 1360598.0423 Libraries & Demo 638503.7349 Lifestyle 1421219.9096 Lifestyle;Pretend Play 10000000.0 Card 3815462.5 Arcade 23028171.411 Puzzle 8302861.91 Racing 15910645.6818 Sports 4611701.5784 Casual 19569221.6026 Simulation 3475484.0884 Adventure 4158764.7458 Trivia 3475712.7027 Action 12467105.6204 Word 9094458.6957 Role Playing 3965645.4217 Strategy 11199902.5309 Board 4759209.1176 Music 9445583.3333 Action;Action & Adventure 5888888.8889 Casual;Brain Games 1425916.6667 Educational;Creativity 2333333.3333 Puzzle;Brain Games 9280666.6667 Educational;Education 1737143.1429 Casual;Pretend Play 6957142.8571 Educational;Brain Games 4433333.3333 Art & Design;Pretend Play 500000.0 Educational;Pretend Play 9375000.0 Entertainment;Education 1000000.0 Casual;Education 1000000.0 Casual;Creativity 5333333.3333 Casual;Action & Adventure 12916666.6667 Music;Music & Video 5050000.0 Arcade;Pretend Play 1000000.0 Adventure;Action & Adventure 35333333.3333 Role Playing;Action & Adventure 7000000.0 Simulation;Pretend Play 550000.0 Puzzle;Creativity 750000.0 Simulation;Action & Adventure 4857142.8571 Racing;Action & Adventure 8816666.6667 Sports;Action & Adventure 5050000.0 Educational;Action & Adventure 17016666.6667 Arcade;Action & Adventure 3190909.1818 Entertainment;Action & Adventure 2333333.3333 Art & Design;Action & Adventure 100000.0 Puzzle;Action & Adventure 18366666.6667 Education;Action & Adventure 1000000.0 Strategy;Action & Adventure 1000000.0 Music & Audio;Music & Video 500000.0 Health & Fitness;Education 100000.0 Board;Action & Adventure 3000000.0 Board;Brain Games 407142.8571 Casual;Music & Video 10000000.0 Education;Music & Video 2033333.3333 Role Playing;Pretend Play 5275000.0 Entertainment;Pretend Play 3000000.0 Medical 120550.6198 Social 23253652.1271 Shopping 7036877.3116 Photography 17840110.4023 Travel & Local 14051476.1456 Travel & Local;Action & Adventure 100000.0 Tools 10831363.419 Tools;Education 10000000.0 Personalization 5201482.6122 Productivity 16787331.3449 Parenting 467977.5 Parenting;Music & Video 1118333.3333 Parenting;Education 452857.1429 Parenting;Brain Games 1000000.0 Weather 5145550.2857 Video Players & Editors 24947335.7962 Video Players & Editors;Music & Video 7500000.0 Video Players & Editors;Creativity 5000000.0 News & Magazines 9549178.4677 Maps & Navigation 4049274.6341 Health & Fitness;Action & Adventure 1000000.0 Educational 411184.8485 Casino 3520421.6216 Trivia;Education 100.0 Lifestyle;Education 100000.0 Card;Action & Adventure 10000000.0 Books & Reference;Education 1000.0 Simulation;Education 500.0 Puzzle;Education 100000.0 Adventure;Education 10000000.0 Role Playing;Brain Games 10000000.0 Strategy;Education 500000.0 Racing;Pretend Play 1000000.0 Communication;Creativity 500000.0 Strategy;Creativity 1000000.0
To refine our decision and make it more tangible, lets see how the data looks like.
# Making a list with all book and reference android apps
books_android = []
for app in free_android_apps:
category = app[1]
if category == 'BOOKS_AND_REFERENCE':
books_android.append(app)
# Making a list with all music android apps
music_android = []
for app in free_android_apps:
genre = app[9]
if 'Music' in genre:
music_android.append(app)
# Making a list with all book and reference ios apps
books_ios = []
for app in free_ios_apps:
category = app[11]
if category == 'Reference':
books_ios.append(app)
elif category == 'Book':
books_ios.append(app)
# Making a list with all music ios apps
music_ios = []
for app in free_ios_apps:
category = app[11]
if category == 'Music':
music_ios.append(app)
print("Length of books_android (and references):", len(books_android))
print("Length of music_android:", len(music_android))
print('\n')
print("Length of books_ios (and references):", len(books_ios))
print("Length of music_ios:", len(music_ios))
Length of books_android (and references): 189 Length of music_android: 48 Length of books_ios (and references): 86 Length of music_ios: 67
Now let's make a table with the data to see how it looks like.
Market | Category | Total Amount of Apps | Total Amount of Downloads | Avg. Downloads per App |
---|---|---|---|---|
Google Play Store | Books & Reference | 189 | 8,814,199 | 46,635.97 |
App Store | Books & Reference | 86 | 1,909,848 | 22,207.53 |
Google Play Store | Music | 48 | 5,550,000 | 115,625 |
App Store | Music | 67 | 37,842,296 | 564,810.38 |
Looking at the data we can see that (keeping in mind that Average Downloads per App is an estimate and works not actually that way) a good idea would be a Music App, since the amount of apps is relatively small (48 and 67) and receive a big amount of downloads. Creating a interesting Music app with an efficient and not annoying in-app add system would make it a real success.