Source: Unsplash (Rami Al-zayat)
With the explosive growth of the smartphone over the last decade there has been a significant surge in app development. If you've thought of it, there is most likely an app for it.
Competition has forced developers to deliver apps that are not only highly functional but also provide basic features for free. Provision of the latter is of significant importance unless the app targets a niche market (of which almost none exists) or tailored for the use of specific devices. This forces companies to create apps whose main source of revenue would be ads
The goal of this project is to analyze data from Google's Play store and Apple's Apps Store and identify app profiles that:
The expected end result is to identify those app classes that meet the above goals and consider the same for future app development.
Reading in the data from the data files
The focus for this project will be apps on Google's Playstore and Apple's App Store. As of 2018 the Google Play Store had more than 2.1 million apps while the App Store had about 2 million apps. Because of the volume of data and considering the fact that this is a learning project, the data sets we will be using consider a sample of 10000 apps for analysis. The links to the original data sets can be found below.
#Read the data from the sample datasets
from csv import reader
#App store data
opened_file = open('AppleStore.csv',encoding="utf-8")
read_file = reader(opened_file)
apple_list = list(read_file)
#Playstore data
opened_file = open('googleplaystore.csv', encoding="utf-8")
read_file = reader(opened_file)
google_list = list(read_file)
def explore_data(dataset, start, end, rows_and_columns=False):
"""
Helps with quick analysis of playstore and app store data
by displaying the data slice specified by user along with the column name.
Args:
dataset (list): Data the user wants to analyse
start (int): Start row of the data slice
end (int): End row of the data slice
rows_and_columns (boolean): If True, prints the number of rows and columns associated to the slice
"""
dataset_slice = dataset[start:end]
print(dataset[0])
print('\n')
for row in dataset_slice:
if dataset_slice.index(row)!=0:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns == True:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
explore_data(apple_list, 0, 4, True)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] Number of rows: 7198 Number of columns: 16
explore_data(google_list, 0, 4, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] Number of rows: 10842 Number of columns: 13
Based on the above listing there are many columns from both stores that could help in our analysis. A more detailed description has been provided in the Column tab of the pages whose links have been provided at the beginning of this section.
Filling in missing data and removing duplicates
Before proceeding to data analysis, it has to be ensured that the data is relevant and is accurate. On the point of relevancy, it has to be noted that analysis will be focussed on apps targeted towards the English speaking audience and that the apps must be free.
Both the data sets have a discussion section.
The discussions here should give some idea of issues found by others in specific areas.
One of the discussions in the Playstore data mentions that data associated to app Rating is missing thus causing the other columns to shift to the left. Further analysis is required to determine whether the problem exists as mentioned.
explore_data(google_list, 10472, 10474)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
As mentioned there is a value missing, however further analysis reveals that the data missing is the category to which the app belongs to. A quick check on google reveals that the category is Lifestyle. The value can be inserted.
google_list[10473] = ['Life Made WI-Fi Touchscreen Photo Frame', 'Lifestyle', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
In order to ensure that the App store data does not have any such issues we could verify that information for each app has the same set of columns as the header.
#Verify the App store data for missing information
row_with_missing_column = []
for each_app in apple_list[1:]:
if len(each_app) != len(apple_list[0]):
row_with_missing_colum.append(apple_list.index(each_app))
if len(row_with_missing_column) == 0:
print("No rows with missing columns")
else:
print(row_with_missing_column)
No rows with missing columns
That makes it clear that every row in the App Store data has all the required columns. Further reading of the discussions in the Gooogle playstore dataset reveals that there are multiple instances of duplicate data. Before taking any action for the same. It is essential to verify the degree by which this issue affects the dataset.
def print_list(a_list):
"""
Prints each row in a list.
Args:
a_list (list): The list to be printed
"""
for row in a_list:
print(row)
def bold_print(a_string,a_value=None):
"""
Boldens the output
Args:
a_string (string): String to be bolded
"""
print("\033[1m"+a_string+"\033[0m"+'\n')
unique_apps = []
duplicate_apps = []
for an_app in google_list[1:]:
app_name = an_app[0]
if app_name in unique_apps:
duplicate_apps.append(app_name)
else:
unique_apps.append(app_name)
print("Number of instances of duplicate apps:",len(duplicate_apps))
Number of instances of duplicate apps: 1181
print("\033[1m"+"Some of the apps with multiple entries:"+"\033[0m")
print_list(duplicate_apps[:20])
Some of the apps with multiple entries:
Quick PDF Scanner + OCR FREE
Box
Google My Business
ZOOM Cloud Meetings
join.me - Simple Meetings
Box
Zenefits
Google Ads
Google My Business
Slack
FreshBooks Classic
Insightly CRM
QuickBooks Accounting: Invoicing & Expenses
HipChat - Chat Built for Teams
Xero Accounting Software
MailChimp - Email, Marketing Automation
Crew - Free Messaging and Scheduling
Asana: organize team projects
Google Analytics
AdWords Express
bold_print("Example of an app with duplicate records:")
print(google_list[0])
for apps in google_list[1:]:
name = apps[0]
if name == 'Box':
print(apps)
Example of an app with duplicate records:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
As can be seen above there are multiple instances for some apps. In order to remove the duplicate instances a criteria that could be considered is the instance with the highest count of comments.
An app with a large number of comments is a sign that the app is in use. So the app instance with the most comments for an app will be considered.
#Identify the number of reviews each app in the Playstore dataset
reviews_max = {}
for app in google_list[1:]:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name]<n_reviews:
reviews_max[name] = n_reviews
if name not in reviews_max:
reviews_max[name] = n_reviews
bold_print("Some of the apps and their review counts:")
print_list(list(reviews_max.items())[:10])
Some of the apps and their review counts:
('Photo Editor & Candy Camera & Grid & ScrapBook', 159.0)
('Coloring book moana', 974.0)
('U Launcher Lite – FREE Live Cool Themes, Hide Apps', 87510.0)
('Sketch - Draw & Paint', 215644.0)
('Pixel Draw - Number Art Coloring Book', 967.0)
('Paper flowers instructions', 167.0)
('Smoke Effect Photo Maker - Smoke Editor', 178.0)
('Infinite Painter', 36815.0)
('Garden Coloring Book', 13791.0)
('Kids Paint Free - Drawing Fun', 121.0)
Now that we have a list containing highest number of comments for each app it becomes easier to select a single app instance for the apps with the multiple entries.
#Remove all instances of the same app that do not have the most reviews
android_clean = []
already_added = []
for row in google_list[1:]:
name = row[0]
n_reviews = float(row[3])
if reviews_max[name] == n_reviews and name not in already_added:
android_clean.append(row)
already_added.append(name)
bold_print("Duplicates removed.")
Duplicates removed.
To verify that we have a clean list without multiple instances for the same app. We could run a count of the multiplicate over the cleaned set.
#Verify that there are no apps with multiple instances in the Playstore dataset
unique_apps = []
duplicate_apps = []
for an_app in android_clean[1:]:
app_name = an_app[0]
if app_name in unique_apps:
duplicate_apps.append(app_name)
else:
unique_apps.append(app_name)
print("Number of instances of duplicate apps:",len(duplicate_apps))
Number of instances of duplicate apps: 0
Seeing as we have cleaned the android set it would be beneficial to find out whether the apple data set suffers from such duplicates.
#Verify the App store dataset for existence of multiple entries
unique_apps = []
duplicate_apps = []
for an_app in apple_list[1:]:
app_name = an_app[0]
if app_name in unique_apps:
duplicate_apps.append(app_name)
else:
unique_apps.append(app_name)
print("Number of instances of duplicate apps:",len(duplicate_apps))
Number of instances of duplicate apps: 0
The App store dataset does not seem to have app data with multiple entries.
Filtering out apps focused on english speaking audience
As mentioned earlier our analysis is on apps that are focused on an English speaking audience. There are many apps in this list that are for audiences of other languages. We first need to identify those non-English apps.
def is_english(app_name):
"""
Identify whether an app is English language based
Args:
app_name (string): Name of the app to be verified
Returns:
check (boolean): Indicates whether app name is English or Non-English
"""
count = 0
check = True
for letter in app_name:
#ord() returns integer value for a Unicode charachter
if ord(letter)>127:
count+=1
#There are many apps with TM and smiley faces.
#This includes those apps as well
if count>3:
check = False
return check
#Verify the is_english() function
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
True False True True
#Identify and filter the number of english apps in the Playstore dataset
english_apps = []
non_english_apps = []
for row in android_clean:
if is_english(row[0]):
english_apps.append(row)
else:
non_english_apps.append(row[0])
print("The number of english apps in the Playstore:",len(english_apps))
print("The number of non-english apps in the Playstore:",len(non_english_apps))
The number of english apps in the Playstore: 9615 The number of non-english apps in the Playstore: 45
We could repeat the same exercise for apps associated to the App Store and attempt to eliminate the non-enligh apps from the data set therein.
#Identify and filter the number of english apps in the App store dataset
appl_english_apps = []
appl_non_english_apps = []
for row in apple_list:
if is_english(row[1]):
appl_english_apps.append(row)
else:
appl_non_english_apps.append(row[0])
print("The number of App store english apps:",len(appl_english_apps))
print("The number of App store non-english apps:",len(appl_non_english_apps))
The number of App store english apps: 6184 The number of App store non-english apps: 1014
Filtering out free english apps
The goal is to identify apps that are both for an english speaking audience and are free. This requires that we remove paid apps from the data sets given. The seventh column of each row identifies the price of an app in the google data set. to identify the free apps, only those apps whose price is 0 will be considered.
#Identify and filter free english apps from the Playstore dataset
free_apps = []
priced_apps = []
for app in english_apps:
if app[7] != '0':
priced_apps.append(app)
else:
free_apps.append(app)
print("The number of free english apps in the android data set:",len(free_apps))
print("The number of paid english apps in the android data set:",len(priced_apps))
The number of free english apps in the android data set: 8865 The number of paid english apps in the android data set: 750
We could have the App Store data set go through the same filtering process to identify free apps in the App store.
#Identify and filter free english apps from the App store dataset
appl_free_apps = []
appl_priced_apps = []
for app in appl_english_apps:
if app[4] != '0.0':
appl_priced_apps.append(app)
else:
appl_free_apps.append(app)
print("The number of free english apps in the app store data set:",len(appl_free_apps))
print("The number of paid english apps in the app store data set:",len(appl_priced_apps))
The number of free english apps in the app store data set: 3222 The number of paid english apps in the app store data set: 2962
Since we now have a clean data set for both apps from the App Store and Google play. The next step would be to identify how we could use the data to identify a class of apps that would help generate revenue.
After discussions with certain stakeholders the planned strategy for development is as follows.
Since the planned app is meant for both the Playstore and App store we need to identify which app types are popular in both stores.
Apps in both stores belong to a genre. Apps in the Play store also have a Category in addition to a Genre. We could consolidate apps by genre and identify which apps genres have the most apps. This would help to identify app share by genre in both stores.
def freq_table(dataset, index):
"""
Calculate relative frequecy of a each value in a column in a dataset
Args:
dataset (list): List of lists containing the data
index (int): Column for which the relative frequencies must be generated
Returns:
temp_dict (dictionary): Relative frequency of each value of the column supplied as input
"""
#Create Frequency table
temp_dict = {}
for row in dataset[1:]:
value_in_column = row[index]
if value_in_column in temp_dict:
temp_dict[value_in_column]+=1
else:
temp_dict[value_in_column]=1
#Sum the frequencies for all genres
sum_of_freq = sum(temp_dict.values())
# Assign percentage values to identify share of each genre in app store
for row in temp_dict:
temp_dict[row] = round(((temp_dict[row] / sum_of_freq) * 100),2)
return temp_dict
#Function to display the values in the above dictionary in descending order of percentage using tuples(Dictionaries can be sorted only by keys)
import pandas as pd
def display_table(dataset, index):
"""
Converts a list in to a dataframe which enables easier processing of data
Args:
dataset (list): List of lists containing the data
index (int): Column for which the relative frequencies must be generated
Returns:
percentage_table_df (dataframe): Contains the values in the column specified by the user
"""
table = freq_table(dataset, index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
percentage_table = []
for entry in table_sorted:
percentage_table.append([entry[1],entry[0]])
percentage_table_df = pd.DataFrame(percentage_table, columns = ["app_group","percentage"])
return percentage_table_df
Using the functions created above it is possible to summarize the prime_genre column of the apple data set and the Genre and Category columns of the filtered Google data set.
#Percentage of apps by genre in the filtered App store data set
appl_percent = display_table(appl_free_apps, 11)
google_percent = display_table(free_apps, 1)
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (13,13))
ax1 = fig.add_subplot(121)
ax1.set_title("Apps by Genre in App Store", size=16)
ax1.pie(x = appl_percent["percentage"],
labels = appl_percent["app_group"].str.lower(),
rotatelabels=True)
ax2 = fig.add_subplot(1,2,2)
ax2.set_title("Apps by Genre in Playstore", size=16)
ax2.pie(x = google_percent["percentage"],
labels = google_percent["app_group"].str.lower(),
rotatelabels=True)
plt.tight_layout()
plt.show()
A significant percentage of apps in the App Store are associated to the Games genre. However this cannot be considered as the most sought after genre of apps as this only considers a filtered list of free apps in English. However if a generalization is to be considered, apps associated to fun (i.e. Games, Entertainment, Photo & Video) have a larger share of the pie.
Contrast this to the Playstore data and what is immediately noticed is that most categories have an almost equal share in the lot. However, it must be noted that here too the Family and Game genres have a clear lead. What's more interesting is that that apps that belong to the Family genre are mostly games aimed towards children as can be seen below.
This clearly gives the Games genre slightly more than a quarter of the share of genres. However, it must be noted that non-English free apps were eliminated so the Game genre cannot conclusively have a full lead.
bold_print("Some apps in the FAMILY Category of PlayStore data")
count = 0
print('\033[4m'+"NAME"+'\033[0m'+":"+'\033[4m'+"CATEGORY"+'\033[0m')
for an_app in free_apps:
if (an_app[1] == 'FAMILY')and (count<10):
print(an_app[0],":",an_app[1])
count+=1
Some apps in the FAMILY Category of PlayStore data NAME:CATEGORY Jewels Crush- Match 3 Puzzle : FAMILY Coloring & Learn : FAMILY Mahjong : FAMILY Super ABC! Learning games for kids! Preschool apps : FAMILY Toy Pop Cubes : FAMILY Educational Games 4 Kids : FAMILY Candy Pop Story : FAMILY Princess Coloring Book : FAMILY Hello Kitty Nail Salon : FAMILY Candy Smash : FAMILY
It must be noted at this point that the Genre column of the Playstore dataset was not used because the column's purpose is primarily to display the sub-category of an app. Most apps have the same Category and Genre. However apps that belong to the Family and Games Category can belong to different Genres as shown below.
Since we are not analysing games separately, this column can be ignored.
bold_print("Apps in the GAME and FAMILY Category of PlayStore data")
count = 0
print('\033[4m'+"NAME"+'\033[0m'+":"+'\033[4m'+"CATEGORY"+'\033[0m'+":"+'\033[4m'+"GENRE"+'\033[0m')
for an_app in free_apps:
if (an_app[1] == 'FAMILY' or an_app[1] == 'GAME') and (count<30):
print(an_app[0],":",an_app[1],":",an_app[-4])
count+=1
Apps in the GAME and FAMILY Category of PlayStore data NAME:CATEGORY:GENRE Solitaire : GAME : Card Sonic Dash : GAME : Arcade PAC-MAN : GAME : Arcade Bubble Witch 3 Saga : GAME : Puzzle Race the Traffic Moto : GAME : Racing Marble - Temple Quest : GAME : Puzzle Shooting King : GAME : Sports Geometry Dash World : GAME : Arcade Jungle Marble Blast : GAME : Casual Roll the Ball® - slide puzzle : GAME : Puzzle Block Craft 3D: Building Simulator Games For Free : GAME : Simulation Farm Fruit Pop: Party Time : GAME : Casual Love Balls : GAME : Puzzle Piano Tiles 2™ : GAME : Arcade Pokémon GO : GAME : Adventure Paint Hit : GAME : Casual Snake VS Block : GAME : Arcade Rolly Vortex : GAME : Arcade Woody Puzzle : GAME : Puzzle Stack Jump : GAME : Arcade The Cube : GAME : Arcade Extreme Car Driving Simulator : GAME : Racing Bricks n Balls : GAME : Casual The Fish Master! : GAME : Arcade Color Road : GAME : Arcade Draw In : GAME : Arcade PLANK! : GAME : Arcade Looper! : GAME : Puzzle Trivia Crack : GAME : Trivia Will it Crush? : GAME : Simulation
The number of users per genre could be a more reliable parameter to assess the popularity of a genre. Since more users for a genre would mean more users for apps of that genre.
However to meet this end we have to consider a couple of adjustments to consider. These have been detailed below:
apple_user_rating = []
prime_genre_dict = freq_table(appl_free_apps, -5)
#Calculate the average user rating for each genre in the filtered App Store data
for genre in prime_genre_dict:
total = 0
len_genre = 0
for row in appl_free_apps[1:]:
genre_app = row[-5]
if genre_app == genre:
user_rating = float(row[5])
total+=user_rating
len_genre+=1
avg_user_rating = round((total/len_genre),2)
apple_user_rating.append([genre,avg_user_rating])
#Generating a Dataframe for a graph
apple_user_rating_df = pd.DataFrame(apple_user_rating, columns = ["genre","apple_avg_user_rating"])
apple_user_rating_df.sort_values(by = ["apple_avg_user_rating"],inplace = True)
apple_user_rating_df['genre'] = apple_user_rating_df['genre'].str.lower()
category_dict = freq_table(free_apps, 1)
google_user_rating = []
#Calculate the average user rating for each genre in the filtered Playstore data
for category in category_dict:
total = 0
len_category = 0
for row in free_apps[1:]:
category_app = category
if category_app == row[1]:
number_of_installs = row[5].replace('+', '')
number_of_installs = float(number_of_installs.replace(',', ''))
total+=number_of_installs
len_category+=1
google_user_rating.append([category,round((total/len_category),2)])
#Generating a Dataframe for a graph
google_user_rating_df = pd.DataFrame(google_user_rating, columns = ["genre","google_avg_user_downloads"])
google_user_rating_df.sort_values(by = ["google_avg_user_downloads"],inplace = True)
google_user_rating_df['genre'] = google_user_rating_df['genre'].str.lower().str.replace('_',' ').str.replace('and','&')
To make comparison easier, the average user ratings are represented in barcharts below.
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (12,12))
ax1 = fig.add_subplot(121)
ax1.set_title("User Ratings by Genre - App Store",size=16)
ax1.barh(width = apple_user_rating_df['apple_avg_user_rating'], y = apple_user_rating_df['genre'])
for key, values in ax1.spines.items():
if key!="top":
ax1.spines[key].set_visible(False)
ax1.xaxis.tick_top()
ax1.tick_params(left = False)
ax2 = fig.add_subplot(122)
ax2.set_title("User Downloads by Genre - PlayStore (million)",size=16)
ax2.barh(width = google_user_rating_df['google_avg_user_downloads']/1000000, y = google_user_rating_df['genre'],color = 'red')
for key, values in ax2.spines.items():
if key!="top":
ax2.spines[key].set_visible(False)
ax2.xaxis.tick_top()
ax2.tick_params(left = False)
plt.tight_layout()
plt.show()
What immediately comes to notice is that there are many genres that are exactly the same like productivity and finance and many others that seem to be the same but have something extra like navigation and maps & navigation.
Based on the assumption that both Google and Apple have almost similar logical definitions for their Categories and Genres respectively we have to assume that those genres with the exact name have apps that meet those descriptions. However apps in the Playstore dataset that have slightly different names, may not fit the App Store definition for the same app.
Consider the navigation and maps & navigation example below.
bold_print("Some apps from the Navigation genre of the App store")
for app in appl_free_apps:
if app[-5] == "Navigation":
print(app[1],":",app[5])
Some apps from the Navigation genre of the App store
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
bold_print("Some apps from the MAPS AND NAVIGATION genre of the App store")
count = 0
for app in free_apps:
if app[1] == "MAPS_AND_NAVIGATION" and count<30:
print(app[0],":",app[5])
count+=1
Some apps from the MAPS AND NAVIGATION genre of the App store
Waze - GPS, Maps, Traffic Alerts & Live Navigation : 100,000,000+
T map (te map, T map, navigation) : 5,000,000+
MapQuest: Directions, Maps, GPS & Navigation : 10,000,000+
Yahoo! transit guide free timetable, operation information, transfer search : 10,000,000+
乗換NAVITIME Timetable & Route Search in Japan Tokyo : 5,000,000+
Transit: Real-Time Transit App : 5,000,000+
Mapy.cz - Cycling & Hiking offline maps : 1,000,000+
Uber : 100,000,000+
GPS Navigation & Offline Maps Sygic : 50,000,000+
Map and Router Badge : 500,000+
Yandex.Transport : 10,000,000+
Air Traffic : 1,000,000+
Speed Cameras Radar : 1,000,000+
Atlan3D Navigation: Korea navigator : 1,000,000+
Compass : 10,000,000+
Mappy - Plan, route comparison, GPS : 1,000,000+
Gps Route Finder : 100,000+
My Location: GPS Maps, Share & Save Places : 5,000,000+
Yanosik: "antyradar", traffic jams, navigation, camera : 5,000,000+
NAVITIME - Map & Transfer Navi : 5,000,000+
Sygic Car Navigation : 5,000,000+
Czech Public Transport IDOS : 1,000,000+
Karta GPS - Offline Navigation : 1,000,000+
Circle ratio : 1,000,000+
Soviet Military Maps Free : 1,000,000+
Truck Car Navi by Navitime Large size car, traffic jam, traffic closure, live camera, typhoon / precipitation map : 100,000+
Sentin Information Map : 100,000+
Snapp : 1,000,000+
GPS Speedometer and Odometer : 1,000,000+
GPS Traffic Speedcam Route Planner by ViaMichelin : 5,000,000+
While the App Store apps are clearly navigation related the MAPS AND NAVIGATION category apps in Playstore include apps like Mapy.cz and Compass which are clearly more related to maps than actual navigation.
Based on this assumption, we will only consider genres that have the exact names and use the same to come to a conclusion.
The Finance genre is a good example to show that our assumption must be good. Data from both sets clearly highlight that most apps are directly finance related.
bold_print("Some apps from the Finance genre of the App store")
for app in appl_free_apps:
if app[-5] == "Finance":
print(app[1],":",app[5])
Some apps from the Finance genre of the App store
Chase Mobile℠ : 233270
Mint: Personal Finance, Budget, Bills & Money : 232940
Bank of America - Mobile Banking : 119773
PayPal - Send and request money safely : 119487
Credit Karma: Free Credit Scores, Reports & Alerts : 101679
Capital One Mobile : 56110
Citi Mobile® : 48822
Wells Fargo Mobile : 43064
Chase Mobile : 34322
Square Cash - Send Money for Free : 23775
Capital One for iPad : 21858
Venmo : 21090
USAA Mobile : 19946
TaxCaster – Free tax refund calculator : 17516
Amex Mobile : 11421
TurboTax Tax Return App - File 2016 income taxes : 9635
Bank of America - Mobile Banking for iPad : 7569
Wells Fargo for iPad : 2207
Stash Invest: Investing & Financial Education : 1655
Digit: Save Money Without Thinking About It : 1506
IRS2Go : 1329
Capital One CreditWise - Credit score and report : 1019
U by BB&T : 790
Paribus - Rebates When Prices Drop : 768
KeyBank Mobile : 623
VyStar Mobile Banking for iPhone : 434
Sparkasse - Your mobile branch : 77
VyStar Mobile Banking for iPad : 57
Zaim : 44
Ma Banque : 17
Lloyds Bank Mobile Banking : 17
Suica : 10
Halifax Mobile Banking : 8
La Banque Postale : 8
币优铺 : 0
Impots.gouv : 0
bold_print("Some apps from the Finance genre of the Playstore")
count = 0
for app in free_apps:
if app[1] == "FINANCE" and count<20:
print(app[0],":",app[5])
count+=1
Some apps from the Finance genre of the Playstore
K PLUS : 10,000,000+
ING Banking : 1,000,000+
Citibanamex Movil : 5,000,000+
The postal bank : 5,000,000+
KTB Netbank : 5,000,000+
Mobile Bancomer : 10,000,000+
Nedbank Money : 500,000+
SCB EASY : 5,000,000+
CASHIER : 10,000,000+
Rabo Banking : 1,000,000+
Capitec Remote Banking : 1,000,000+
Itau bank : 10,000,000+
Nubank : 5,000,000+
The Societe Generale App : 1,000,000+
IKO : 1,000,000+
Cash App : 10,000,000+
Standard Bank / Stanbic Bank : 1,000,000+
Bualuang mBanking : 5,000,000+
Intesa Sanpaolo Mobile : 1,000,000+
UBA Mobile Banking : 1,000,000+
Based on the above assumption we could compare genres on datasets and come up with a genre for which to develop an app.
merged_apple_google = pd.merge(apple_user_rating_df,google_user_rating_df,on = 'genre')
merged_apple_google.sort_values(by = 'apple_avg_user_rating', ascending = False)
genre | apple_avg_user_rating | google_avg_user_downloads | |
---|---|---|---|
12 | weather | 52279.89 | 5074486.20 |
4 | food & drink | 33333.92 | 1924897.74 |
3 | finance | 31467.94 | 1387692.48 |
10 | shopping | 26919.69 | 7036877.31 |
5 | health & fitness | 23298.02 | 4188821.99 |
11 | sports | 23008.90 | 3638640.14 |
9 | productivity | 21028.41 | 16787331.34 |
6 | lifestyle | 16485.76 | 1437816.27 |
7 | lifestyle | 16485.76 | 1000.00 |
2 | entertainment | 14029.83 | 11640705.88 |
0 | business | 7491.12 | 1712290.15 |
1 | education | 7003.98 | 1833495.15 |
8 | medical | 612.00 | 120550.62 |
The top 5 genres with the most downloads are productivity, entertainment, shopping, weather and health & fitness.
The top 5 genres based on ratings are weather, food & drink, finance, shopping and health & fitness.
Based on the comparison above, apps in the Shopping genre are the clear favorites to take up for further development. They perform well in terms of average downloads in the Playstore and have a respectable rate in the App Store.
However, Productivity apps could also be given a strong consideration as our strategy warrants for App Store app development only if we see strong out comes in the Playstore.
While apps associated to Weather could be given some consideration I am unsure whether it is good enough to generate revenue. Unlikely Shopping and Productivity genres, its unlikely that the app will be opened multiple times.
However, this result must only be considered while keeping the following points in mind:
fig = plt.figure(figsize = (10,10))
plt.subplot(1,2,1)
plt.title("Free vs. Paid apps in Playstore",size=15)
plt.pie(x = [8865,750])
plt.legend(['Free apps','Paid apps'], loc = 'upper right')
plt.subplot(1,2,2)
plt.title("Free vs. Paid apps in App Store", size=15)
plt.pie(x = [3222,2962])
plt.legend(['Free apps','Paid apps'], loc = 'upper right')
plt.show()
The aim of this project was to identify an app profile that could help to generate revenue in both Google's Playstore and Apple's Appstore. The project began by cleaning up the associated datasets and analysing them. We attempted to identify a popular app profile by going over the most popular genres. However since the datasets were filtered to evaluate English language apps that were free, we switched to a strategy of analysing user preferences for each genre.
Based on the analysis we concluded that apps in the Shopping genre could prove to be a safe bet as user preference for the same is balanced in both stores. This result, however does come with a few caveats as it is built on an assumption.