Notebook

Profitable App Profiles for the App Store and Google Play Markets¶

In this project, we aim to find App profiles that are profitable in the App Store and Google Play. We work for a company that builds Android and iOs mobile apps. Our apps are free; therefore, our primary source of revenue consists of in-app ads. The number of users of the app is the leading influencer of In-app ads revenue

We are going to analyze data from the Apple Store and Google play market to find app profiles with the highest user engagement. To make our recommendation we will try to find out:

Which app genres are the more dominant
Which app genre has the most users

Summary of Result¶

After analyzing the data, we have concluded that developing a music game could be profitable in both markets.

For more details, please refer to the the full analysis below.

Exploring App Store and Google Play datasets¶

As of July 2019, there were 3.9 million apps available in the App store and 3.3 million apps on Google play. Collecting over 6 million apps will require time and money. Luckily there are relevant samples available at no cost:

A data set of around 7 000 apps form the App store, collected in July 2017, available in this link
A data set of around 10 000 apps from Google play, collected in August 2018,available in this link

We will start by defining the function explore_data() that allows exploration in an understandable way.

In [1]:

def explore_data(dataset, start, end, rows_and_columns=False):
    '''
    This function prints a slice from the dataset, indicated by start and end,
    with an empty line after each row.
    
    If rows_and_columns is set to True, the function will also print 
    the number fo rows and column in the data set.
    
    Parameter dataset: the dataset from which we want to print the rows
    Precondition: dataset is a valid list of list
    
    Parameter start: represent the srating index of our desired slice
    Precondition: start is an integer
    
    Parameter end: represent the ending index of our desired slice
    Precondition: end is an integer
    
    Parameter rows_and_columns: Indicates if we want to print  the number of rows and columns
                                Set to False by default
    Precondition: rows_and_columns is a boolean
    '''
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Apple data exploration¶

We start by exploring the Apple store data set

In [2]:

# import and read the data
from csv import reader
Apple_opened=open('AppleStore.csv')
Apple_read=reader(Apple_opened)
Apple_data=list(Apple_read)

#print header and explore first 3 rows of data
print(Apple_data[0])
explore_data(Apple_data[1:],1,4,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16

The Apple data set contains 7197 rows and 16 columns. Out of the 16 columns, the data that is to our interest is:

price, as we are interested in Free apps
rating_count_tot; we can use this information to check the popularity
prime_genre; this data point will give us information on App profiles

Android data exploration¶

Below we are going to explore the android dataset.

In [3]:

# import and read the data
Android_opened=open('googleplaystore.csv')
Android_read=reader(Android_opened)
Android_data=list(Android_read)

#print header and explore first 3 rows of data
print(Android_data[0]) 
explore_data(Android_data[1:],1,4,True) 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13

Google play data set contains 10 841 rows and 13 columns. The columns that could help us in our analysis are:

price, as we are interested in free apps
category and genre; this data will be used to profiles the apps
installs; this data can be used to estimate the popularity

Data cleaning¶

For our analysis, we are focusing on Free English-speaking apps. We will need to :

Remove non-English apps
Remove apps that are not free

However, before that, we are going to remove inaccurate and duplicate data.

Cleaning Android data¶

From the discussion section on Google Playa data, we can see that row 10473(including header) has a missing value in the column rating therefore we are going to remove this row.

In [4]:

print(Android_data[10473])#index include the header
del Android_data[10473]

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']

Moreover, some apps in Google play data set have duplicate entries, for example, Instagram.

In [5]:

for app in Android_data:
    name=app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

From the example above, we can see that the only difference is the number of reviews in index 3.

In the next step, we are going to count the number of duplicate apps in the Android data set.

In [6]:

duplicate_apps=[]
unique_apps=[]

for app in Android_data[1:]: #Ommit first row as it contains the header
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps: ',len(duplicate_apps))
print('\n')
print('Examples of duplciate apps: ',duplicate_apps[:15])

Number of duplicate apps:  1181


Examples of duplciate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']

There are 1 181 duplicate apps in the data set.

As we have seen above, the main difference is the number of reviews. For our analysis, we do not want to count apps more than once. Therefore we are going to remove the duplicates. We will keep the more recent data, using a higher number of reviews as a criterion to determine the recency of the data.

Once we clean the data, the expected legnth of the data set is 9 659 rows

In [7]:

print('Expected length: ',len(Android_data[1:])- 1181)

Expected length:  9659

To remove the duplicates, we will:

create a dictionary, where each key is a unique app and the value is the highest number of app reviews
Use the dictionary to create a new data set with only one entry per app

In [8]:

reviews_max={}

for app in Android_data[1:]:
    name = app[0]
    n_reviews=float(app[3])
   
    #Change the value in the dictionnary with the highest n_reviews value
    if name in reviews_max and reviews_max[name]< n_reviews:
        reviews_max[name]= n_reviews
    
    if name not in reviews_max:
        reviews_max[name]= n_reviews

print('review_max has '+ str(len(reviews_max)))

review_max has 9659

Review_max has 9659 rows, which is the same as the expected legnth of the android data set without the duplicates.

In the next step, we will use review_max dictionary to remove the duplicates and create a new dataset android_clean.

In [9]:

android_clean=[]
already_added=[]

for app in Android_data[1:]:#Omit the first row as it contains the header
    name=app[0]
    n_reviews=float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

print('android_clean has:',len(android_clean),' rows')

android_clean has: 9659  rows

android_clean has 9 659, which is the same as the expected value of the data set without the duplicates.

android_clean is a copy to the initial dataset without the duplicates, and without the incorrect values, we will use this dataset for the rest of the analysis

Removing non-English app¶

To help us remove the non-English apps, we will use the built-in function ord(), this built-in function takes a character and returns an integer. All English characters are in the range of 0 to 127. We can use this information to check if the app's name is English.

One limitation of the above technique is that emojis and special characters like 'TM' fall outside the range 0 to 127. As a result, we could incorrectly label English apps as non_English. To avoid data loss, we will only remove apps that have more than 3 characters with the corresponding number beyond 127.

The first step is to define the function all-English().

In [10]:

def all_english(a_string):
    '''
    This function returns True if all the characters in the string are English
    or if all the characters are English and there are a maximum of 3 non-English characters
    Otherwise, the function returns False

    Parameter a_string: a_string is the string we want to analyze
    Precondition: a_string is a string
    '''
    non_english=0
    
    for n in a_string:
        if ord(n)> 127:
            non_english+=1
            
    return False if non_english >3 else True

# test cases to check that the function works correctly
print(all_english('Instagram'))
print(all_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(all_english('Docs To Go™ Free Office Suite'))
        
        

True
False
True

all_english function can be used to filter out non-english apps. In the next step, we will use this function to remove non-English apps from the Apple data set and form the Android dataset.

In [11]:

Apple_english=[]
Android_english=[]

# Removing non-English apps from the Apple dataset
for app in Apple_data[1:]:
    name=app[1]
    
    if all_english(name):
        Apple_english.append(app)
        
# Removing non-English apps from the Android_clean dataset
for app in android_clean:
    name=app[0]
    if all_english(name):
        Android_english.append(app)

#Exploring the data
explore_data(Apple_english,1,4,True)
explore_data(Android_english,1,4,True)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 6183
Number of columns: 16
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9614
Number of columns: 13

Once we remove the non-English apps, the new dataset Apple-English has 6 183 rows, and Android-English has 9614 rows.ows.

Removing non-free apps¶

As for our analysis, we are only interested in free apps; in our next step, we are going to generate a copy of Apple_english and Android_english dataset with only free apps.

In [12]:

Apple_cleandata=[]
Android_cleandata=[]

# Removing non-free apps from Apple dataset
for app in Apple_english[1:]:
    price=float(app[4])# index 4 contain the price information
    
    if price == 0.0 :
        Apple_cleandata.append(app)
        
# Removing non-free apps from Android dataset        
for app in Android_english[1:]:
    price = app[6]     # Index 6 contain either `free` or `paid`
    if price == 'Free':
        Android_cleandata.append(app)


# exploring the new clean dataset
explore_data(Apple_cleandata,1,3,True)
explore_data(Android_cleandata,1,3,True)
    

['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 3221
Number of columns: 16
['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 8862
Number of columns: 13

At this point, we have cleaned and prepared the dataset Apple_cleandata, which has 3221 rows and Android_cleandata, which has 8862 rows. We will use this clean data set in our analysis.

Finding profitable app profiles¶

As we mentioned in the introduction, we aim to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets

To continue with the analysis we are going to build the functions freq_table() to generate a frequency table and display_table() to sort the table.

In [13]:

def freq_table(dataset,index):
    '''
    This function takes a dataset and an index from the dataset and returns 
    a dictionary where the keys are the unique values in the column represented by the index.
    and the values are the % frequency of each key
    
    Parameter dataset: is the dataset we want to analyze
    Precondition: Dataset is a nested list with a header in the first row
    
    Parameter index: index represents the index form the column we want to analyze
    Precondition: index is a valid index from the dataset
    '''
    freq_table={}
    total=len(dataset[1:])
    
    for app in dataset[1:]: #generates the frequency table
        item= app[index]
        
        if item in freq_table:
            freq_table[item] += 1
        else:
            freq_table[item] = 1
    result= {}
    for item in freq_table:  # convert the frequency table to percentages
        proportion= freq_table[item] / total
        percentage= round(proportion * 100,2)
        result[item]= percentage
    return result

def display_table(dataset, index):
    '''
    This function takes a dataset and an index from the dataset and returns 
    a sorted dictionary where the keys are the unique values in the column represented by the index.
    and the values are the % frequency of each key
    
    Parameter dataset: is the dataset we want to analyze
    Precondition: Dataset is a nested list with a header in the first row
    
    Parameter index: index represent the index form the column we want to analyze
    Precondition: index is a valid index from the dataset
    '''
    
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Most common app genres¶

The first step of our analysis is to find the most common genres in the Apple store and Google play.

We expect the app distribution between the genres to match user preferences. Therefore we can use the genre frequency to estimate which genre is preferred

First, we are going to generate the frequency table for prime_genre from Apple dataset.

In [14]:

prime_genre= display_table(Apple_cleandata,11)

Games : 58.2
Entertainment : 7.89
Photo & Video : 4.94
Education : 3.66
Social Networking : 3.26
Shopping : 2.61
Utilities : 2.52
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.34
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12

The table above shows the frequency of each genre in Apple English only data.

We observe that the distribution is skewed towards games, with this genre representing 58.2% of the Apps. This suggests that the App Store is saturated with games, and while it could be easy to enter this market, it will be hard to gain market share.

The second and third highest genre are Entertainment with 7.89% of the apps, and Photo & video with 4.94% of the apps

As a result, we can say that most of the apps in App store data are designed for Entertainment, and therefore we should offer entertainment features to be successful in the App store market.

In our next step, we are going to analyze the most common genres in Google play. In the Android dataset, two colums describe the app genre; Genres and Category.

We will first display the frequency table for Genres.

In [15]:

Genres= display_table(Android_cleandata,9)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.95
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.9
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.58
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;Brain Games : 0.14
Casual;Action & Adventure : 0.14
Arcade;Action & Adventure : 0.12
Action;Action & Adventure : 0.1
Educational;Pretend Play : 0.09
Simulation;Action & Adventure : 0.08
Parenting;Education : 0.08
Entertainment;Brain Games : 0.08
Board;Brain Games : 0.08
Parenting;Music & Video : 0.07
Educational;Brain Games : 0.07
Casual;Creativity : 0.07
Art & Design;Creativity : 0.07
Education;Pretend Play : 0.06
Role Playing;Pretend Play : 0.05
Education;Creativity : 0.05
Role Playing;Action & Adventure : 0.03
Puzzle;Action & Adventure : 0.03
Entertainment;Creativity : 0.03
Entertainment;Action & Adventure : 0.03
Educational;Creativity : 0.03
Educational;Action & Adventure : 0.03
Education;Music & Video : 0.03
Education;Brain Games : 0.03
Education;Action & Adventure : 0.03
Adventure;Action & Adventure : 0.03
Video Players & Editors;Music & Video : 0.02
Sports;Action & Adventure : 0.02
Simulation;Pretend Play : 0.02
Puzzle;Creativity : 0.02
Music;Music & Video : 0.02
Entertainment;Pretend Play : 0.02
Casual;Education : 0.02
Board;Action & Adventure : 0.02
Video Players & Editors;Creativity : 0.01
Trivia;Education : 0.01
Travel & Local;Action & Adventure : 0.01
Tools;Education : 0.01
Strategy;Education : 0.01
Strategy;Creativity : 0.01
Strategy;Action & Adventure : 0.01
Simulation;Education : 0.01
Role Playing;Brain Games : 0.01
Racing;Pretend Play : 0.01
Puzzle;Education : 0.01
Parenting;Brain Games : 0.01
Music & Audio;Music & Video : 0.01
Lifestyle;Pretend Play : 0.01
Lifestyle;Education : 0.01
Health & Fitness;Education : 0.01
Health & Fitness;Action & Adventure : 0.01
Entertainment;Education : 0.01
Communication;Creativity : 0.01
Comics;Creativity : 0.01
Casual;Music & Video : 0.01
Card;Action & Adventure : 0.01
Books & Reference;Education : 0.01
Art & Design;Pretend Play : 0.01
Art & Design;Action & Adventure : 0.01
Arcade;Pretend Play : 0.01
Adventure;Education : 0.01

From the table above, we can see that the Genres are more granular than the prime_genres from the Apple store. Moreover, apps in the Android data set are distributed quite evenly, without one specific genre accounting for a significant % of the Apps. As a result, we can say that Google play market has a more balanced distribution of apps genre than Apple store

Tools genre is the highest genre with 8.45% of the Apps, followed by Entertainment, Education and Business.

So far, we can see that Entertainment apps do well in both markets, we are now going to examine the Category frequency distribution.

In [16]:

Category=display_table(Android_cleandata,1)

FAMILY : 18.9
GAME : 9.73
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.95
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
COMICS : 0.62
ART_AND_DESIGN : 0.62
BEAUTY : 0.6

From the table above, we can see that Category data is a broader definition of the app genre. The Top 3 categories are Family, Games, and Tools.

In conclusion, from the frequency tables above, we can say that the App store is dominated by apps designed for entertainment, while Google play has a more balance app landscape. Therefore we can deduce that users in both markets will respond positively to and app offering entertainment features. Our next step is to gain insight into which apps have the most users.

Finding the most popular app genres¶

One way of finding what genres are the most popular is to calculate the average number of install for each app genre.

The App store data set does not have the information on the number of install, as a workaround, we assume that each user will provide a rating only once and therefore we will use rating_count_tot as an approximation.

Below we will analyse the average number of user ratings by app genre for the App store data set

In [17]:

prime_genre_table= freq_table(Apple_cleandata,11)

for genre in prime_genre_table:
    total =0
    len_genre=0
    
    for app in Apple_cleandata[1:]:
        genre_app=app[11]
        if genre_app == genre:
            rating=float(app[5])
            total+= rating
            len_genre += 1
            
    average= round(total/ len_genre,2)
    print('Average number of user ratings for '+ genre+ ' is '+ str(average))

Average number of user ratings for Productivity is 21028.41
Average number of user ratings for Food & Drink is 33333.92
Average number of user ratings for Catalogs is 4004.0
Average number of user ratings for Weather is 52279.89
Average number of user ratings for Travel is 28243.8
Average number of user ratings for Reference is 74942.11
Average number of user ratings for Lifestyle is 16485.76
Average number of user ratings for Book is 39758.5
Average number of user ratings for Entertainment is 14029.83
Average number of user ratings for Navigation is 86090.33
Average number of user ratings for Games is 22788.67
Average number of user ratings for Shopping is 26919.69
Average number of user ratings for Business is 7491.12
Average number of user ratings for Utilities is 18684.46
Average number of user ratings for Health & Fitness is 23298.02
Average number of user ratings for Sports is 23008.9
Average number of user ratings for Medical is 612.0
Average number of user ratings for Social Networking is 43899.51
Average number of user ratings for Photo & Video is 15025.72
Average number of user ratings for Music is 57326.53
Average number of user ratings for Finance is 31467.94
Average number of user ratings for News is 21248.02
Average number of user ratings for Education is 7003.98

From the data above, we can see that Navigation, Reference, Music have the highest average number of user ratings. However, Navigation and reference represent only 0.19% and 0,56% of the Apps on the Apple store, suggesting that there are very well established apps in those segments, and it will be hard to compete.

Music, on the other hand, represent 2.05% of the Apps, and it will be easier to enter this market.

For the Google play analysis, we will use Category as our genre, as this is a broader description of each app. To measure the number of installs, we will use the column Installs. However, we need to note that the Installs values are open-ended ( 100+, 1,000+, etc.).However, we don't need exact data for our purposes; we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

In the next step, we are going to analyze the average install per Category

In [18]:

category_table=freq_table(Android_cleandata,1)

for genre in category_table:
    total = 0
    len_category = 0
    
    for app in Android_cleandata[1:]:
        category_app=app[1]
        
        if category_app == genre:
            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '') # removing special characters
            n_installs = n_installs.replace('+', '') # removing special characters
            
            total +=float(n_installs)
            len_category +=1
    average= round(total / len_category,2)
    print('Average number of installs for '+ genre+ ' is '+ str(average))

        

Average number of installs for AUTO_AND_VEHICLES is 647317.82
Average number of installs for DATING is 854028.83
Average number of installs for HOUSE_AND_HOME is 1331540.56
Average number of installs for BUSINESS is 1712290.15
Average number of installs for ENTERTAINMENT is 11640705.88
Average number of installs for SPORTS is 3638640.14
Average number of installs for PERSONALIZATION is 5201482.61
Average number of installs for VIDEO_PLAYERS is 24727872.45
Average number of installs for COMICS is 817657.27
Average number of installs for EDUCATION is 1833495.15
Average number of installs for MEDICAL is 120550.62
Average number of installs for WEATHER is 5074486.2
Average number of installs for BEAUTY is 513151.89
Average number of installs for COMMUNICATION is 38456119.17
Average number of installs for TRAVEL_AND_LOCAL is 13984077.71
Average number of installs for LIBRARIES_AND_DEMO is 638503.73
Average number of installs for FAMILY is 3697848.17
Average number of installs for PHOTOGRAPHY is 17840110.4
Average number of installs for FINANCE is 1387692.48
Average number of installs for SOCIAL is 23253652.13
Average number of installs for PARENTING is 542603.62
Average number of installs for GAME is 15588015.6
Average number of installs for EVENTS is 253542.22
Average number of installs for ART_AND_DESIGN is 1967474.55
Average number of installs for PRODUCTIVITY is 16787331.34
Average number of installs for LIFESTYLE is 1437816.27
Average number of installs for MAPS_AND_NAVIGATION is 4056941.77
Average number of installs for FOOD_AND_DRINK is 1924897.74
Average number of installs for SHOPPING is 7036877.31
Average number of installs for TOOLS is 10801391.3
Average number of installs for HEALTH_AND_FITNESS is 4188821.99
Average number of installs for BOOKS_AND_REFERENCE is 8767811.89
Average number of installs for NEWS_AND_MAGAZINES is 9549178.47

From the table above we observe that Communication, Social and Video have the highest average of installs, however, those are skewed by a few apps with more than 1 Billion downloads

In [21]:

for app in Android_cleandata:
    if app[1] == 'COMMUNICATION' and app[5] == '1,000,000,000+':
                                      
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+

From the data above, we can see that a few big players dominate those markets, and it will be hard for us to enter and gain market share.

On the other hand, we can see that Games are popular in google play. However, this market is saturated in the App store. However, an app that combines Music and games would be profitable in both markets

Conclusion¶

In this project, we have analyzed the most common and most popular app genres in the App Store and google play to find an app profile that will be profitable in both markets.

After our analysis, we can conclude that developing a music game could be profitable in both markets