Notebook

Free App Profiles with High User Engagement

An Analysis of iOS and Android apps to Increase Ad Revenue

i. Executive Summary¶

At this time, it is recommended that the company's app developers focus resources on creating app profiles that are unique, no-pay-to-play, extemely user friendly vlogger/tuber simulator games and/or tools similar to 'PewDiePie's Tuber Simulator' or 'Vlogger Go Viral - Tuber Game'. Both app profiles can be viewed here:

Other notable app profiles for consideration include: Head Soccer, Eternium, My Talking Tom, Candy Crush Saga, Sniper 3D Assassin, and Geometry Dash.

Additionally, it is recommended that the company conducts a feasibility study to assess the practicality and the resources required to develop mobile antivirus, cleaning, and boosting apps.

iii. Table of Figures¶

Chart 1: Percent Distribution of Genres - Google Play Apps
Chart 2: Percent Distribution of Genres - Apple App Store
Table 1: Top 10 Google Apps: FAMILY & GAME Genre - Highest Rating Count Total
Table 2: Top 10 Google Apps: All Genres - Most Installs & Highest User Rating
Table 3: Top 10 Apple Apps: Games Genre - Highest Rating Count Total & User Rating

1. Introduction ¶

There are millions of apps available to users from the Apple Apps Store and Google Play. Our company develops apps for the Apple and Google markets that are typically directed towards an English speaking audience and are free for users to download. The apps generate revenue for the company by means of in-app ads; therefore, the more users that install and use our apps the more likely the user will see and/or engage with the ads that generate the company's revenue. As we continue to develop apps for these markets, it is beneficial for the company to be aware of free apps currently on the market that attract the most users in order to aid our developers when creating new app profiles.

2. Scope ¶

This project will include the collection, cleaning, and analysis of two datasets regarding mobile apps available on the Apple Apps Store and Google Play. In an effort to focus time and resources, only sample datasets available through a third party will be analyzed. The samples consist of data for approximately 10,000 Android and 7,000 iOS apps collected from August 2018 and July 2017 respectively. The data points in the samples include information related to the app's name, price, genre, user rating, rating count total, size, install count, etc.

3. Purpose ¶

The goal of this project is to develop short-lists of apps highly engaged by users from the Apple and Google sample datasets. The app short-lists will assist the company's developers in creating similar engaging apps for both the Apple and Google markets that will encourage user/ad interaction with the intent to increase company revenue.

4. Data ¶

4.1 Data Collection ¶

The data to be analyzed in this project include two third party sample datasets, one consists of data collected from approximately 10,000 Android apps which can be accessed here:

Google Play Apps

The second dataset if of approximately 7,000 iOS apps which can be accessed here:

Apple App Store

The heading column names from both the Google and Apple dataset csv files were modified prior to importing so that the column names were consistent and in similar order. The Google dataset does have two additional column headers installs and sub_genre as shown below.

In [1]:

# google dataset open
opened_file = open('C:\Shaun\Businesses\Mr Data\DataQuest\Guided Projects\Profitable App Profiles\Datasets\googleplaystore mod.1.csv', encoding = 'utf-8')
from csv import reader
read_file = reader(opened_file)
google = list(read_file)
google_header = google[0]
google_data = google[1:]
print(google_header)

['app_name', 'price', 'genre', 'rating_count_total', 'user_rating', 'content_rating', 'size_bytes', 'installs', 'sub_genre']

In [2]:

# apple dataset open
opened_file = open('C:\Shaun\Businesses\Mr Data\DataQuest\Guided Projects\Profitable App Profiles\Datasets\AppleStore mod.1.csv', encoding = 'utf-8')
from csv import reader
read_file = reader(opened_file)
apple = list(read_file)
apple_header = apple[0]
apple_data = apple[1:]
print(apple_header)

['app_name', 'price', 'genre', 'rating_count_total', 'user_rating', 'content_rating', 'size_bytes']

An explorer function was added to assist the analysis by printing slices of the dataset rows in a more readable way. An example of the explore_data function was printed to demonstrate its output.

In [3]:

def explore_data(dataset, start, end, rows_and_columns = True):
    
    dataset_slice = dataset[start:end] 
    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print('An example of the explore_data output for google_data.')
print(' ')
explore_data(google_data, 0, 1) 

print('\n')
print('An example of the explore_data output for apple_data.')
print(' ')
explore_data(apple_data, 0, 1)

An example of the explore_data output for google_data.
 
['Photo Editor & Candy Camera & Grid & ScrapBook', '0.00', 'ART_AND_DESIGN', '159', '4.1', 'Everyone', '19M', '10,000+', 'Adventure;Action & Adventure']


Number of rows: 10841
Number of columns: 9


An example of the explore_data output for apple_data.
 
['Facebook', '0', 'Social Networking', '2974676', '3.5', '4+', '389879808']


Number of rows: 7197
Number of columns: 7

4.2 Data Cleaning ¶

Since the company typically produces free apps for users that are English speaking, the data cleaning scope will include:

Extraction of free apps where price equals 0.00,
Removal of apps with no user_rating where rating_count_total is greater than zero,
Removal of duplicate apps based on the app_name column,
Removal of apps with non-English characters,
Removal of any incorrect or inaccurate data if detected.

4.3 Google App Data Cleaning ¶

This section of the report describes the data cleaning process for the google_data list only and is more descriptive of the cleaning process. The apple_data list cleaning process includes the same steps and is shown in section 4.4.

4.3.1 Removal of Incorrect or Inaccurate Data - Google ¶

While first attempting to extract free apps with user rating data into a new list called google_data_mod_1, it was noted that a ValueError: could not convert string to float: 'Everyone' would occur in relation to price = float(row[1]). The following code was run to search for the error:

In [4]:

for row in google_data:
    price = row[1]
    if price == 'Everyone':
        print(row)    

['Life Made WI-Fi Touchscreen Photo Frame', 'Everyone', '1.9', '3.0M', '19', '', '1,000+', 'Free', '']

It was found that the app named 'Life Made WI-Fi Touchscreen Photo Frame' had errors in each column excluding the app_name column. In order to remove the row containing 'Life Made WI-Fi Touchscreen Photo Frame' from the dataset, the app's index would need to be known. List comprehension with the built-in enumerate function was used to identify the correct index number for removal.

In [5]:

error_index  = [index for (index, item) in enumerate(google_data) if item == ['Life Made WI-Fi Touchscreen Photo Frame', 'Everyone', '1.9', '3.0M', '19', '', '1,000+', 'Free', '']]
print(error_index)

[10472]

The app 'Life Made WI-Fi Touchscreen Photo Frame' index number is [10472] and was removed from the dataset in the following cell; the ValueError raised while extracting free apps no longer was an issue. The explore_data function was ran before and after the app removal to demonstrate that the correct app was removed and that the Number of rows count decreased by one from 10841 to 10840.

In [6]:

explore_data(google_data, 10472, 10473 )
del google_data[10472]
print('\n')

explore_data(google_data, 10472, 10473 )

['Life Made WI-Fi Touchscreen Photo Frame', 'Everyone', '1.9', '3.0M', '19', '', '1,000+', 'Free', '']


Number of rows: 10841
Number of columns: 9


['osmino Wi-Fi: free WiFi', '0.00', 'TOOLS', '134203', '4.2', 'Everyone', '4.1M', '10,000,000+', '']


Number of rows: 10840
Number of columns: 9

4.3.2 Extraction of Free Apps with User Rating Data - Google ¶

As part of the data cleaning scope, a new list google_data_mod_1 was created to extract all free apps with user rating data, where price equals 0.00 and rating_count_total is greater than zero. The explore_data function was called to show the updated row count.

In [7]:

google_data_mod_1 = []

for row in google_data:
    
    app_name = row[0]
    price = float(row[1])
    genre = row[2]
    rating_count_total = int(row[3])
    user_rating = row[4]
    content_rating = row[5]
    size_bytes = row[6]
    installs = row[7]
    sub_genre = row[8]
    
    if price == 0 and rating_count_total != 0:
        google_data_mod_1.append([app_name, price, genre, rating_count_total, user_rating, content_rating, size_bytes, installs, sub_genre])

explore_data(google_data_mod_1, 0, 1)

['Photo Editor & Candy Camera & Grid & ScrapBook', 0.0, 'ART_AND_DESIGN', 159, '4.1', 'Everyone', '19M', '10,000+', 'Adventure;Action & Adventure']


Number of rows: 9520
Number of columns: 9

4.3.3 Removal of Duplicate Data - Google ¶

Continuing with the data cleaning scope, all duplicate apps in the new list google_data_mod_1 needed to be removed based on the app_name column. A new list, duplicate_apps_google was created to determine how many duplicate apps are in the google_data_mod_1 dataset and also to provide examples of duplicate apps in order to establish criterion in which to remove specific duplicates.

In [8]:

# duplicate and unique app list creation.
duplicate_apps_google = [] 
unique_apps_google = []

for row in google_data_mod_1:
    
    app_name = row[0]    
    if app_name in unique_apps_google:
        duplicate_apps_google.append(app_name)        
    else:
        unique_apps_google.append(app_name)
        
print(len(duplicate_apps_google))

According to the duplicate_apps_google list, there are 1132 duplicate apps based on the app_name column.

In [9]:

# duplicate app examples.
print(duplicate_apps_google[:15])
print('\n')

def duplicates(dataset, name):
    
    for row in dataset:    
        app_name = row[0]
        if app_name == name:       
            print(row)
        
duplicates(google_data_mod_1, 'Slack')
print('\n')

duplicates(google_data_mod_1, 'Google Ads')

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


['Slack', 0.0, 'BUSINESS', 51507, '4.4', 'Everyone', 'Varies with device', '5,000,000+', 'Casual']
['Slack', 0.0, 'BUSINESS', 51507, '4.4', 'Everyone', 'Varies with device', '5,000,000+', 'Action']
['Slack', 0.0, 'BUSINESS', 51510, '4.4', 'Everyone', 'Varies with device', '5,000,000+', 'Action']


['Google Ads', 0.0, 'BUSINESS', 29313, '4.3', 'Everyone', '20M', '5,000,000+', 'Action']
['Google Ads', 0.0, 'BUSINESS', 29313, '4.3', 'Everyone', '20M', '5,000,000+', 'Action']
['Google Ads', 0.0, 'BUSINESS', 29331, '4.3', 'Everyone', '20M', '5,000,000+', '']

It appears that the differences between duplicate apps include the number value in the rating_count_total and the designated sub_genre. Using 'Slack' rating_count_total as an example, the first entry has a rating_count_total of 51507 and the third entry has a rating_count_total of 51510.

The criterion moving forward with the data cleaning scope is as follows: duplicate apps with the highest recorded rating_count_total will remain on the final cleaned list as they are most likely the record with the most recent data; all other duplicates will be removed; sub_genre is not taken into consideration at this time.

In [10]:

# Expected number of remaining apps after duplication removal.
print(len(google_data_mod_1) - len(duplicate_apps_google))

After removing all the duplicate apps there should be 8388 unique apps remaining. The proceding cell creates a dictionary called google_data_mod_2 where each dictionary key is a unique app name and the complimenting dictionary value is the highest rating_count_total as per the duplicate removal criterion. The purpose of the dictionary is to store only unique apps with the highest rating_count_total separate from the duplicates.

In [11]:

# Unique app name dictionary creation.
google_data_mod_2 = {}

for row in google_data_mod_1:
    
    app_name = row[0]
    rating_count_total = row[3]
    
    if app_name in google_data_mod_2 and google_data_mod_2[app_name] < rating_count_total:
        google_data_mod_2[app_name] = rating_count_total       
    elif app_name not in google_data_mod_2:
        google_data_mod_2[app_name] = rating_count_total
        
#print(google_data_mod_2)

print('Expected length:', (len(google_data_mod_1)) - (len(duplicate_apps_google)))
print(' ')
print('Actual length:', len(google_data_mod_2))

Expected length: 8388
 
Actual length: 8388

As expected, after removing the 1138 duplicate apps, 8388 unique apps remain. Since google_data_mod_2 is only a dictionary containing key-value pairs of app_name and rating_count_total, a new list is required which includes only apps with unique names and their accompanying datapoints (price, genre, user_rating, content_rating, size_bytes, installs). Below, a new list called google_data_no_duplicates is created utylizing the google_data_mod_2 dictionary and the google_data_mod_1 list; the list already_added helps to keep track of the apps already added to google_data_no_apps so no duplicate apps migrate into the new list.

In [12]:

# Complete list of apps with all duplicates removed.
google_data_no_duplicates = []
already_added = []

for row in google_data_mod_1:
    
    app_name = row[0]
    rating_count_total = row[3]    
    if google_data_mod_2[app_name] == rating_count_total and app_name not in already_added:
        google_data_no_duplicates.append(row)
        already_added.append(app_name) 
        
explore_data(google_data_no_duplicates, 0, 1)

['Photo Editor & Candy Camera & Grid & ScrapBook', 0.0, 'ART_AND_DESIGN', 159, '4.1', 'Everyone', '19M', '10,000+', 'Adventure;Action & Adventure']


Number of rows: 8388
Number of columns: 9

The first row of the google_data_no_duplicates list was printed using the explore_data function to demonstrate that google_data_mod_1 and the google_data_mod_2 dictionary were integrated properly with the expected number of rows 8388 remaining.

4.3.4 Removal of Non-English Apps - Google ¶

The dataset up to this point appears to include some apps that are for non-English speaking users. As part of the analysis, only apps designed for an English-speaking user will be considered; as such, non-English apps needed to be removed. According to the American Standard Code for Information Interchange system (ASCII), the numbers associated with common English characters are in the number range 0 to 127. Each character in a string has a corresponding number associated with it; for example, the string 'Pie' would be associated with the numbers 80(P), 105(i), and 101(e).

A function called english was created with the built-in ord function to detect names of apps that contain strings with characters that fall outside of the common range of English characters (0-127). If a non-English character is detected in a string then the english function returns False for an output as shown in the cell below.

In [13]:

def english(a_string):
    
    for index in a_string:
        if ord(index) > 127:
            return False
        else:
            return True

print(english('Slack'))
print(english('漫咖 Comics - Manga,Novel and Stories'))       

True
False

However, some English app names contain special characters and emojis that fall outside of the ASCII range of 0-127 for common English characters. The english function needed to be modified so it would not incorrectly label English apps as non-English apps and result in the loss of useful data. To minimize data loss, an app was only removed by the modified english function if the app name had more than three characters outside of the 0-127 ASCII range as shown with the printed examples below.

In [14]:

def english(a_string):
    non_ascii = 0
    
    for index in a_string:
        if ord(index) > 127:
            non_ascii += 1    
    if non_ascii > 3:
        return False
    else:
        return True
    
print(english('DM for IG 😘 - Image & Video Saver for Instagram'))
print(english('Combat Strike CS 🔫 Counter Terrorist Attack FPS💣'))
print(english('뽕티비 - 개인방송, 인터넷방송, BJ방송'))

True
True
False

4.3.5 Final Dataset - Google ¶

The final step in the cleaning process is to remove the 'non-English' apps from the google_data_no_duplicates list and create a final list called google_data_final to use for analysis. Below, the modified english function is applied to filter out non-English apps while minimizing data loss.

In [15]:

google_data_final = []

for row in google_data_no_duplicates:
    
    app_name = row[0]    
    if english(app_name):
        google_data_final.append(row)
        
explore_data(google_data_final, 0, 1)

['Photo Editor & Candy Camera & Grid & ScrapBook', 0.0, 'ART_AND_DESIGN', 159, '4.1', 'Everyone', '19M', '10,000+', 'Adventure;Action & Adventure']


Number of rows: 8348
Number of columns: 9

At the end of the cleaning process, the initial google_data dataset of 10841 apps has been cleaned and trimmed down to the google_data_final list of 8343 apps.

4.4 Apple App Data Cleaning ¶

This section of the report describes the data cleaning process for the apple_data dataset only and is less descriptive of the cleaning process as the steps are the same as the google_data cleaning process.

4.4.1 Removal of Incorrect or Inaccurate Data - Apple ¶

There are no incorrect or inaccurate data to remove from the apple_data list.

4.4.2 Extraction of Free Apps with User Rating Data - Apple ¶

In [16]:

apple_data_mod_1 = []

for row in apple_data:
    
    app_name = row[0]
    price = float(row[1])
    genre = row[2]
    rating_count_total = int(row[3])
    user_rating = row[4]
    content_rating = row[5]
    size_bytes = row[6]
        
    if price == 0 and rating_count_total != 0:
        apple_data_mod_1.append([app_name, price, genre, rating_count_total, user_rating, content_rating, size_bytes])

explore_data(apple_data_mod_1, 0, 1)

['Facebook', 0.0, 'Social Networking', 2974676, '3.5', '4+', '389879808']


Number of rows: 3383
Number of columns: 7

4.4.3 Removal of Duplicate Data - Apple ¶

In [17]:

duplicate_apps_apple = [] 
unique_apps_apple = []

for row in apple_data_mod_1:
    
    app_name = row[0]    
    if app_name in unique_apps_apple:
        duplicate_apps_apple.append(app_name)        
    else:
        unique_apps_apple.append(app_name)
        
print(len(duplicate_apps_apple))

There are only two duplicate apps in the apple_data_mod_1 list.

In [18]:

print(duplicate_apps_apple[:15])
print('\n')

def duplicates(dataset, name):
    
    for row in dataset:    
        app_name = row[0]
        if app_name == name:       
            print(row)
            
duplicates(apple_data_mod_1, 'Mannequin Challenge')
print('\n')

duplicates(apple_data_mod_1, 'VR Roller Coaster')

['Mannequin Challenge', 'VR Roller Coaster']


['Mannequin Challenge', 0.0, 'Games', 668, '3', '9+', '109705216']
['Mannequin Challenge', 0.0, 'Games', 105, '4', '4+', '59572224']


['VR Roller Coaster', 0.0, 'Games', 107, '3.5', '4+', '169523200']
['VR Roller Coaster', 0.0, 'Games', 67, '3.5', '4+', '240964608']

The apple_data_mod_1 list indicates that the rating_count_total column is not the only difference between duplicate apps as first stated with the google_data_mod_1 list in section 4.2.3. As shown, 'Mannequin Challenge' has discrepancies with the rating_count_total, user_rating and content_rating columns. This is not considered an issue moving forward as the duplicate app with the highest rating_count_total is most likely the app record with the most recent data.

In [19]:

print(len(apple_data_mod_1) - len(duplicate_apps_apple))

In [20]:

apple_data_mod_2 = {}

for row in apple_data_mod_1:
    
    app_name = row[0]
    rating_count_total = row[3]    
    if app_name in apple_data_mod_2 and apple_data_mod_2[app_name] < rating_count_total:
        apple_data_mod_2[app_name] = rating_count_total       
    elif app_name not in apple_data_mod_2:
        apple_data_mod_2[app_name] = rating_count_total       

print('Expected length:', (len(apple_data_mod_1)) - (len(duplicate_apps_apple)))
print(' ')
print('Actual length:', len(apple_data_mod_2))

Expected length: 3381
 
Actual length: 3381

In [21]:

apple_data_no_duplicates = []
already_added = []

for row in apple_data_mod_1:
    
    app_name = row[0]
    rating_count_total = row[3]    
    if apple_data_mod_2[app_name] == rating_count_total and app_name not in already_added:
        apple_data_no_duplicates.append(row)
        already_added.append(app_name) 
        
explore_data(apple_data_no_duplicates, 0, 1)

['Facebook', 0.0, 'Social Networking', 2974676, '3.5', '4+', '389879808']


Number of rows: 3381
Number of columns: 7

4.4.4 Removal of Non-English Apps and Final Dataset - Apple ¶

In [22]:

apple_data_final = []

for row in apple_data_no_duplicates:
    
    app_name = row[0]    
    if english(app_name):
        apple_data_final.append(row)
        
explore_data(apple_data_final, 0, 1)

['Facebook', 0.0, 'Social Networking', 2974676, '3.5', '4+', '389879808']


Number of rows: 3069
Number of columns: 7

At the end of the cleaning process, the initial apple_data dataset of 7197 apps has been cleaned and trimmed down to the apple_data_final dataset of 3069 apps.

5. Data Analysis ¶

The intent of the data analysis is to provide company app developers with short-lists of currently available free apps that are highly engaged by users in both the Google and Apple markets. When building new app profiles, the developers can utilize the short-lists to model their projects in a similar fashion to increase the likelihood of user/ad interaction. It was decided to start the analysis process by determining which app genres are most commonly found in the cleaned google_data_final and apple_data_final datasets.

5.1 Determine the Most Common App Genre ¶

The most common genre was determined by creating and reviewing frequency tables for both the google_data_final and apple_data_final datasets. The percent distrbution of each genre was calculated, sorted from highest percent to lowest, and then presented in a bar chart.

In [23]:

# most common genre for google apps.

def freq_table(data_set, index):
    
    table = {}
    total = 0
    
    for row in data_set:
        total += 1
        key = row[index]        
        if key in table:
            table[key] += 1
        else:
            table[key] = 1            
    
    percent = {}
    
    for key in table:
        value = table[key]
        percentage = (value / total) * 100
        percentage = round(percentage, 1)
        percent[key] = percentage
        
    return percent
        
genre_most_common_google = freq_table(google_data_final, 2)

percent_genre_google = []

for key, value in genre_most_common_google.items():
    percent_genre_google.append([key, value])

print(percent_genre_google)

[['ART_AND_DESIGN', 0.7], ['AUTO_AND_VEHICLES', 0.9], ['BEAUTY', 0.6], ['BOOKS_AND_REFERENCE', 2.1], ['BUSINESS', 4.0], ['COMICS', 0.7], ['COMMUNICATION', 3.1], ['DATING', 1.8], ['EDUCATION', 1.2], ['ENTERTAINMENT', 1.0], ['EVENTS', 0.7], ['FINANCE', 3.8], ['FOOD_AND_DRINK', 1.2], ['HEALTH_AND_FITNESS', 3.0], ['HOUSE_AND_HOME', 0.8], ['LIBRARIES_AND_DEMO', 1.0], ['LIFESTYLE', 3.8], ['GAME', 10.2], ['FAMILY', 19.4], ['MEDICAL', 3.1], ['SOCIAL', 2.7], ['SHOPPING', 2.3], ['PHOTOGRAPHY', 3.1], ['SPORTS', 3.3], ['TRAVEL_AND_LOCAL', 2.3], ['TOOLS', 8.6], ['PERSONALIZATION', 3.3], ['PRODUCTIVITY', 3.7], ['PARENTING', 0.7], ['WEATHER', 0.8], ['VIDEO_PLAYERS', 1.9], ['NEWS_AND_MAGAZINES', 2.8], ['MAPS_AND_NAVIGATION', 1.4]]

In [24]:

# most common genre for apple apps.
genre_most_common_apple = freq_table(apple_data_final, 2)

percent_genre_apple = []

for key, value in genre_most_common_apple.items():
    percent_genre_apple.append([key, value])

print(percent_genre_apple)

[['Social Networking', 3.3], ['Photo & Video', 4.9], ['Games', 58.5], ['Music', 2.1], ['Reference', 0.5], ['Health & Fitness', 1.9], ['Weather', 0.8], ['Utilities', 2.4], ['Travel', 1.2], ['Shopping', 2.6], ['News', 1.3], ['Navigation', 0.2], ['Lifestyle', 1.6], ['Entertainment', 8.0], ['Food & Drink', 0.8], ['Sports', 2.2], ['Book', 0.3], ['Finance', 1.1], ['Education', 3.6], ['Productivity', 1.7], ['Business', 0.6], ['Catalogs', 0.1], ['Medical', 0.1]]

5.1.1 Genre Percent Distribution - Sorted ¶

In [25]:

percent_genre_google_sorted = percent_genre_google
percent_genre_google_sorted.sort(key=lambda x: x[1], reverse = True)

print('Google apps genre percent distribution sorted highest to lowest:')
print(' ')
print(percent_genre_google_sorted)
print('\n')

percent_genre_apple_sorted = percent_genre_apple
percent_genre_apple_sorted.sort(key=lambda x: x[1], reverse = True)

print('Apple apps genre percent distribution sorted highest to lowest:')
print(' ')
print(percent_genre_apple_sorted)

Google apps genre percent distribution sorted highest to lowest:
 
[['FAMILY', 19.4], ['GAME', 10.2], ['TOOLS', 8.6], ['BUSINESS', 4.0], ['FINANCE', 3.8], ['LIFESTYLE', 3.8], ['PRODUCTIVITY', 3.7], ['SPORTS', 3.3], ['PERSONALIZATION', 3.3], ['COMMUNICATION', 3.1], ['MEDICAL', 3.1], ['PHOTOGRAPHY', 3.1], ['HEALTH_AND_FITNESS', 3.0], ['NEWS_AND_MAGAZINES', 2.8], ['SOCIAL', 2.7], ['SHOPPING', 2.3], ['TRAVEL_AND_LOCAL', 2.3], ['BOOKS_AND_REFERENCE', 2.1], ['VIDEO_PLAYERS', 1.9], ['DATING', 1.8], ['MAPS_AND_NAVIGATION', 1.4], ['EDUCATION', 1.2], ['FOOD_AND_DRINK', 1.2], ['ENTERTAINMENT', 1.0], ['LIBRARIES_AND_DEMO', 1.0], ['AUTO_AND_VEHICLES', 0.9], ['HOUSE_AND_HOME', 0.8], ['WEATHER', 0.8], ['ART_AND_DESIGN', 0.7], ['COMICS', 0.7], ['EVENTS', 0.7], ['PARENTING', 0.7], ['BEAUTY', 0.6]]


Apple apps genre percent distribution sorted highest to lowest:
 
[['Games', 58.5], ['Entertainment', 8.0], ['Photo & Video', 4.9], ['Education', 3.6], ['Social Networking', 3.3], ['Shopping', 2.6], ['Utilities', 2.4], ['Sports', 2.2], ['Music', 2.1], ['Health & Fitness', 1.9], ['Productivity', 1.7], ['Lifestyle', 1.6], ['News', 1.3], ['Travel', 1.2], ['Finance', 1.1], ['Weather', 0.8], ['Food & Drink', 0.8], ['Business', 0.6], ['Reference', 0.5], ['Book', 0.3], ['Navigation', 0.2], ['Catalogs', 0.1], ['Medical', 0.1]]

Chart 1: Percent Distribution of Genres - Google Play Apps ¶

In [26]:

from matplotlib import pyplot as plt

genre = []
for row in percent_genre_google_sorted:
    g = row[0]
    genre.append(g)
# print(genre)

percent = []
for row in percent_genre_google_sorted:
    p = row[1]
    percent.append(p)
# print(percent)

Color = 'white'
New_Colors = ['sandybrown', 'sandybrown', 'sandybrown', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen', 'mediumseagreen'] 
plt.figure(figsize = (14, 6))
ax = plt.axes()
ax.set_facecolor('0.85') # Background Color
plt.grid(color = Color, alpha = 0.3, linestyle = '--', linewidth = 1)
plt.bar(genre, percent, color = New_Colors, edgecolor = 'black', width = .85)
plt.title('Percent Distribution of Genres - Google Play Apps', fontsize = 18)
plt.ylabel('Distribution (%)', fontsize = 18)
plt.xlabel('Genre', fontsize = 18,)
plt.xticks(genre, horizontalalignment = 'right', rotation = '45')
plt.yticks([0, 5, 8.6, 10.2, 15, 19.4, 25])
        
plt.show('center')

There are 33 different genres in the google_data_final dataset and the most common is FAMILY at 19.4% of the the total distribution of possible genres. The next closest genres include GAME and TOOLS at 10.2% and 8.6% respectively.

Chart 2: Percent Distribution of Genres - Apple App Store ¶

In [27]:

genre = []
for row in percent_genre_apple_sorted:
    g = row[0]
    genre.append(g)
# print(genre)

percent = []
for row in percent_genre_apple_sorted:
    p = row[1]
    percent.append(p)
# print(frequency)

Color = 'white'
New_Colors = ['cornflowerblue', 'cornflowerblue', 'cornflowerblue', 'tomato', 'tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato','tomato']
plt.figure(figsize = (14, 6))
ax = plt.axes()
ax.set_facecolor('0.85') # Background Color
plt.grid(color = Color, alpha = 0.3, linestyle = '--', linewidth = 1)
plt.bar(genre, percent, color = New_Colors, edgecolor = 'black', width = .85)
plt.title('Percent Distribution of Genres - Apple App Store', fontsize = 18)
plt.ylabel('Distribution (%)', fontsize = 18)
plt.xlabel('Genre', fontsize = 18,)
plt.xticks(genre, horizontalalignment = 'right', rotation = '45')
plt.yticks([0, 4.9, 8, 20, 30, 40, 50, 58.5, 70])
        
plt.show('center')

There are 23 different genres in the apple_data_final dataset, the most common is Games with a commanding 58.5% of the the total distribution of possible genres. The next closest genres include Entertainment and Photo & Video at 8.0% and 4.9% respectively. The analysis suggests that the Apple App store focuses primarily on providing users with apps in the Games and Entertainment genres as they make up 67% of the apps in the apple_data_final dataset.

It is obvious that the most common genre combination in the apple_data_final dataset is Games and Entertainment. However, with the google_data_final dataset it is not quite as one sided as the genre distribution is more comprehensive with 33 primary genres which can be assigned to at least one of 84 sub_genres including assigning no sub-genre at all. It was decided to explore how the sub_genres were distributed throughout the most common primary genre FAMILYto see if any additional patterns could be identified. A frequency table for FAMILY sub_genres was created and sorted for review.

5.2 Most Common Sub-Genre - Google ¶

In [28]:

# most common sub_genre in the FAMILY genre, not sorted.
most_common_sub_genre_family_google = {}

for row in google_data_final:
    
    genre = row[2]    
    sub_genre = row[8]
    if genre == 'FAMILY':
        if genre and sub_genre in most_common_sub_genre_family_google:
            most_common_sub_genre_family_google[sub_genre] += 1
        else:
            most_common_sub_genre_family_google[sub_genre] = 1

print(most_common_sub_genre_family_google)

{'Role Playing': 8, 'Education': 29, 'Action': 20, 'Trivia': 4, 'Simulation': 7, 'Racing': 4, 'Music': 3, 'Casual': 10, 'Puzzle': 3, 'Entertainment': 45, 'Strategy': 3, 'Word': 2, 'Board': 4, 'Adventure': 1, 'Arcade': 2, 'Education;Education': 1, 'Casino': 1, '': 1473}

5.2.1 Most Common Sub-Genre Sorted - Google ¶

In [29]:

# most common sub-genre within the 'FAMILY' genre for the google_data_final dataset?
family_sub_genre_google = []

for key in most_common_sub_genre_family_google:
    sub_genre = key
    freq = most_common_sub_genre_family_google[key]
    family_sub_genre_google.append([sub_genre, freq])

family_sub_genre_google_sorted = family_sub_genre_google
family_sub_genre_google_sorted.sort(key=lambda x: x[1], reverse = True)

print('"FAMILY" genre sub_genre frequency table sorted:')
print(' ')
print(family_sub_genre_google_sorted)
print('\n')

# 'FAMILY' genre app count check
freq_count = 0

for row in family_sub_genre_google_sorted:
    freq = row[1]
    freq_count = freq_count + freq
print(freq_count, '- Check count matches frequency table - Yes')

"FAMILY" genre sub_genre frequency table sorted:
 
[['', 1473], ['Entertainment', 45], ['Education', 29], ['Action', 20], ['Casual', 10], ['Role Playing', 8], ['Simulation', 7], ['Trivia', 4], ['Racing', 4], ['Board', 4], ['Music', 3], ['Puzzle', 3], ['Strategy', 3], ['Word', 2], ['Arcade', 2], ['Adventure', 1], ['Education;Education', 1], ['Casino', 1]]


1620 - Check count matches frequency table - Yes

Since a very high number (1473 of 1620) of the FAMILY sub_genreis blank, it was decided to see how many of the google_data_final dataset has undefined sub-genres for comparison.

In [30]:

# most common sub-genre in the google_data_final dataset.
sub_genre_google = freq_table(google_data_final, 8)

percent_sub_genre_google = []

for key, value in sub_genre_google.items():
    percent_sub_genre_google.append([key, value])
    
percent_sub_genre_google_sorted = percent_sub_genre_google
percent_sub_genre_google_sorted.sort(key=lambda x: x[1], reverse = True)

print('Google apps sub-genre percent distribution sorted highest to lowest (top 3):')
print(' ')
print(percent_sub_genre_google_sorted[:3])

Google apps sub-genre percent distribution sorted highest to lowest (top 3):
 
[['', 74.9], ['Entertainment', 4.2], ['Education', 3.2]]

It appears that the most common sub_genre within the google_data_final dataset and the FAMILY primary genre is blank or undefined. There are 1473 out of 1620 apps in the FAMILY genre that have no defined sub_genre; furthermore, nearly 75% of the entire google_data_final dataset has no defined sub_genre. The most common genre in the google_data_final list remains to be FAMILY followed by ENTERTAINMENT. However, it was noted during review of the FAMILY genre app list that many of the apps could be considered games or entertainment. The following is a qualitative assessment of the google_data_final list in order to develop assumptions that will allow further analysis of the google_data_final list.

5.3 Qualitative Assessment of Google Dataset ¶

In [31]:

# qualitative assessment of 'FAMILY' genre apps with undefined sub_genres.
def look_up(data_set, start, end, genre_name):
       
    extract = []
    
    for row in data_set:
        app_name = row[0]
        genre = row[2]
        sub_genre = row[8]
        
        if genre == genre_name:
            extract.append([app_name, genre, sub_genre])
            
    data_slice = extract[start:end]
    for row in data_slice:
        print(row)

look_up(google_data_final, 0, 10, 'FAMILY')
print('\n')

look_up(google_data_final, 0, 5, 'BEAUTY')
print('\n')

look_up(google_data_final, 0, 5, 'BOOKS_AND_REFERENCE')

['Jewels Crush- Match 3 Puzzle', 'FAMILY', 'Role Playing']
['Coloring & Learn', 'FAMILY', 'Education']
['Mahjong', 'FAMILY', 'Role Playing']
['Super ABC! Learning games for kids! Preschool apps', 'FAMILY', 'Action']
['Toy Pop Cubes', 'FAMILY', 'Trivia']
['Educational Games 4 Kids', 'FAMILY', 'Action']
['Candy Pop Story', 'FAMILY', 'Simulation']
['Princess Coloring Book', 'FAMILY', 'Education']
['Hello Kitty Nail Salon', 'FAMILY', 'Action']
['Candy Smash', 'FAMILY', 'Education']


['Hush - Beauty for Everyone', 'BEAUTY', 'Casual']
['ipsy: Makeup, Beauty, and Tips', 'BEAUTY', 'Puzzle']
['Natural recipes for your beauty', 'BEAUTY', 'Puzzle']
['BestCam Selfie-selfie, beauty camera, photo editor', 'BEAUTY', 'Arcade']
['Mirror - Zoom & Exposure -', 'BEAUTY', 'Sports']


['E-Book Read - Read Book for free', 'BOOKS_AND_REFERENCE', 'Sports']
['Download free book with green book', 'BOOKS_AND_REFERENCE', 'Action']
['Wikipedia', 'BOOKS_AND_REFERENCE', 'Arcade']
['Cool Reader', 'BOOKS_AND_REFERENCE', 'Word']
['Free Panda Radio Music', 'BOOKS_AND_REFERENCE', 'Action']

A qualitative assessment of the FAMILY genre found that many of the apps could be classified as GAME or ENTERTAINMENT following review of app descriptions and content. Additional qualitative assessment of all genres in the google_data_final dataset affirms that there are apps in all genres that could be defined as GAME or ENTERTAINMENT or have mislabeled sub_genres as shown in the BEAUTY and BOOKS_AND_REFERENCE examples; i.e., "ipsy: Makeup, Beauty, and Tips', 'BEAUTY', 'Puzzle" is not a 'Puzzle' type app or "Wikipedia', 'BOOKS_AND_REFERENCE', 'Arcade" is not an 'Arcade' type app. It was determined that the sub_genre data was not reliable and no further analysis into sub_genre was required.

The qualitative assessment of the google_data_final list appears to "agree" with the apple_data_final dataset as approximately 67% of the apps in the apple_data_final dataset (as shown in Chart 2) are defined as Games or Entertainment. Even though Google Play offers a greater variety of apps in an extensive range of genres and sub_genres as compared to the Apple App Store, the most common genres of apps in the google_data_final dataset are FAMILY and GAME with a combined total of 30% of the distribution. The assumptions moving forward with the analysis are 1: The majority of apps in the FAMILY genre are considered GAME and ENTERTAINMENT. 2: Some apps in all genres other than FAMILY and GAME are considered GAME and ENTERTAINMENT. 3: The most common apps in the google_data_final are GAME and ENTERTAINMENT.

5.4 App Short Lists ¶

5.4.1 Top 10 Google Apps: FAMILY & GAME Genre - Highest Rating Count Total ¶

The following section of this report is the development of short-lists which include apps that are deemed highly engaged by users. The lists include the top ten apps based on user data such as user_rating, rating_count_total, and installs. The first list is of the top 10 apps in the google_data_final dataset in the FAMILY and GAMES genres based on the highest rating_count_total and then the highest user_rating.

In [32]:

def look_up_user_rating_and_rating_count_total(data_set, n_rating_count_total, n_user_rating):
       
    extract = []
    
    for row in data_set:
        app_name = row[0]
        genre = row[2]
        rating_count_total = row[3]
        user_rating = row[4]
        installs = row[7]
        
        if user_rating >= n_user_rating and rating_count_total >= n_rating_count_total:
            extract.append([app_name, genre, rating_count_total, user_rating, installs])
            
    return extract
        
top_google_apps_genre = look_up_user_rating_and_rating_count_total(google_data_final, 40000, '4.8')

top_google_apps_genre_sorted = top_google_apps_genre
top_google_apps_genre_sorted.sort(key=lambda x: x[2], reverse = True)

def look_up_FAMILY_GAMES(data_set):
       
    extract = []
    
    for row in data_set:
        app_name = row[0]
        genre = row[1]
        rating_count_total = row[2] 
        user_rating = row[3]
        installs = row[4]
        
        if genre == 'FAMILY' or genre == 'GAME':
            extract.append([app_name, genre, rating_count_total, user_rating, installs])
            
    return extract
        
top10_goggle_FAMILY_GAME_apps = look_up_FAMILY_GAMES(top_google_apps_genre_sorted)
print(top10_goggle_FAMILY_GAME_apps)

[['Eternium', 'FAMILY', 1506783, '4.8', '10,000,000+'], ["PewDiePie's Tuber Simulator", 'FAMILY', 1499466, '4.8', '10,000,000+'], ['Vlogger Go Viral - Tuber Game', 'FAMILY', 1304467, '4.8', '10,000,000+'], ['Cash, Inc. Money Clicker Game & Business Adventure', 'GAME', 549720, '4.8', '10,000,000+'], ['Dan the Man: Action Platformer', 'GAME', 528550, '4.8', '10,000,000+'], ['Fernanfloo', 'GAME', 526595, '4.8', '10,000,000+'], ['No. Color - Color by Number, Number Coloring', 'FAMILY', 269194, '4.8', '10,000,000+'], ['Wordscapes', 'GAME', 230849, '4.8', '10,000,000+'], ["Drag'n'Boom", 'GAME', 133180, '4.8', '1,000,000+'], ['PixPanda - Color by Number Pixel Art Coloring Book', 'FAMILY', 55723, '4.9', '1,000,000+'], ['Hungry Hearts Diner: A Tale of Star-Crossed Souls', 'FAMILY', 46253, '4.9', '500,000+']]

Table 1: Top 10 Google Apps: FAMILY & GAME Genre - Highest Rating Count Total ¶

In [33]:

data = top10_goggle_FAMILY_GAME_apps[:10]

columns = ['App Name', 'Genre', 'Rating Count Total', 'User Rating', 'Installs'] 
rows = ['{:0}'.format((1 * i) + 1) for i in range(10)]
r_colors = ['gold', 'silver', 'peru', 'linen', 'linen', 'linen', 'linen', 'linen', 'linen', 'linen',]

fig, ax = plt.subplots()
ax.set_axis_off() # removes the plot x and y axes
table = ax.table( 
    cellText = data, cellLoc = 'center',    
    rowLabels = rows, rowLoc = 'center', rowColours = r_colors,
    colLabels = columns, colColours = ['sandybrown'] * 5,    
    bbox=[0, -.55, 2.5, 1.5], # table (left/right, up/down, Column Width and padding, Row Height and padding)
    loc = 'upper left') 

table.auto_set_column_width(col = [-1, 0, 1, 2, 3, 4]) # auto fit column size
table.auto_set_font_size(False)
table.set_fontsize(12)

ax.set_title('Top 10 Google Apps: FAMILY & GAME Genre - Highest Rating Count Total',
             fontweight = 'bold', fontsize = 16, loc = 'left') 
 
plt.show() 

The top app in the FAMILY and GAME genres is 'Eternium' which is a classic RPG game with impressive graphics that touts its "effortless 'tap to move' and innovative 'swipe to cast'" control features. The developer says the game is "player-friendly" with "no paywalls, never pay-to-play" philosophy.

The 2nd (PewDiePie's Tuber Simulator) and 3rd (Vlogger Go Viral - Tuber Game) position apps are both 'tuber' simulator games where users play to become the world's "#1" YouTuber influencer or internet sensation.

Other notables include the 7th and 10th position apps which are both color-by-numbers type games.

5.4.2 Top 10 Google Apps: All Genres - Most Installs & Highest User Rating ¶

The second google app list includes the top 10 apps for all primary genres based on the highest number of installs (500,000,000+) with the highest user_rating.

In [34]:

installs = []
for row in google_data_final:
    count = row[7]
    installs.append(count)
    
min_installs = min(installs)
max_installs = max(installs)

print(min_installs, 'to', max_installs)

1+ to 500,000,000+

In [35]:

def look_up_installs_and_user_rating(data_set, n_installs, n_user_rating):
       
    extract = []
    
    for row in data_set:
        app_name = row[0]
        genre = row[2]
        rating_count_total = row[3]
        user_rating = row[4]
        installs = row[7]
        sub_genre = row[8]
        
        if user_rating >= n_user_rating and installs >= n_installs:
            extract.append([app_name, genre, installs, user_rating, rating_count_total])
            
    return extract

top10_google_apps_installs_and_user_rating = look_up_installs_and_user_rating(google_data_final, '500,000,000+', '4.4')

top10_google_apps_installs_and_user_rating_sorted = top10_google_apps_installs_and_user_rating
top10_google_apps_installs_and_user_rating_sorted.sort(key=lambda x: x[3], reverse = True)

print(top10_google_apps_installs_and_user_rating_sorted)

[['Clean Master- Space Cleaner & Antivirus', 'TOOLS', '500,000,000+', '4.7', 42916526], ['Security Master - Antivirus, VPN, AppLock, Booster', 'TOOLS', '500,000,000+', '4.7', 24900999], ['Google Duo - High Quality Video Calls', 'COMMUNICATION', '500,000,000+', '4.6', 2083237], ['SHAREit - Transfer & Share', 'TOOLS', '500,000,000+', '4.6', 7790693], ['UC Browser - Fast Download Private & Secure', 'COMMUNICATION', '500,000,000+', '4.5', 17714850], ['My Talking Tom', 'GAME', '500,000,000+', '4.5', 14892469], ['Microsoft Word', 'PRODUCTIVITY', '500,000,000+', '4.5', 2084126], ['MX Player', 'VIDEO_PLAYERS', '500,000,000+', '4.5', 6474672], ['Candy Crush Saga', 'GAME', '500,000,000+', '4.4', 22430188], ['Google Translate', 'TOOLS', '500,000,000+', '4.4', 5745093], ['Dropbox', 'PRODUCTIVITY', '500,000,000+', '4.4', 1861310], ['Flipboard: News For Our Time', 'NEWS_AND_MAGAZINES', '500,000,000+', '4.4', 1284018]]

Table 2: Top 10 Google Apps: All Genres - Most Installs & Highest User Rating ¶

In [36]:

data = top10_google_apps_installs_and_user_rating_sorted[0:10]

columns = ['App Name', 'Genre', 'Installs', 'User Rating', 'Rating Count Total'] 
rows = ['{:0}'.format((1 * i) + 1) for i in range(10)]
r_colors = ['gold', 'silver', 'peru', 'linen', 'linen', 'linen', 'linen', 'linen', 'linen', 'linen',]

fig, ax = plt.subplots()
ax.set_axis_off() # removes the plot x and y axes
table = ax.table( 
    cellText = data, cellLoc = 'center',    
    rowLabels = rows, rowLoc = 'center', rowColours = r_colors,
    colLabels = columns, colColours = ['cornflowerblue'] * 5,    
    bbox=[0, -.55, 2.5, 1.5], # table (left/right, up/down, Column Width and padding, Row Height and padding)
    loc = 'upper left') 

table.auto_set_column_width(col =  [0, 1, 2, 3, 4]) # auto fit column size
table.auto_set_font_size(False)
table.set_fontsize(12)

ax.set_title('Top 10 Google Apps: All Genres - Most Installs & Highest User Rating',
             fontweight = 'bold', fontsize = 16, loc = 'left') 
 
plt.show()

The majority of the apps in the list above (6 of 10) fall under the TOOLS and COMMUNICATION genres. The #1 and #2 positions are 'Cleanmaster' and 'Security Master' respectively; both are mobile antivirus, cleaning, and performance boosting apps. Antivirus apps may be an area to consider for development if the company has the experience and resources available.

Two GAMES made this top 10 list which include 'My Talking Tom' and 'Candy Crush Saga'. These two games did not make the first top 10 list due to lower user ratings. It should be noted that 'My Talking Tom' and 'Candy Crush Saga' have at least 50x more installs and 10-15x more rating count totals as compared to the #1 FAMILY and GAME app 'Eternium'.

5.4.3 Top 10 Apple Apps: Games Genre - Highest Rating Count Total & User Rating ¶

The final top 10 list includes apps from only the Games genre in the Apple_data_final dataset. The list was genreated by taking the highest number rating_count_total for apps with the highest user_rating into consideration.

In [37]:

rating_count_total = []
for row in apple_data_final:
    count = row[3]
    rating_count_total.append(count)
    
min_installs = min(rating_count_total)
max_installs = max(rating_count_total)

print(min_installs, 'to', max_installs)

1 to 2974676

In [38]:

def look_up_user_rating_and_genre(data_set, n_genre, n_rating_count_total, n_user_rating,):
       
    extract = []
    
    for row in data_set:
        app_name = row[0]
        genre = row[2]
        rating_count_total = row[3]
        user_rating = row[4]
               
        if user_rating == n_user_rating and genre == n_genre and rating_count_total >= n_rating_count_total:
            extract.append([app_name, genre, rating_count_total, user_rating])
            
    return extract
        
top10_apple_apps_games_genre = look_up_user_rating_and_genre(apple_data_final, 'Games', 60000, '5')

top10_apple_apps_games_genre_sorted = top10_apple_apps_games_genre
top10_apple_apps_games_genre_sorted.sort(key=lambda x: x[2], reverse = True)

print(top10_apple_apps_games_genre_sorted)

[['Head Soccer', 'Games', 481564, '5'], ['Sniper 3D Assassin: Shoot to Kill Gun Game', 'Games', 386521, '5'], ['Geometry Dash Lite', 'Games', 370370, '5'], ['CSR Racing 2', 'Games', 257100, '5'], ["Pictoword: Fun 2 Pics Guess What's the Word Trivia", 'Games', 186089, '5'], ['Iron Force', 'Games', 141634, '5'], ['Sniper Shooter: Gun Shooting Games', 'Games', 134080, '5'], ["PewDiePie's Tuber Simulator", 'Games', 90851, '5'], ['Blackbox - think outside the box', 'Games', 80058, '5'], ['Egg, Inc.', 'Games', 79074, '5'], ['Flight Pilot Simulator 3D: Flying Game For Free', 'Games', 60360, '5']]

Table 3: Top 10 Apple Apps: Games Genre - Highest Rating Count Total & User Rating ¶

In [39]:

data = top10_apple_apps_games_genre_sorted[:10]

columns = ['App Name', 'Genre', 'Rating Count Total', 'User Rating'] 
rows = ['{:0}'.format((1 * i) + 1) for i in range(10)]
r_colors = ['gold', 'silver', 'peru', 'linen', 'linen', 'linen', 'linen', 'linen', 'linen', 'linen',]

fig, ax = plt.subplots()
ax.set_axis_off() # removes the plot x and y axes
table = ax.table( 
    cellText = data, cellLoc = 'center',    
    rowLabels = rows, rowLoc = 'center', rowColours = r_colors,
    colLabels = columns, colColours = ['violet'] * 4,    
    bbox=[0, -.55, 2.5, 1.5], # table (left/right, up/down, Column Width and padding, Row Height and padding)
    loc = 'upper left') 

table.auto_set_column_width(col = [0, 1, 2, 3]) # auto fit column size
table.auto_set_font_size(False)
table.set_fontsize(12)

ax.set_title('Top 10 Apple Apps: Games Genre - Highest Rating Count Total & User Rating',
             fontweight = 'bold', fontsize = 16, loc = 'left') 
 
plt.show()

The #1 game from the apple_data_final dataset is 'Head Soccer'. The apps' developers mention that it is "a soccer game with easy controls that everyone can learn in 1 second". It was also noted from the Apple App store that this app has over 100,000,000+ downloads. Soccer is the world's most popular sport with an estimated 4 billion fans followed by cricket with an estimated 2.5 billon fans. The company could develop a two-in-one app where users have the option to play either soccer or cricket to capture fans from both sports markets.

The only overlap between the Google and Apple top 10 lists is 'PewDiePies's Tuber Simulator' which had the 2nd position in the Google FAMILY and GAME genre list and the 8th position in the Apple Games genre list. This type of app profile should be considered by the company's developers especially as "vlogging" and "tubing" continue to become even more popular and user friendly for all.

6. Summary ¶

As part of this project, third party datasets containing user data for apps currently available on the Apple App store and Google play were collected, cleaned, and analyzed. The project's intent was to provide the company's app developers with lists of highly engaged apps to consider when creating new app profiles in order to increase the liklihood of user/ad interaction to generate revenue.

The concluding primary suggestion is to focus the company's app development resources to create app profiles that are unique, no-pay-to-play, extemely user friendly vlogger/tuber simulator games and/or tools similar to 'PewDiePie's Tuber Simulator' or 'Vlogger Go Viral - Tuber Game'.

Other suggestions include developing a two-in-one soccer/cricket game, apps similar to 'Candy Crush Saga', 'My Talking Tom', and/or an RPG style game with innovative gameplay controls.

Finally, it is recommended that the company considers looking into what resources would be required to develop mobile antivirus, cleaning, and boosting apps.