Notebook

Free mobile apps: an exploratory data analysis¶

What free app genres are most prevalent on the App Store and Google Play Store? Which have the highest levels of user engagement? Answering this question is critical for modelling user behavior in the mobile app market.

While simply classifying apps by popularity of genre neglects other important decision-relevant information (such as effort required to build an app, revenue potential, etc), sophisticated knowledge of trends in app stores enables developers to make data-driven decisions on what kind of apps they build.

Summary of results¶

I explore and clean datasets from the App Store and Google Play store, obtained from Kaggle.
App genres are compared by number of downloads, to gauge popularity, average number of reviews, to gauge user engagement, and rating inequality, to estimate the viability of smaller apps.
Separate recommendations for promising app genres are made for both the App Store and Google Play Store.

Exploring the data¶

There are ~4 million apps between the App store and Google Play store, and comprehensive data is unavailable. To approximate both markets, we'll use publicly available samples, uploaded to Kaggle, and obtained through web scraping.

Play Store¶

The dataset is here, and contains roughly 10,000 apps. Key context and caveats:

Geographical location will influence the list of apps seen (and thus bias web scraping)
Google uses personalized algorithms based on user behavior to determine what apps are seen.

App Store¶

The dataset, containing roughly 7,000 apps. The same caveats from the Play Store apply to the App Store.

Opening and exploring¶

The two functions below, open_dataset() and explore_dataset(), streamline exploring the datasets.

Exploring the data¶

Play Store¶

The dataset is here, and contains roughly 10,000 apps. Key context and caveats:

Geographical location will influence the list of apps seen (and thus bias web scraping)
Google uses personalized algorithms based on user behavior to determine what apps are seen.

App Store¶

The dataset, containing roughly 7,000 apps. The same caveats from the Play Store apply to the App Store; note, that the App Store data originates from a more streamlined iTunes API.

Opening and exploring functions¶

The two functions below, open_dataset and explore_dataset, streamline exploring the datasets.

In [2]:

from csv import reader

def open_dataset(filename, header=True):
    """Opens a given dataset.

    Args:
        filename (str): A .csv file.
        header (bool): Indicates whether the dataset includes a header row.

    Returns:
        If the dataset contains a header returns the header row and data separately.
        If there is no header, only returns the data.
    """
    opened_file = open(filename,encoding='utf-8')
    read_file = reader(opened_file) # an iterator
    data = list(read_file)
    if header:
        return data[0], data[1:]
    else:
        return data
    
# open both datasets
play_store_header, play_store = open_dataset('googleplaystore.csv')
app_store_header, app_store = open_dataset('AppleStore.csv')

    

In [3]:

def explore_data(dataset, start, end, rows_and_columns=False):
    """Prints rows from a dataset in a readable format.

    Args:
        dataset (list): The data set to be explored, a list of lists
        start (int): Starting index of a slice of the data set
        end (int): Ending index of slice
        rows_and_columns (bool): Indicates whether to print number of rows and columns or not
        
    Returns:
        None
"""
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # add an empty row after each row
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:

# print headers, and the first couple rows for each dataset.
print("Play Store columns:\n")
print(play_store_header)
print('\n')
explore_data(play_store, 0, 2, True)

print("\nApp Store columns:\n")
print(app_store_header)
print('\n')
explore_data(app_store, 0, 2, True)

Play Store columns:

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13

App Store columns:

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16

Preliminary Observations¶

Of note, ad and microtransaction data is not included in either dataset. The columns for the Play Store are self-explanatory; Details for each column of the App Store are available on the dataset link above is less clear. Detailed descriptions of each column can be found in the App Store dataset link above.

Cleaning the datasets¶

Erroneous data¶

There is a missing category for one of the apps listed, on row 10472. Discussion about the missing value can be found on this Kaggle thread. We'll deal with this by deleting the app.

In [5]:

explore_data(play_store, 10472, 10473) # the genre is missing # column 8, category, is missing

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']

In [6]:

del play_store[10472] # run only once

In [7]:

explore_data(play_store, 10472, 10473) # the app is replaced  # the app has been replaced

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']

Duplicate rows in the data¶

There are duplicate apps in the Play Store dataset.

While the App Store uses an indexed appendix-like page suitable for web scraping, the Play Store relies on modern techniques like dynamic page load that make scraping difficult. In this section, I'll consolidate all groups of duplicates into the app version that has the greatest amount of reviews.

In [8]:

# create a list of duplicate apps
duplicates = []
unique_apps = []
for app in play_store:
    name = app[0]
    if name in unique_apps:
        duplicates.append(name)
    else:
        unique_apps.append(name)

In [9]:

print('There are', len(duplicates), 'duplicates.')

There are 1181 duplicates.

In [10]:

# the app version with the most reviews is probably the most recent version
reviews_max = {}
for app in play_store:
    name = app[0]
    n_reviews = float(app[3])
    if(name in reviews_max and reviews_max[name] < n_reviews or
        name not in reviews_max): # update dictionary if we have a new max, or if there is no key yet
        reviews_max[name] = n_reviews

In [11]:

print('Expected length:', len(play_store) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659

In [12]:

# clean the dataset and create play_store_clean
already_added = []
play_store_clean = []
for app in play_store:
    name = app[0]
    n_reviews = float(app[3])
    if(name in reviews_max and n_reviews == reviews_max[name]
        and name not in already_added):
        play_store_clean.append(app)
        already_added.append(name)
explore_data(play_store_clean,0,2,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13

Non-English Apps¶

Some of the apps in both stores are non-English. To simplify our analysis, we'll only consider apps directed towards English-speaking audiences by filtering out app names with non-English characters.

In [13]:

def check_ASCII(name):
    '''Checks if there's any character in a string that is outside of the ASCII set.
    
    Args: 
        name (str): The name of the app to be checked.
    Returns:
        A boolean indicating whether the string contains only ASCII characters.
    '''
    for char in name:
        if ord(char) > 127:
            return False
    return True

In [14]:

print(check_ASCII("Instagram"))
print(check_ASCII('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False

The check_ASCII function is working well. However, some English apps can have characters from outside the ASCII set:

In [15]:

print(check_ASCII('Instachat 😜'))
print(check_ASCII('Docs To Go™ Free Office Suite'))

False
False

Let's relax the criteria of the check_ASCII, such that only if four or more characters are outside the ASCII set, the function returns False.

In [16]:

def check_ASCII(name):
    '''Checks if there's any character in a string that is outside of ASCII.
    
    Args: 
        name (str): The name of the app to be checked.
    Returns:
        A boolean indicating whether the string contains only ASCII characters.
    '''
    counter = 0
    for char in name:
        if ord(char) > 127:
            counter +=1
    if counter > 3:
        return False
    else:
        return True

In [17]:

print(check_ASCII('Instachat 😜'))
print(check_ASCII('Docs To Go™ Free Office Suite'))

True
True

Let's now filter the dataset, so that we only have English apps, and store the data in lists play_store_English and app_store_English.

In [18]:

play_store_English = []
app_store_English = []
for app in play_store_clean:
    name = app[0]
    if check_ASCII(name) == True:
        play_store_English.append(app)

for app in app_store:
    name = app[1] 
    if check_ASCII(name) == True:
        app_store_English.append(app)
    

Filtering out free apps¶

Let's filter out free apps, by conditionally adding them to new lists based on if their price is 0.

In [19]:

play_store_free = []
app_store_free = []
for app in play_store_English:
    if app[7] == '0':
        play_store_free.append(app)

for app in app_store_English:
    if app[4] == '0.0':
        app_store_free.append(app)
    

The most popular app genres¶

Which app genres are popular on both markets? We'll create a frequency table to assess the percentage of apps among different genres. We'll analyze the genre breakdown for both the Google Play Store and App Store, identify similarities and differences, and derive conclusions.

In [20]:

def freq_table(dataset, index):
    '''Creates a frequency table made of percentages given a dataset and category.
    Args:
        dataset: A list of lists
        index (int): The category from which to generate a frequency table
    Returns:
        A dictionary that associates values with their relative prevalence in a dataset 
    '''
    ft = {}
    for row in dataset:
        row[index]
        if row[index] in ft:
            ft[row[index]] += 1
        else:
            ft[row[index]] = 1
    # make percentages
    for key in ft:
        ft[key] = ft[key] * 100 / len(dataset)
    return ft

In [21]:

def display_table(dataset, index):
    '''Displays a sorted frequency table in a readable format.
    Args:
        dataset: A list of lists
        index (int): The category from which to display a frequency table
    Returns:
        None
    '''
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        if entry[0] > 1: # anything greater than 1% of apps
            print(entry[1], ':', round(entry[0],2)) # displays to two decimal places.

In [22]:

display_table(play_store_free, 1) # Categories
print('\n')
display_table(play_store_free, -4) # Genre

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16


Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13

App genres on the Google Play Store¶

Categories are more inclusive than genres, which are relatively granular. While family and game apps are common, we see that there is a significant portion of practical apps, in tools, business, productivity, finance, etc. Let's visualize the categories:

In [23]:

import matplotlib.pyplot as plt
import seaborn as sns
category_ft = freq_table(play_store_free, 1)
test_x = ["a", 'b','c','d']
test_y = [1,2,3,4]

fig, ax = plt.subplots(figsize = (12,2))
sns.barplot(list(category_ft.keys()), list(category_ft.values()))
plt.xticks(rotation = 90)

plt.show()

App genres on the App Store¶

We'll do a similar analysis for the App Store:

In [24]:

display_table(app_store_free, -5)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12

In [25]:

category_ft = freq_table(app_store_free, -5)
sns.barplot(list(category_ft.keys()), list(category_ft.values()))
plt.xticks(rotation = 90)
plt.show()

User engagement on the App Store¶

We see that games dominate the App Store, in terms of sheer number of apps. The saturation in one genre contrasts with the Google Play Store. Furthermore, the prevalent genres are more geared towards fun; for example, entertainment, photos & videos, and social networking.

It's important to keep in mind that this is just a slice of a data, from a sample, and only including free and English-language apps. Number of apps also does not reveal number of users and engagement.

A better proxy for user engagement is number of ratings:

In [26]:

app_store_ft = freq_table(app_store_free, -5)
# calculate number of ratings

def genre_avg_n_ratings(dataset, genre_index, n_ratings_index):
    '''Prints the average number of ratings for a given genre, and generate a bar plot
    Args:
        dataset: A list of lists
        genre_index (int): index for the genre column (1 for Play Store, -5 for App Store)
        n_ratings_index (int): index for the n_ratings column (3 for Play Store, 5 for App Store)
    Returns:
        None

    '''
    store_ft = freq_table(dataset, genre_index)
    x_genres = []
    y_n_ratings = []
    for genre in store_ft:
        x_genres.append(genre)
        total = 0
        len_genre = 0
        for app in dataset:
            genre_app = app[genre_index]
            if genre_app == genre:
                n_ratings = float(app[n_ratings_index])
                total += n_ratings
                len_genre +=1
        avg_n_ratings = round(total / len_genre, 1)
        y_n_ratings.append(avg_n_ratings)
        print(genre + ": " + str(avg_n_ratings))
    
    sns.barplot(x_genres, y_n_ratings)
    plt.xticks(rotation = 90)
    plt.show()

genre_avg_n_ratings(app_store_free, -5, 5)

Social Networking: 71548.3
Photo & Video: 28441.5
Games: 22788.7
Music: 57326.5
Reference: 74942.1
Health & Fitness: 23298.0
Weather: 52279.9
Utilities: 18684.5
Travel: 28243.8
Shopping: 26919.7
News: 21248.0
Navigation: 86090.3
Lifestyle: 16485.8
Entertainment: 14029.8
Food & Drink: 33333.9
Sports: 23008.9
Book: 39758.5
Finance: 31467.9
Education: 7004.0
Productivity: 21028.4
Business: 7491.1
Catalogs: 4004.0
Medical: 612.0

Navigation, music, and social networking apps have a high average number of ratings. Games also have high engagement -- considering the vast number of games on the platform, ~23k reviews per app is very respectable.

Based on average number of ratings some genres show promise in terms of user engagement: social networking, reference, and navigation. However, these are app genres that are saturated by huge companies. To demonstrate, let's print the top 10 apps for these genres:

In [27]:

def print_top_apps(genre, dataset = app_store_free, n_ratings_index=5, name_index=1, genre_index=-5, num=10):
    '''Prints num of the top apps in a genre by number of ratings. Default args are for the App Store.
    Args:
        genre (str): Category of app
        dataset (list): The list of rows corresponding to apps
        n_ratings_index (int): The column index for number of ratings (3 for the Play Store)
        name_index (int): The column index for the name of the app (0 for the Play Store)
        genre_index (int): Column index for genre (1 for the Play Store)
        num (int): How many apps to print
    Returns:
        None
    '''
        
    counter = 0
    print(genre)
    for app in dataset: 
        if app[genre_index] == genre:
            print(app[name_index] + ': ' + app[n_ratings_index])
            counter +=1
        if counter == num: # only want to print num apps
            break
print_top_apps('Social Networking')
print('\n')
print_top_apps('Reference')
print('\n')
print_top_apps('Navigation')

Social Networking
Facebook: 2974676
Pinterest: 1061624
Skype for iPhone: 373519
Messenger: 351466
Tumblr: 334293
WhatsApp Messenger: 287589
Kik: 260965
ooVoo – Free Video Call, Text and Voice: 177501
TextNow - Unlimited Text + Calls: 164963
Viber Messenger – Text & Call: 164249


Reference
Bible: 985920
Dictionary.com Dictionary & Thesaurus: 200047
Dictionary.com Dictionary & Thesaurus for iPad: 54175
Google Translate: 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran: 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition: 17588
Merriam-Webster Dictionary: 16849
Night Sky: 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE): 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools: 4693


Navigation
Waze - GPS Navigation, Maps & Real-time Traffic: 345046
Google Maps - Navigation & Transit: 154911
Geocaching®: 12811
CoPilot GPS – Car Navigation & Offline Maps: 3582
ImmobilienScout24: Real Estate Search in Germany: 187
Railway Route Search: 5

Apps like Facebook, the Bible, and Waze etc heavily skew the average ratings upward for these app genres For games, this is much less pronounced:

In [28]:

print_top_apps('Games')

Games
Clash of Clans: 2130805
Temple Run: 1724546
Candy Crush Saga: 961794
Angry Birds: 824451
Subway Surfers: 706110
Solitaire: 679055
CSR Racing: 677247
Crossy Road - Endless Arcade Hopper: 669079
Injustice: Gods Among Us: 612532
Hay Day: 567344

The 10th game on the list, Hay Day, has more than three times the ratings as the 10th social networking app, Viber Messenger, despite the large average number of ratings for social networking apps.

Ratings inequality in the App Store¶

Let's be a little more systematic and design a function to calculate the ratio between the number of ratings of the top app over the number ratings of the 20th app to get an rough idea of which genres are most tractable for small developers.

In [29]:

def rating_inequality(genre, dataset = app_store_free, n_ratings_index=5, name_index=1, genre_index=-5, num=20):
    '''Prints num of the top apps in a genre by number of ratings. Default args are for the App Store.
       Genres without a 20th ranking app are not included.
    Args:
        genre (str): Category of app
        dataset (list): The list of rows corresponding to apps
        n_ratings_index (int): The column index for number of ratings (3 for the Play Store)
        name_index (int): The column index for the name of the app (0 for the Play Store)
        genre_index (int): Column index for genre (1 for the Play Store)
        num (int): The app rank to compare to the top app
    Returns:
        None
    '''
        
    counter = 0
    for app in dataset: 
        if app[genre_index] == genre:
            if counter == 0: top_rating = float(app[n_ratings_index])
            counter +=1
        if counter == num: 
            num_rating = float(app[n_ratings_index])
            print(genre, '-', top_rating / num_rating)                 
            return top_rating / num_rating
            break

In [30]:

for genre in app_store_ft:
    rating_inequality(genre)

Social Networking - 60.08232680266613
Photo & Video - 63.72329825182041
Games - 5.261155980020098
Music - 78.97946453602466
Health & Fitness - 74.57491186839013
Weather - 13395.297297297297
Utilities - 51.725105189340816
Travel - 489.23793859649123
Shopping - 17.232263652862564
News - 511.6445086705202
Lifestyle - 154.00493938033227
Entertainment - 5.581047381546135
Food & Drink - 10477.793103448275
Sports - 17.7599023497101
Finance - 154.8937583001328
Education - 22.60678060302904
Productivity - 12.586153004610455

Analysis and recommendations for the App Store¶

By this metric, games, entertainment, productivity, shopping, and education apps have promising opportunities for smaller developers. Given the dominance of games and entertainment app on the App Store, iOS developers should remain focused on these genres. Other than these two genres, education and shopping apps are also particularly attractive due to their small, but significant proportion in the App Store.

While the gaming app market on the App Store is relatively saturated, the solid average number of user ratings make it the primary genre to recommend for aspriring iOS developers interested in growth.

User engagement on the Play Store¶

Let's repeat a similar analysis for the Play Store, and make recommendations. The hard work for writing functions is already done, and the process is straightforward:

In [31]:

# the Play Store is not ordered by reviews
play_store_free = sorted(play_store_free, key = lambda x: float(x[3]), reverse=True)

# calculate number of ratings
play_store_ft = freq_table(play_store_free, 1)
genre_avg_n_ratings(play_store_free, 1, 3)

SOCIAL: 965831.0
COMMUNICATION: 995608.5
GAME: 683523.8
TOOLS: 305732.9
VIDEO_PLAYERS: 425350.1
NEWS_AND_MAGAZINES: 93088.0
PHOTOGRAPHY: 404081.4
FAMILY: 113143.0
TRAVEL_AND_LOCAL: 129484.4
PERSONALIZATION: 181122.3
MAPS_AND_NAVIGATION: 142860.0
SHOPPING: 223887.3
ENTERTAINMENT: 301752.2
PRODUCTIVITY: 160634.5
HEALTH_AND_FITNESS: 78095.0
SPORTS: 116938.6
BOOKS_AND_REFERENCE: 87995.1
LIFESTYLE: 33921.8
WEATHER: 171250.8
FINANCE: 38535.9
BUSINESS: 24239.7
EDUCATION: 56293.1
FOOD_AND_DRINK: 57478.8
COMICS: 42585.6
PARENTING: 16378.7
DATING: 21953.3
HOUSE_AND_HOME: 26435.5
LIBRARIES_AND_DEMO: 10925.8
ART_AND_DESIGN: 24699.4
AUTO_AND_VEHICLES: 14140.3
MEDICAL: 3730.2
BEAUTY: 7476.2
EVENTS: 2555.8

The Play Store data is much more mixed in terms of average number of reviews. Social, communication, and gaming apps have the majority of user ratings.

Ratings inequality in the Play Store¶

Let's get a little more context on the raw number of reviews withour ratings inequality function:

In [32]:

for genre in play_store_ft:
    rating_inequality(genre, play_store_free, 3, 1, 1)

SOCIAL - 86.83638719024425
COMMUNICATION - 27.142346760262615
GAME - 7.2420067767956455
TOOLS - 23.084384774476028
VIDEO_PLAYERS - 59.57441546710384
NEWS_AND_MAGAZINES - 62.767118202750105
PHOTOGRAPHY - 7.483839421088904
FAMILY - 5.558485710710575
TRAVEL_AND_LOCAL - 42.679691110412776
PERSONALIZATION - 15.392347329070624
MAPS_AND_NAVIGATION - 128.07687131448
SHOPPING - 16.913286503852543
ENTERTAINMENT - 27.59062364112573
PRODUCTIVITY - 6.273359122845857
HEALTH_AND_FITNESS - 16.8683248610772
SPORTS - 8.501647466164416
BOOKS_AND_REFERENCE - 13.846642347554313
LIFESTYLE - 29.08894218236797
WEATHER - 64.26945799457995
FINANCE - 9.933793930809202
BUSINESS - 15.02377179080824
EDUCATION - 14.750612418787943
FOOD_AND_DRINK - 15.130145012450564
COMICS - 199.43823760818253
PARENTING - 339.2201030927835
DATING - 12.91033742101451
HOUSE_AND_HOME - 37.313125
LIBRARIES_AND_DEMO - 149.5195857721747
ART_AND_DESIGN - 194.49077733860344
AUTO_AND_VEHICLES - 53.349028840494405
MEDICAL - 14.604108309990663
BEAUTY - 117.9616182572614
EVENTS - 70.00523560209425

Analysis and recommendations for the Play Store¶

Based on rating inequality, games, photography, family, productivity, sports, and finance apps are promising opportunities for developers.

Games and photography apps are especially notable because of their high volume of ratings, and low ratio between the top app and 20th app in number of user ratings.

In [39]:

# Let's see some examples:
print_top_apps('GAME',play_store_free,3,0,1, 15)
print_top_apps('PHOTOGRAPHY',play_store_free,3,0,1,15)

GAME
Clash of Clans: 44893888
Subway Surfers: 27725352
Clash Royale: 23136735
Candy Crush Saga: 22430188
My Talking Tom: 14892469
8 Ball Pool: 14201891
Shadow Fight 2: 10981850
Pou: 10486018
Pokémon GO: 10424925
Yes day: 10055521
Dream League Soccer 2018: 9883806
My Talking Angela: 9883367
Hill Climb Racing: 8923847
Asphalt 8: Airborne: 8389714
Mobile Legends: Bang Bang: 8219586
PHOTOGRAPHY
Google Photos: 10859051
PicsArt Photo Studio: Collage Maker & Pic Editor: 7594559
PhotoGrid: Video & Pic Collage Maker, Photo Editor: 7529865
Retrica: 6120977
B612 - Beauty & Filter Camera: 5282578
Camera360: Selfie Photo Editor with Funny Sticker: 4865132
Candy Camera - selfie, beauty camera, photo editor: 3368705
YouCam Makeup - Magic Selfie Makeovers: 3337956
BeautyPlus - Easy Photo Editor & Selfie Camera: 3158151
Cymera Camera- Photo Editor, Filter,Collage,Layout: 2418165
Video Editor Music,Cut,No Crop: 2163282
Photo Editor Pro: 1871421
Keepsafe Photo Vault: Hide Private Photos & Videos: 1656808
YouCam Perfect - Selfie Photo Editor: 1579343
Photo Lab Picture Editor: face effects, art frames: 1536512

Conclusions¶

The App Store and Google Play store have a crucial difference in terms of their distributions of free app genres: the App Store is much more saturated by games to the point where it is hard to make other genre recommendations for small developers.

While developing games is a fair choice for the Google Play store, other genres like photography have high potential for developers.

Wild speculation: there's a variety of selfie apps and collage makers in the more popular photography apps on the Play Store -- perhaps a selfie-collage maker could gain traction?

There are still unanswered questions and possible improvements after this initial analysis. Here are some threads:

How does only looking at free apps skew the data? (the App Store has many more paid apps).
What apps make the most profits? (factoring in microtransactions and ads)