Profitable Apps In IOS and GP

  • The purpose of the project is to understand what type of free to download apps attract the most users

  • The project will look at free to download apps and analyze the number of users, the genre of the app, number of ad ons and the potentional revenue.

In [1]:
def data_file(filename):
    file = open(filename)
    from csv import reader
    read_file = reader(file)
    data_list = list(read_file)
    return data_list



ios = data_file('AppleStore.csv')
android = data_file('googleplaystore.csv')
In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
In [8]:
apple = explore_data(ios,0,3)
android_dat = explore_data(android,0,3)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


In [3]:
print(ios[0])
print('\n')
print(android[0])
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

Delete row missing data from andorid 10472

In [19]:
del android[10472]   

Number of rows left after deleting error row from android

In [20]:
len(android)
Out[20]:
10840

Checking for data rows that don't match the number of columns as the header row

In [29]:
header = len(android[0])
for app in android:
    if len(app) != header:
        print(android.index(app))
    

Android data set has duplicate entries. Duplications are defined as apps with the same name but different number of reviews. To be conservative we are going to take the largest number of reviews of an app as that will be the latest and most up to date entry

In [21]:
unique_apps=[]
duplicate_apps =[]
for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)


print('Number of duplicate_apps: ' , len(duplicate_apps))
print('Duplicate Apps examples: ', duplicate_apps[1:7])
Number of duplicate_apps:  1181
Duplicate Apps examples:  ['Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits']
In [22]:
print('Expected length: ', len(android)-len(duplicate_apps))
Expected length:  9659
In [26]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
    
ValueErrorTraceback (most recent call last)
<ipython-input-26-b0c8f5542b28> in <module>()
      3 for app in android:
      4     name = app[0]
----> 5     n_reviews = float(app[3])
      6 
      7     if name in reviews_max and (reviews_max[name] < n_reviews):

ValueError: could not convert string to float: 'Reviews'
In [59]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == max_reviews[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
ValueErrorTraceback (most recent call last)
<ipython-input-59-5c35fdd331ed> in <module>()
      4 for app in android:
      5     name = app[0]
----> 6     n_reviews = float(app[3])
      7     if (n_reviews == max_reviews[name]) and (name not in already_added):
      8         android_clean.append(app)

ValueError: could not convert string to float: 'Reviews'
In [25]:
 
In [ ]: