In this project, we are interested in exploring the market of mobile apps on behalf of a company that develops free Android and iOS apps to be added on Apple Store and Google Play.
Our goal in this project is to analyze data on the existent apps on Google Play and in the App Store to identify the features that are likely to attract more users.
We used data on thousands of Android and iOS apps and their characteristics in order to determine:
Which genre of app represents a larger percentage of our datasets?
Which kind of app has the highest number of users?
Which genre of app has the highest average ratings?
We then combined these results to identify a possible profile for the next app to be launched on the market.
We managed to identify some categories that have the requisites our company is looking for, namely:
They rank high in the users' preferences based on number of ratings and number of installations in both Apple Store and Google Play;
They have a medium average rating, which makes us think there is room for improvement and the niche is not yet saturated.
These categories can be explored one by one, or even combined together, to offer a new and attractive product that can engage our users.
We used data from two publicly available dataset:
A dataset containing data about approximately 10,000 Android apps from Google Play, updated in August 2018;
A dataset containing data about approximately 7,000 iOS apps from Apple Store, updated in July 2017.
Although these datasets only include a relatively small sample of all the existing apps, their availability made it convenient for us to use them in our project. The sample is still large enough to justify the generalization of our findings.
Let's start exploring our datasets. First of all, we need to open the two files:
#opening and reading files
import csv
read_file = csv.reader(open('googleplaystore.csv'))
google_data = list(read_file)
read_file = csv.reader(open('AppleStore.csv'))
apple_data = list(read_file)
Now let's define a function that allows us to visualize the two datasets in the most readable form:
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
We'll use the function to explore the first rows of each dataset, and to identify the number of rows and columns in each:
#print the first five rows of the Google dataset
#print the number of rows and columns
explore_data(google_data, 0, 5, 1)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] Number of rows: 10842 Number of columns: 13
#print the first five rows of the Google dataset
#print the number of rows and columns
explore_data(apple_data, 0, 5, 1)
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'] ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'] Number of rows: 7198 Number of columns: 16
In this part, we want to clean our data by eliminating mistakes, typos, duplicates, and then selecting the subset of our data that is relevant to our analysis.
First of all, we check for duplicates and errors. In the Discussion relative to the dataset, it is referred that there is a mistake in line 10472 (which becomes 10473 because of the header) so we check that the mistake is actually there (one of the entries is missing, causing a shift in the other columns) and then delete the row.
print(google_data[10473])
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
#delete the line containing the error
del google_data[10473]
Now we want to check whether there are duplicates in the data. A way to do this is by creating a list for all the names that are mentioned more than once, distinct from the list of all the names that are mentioned only once.
#create the two empty lists for unique names and duplicates
names = []
copies = []
#loop over the data excluding the header
for app in apple_data[1:]:
#if app ID is duplicate, add to the list of copies
if app[0] in names:
copies.append(app[0])
#else, add to the list of unique names
else:
names.append(app[0])
Then we print the length of the complete list and the length of the list of unique names, to check that the numbers add up (including the header).
print(copies)
print(len(apple_data))
print(len(names))
[] 7198 7197
We see that in the AppleStore dataset there are no two rows with the same ID (the only difference between the complete list and the list of unique names is given by the header).
Now let's do the same for the Google store dataset:
#create the two empty lists for unique names and duplicates
names = []
copies = []
#loop over the data excluding the header
for app in google_data[1:]:
#if app name, add to the list of copies
if app[0] in names:
copies.append(app[0])
#else, add to the list of unique names
else:
names.append(app[0])
And again check the numbers:
print(copies)
print(len(google_data))
print(len(names))
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express', 'Accounting App - Zoho Books', 'Invoice & Time Tracking - Zoho', 'join.me - Simple Meetings', 'Invoice 2go — Professional Invoices and Estimates', 'SignEasy | Sign and Fill PDF and other Documents', 'Quick PDF Scanner + OCR FREE', 'Genius Scan - PDF Scanner', 'Tiny Scanner - PDF Scanner App', 'Fast Scanner : Free PDF Scan', 'Mobile Doc Scanner (MDScan) Lite', 'TurboScan: scan documents and receipts in PDF', 'Tiny Scanner Pro: PDF Doc Scan', 'Docs To Go™ Free Office Suite', 'OfficeSuite : Free Office + PDF Editor', 'Slack', 'QuickBooks Accounting: Invoicing & Expenses', 'WhatsApp Messenger', 'Messenger – Text and Video Chat for Free', 'imo free video calls and chat', 'Viber Messenger', 'Hangouts', 'WeChat', 'Telegram', 'Who', 'Google Voice', 'Android Messages', 'Firefox Focus: The privacy browser', 'Google Allo', 'Google Chrome: Fast & Secure', 'Firefox Browser fast & private', 'Puffin Web Browser', 'Opera Browser: Fast and Secure', 'Opera Mini - fast web browser', 'UC Browser Mini -Tiny Fast Private & Secure', 'UC Browser - Fast Download Private & Secure', 'Calls & Text by Mo+', 'Viber Messenger', 'Call Blocker', 'Gmail', 'Yahoo Mail – Stay Organized', 'Hangouts', 'imo free video calls and chat', 'free video calls and chat', 'Viber Messenger', 'Skype - free IM & video calls', 'WeChat', 'Glide - Video Chat Messenger', 'Talkray - Free Calls & Texts', 'LINE: Free Calls & Messages', 'KakaoTalk: Free Calls & Text', 'OkCupid Dating', 'CMB Free Dating App', 'Hily: Dating, Chat, Match, Meet & Hook up', 'Hinge: Dating & Relationships', 'BBW Dating & Plus Size Chat', 'Casual Dating & Adult Singles - Joyride', 'EliteSingles – Dating for Single Professionals', 'Clover Dating App', 'Moco - Chat, Meet People', 'Hot or Not - Find someone right now', 'Just She - Top Lesbian Dating', 'Once - Quality Matches Every day', 'Sudy – Meet Elite & Rich Single', 'muzmatch: Muslim & Arab Singles, Marriage & Dating', 'Chispa, the Dating App for Latino, Latina Singles', 'Blendr - Chat, Flirt & Meet', 'Find Real Love — YouLove Premium Dating', 'Cougar Dating Life : Date Older Women Sugar Mummy', 'stranger chat - anonymous chat', 'Millionaire Match: Rich Singles Dating App', 'Dating for 50 plus Mature Singles – FINALLY', 'Moco+ - Chat, Meet People', 'Chat Rooms, Avatars, Date - Galaxy', 'FastMeet: Chat, Dating, Love', 'Christian Dating For Free App', 'Meet24 - Love, Chat, Singles', 'Black White Interracial Dating - Interracial Match', 'Gay Sugar Daddy Dating & Hookup – Sudy Gay', 'Adult Dirty Emojis', 'Hide App, Private Dating, Safe Chat - PrivacyHider', 'Meet4U - Chat, Love, Singles!', '95Live -SG#1 Live Streaming App', 'Just She - Top Lesbian Dating', 'Hily: Dating, Chat, Match, Meet & Hook up', 'O-Star', 'Random Video Chat', 'Black People Meet Singles Date', 'Howlr', 'Free Dating & Flirt Chat - Choice of Love', 'Cardi B Live Stream Video Chat - Prank', 'Chat Kids - Chat Room For Kids', 'muzmatch: Muslim & Arab Singles, Marriage & Dating', 'BBW Dating & Plus Size Chat', 'Transenger – Ts Dating and Chat for Free', 'BBW Dating & Curvy Singles Chat- LargeFriends', 'MouseMingle', 'FlirtChat - ♥Free Dating/Flirting App♥', 'Live Talk - Free Text and Video Chat', 'Adult Dirty Emojis', 'Free Cam Girls - Live Webcam', 'Random Video Chat App With Strangers', 'Live Girls Talk - Free Video Chat', 'Girls Live Chat - Free Text & Video Chat', 'Free Dating App - Meet Local Singles - Flirt Chat', 'iPair-Meet, Chat, Dating', 'Free Dating Hook Up Messenger', 'Free Dating App - YoCutie - Flirt, Chat & Meet', 'Khan Academy', 'TED', 'Lumosity: #1 Brain Games & Cognitive Training App', 'Udemy - Online Courses', 'Khan Academy', 'TED', 'Duolingo: Learn Languages Free', 'Quizlet: Learn Languages & Vocab with Flashcards', 'Coursera: Online courses', 'Udemy - Online Courses', 'Udacity - Lifelong Learning', 'edX - Online Courses by Harvard, MIT & more', 'Lynda - Online Training Videos', 'Learn languages, grammar & vocabulary with Memrise', 'Brilliant', 'Babbel – Learn Languages', 'Duolingo: Learn Languages Free', 'Rosetta Stone: Learn to Speak & Read New Languages', 'Learn languages, grammar & vocabulary with Memrise', 'Learn English with Wlingua', 'Quizlet: Learn Languages & Vocab with Flashcards', 'Google Classroom', 'Duolingo: Learn Languages Free', 'Learn 50 languages', 'Mango Languages: Lovable Language Courses', 'Rosetta Stone: Learn to Speak & Read New Languages', 'Learn languages, grammar & vocabulary with Memrise', 'Babbel – Learn Languages', 'busuu: Learn Languages - Spanish, English & More', 'My Class Schedule: Timetable', 'Socratic - Math Answers & Homework Help', 'Google Classroom', 'ClassDojo', 'HelloTalk — Chat, Speak & Learn Foreign Languages', 'Quizlet: Learn Languages & Vocab with Flashcards', 'busuu: Learn Languages - Spanish, English & More', 'Rosetta Stone: Learn to Speak & Read New Languages', 'Movies by Flixster, with Rotten Tomatoes', 'IMDb Movies & TV', 'Netflix', 'IMDb Movies & TV', 'Netflix', 'Tubi TV - Free Movies & TV', 'Crunchyroll - Everything Anime', 'STARZ', 'Crackle - Free TV & Movies', 'CBS - Full Episodes & Live TV', 'Nick', 'Hulu: Stream TV, Movies & more', 'FOX NOW - On Demand & Live TV', 'The CW', 'HISTORY: Watch TV Show Full Episodes & Specials', 'HBO NOW: Stream TV & Movies', 'A&E - Watch Full Episodes of TV Shows', 'VH1', 'Lifetime - Watch Full Episodes & Original Movies', 'BET NOW - Watch Shows', 'Netflix', 'Tubi TV - Free Movies & TV', 'Crackle - Free TV & Movies', 'Crunchyroll - Everything Anime', 'Nick', 'STARZ', 'Hulu: Stream TV, Movies & more', 'Food Network', 'FOX NOW - On Demand & Live TV', 'Movies by Flixster, with Rotten Tomatoes', 'HISTORY: Watch TV Show Full Episodes & Specials', 'Viki: Asian TV Dramas & Movies', 'The CW', 'Univision NOW - Live TV and On Demand', 'HBO NOW: Stream TV & Movies', 'A&E - Watch Full Episodes of TV Shows', 'VH1', 'SHOWTIME', 'MTV', 'Lifetime - Watch Full Episodes & Original Movies', 'Comedy Central', 'BET NOW - Watch Shows', 'FOX', 'Telemundo Now', 'Viki: Asian TV Dramas & Movies', 'Nick', 'Fandango Movies - Times + Tickets', 'Google Pay', 'Wells Fargo Daily Change', 'Credit Karma', 'Robinhood - Investing, No Fees', 'Digit Save Money Automatically', 'Acorns - Invest Spare Change', 'Money Lover: Expense Tracker, Budget Planner', 'Mint: Budget, Bills, Finance', 'Simple - Better Banking', 'PayPal', 'Google Pay', 'Wells Fargo Mobile', 'Capital One® Mobile', 'Grubhub: Food Delivery', 'Postmates Food Delivery: Order Eats & Alcohol', "Domino's Pizza USA", 'Chick-fil-A', 'Zomato - Restaurant Finder and Food Delivery App', 'Run with Map My Run', 'Weight Loss Running by Verv', 'Nike+ Run Club', 'Runtastic Running App & Mile Tracker', '8fit Workouts & Meal Planner', 'Daily Yoga - Yoga Fitness Plans', 'Pocket Yoga', 'Calorie Counter - MyFitnessPal', 'Weight Loss Running by Verv', 'Nike+ Run Club', 'Seven - 7 Minute Workout Training Challenge', 'Weight Watchers Mobile', 'Walk with Map My Walk', 'Workout Trainer: fitness coach', 'Run with Map My Run', 'Nike Training Club - Workouts & Fitness Plans', 'Fitbit Coach', 'Calorie Counter - MyFitnessPal', 'Endomondo - Running & Walking', 'Runkeeper - GPS Track Run Walk', 'Nike+ Run Club', 'Runtastic Running App & Mile Tracker', 'Calorie Counter - MyFitnessPal', 'Lose It! - Calorie Counter', 'Calorie Counter - MyNetDiary', '10 Best Foods for You', 'MyPlate Calorie Tracker', 'Weight Loss Tracker - RecStyle', 'Calorie Counter by FatSecret', 'Calorie Counter - Macros', 'My Diet Diary Calorie Counter', 'Lark - 24/7 Health Coach', 'Weight Watchers Mobile', 'Calorie Counter & Diet Tracker', 'MealLogger-Photo Food Journal', 'Health and Nutrition Guide', 'Food Calorie Calculator', 'Calorie Counter - MyFitnessPal', 'Lose It! - Calorie Counter', 'Relax Meditation: Sleep with Sleep Sounds', 'Meditation Music - Relax, Yoga', '21-Day Meditation Experience', 'Fabulous: Motivate Me! Meditate, Relax, Sleep', 'Calm - Meditate, Sleep, Relax', 'Relax Melodies: Sleep Sounds', 'Simple Habit Meditation', 'Headspace: Meditation & Mindfulness', 'Daily Yoga - Yoga Fitness Plans', 'Houzz Interior Design Ideas', 'Mortgage by Zillow: Calculator & Rates', 'Redfin Real Estate', 'Apartment List: Housing, Apt, and Property Rentals', 'Realtor.com Real Estate: Homes for Sale and Rent', 'Trulia Real Estate & Rentals', 'Zillow: Find Houses for Sale & Apartments for Rent', 'Apartments & Rentals - Zillow', 'Trulia Rent Apartments & Homes', 'Apartments.com Rental Search', 'Houzz Interior Design Ideas', 'Vaniday - Beauty Booking App', 'StyleSeat', 'JOANN - Crafts & Coupons', 'Fashion in Vogue', 'Wheretoget: Shop in style', 'My Dressing - Fashion closet', 'Chictopia', 'Scarf Fashion Designer', 'Fashion in Vogue', 'Zara', 'Subway Surfers', 'ROBLOX', 'Pou', '8 Ball Pool', 'Clash of Clans', 'Candy Crush Saga', 'Plants vs. Zombies FREE', 'My Talking Angela', 'Bubble Shooter', 'Word Search', 'Candy Crush Soda Saga', 'Fishdom', 'Block Puzzle', 'Clash Royale', 'Sniper 3D Gun Shooter: Free Shooting Games - FPS', 'Granny', 'Galaxy Attack: Alien Shooter', 'Angry Birds Rio', 'Zombie Catchers', 'Zombie Hunter King', 'Temple Run 2', 'Zombie Tsunami', 'Farm Heroes Saga', 'Super Jim Jump - pixel 3d', 'slither.io', 'Angry Birds Classic', 'Flow Free', 'ROBLOX', 'Helix Jump', 'Subway Surfers', 'Candy Crush Saga', 'Toon Blast', 'Granny', '8 Ball Pool', 'Sniper 3D Gun Shooter: Free Shooting Games - FPS', 'slither.io', 'Temple Run 2', 'Kick the Buddy', 'Magic Tiles 3', 'Bowmasters', 'Wordscapes', 'Block Craft 3D: Building Simulator Games For Free', 'Helix Jump', 'PUBG MOBILE', 'Wordscapes', 'DRAGON BALL LEGENDS', 'ROBLOX', 'Candy Crush Saga', '8 Ball Pool', 'Harry Potter: Hogwarts Mystery', 'PUBG MOBILE', 'MARVEL Strike Force', 'Merge Dragons!', 'Honkai Impact 3rd', 'Candy Crush Saga', 'ROBLOX', '8 Ball Pool', 'Subway Surfers', 'Candy Crush Soda Saga', 'Zombie Hunter King', 'Bubble Shooter', 'Toon Blast', 'Toy Blast', 'Clash Royale', 'Clash of Clans', 'Farm Heroes Saga', 'Plants vs. Zombies FREE', 'Word Search', 'Block Puzzle', 'Super Jim Jump - pixel 3d', 'Pou', 'Temple Run 2', 'Flow Free', 'Homescapes', 'Wordscapes', 'My Talking Angela', 'slither.io', 'Cooking Fever', 'Gardenscapes', 'Fishdom', 'Galaxy Attack: Alien Shooter', 'Score! Hero', 'Zombie Catchers', 'Magic Tiles 3', 'Granny', 'Dream League Soccer 2018', 'Fruits Bomb', 'Angry Birds Classic', 'Talking Tom Gold Run', 'Bowmasters', 'My Talking Tom', 'Hill Climb Racing', 'Sniper 3D Gun Shooter: Free Shooting Games - FPS', 'Pixel Art: Color by Number Game', 'Rider', 'Zombie Tsunami', 'Garena Free Fire', 'Subway Surfers', 'Helix Jump', 'Temple Run 2', 'slither.io', 'Bowmasters', 'Talking Tom Gold Run', 'Zombie Catchers', 'Sniper 3D Gun Shooter: Free Shooting Games - FPS', 'Miraculous Ladybug & Cat Noir - The Official Game', 'Kick the Buddy', 'DRAGON BALL LEGENDS', 'Zombie Hunter King', 'Garena Free Fire', 'Candy Crush Saga', 'Plants vs. Zombies FREE', 'Block Puzzle', 'Helix Jump', '8 Ball Pool', 'Bubble Shooter', 'Solitaire', 'Traffic Racer', 'Hill Climb Racing', 'Earn to Die 2', 'Bubble Shooter 2', 'Flow Free', 'Zombie Catchers', 'Angry Birds Rio', 'Candy Crush Jelly Saga', 'Cut the Rope FULL FREE', 'Jewels Star: OZ adventure', 'Hungry Shark Evolution', 'Angry Birds Classic', 'Best Fiends - Free Puzzle Game', 'Hill Climb Racing 2', 'Swamp Attack', 'Bowmasters', 'Magic Tiles 3', 'Block Puzzle Classic Legend !', 'Pixel Art: Color by Number Game', 'Score! Hero', 'Zombie Tsunami', 'DEAD TARGET: FPS Zombie Apocalypse Survival Games', 'Word Search', 'Farm Heroes Saga', 'YouTube Kids', 'Candy Bomb', 'ROBLOX', 'Solitaire', 'Princess Coloring Book', 'Hello Kitty Nail Salon', 'Dog Run - Pet Dog Simulator', 'Coloring book moana', 'Bubble Shooter', 'Barbie™ Fashion Closet', 'Minion Rush: Despicable Me Official Game', 'No.Draw - Colors by Number 2018', 'Duolingo: Learn Languages Free', 'Super ABC! Learning games for kids! Preschool apps', 'ROBLOX', 'PJ Masks: Moonlight Heroes', 'Minion Rush: Despicable Me Official Game', 'Dog Run - Pet Dog Simulator', 'Hot Wheels: Race Off', 'Farming Simulator 14', 'Mcqueen Coloring pages', 'Monster Truck Driver & Racing', 'Strawberry Shortcake BerryRush', 'DC Super Hero Girls™', 'Toca Kitchen 2', 'DC Super Hero Girls™', 'Strawberry Shortcake BerryRush', 'Disney Magic Kingdoms: Build Your Own Magical Park', 'Toca Life: City', 'Papumba Academy - Fun Learning For Kids', 'Kids Balloon Pop Game Free 🎈', 'Sounds for Toddlers FREE', 'Elmo Calls by Sesame Street', 'Sago Mini Friends', 'Papumba Academy - Fun Learning For Kids', 'Tee and Mo Bath Time Free', 'Bita and the Animals - Pelos Ares', 'TO-FU Oh!SUSHI', 'DreamWorks Friends', 'Avokiddo Emotions', 'Nighty Night Circus', 'Sago Mini Babies', "Dr. Panda & Toto's Treehouse", 'ROBLOX', 'PlayKids - Educational cartoons and games for kids', 'YouTube Kids', 'Baby Panda Care', 'Monster High™', 'Duolingo: Learn Languages Free', 'Shopkins World!', 'DisneyNOW – TV Shows & Games', 'Equestria Girls', 'Frozen Free Fall', 'Nick', 'Thomas & Friends: Race On!', 'Inside Out Thought Bubbles', 'Sago Mini Friends', 'School of Dragons', 'Peak – Brain Games & Training', 'Period Tracker', 'Vargo Anesthesia Mega App', 'Monash Uni Low FODMAP Diet', 'mySugr: the blood sugar tracker made just for you', 'Human Anatomy Atlas 2018: Complete 3D Human Body', 'ASCCP Mobile', 'Paramedic Protocol Provider', '2017 EMRA Antibiotic Guide', 'Essential Anatomy 3', 'EMT PASS', 'Block Buddy', 'EMT Review Plus', 'Journal Club: Medicine', 'Pedi STAT', 'AnatomyMapp', 'Diabetes & Diet Tracker', 'A Manual of Acupuncture', 'PTA Content Master', 'Muscle Premium - Human Anatomy, Kinesiology, Bones', 'Cardiac diagnosis (heart rate, arrhythmia)', 'Medical ID - In Case of Emergency (ICE)', 'IBM Micromedex Drug Info', 'Advanced Comprehension Therapy', 'Hospitalist Handbook', 'Teladoc Member', 'Ada - Your Health Guide', 'Ovia Fertility Tracker & Ovulation Calculator', 'Youper - AI Therapy', 'MoodSpace', 'Super Hearing Super Ear Amplifier', 'Penn State Health OnDemand', 'ScriptSave WellRx Rx Discounts', 'Free Blood Pressure', 'All Mental disorders', 'Nurse Grid', 'JH Blood Pressure Monitor', 'Blood Pressure', 'RT 516 VET', 'Anthem Anywhere', 'Sway Medical', "fred's Pharmacy", 'Breastfeeding Tracker Baby Log', 'Banfield Pet Health Tracker', '1800 Contacts - Lens Store', 'TextNow - free text + calls', 'Tumblr', 'Snapchat', 'Instagram', 'Periscope - Live Video', 'Snapchat', 'Instagram', 'Pinterest', 'MeetMe: Chat & Meet New People', 'ooVoo Video Calls, Messaging & Stories', 'LinkedIn', 'Tango - Live Video Broadcast', 'SayHi Chat, Meet New People', 'Tapatalk - 100,000+ Forums', 'Badoo - Free Chat & Dating App', 'Nextdoor - Local neighborhood news & classifieds', 'MeetMe: Chat & Meet New People', 'Meetup', 'Text Free: WiFi Calling App', 'textPlus: Free Text & Calls', 'Meetup', 'POF Free Dating App', 'MeetMe: Chat & Meet New People', 'Tagged - Meet, Chat & Dating', 'LOVOO', 'SKOUT - Meet, Chat, Go Live', 'Badoo - Free Chat & Dating App', 'Jaumo Dating, Flirt & Live Video', 'SayHi Chat, Meet New People', 'Couple - Relationship App', 'Meetup', 'Wish - Shopping Made Fun', 'SnipSnap Coupon App', 'Extreme Coupon Finder', 'Checkout 51: Grocery coupons', 'The Coupons App', 'RetailMeNot - Coupons, Deals & Discount Shopping', 'Groupon - Shop Deals, Discounts & Coupons', 'Stocard - Rewards Cards Wallet', 'Extreme Coupon Finder', 'eBay: Buy & Sell this Summer - Discover Deals Now!', 'Checkout 51: Grocery coupons', 'Gyft - Mobile Gift Card Wallet', 'Shopkick: Free Gift Cards, Shop Rewards & Deals', 'The Coupons App', 'Shopular: Coupons, Weekly Ads & Shopping Deals', 'Wish - Shopping Made Fun', 'Carousell: Snap-Sell, Chat-Buy', 'Walmart', 'Ibotta: Cash Back Savings, Rewards & Coupons App', 'AliExpress - Smarter Shopping, Better Living', 'LivingSocial - Local Deals', 'Amazon Shopping', 'RetailMeNot - Coupons, Deals & Discount Shopping', 'Poshmark - Buy & Sell Fashion', 'Target - now with Cartwheel', 'ZALORA Fashion Shopping', 'eBay: Buy & Sell this Summer - Discover Deals Now!', 'Fancy', "Modcloth – Unique Indie Women's Fashion & Style", 'Gyft - Mobile Gift Card Wallet', "JackThreads: Men's Shopping", 'LivingSocial - Local Deals', 'Zappos – Shoe shopping made simple', 'Wanelo Shopping', 'Etsy: Handmade & Vintage Goods', 'Groupon - Shop Deals, Discounts & Coupons', 'eBay: Buy & Sell this Summer - Discover Deals Now!', "JackThreads: Men's Shopping", 'AliExpress - Smarter Shopping, Better Living', 'Slickdeals: Coupons & Shopping', 'Target - now with Cartwheel', 'Wish - Shopping Made Fun', 'Fancy', 'ASOS', 'Google Photos', 'Shutterfly: Free Prints, Photo Books, Cards, Gifts', 'InstaBeauty -Makeup Selfie Cam', 'B612 - Beauty & Filter Camera', 'BeautyPlus - Easy Photo Editor & Selfie Camera', 'YouCam Perfect - Selfie Photo Editor', 'Google Photos', 'Muzy - Share photos & collages', 'QuickPic - Photo Gallery with Google Drive Support', 'Flickr', 'Shutterfly: Free Prints, Photo Books, Cards, Gifts', 'Open Camera', 'Camera for Android', 'Cymera Camera- Photo Editor, Filter,Collage,Layout', 'Candy Camera - selfie, beauty camera, photo editor', 'Camera360: Selfie Photo Editor with Funny Sticker', 'Facetune - For Free', 'Photo Editor Selfie Camera Filter & Mirror Image', 'HD Camera for Android', 'Photo Editor Pro', 'YouCam Makeup - Magic Selfie Makeovers', 'Photo Editor-', 'Camera for Android', 'Photo Editor', 'Cymera Camera- Photo Editor, Filter,Collage,Layout', 'Adobe Photoshop Express:Photo Editor Collage Maker', 'BeautyPlus - Easy Photo Editor & Selfie Camera', 'InstaSize Photo Filters & Collage Editor', 'Candy Camera - selfie, beauty camera, photo editor', 'YouCam Perfect - Selfie Photo Editor', 'Camera360: Selfie Photo Editor with Funny Sticker', 'Facetune - For Free', 'Shutterfly: Free Prints, Photo Books, Cards, Gifts', 'BeautyPlus - Easy Photo Editor & Selfie Camera', 'CBS Sports App - Scores, News, Stats & Watch Live', 'Yahoo Fantasy Sports - #1 Rated Fantasy App', 'ESPN', 'NFL', 'Bleacher Report: sports news, scores, & highlights', 'ESPN Fantasy Sports', 'theScore: Live Sports Scores, News, Stats & Videos', 'CBS Sports App - Scores, News, Stats & Watch Live', 'ESPN', 'MLB At Bat', 'CBS Sports App - Scores, News, Stats & Watch Live', 'WatchESPN', 'Hole19: Golf GPS App, Rangefinder & Scorecard', 'Yahoo Fantasy Sports - #1 Rated Fantasy App', 'ESPN Fantasy Sports', 'CBS Sports Fantasy', 'ESPN', 'Bleacher Report: sports news, scores, & highlights', 'theScore: Live Sports Scores, News, Stats & Videos', 'CBS Sports App - Scores, News, Stats & Watch Live', 'ESPN', 'Bleacher Report: sports news, scores, & highlights', 'MLB At Bat', 'theScore: Live Sports Scores, News, Stats & Videos', 'CBS Sports App - Scores, News, Stats & Watch Live', 'Yahoo Sports - scores, stats, news, & highlights', 'WatchESPN', 'FotMob - Live Soccer Scores', 'Yahoo Fantasy Sports - #1 Rated Fantasy App', 'ESPN', 'NFL', 'US Open Tennis Championships 2018', 'Bleacher Report: sports news, scores, & highlights', 'MLB At Bat', 'FIFA - Tournaments, Soccer News & Live Scores', 'theScore: Live Sports Scores, News, Stats & Videos', 'Golfshot: Golf GPS + Tee Times', 'BBC Sport', 'CBS Sports App - Scores, News, Stats & Watch Live', 'Yahoo Sports - scores, stats, news, & highlights', 'WatchESPN', 'MLB Ballpark', 'FOX Sports: Live Streaming, Scores & News', 'Fantasy Football', 'PGA TOUR', 'UFC', 'trivago: Hotels & Travel', 'Expedia Hotels, Flights & Car Rental Travel Deals', 'TripAdvisor Hotels Flights Restaurants Attractions', 'Skyscanner', 'Booking.com Travel Deals', 'TripAdvisor Hotels Flights Restaurants Attractions', 'Priceline Hotel Deals, Rental Cars & Flights', 'United Airlines', 'Expedia Hotels, Flights & Car Rental Travel Deals', 'Southwest Airlines', 'Hopper - Watch & Book Flights', 'Fly Delta', 'KAYAK Flights, Hotels & Cars', 'American Airlines', 'Skyscanner', 'Priceline Hotel Deals, Rental Cars & Flights', 'trivago: Hotels & Travel', 'Expedia Hotels, Flights & Car Rental Travel Deals', 'KAYAK Flights, Hotels & Cars', 'Orbitz - Hotels, Flights & Package Deals', 'Skyscanner', 'Hotels.com: Book Hotel Rooms & Find Vacation Deals', 'Booking.com Travel Deals', 'Hostelworld: Hostels & Cheap Hotels Travel App', 'TripAdvisor Hotels Flights Restaurants Attractions', 'Airbnb', 'Skyscanner', 'HotelTonight: Book amazing deals at great hotels', 'Maps - Navigate & Explore', 'Google Street View', 'Calculator', 'Gboard - the Google Keyboard', 'ZEDGE™ Ringtones & Wallpapers', 'ZEDGE™ Ringtones & Wallpapers', 'Nova Launcher', 'Apex Launcher', 'Smart Launcher 5', 'Google Keep', 'Evernote – Organizer, Planner for Notes & Memos', 'ES File Explorer File Manager', 'Microsoft Word', 'Google Drive', 'Adobe Acrobat Reader', 'Google PDF Viewer', 'Microsoft Excel', 'Google Docs', 'Microsoft PowerPoint', 'Microsoft OneNote', 'Google Calendar', 'Google Keep', 'Wunderlist: To-Do List & Tasks', 'Evernote – Organizer, Planner for Notes & Memos', 'Any.do: To-do list, Calendar, Reminders & Planner', 'Todoist: To-do lists for task management & errands', 'Microsoft OneNote', 'Planner Pro-Personal Organizer', 'Google Calendar', 'Google Drive', 'Microsoft OneDrive', 'Dropbox', 'MX Player', 'Video Editor', 'Google News', 'Flipboard: News For Our Time', 'BBC News', 'Fox News – Breaking News, Live Video & News Alerts', 'USA TODAY', 'CNN Breaking US & World News', 'Twitter', 'BBC News', 'NPR News', 'Fox News – Breaking News, Live Video & News Alerts', 'Haystack TV: Local & World News - Free', 'ABC News - US & World News', 'USA TODAY', 'NBC News', 'The Wall Street Journal: Business & Market News', 'CNN Breaking US & World News', 'NYTimes - Latest News', 'Newsroom: News Worth Sharing', 'Google News', 'BuzzFeed: News, Tasty, Quizzes', 'Flipboard: News For Our Time', 'Snapchat', 'Flashlight', 'Pou', 'Agar.io', 'Angry Birds Classic', 'My Talking Tom', 'Netflix', 'Bubble Shooter', 'Adobe Acrobat Reader', 'Subway Surfers', 'LEGO® Juniors Create & Cruise', 'WhatsApp Messenger', 'Google Translate', 'Pokémon GO', 'Instagram', 'My Talking Angela', 'YouTube Kids', 'PAC-MAN', 'Colorfy: Coloring Book for Adults - Free', 'Toca Kitchen 2', 'Amazon Shopping', 'Chick-fil-A', 'YouTube', 'Flow Free', 'Microsoft Word', "Alto's Adventure", 'Zombie Tsunami', 'Facebook', 'B612 - Beauty & Filter Camera', 'Block Craft 3D: Building Simulator Games For Free', '8 Ball Pool', 'VPN Free - Betternet Hotspot VPN & Private Browser', 'EMT Tutor NREMT-B Study Guide', 'EMT-B Pocket Prep', 'UC Browser - Fast Download Private & Secure', 'Hungry Shark Evolution', 'Diary with lock', 'Clash of Clans', 'Clash Royale', 'C Programming', 'CppDroid - C/C++ IDE', 'Candy Crush Saga', 'Google Chrome: Fast & Secure', 'Learn C++', 'Hill Climb Racing', 'ZEDGE™ Ringtones & Wallpapers', 'Minion Rush: Despicable Me Official Game', 'Google Duo - High Quality Video Calls', 'Crossy Road', 'Temple Run 2', 'Dropbox', 'Sniper 3D Gun Shooter: Free Shooting Games - FPS', 'Plants vs. Zombies FREE', 'Dream League Soccer 2018', 'eBay: Buy & Sell this Summer - Discover Deals Now!', 'ESPN', 'ES File Explorer File Manager', 'Firefox Browser fast & private', 'E*TRADE Mobile', 'WWE', 'Amazon Kindle', 'Evernote – Organizer, Planner for Notes & Memos', 'Microsoft Edge', 'Microsoft Outlook', 'Pinterest', 'ClassDojo', 'The Coupons App', 'Ebook Reader', 'Gmail', 'Maps - Navigate & Explore', 'AliExpress - Smarter Shopping, Better Living', 'Flipkart Online Shopping App', 'Wish - Shopping Made Fun', 'Messenger – Text and Video Chat for Free', 'Facebook Lite', 'Messenger Lite: Free Calls & Messages', 'Twitter', 'LINE: Free Calls & Messages', 'imo beta free calls and text', 'Geometry Dash World', 'Google+', 'PUBG MOBILE', 'Gboard - the Google Keyboard', 'Granny', 'Google', 'Hangouts', 'G Cloud Backup', 'Google Drive', 'Helix Jump', 'H&M', 'H TV', 'Talking Ben the Dog', 'imo free video calls and chat', 'free video calls and chat', 'slither.io', 'POF Free Dating App', 'Skype - free IM & video calls', 'Sonic Dash', 'Text Free: WiFi Calling App', 'Talkatone: Free Texts, Calls & Phone Number', 'Textgram - write on photos', 'Zombie Catchers', 'Jetpack Joyride', 'Anger of stick 5 : zombie', 'Cut the Rope FULL FREE', 'Turbo FAST', 'K PLUS', 'PowerDirector Video Editor App: 4K, Slow Mo & More', 'Fuzzy Numbers: Pre-K Number Foundation', 'KakaoTalk: Free Calls & Text', 'LiveMe - Video chat, new friends, and make money', 'Talking Tom Gold Run', 'letgo: Buy & Sell Used Stuff, Cars & Real Estate', 'Love Balls', 'Last Day on Earth: Survival', 'Minecraft', 'MEGA', 'MX Player', 'I’m Expecting - Pregnancy App', 'Diabetes:M', 'Opera Mini - fast web browser', 'Opera Browser: Fast and Secure', 'O-Star', 'PicsArt Photo Studio: Collage Maker & Pic Editor', 'Quora', 'ROBLOX', 'SHAREit - Transfer & Share', 'Tumblr', 'Topbuzz: Breaking News, Videos & Funny GIFs', 'Twitch: Livestream Multiplayer Games & Esports', 'Telegram', 'myAT&T', 'Truecaller: Caller ID, SMS spam blocking & Dialer', 'Showtime Anytime', 'UC Browser Mini -Tiny Fast Private & Secure', 'Uber', 'Viber Messenger', 'VLC for Android', 'Vigo Video', 'WeChat', 'Wattpad 📖 Free Books', 'Waze - GPS, Maps, Traffic Alerts & Live Navigation', 'Dating App, Flirt & Chat : W-Match', 'WPS Office - Word, Docs, PDF, Note, Slide & Sheet', 'LINE WEBTOON - Free Comics', 'WhatsApp Business', 'We Heart It', 'TripAdvisor Hotels Flights Restaurants Attractions', 'Telegram X', 'Share Music & Transfer Files - Xender', 'Yandex Browser with Protect', 'YouTube Studio', 'Yahoo Mail – Stay Organized', 'PBS KIDS Video', 'YouTube Gaming', 'GO Keyboard - Emoticon keyboard, Free Theme, GIF', 'Zello PTT Walkie Talkie', 'Z Camera - Photo Editor, Beauty Selfie, Collage', 'Six Pack in 30 Days - Abs Workout', 'Angry Birds 2', 'Angry Birds Rio', 'AC - Tips & News for Android™', 'CM Browser - Ad Blocker , Fast Download , Privacy', 'Google Ads', 'Mobi Calculator free & AD free!', 'Flipp - Weekly Shopping', 'A&E - Watch Full Episodes of TV Shows', 'Camera FV-5 Lite', 'Cardiac diagnosis (heart rate, arrhythmia)', 'Open Camera', 'All Football - Latest News & Videos', 'Maricopa AH', 'Youper - AI Therapy', 'Animal Jam - Play Wild!', 'RULES OF SURVIVAL', 'Amazon for Tablets', 'Final Fantasy XV: A New Empire', 'The Sims™ FreePlay', 'Text free - Free Text + Call', 'Google Photos', '365Scores - Live Scores', 'DINO HUNTER: DEADLY SHORES', 'AP Mobile - Breaking News', 'Reuters News', 'Puffin Web Browser', 'AccuWeather: Daily Forecast & Live Weather Reports', 'Moovit: Bus Time & Train Time Live Info', 'Jurassic World™ Alive', 'Houzz Interior Design Ideas', 'NYTimes - Latest News', 'Overstock – Home Decor, Furniture Shopping', 'Google Voice', 'Choices: Stories You Play', 'Runtastic Sleep Better: Sleep Cycle & Smart Alarm', 'Google Pay', 'Ringtone Maker', 'Google Allo', 'Plants vs. Zombies™ Heroes', 'CM FILE MANAGER HD', 'MLB At Bat', 'HotelTonight: Book amazing deals at great hotels', 'realestate.com.au - Buy, Rent & Sell Property', 'Video Player All Format', 'DEER HUNTER 2018', 'Video Editor', 'Google Play Games', 'ai.type Free Emoji Keyboard', 'Manga AZ - Manga Comic Reader', 'British Airways', 'American Airlines', 'Anthem BC Anywhere', 'Transit: Real-Time Transit App', 'PMHNP-BC Pocket Prep', 'BeautyPlus - Easy Photo Editor & Selfie Camera', 'Bowmasters', 'Nick', 'mySugr: the blood sugar tracker made just for you', 'Backgrounds HD (Wallpapers)', 'Blur Image Background', 'Newegg Mobile', 'Moco - Chat, Meet People', 'Hot or Not - Find someone right now', 'MeetMe: Chat & Meet New People', 'TED', 'English Dictionary - Offline', 'OkCupid Dating', 'Mint: Budget, Bills, Finance', 'English Grammar Test', 'Daily Manga - Comic & Webtoon', 'BBM - Free Calls & Messages', 'BeyondMenu Food Delivery', 'NOOK: Read eBooks & Magazines', 'NOOK App for NOOK Devices', 'Badoo - Free Chat & Dating App', 'HBO GO: Stream with TV Package', 'No Crop & Square for Instagram', 'Hungry Shark World', 'iBP Blood Pressure', 'Blood Pressure', 'Blood Pressure Log - MyDiary', 'Blood Pressure(BP) Diary', 'BP Journal - Blood Pressure Diary', 'Blood Pressure Monitor', 'Blood Pressure Companion', 'Free Blood Pressure', 'High Blood Pressure Symptoms', 'QR Scanner & Barcode Scanner 2018', 'Camera FV-5', 'Camera MX - Free Photo & Video Camera', 'Bleacher Report: sports news, scores, & highlights', 'Cardboard', 'Kick the Buddy', 'Maps & GPS Navigation — OsmAnd', 'Beautiful Widgets Pro', 'Beautiful Widgets Free', 'HD Widgets', 'Color by Number – New Coloring Book', 'Photo Editor by Aviary', 'UNICORN - Color By Number & Pixel Art Coloring', 'Elmo Calls by Sesame Street', '420 BZ Budeze Delivery', 'BZWBK24 mobile', 'Zoosk Dating App: Meet Singles', 'Cricbuzz - Live Cricket Scores & News', 'Cheapflights – Flight Search', 'Chrome Beta', 'Chrome Dev', 'CJmall', 'Credit Karma', 'Castle Clash: Heroes of the Empire US', 'CM Launcher 3D - Theme, Wallpapers, Efficient', 'CM Locker - Security Lockscreen', 'CM Flashlight (Compass, SOS)', 'Ruler', 'QuickPic - Photo Gallery with Google Drive Support', 'Cartoon Network App', 'LEGO® TV', 'DisneyNOW – TV Shows & Games', 'Brit + Co', 'CT Scan Cross Sectional Anatomy', 'Shadow Fight 2', 'Curriculum vitae App CV Builder Free Resume Maker', 'The CW', 'CW Seed', 'Hulu: Stream TV, Movies & more', 'Cymera Camera- Photo Editor, Filter,Collage,Layout', 'Camera360 Lite - Selfie Camera', 'Mapy.cz - Cycling & Hiking offline maps', 'Face Filter, Selfie Editor - Sweet Camera', 'Metal Soldiers 2', "COOKING MAMA Let's Cook!", 'Run Sausage Run!', 'Knife Hit', 'DRAGON BALL LEGENDS', 'DC Comics', 'DC Super Hero Girls™', 'MARVEL Contest of Champions', 'MARVEL Strike Force', 'wetter.com - Weather and Radar', 'Babbel – Learn Languages', 'CallApp: Caller ID, Blocker & Phone Call Recorder', 'LINE Camera - Photo editor', 'Racing in Car 2', 'Google PDF Viewer', 'Periscope - Live Video', 'Dungeon Hunter Champions: Epic Online Action RPG', 'Red Bull TV: Live Sports, Music & Entertainment', 'Idle Heroes', 'Duolingo: Learn Languages Free', 'Free phone calls, free texting SMS on free number', 'Phone Tracker : Family Locator', 'My Photo Keyboard', 'Whoscall - Caller ID & Block', 'Google Sheets', 'Video Downloader', 'Google Docs', 'Any.do: To-do list, Calendar, Reminders & Planner', 'Notepad & To do list', 'Polaris Office - Word, Docs, Sheets, Slide, PDF', 'Google Keep', 'Do It Later: Tasks & To-Dos', 'Todoist: To-do lists for task management & errands', 'To-Do Calendar Planner', 'Wunderlist: To-Do List & Tasks', 'TickTick: To Do List with Reminder, Day Planner', 'ColorNote Notepad Notes', 'Apex Launcher', 'Dude Perfect 2', 'Dairy Queen', 'SONIC Drive-In', "McDonald's", 'Wendy’s – Food and Offers', "Dunkin' Donuts", 'SUBWAY®', 'Panera Bread', 'Starbucks', "Dr. Panda & Toto's Treehouse", 'Cache Cleaner-DU Speed Booster (booster & cleaner)', 'DU Browser—Browse fast & fun', 'Real Racing 3', 'Equestria Girls', 'My Little Pony Celebration', 'Miraculous Ladybug & Cat Noir - The Official Game', 'My Little Pony: Harmony Quest', 'PJ Masks: Moonlight Heroes', 'The Emirates App', 'Phogy, 3D Camera', 'Weather by eltiempo.es', 'The Game of Life', 'Spanish English Translator', 'Human Anatomy Atlas 2018: Complete 3D Human Body', "Game for KIDS: KIDS match'em", 'CBS Sports App - Scores, News, Stats & Watch Live', 'Yahoo Fantasy Sports - #1 Rated Fantasy App', 'Dictionary - Merriam-Webster', 'Edmodo', 'busuu: Learn Languages - Spanish, English & More', 'Chess Free', 'Oxford Dictionary of English : Free', 'Masha and the Bear Child Games', 'Frozen Free Fall', 'Episode - Choose Your Story', 'The NBC App - Watch Live TV and Full Episodes', 'Moto File Manager', 'HBO NOW: Stream TV & Movies', 'Moneycontrol – Stocks, Sensex, Mutual Funds, IPO', 'CNBC: Breaking Business News & Live Market Data', 'Google Earth', 'Groupon - Shop Deals, Discounts & Coupons', 'Google News', 'Amino: Communities and Chats', 'Nike Training Club - Workouts & Fitness Plans', 'Hangouts Dialer - Call Phones', 'Offline Maps & Navigation', 'Strawberry Shortcake Ice Cream Island', 'Home Workout - No Equipment', 'Home Security Camera WardenCam - reuse old phones', 'Food Network', 'Web Browser for Android', 'Airway Ex - Intubate. Anesthetize. Train.', 'FilterGrid - Cam&Photo Editor', 'Messages, Text and Video Chat for Messenger', 'All Social Networks', 'Premier League - Official App', 'Farm Heroes Saga', 'ESPN Fantasy Sports', 'Fallout Shelter', 'Facebook Pages Manager', 'Facebook Ads Manager', 'Who Viewed My Facebook Profile - Stalkers Visitors', 'The 5th Stand', 'Garena Free Fire', 'osmino Wi-Fi: free WiFi', 'Fun Kid Racing - Motocross', 'Podcast App: Free & Offline Podcasts by Player FM', 'Motorola FM Radio', 'FarmersOnly Dating', 'Firefox Focus: The privacy browser', 'FP Notebook', 'Slickdeals: Coupons & Shopping', 'AAFP'] 10841 9659
This time, it appears we have over 1,000 duplicates we need to get rid of.
Since each app can have multiple duplicates, we decide to keep the ones with higher number of reviews (which should therefore be the most recent), and delete the others.
In order to do this, we collect the names of the apps and the number of reviews in a dictionary (one for each of the datasets). In the dictionary, we only want one entry for each app, associated to the highest number of reviews:
#create the empty dictionary
g_reviews_max = {}
#loop over data
for app in google_data[1:]:
#look for app names already counted
if app[0] in g_reviews_max and g_reviews_max[app[0]] < float(app[3]):
#if the number of reviews is higher,
#replace them in the dictionary
g_reviews_max[app[0]] = float(app[3])
#if not already counted, add it to the dictionary
elif app[0] not in g_reviews_max:
g_reviews_max[app[0]] = float(app[3])
#if already counted and the number of reviews is smaller,
#ignore it
print(len(g_reviews_max))
9659
Now, we use these dictionary to check against each item in the dataset and we only retain the one with the highest number of reviews. In this way, we end up with a list of data without duplicates.
#list of unique apps
google_cleaned = []
#list of copies to be discarded
g_already_added = []
#loop over the data excluding the header
for app in google_data[1:]:
#check each item's number of reviews against dictionary
#if number of reviews corresponds to the maximum for that app
#and if the app is not already counted
if float(app[3]) == g_reviews_max[app[0]] and app[0] not in g_already_added:
#add to the final list and to the check list
google_cleaned.append(app)
g_already_added.append(app[0])
#else (the app is already added and/or
#the number of reviews is not the highest) ignore it
print(len(google_cleaned))
9659
As next step in our data cleaning process, we want to remove non-English apps. Since there is no column devoted to language in our dataset, we can start by removing apps whose name contains ASCII codes that do not correspond to letters, numbers, and symbols used in English.
The number range of ASCII codes associated to common use in English is 0 to 127, so we can build a function that reads all the character in the app name's string and looks for ASCII codes from 127 on.
The function will only remove the app if its name contains more than three non-English symbols. This lets us include apps that contain up to three emoji's or special characters in their name.
Let's build the "letter" function and show some examples of its functioning:
def letter(string):
i = 0
for character in string:
#if ASCII characters are present, increase the i index by 1
if ord(character) > 127:
i += 1
if i>= 3:
#apps with more than three ASCII characters above 127
#return False
return False
else:
return True
letter('Instagram')
True
letter('爱奇艺PPS')
False
letter('Instachat 😜')
True
Now let's iterate the function through both datasets to filter out non-English apps and see how many remain:
english_apple = []
for app in apple_data:
#if the function 'letter' returns True, add it to the list
if letter(app[1]):
english_apple.append(app)
#else, ignore it
print(len(english_apple))
6156
english_google = []
for app in google_cleaned:
#if the function 'letter' returns True, add it to the list
if letter(app[0]):
english_google.append(app)
#else, ignore it
print(len(english_google))
9597
Since our company builds free apps, it would not be appropriate to build our profile based on paid apps. Therefore, the next step is to delete from our dataset the apps that are not free.
We can do it by defining a new list of free apps and only attaching to the list the names of the app which are identified as "free" in the specific column ("Type" for the Google store, and "Price" for the Apple store).
free_apple = []
for app in english_apple[1:]:
#if 'Price' columns has value '0.0'
if app[4] == '0.0':
free_apple.append(app)
#else, ignore it
print(len(free_apple))
3203
free_google = []
for app in english_google:
#if 'Type' column is 'Free'
if app[6] == 'Free':
free_google.append(app)
#else, ignore it
print(len(free_google))
8847
This concludes our data cleaning process. Our lists now only contain free, English apps, with no duplicates, and possibly no errors. Now let's move on to the analysis phase.
The main source of revenue for our company are in-app adds. Therefore, we want our apps to attract a large number of users and to engage them as much as possible so that they spend more time interacting with the app.
Our goal in the analysis is to identify the main features that attract users in an app, so that we can develop a minimal Android version of a new app, to be added to Google Play.
If it is profitable, the app will be further developed, and an iOS version will be developed after six months to be added to the App Store. For this reason, we need to identify attractive features in both markets.
Let's start by displaying what genres are most common for each market.
To do this, we define a function "freq_table" that generates frequency tables. A frequency table tells us how many apps are present for each genre in the dataset with respect to the total number of apps.
The function receives the dataset and the index of the desired column as input, and it iterates over the dataset to count how many times each attribute associated to the index (in our case, genre) appears in the dataset. Then, it returns the frequencies as percentages.
def freq_table(daset, index):
column = {}
total = 0
for app in daset:
#add every app to the total number of apps
total += 1
#if the genre is already been encountered,
#increase the number of apps of that genre by 1
if app[index] in column:
column[app[index]] += 1
#if the genre is new, add it to our list
else:
column[app[index]] = 1
#calculate the percentage for each genre
for key in column:
column[key] = column[key]/total*100
return column
This function is on turn called by another function, "display_table", that transforms the dictionary into a tuple and sorts it in descending order. This allows us to see on top of the list the apps that cover a larger percentage, therefore making it easier to skim through the results.
def display_table(dataset, index):
table = freq_table(dataset, index)
table_display = []
for key in table:
#turn the dictionary into a tuple so that it can be sorted
key_val_as_tuple = (table[key], key)
table_display.append(key_val_as_tuple)
#sort the tuple, use reverse for descending order
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
Now let's use our function to inspect the "prime_genre" column of the AppleStore dataset and the "Category" and "Genres" columns of the Google dataset:
display_table(free_apple, 11)
Games : 58.25788323446769 Entertainment : 7.836403371838902 Photo & Video : 4.995316890415236 Education : 3.6840462066812365 Social Networking : 3.3093974399000934 Shopping : 2.5913206369029034 Utilities : 2.466437714642523 Sports : 2.1542304089915705 Music : 2.0605682172962845 Health & Fitness : 2.0293474867311896 Productivity : 1.7483609116453322 Lifestyle : 1.5610365282547611 News : 1.3424914142990947 Travel : 1.248829222603809 Finance : 1.0927255697783327 Weather : 0.8741804558226661 Food & Drink : 0.8117389946924758 Reference : 0.5307524196066188 Business : 0.5307524196066188 Book : 0.3746487667811427 Navigation : 0.18732438339057134 Medical : 0.18732438339057134 Catalogs : 0.1248829222603809
display_table(free_google, 1)
FAMILY : 18.932971628800725 GAME : 9.698202780603594 TOOLS : 8.45484344975698 BUSINESS : 4.600429524132474 PRODUCTIVITY : 3.8996269922007465 LIFESTYLE : 3.888323725556686 FINANCE : 3.7074714592517237 MEDICAL : 3.537922459590822 SPORTS : 3.39097999321804 PERSONALIZATION : 3.3231603933536795 COMMUNICATION : 3.2327342602011986 HEALTH_AND_FITNESS : 3.0857917938284163 PHOTOGRAPHY : 2.950152594099695 NEWS_AND_MAGAZINES : 2.803210127726913 SOCIAL : 2.6675709279981916 TRAVEL_AND_LOCAL : 2.3397761953204474 SHOPPING : 2.2493500621679665 BOOKS_AND_REFERENCE : 2.136317395727365 DATING : 1.8650389962699219 VIDEO_PLAYERS : 1.797219396405561 MAPS_AND_NAVIGATION : 1.3903017972193965 FOOD_AND_DRINK : 1.2433593308466147 EDUCATION : 1.1642364643381937 ENTERTAINMENT : 0.9607776647451114 LIBRARIES_AND_DEMO : 0.938171131456991 AUTO_AND_VEHICLES : 0.9268678648129309 HOUSE_AND_HOME : 0.8025319317282694 WEATHER : 0.7912286650842093 EVENTS : 0.7121057985757884 PARENTING : 0.6555894653554878 ART_AND_DESIGN : 0.6442861987114276 COMICS : 0.6103763987792472 BEAUTY : 0.5990731321351871
display_table(free_google, 9)
Tools : 8.44354018311292 Entertainment : 6.081157454504352 Education : 5.357748389284503 Business : 4.600429524132474 Productivity : 3.8996269922007465 Lifestyle : 3.8770204589126256 Finance : 3.7074714592517237 Medical : 3.537922459590822 Sports : 3.458799593082401 Personalization : 3.3231603933536795 Communication : 3.2327342602011986 Action : 3.0970950604724763 Health & Fitness : 3.0857917938284163 Photography : 2.950152594099695 News & Magazines : 2.803210127726913 Social : 2.6675709279981916 Travel & Local : 2.3284729286763874 Shopping : 2.2493500621679665 Books & Reference : 2.136317395727365 Simulation : 2.045891262574884 Dating : 1.8650389962699219 Arcade : 1.8424324629818016 Video Players & Editors : 1.7746128631174407 Casual : 1.763309596473381 Maps & Navigation : 1.3903017972193965 Food & Drink : 1.2433593308466147 Puzzle : 1.1303266644060133 Racing : 0.9946874646772917 Role Playing : 0.938171131456991 Libraries & Demo : 0.938171131456991 Auto & Vehicles : 0.9268678648129309 Strategy : 0.9042613315248107 House & Home : 0.8025319317282694 Weather : 0.7912286650842093 Events : 0.7121057985757884 Adventure : 0.6668927319995479 Comics : 0.5990731321351871 Beauty : 0.5990731321351871 Art & Design : 0.5990731321351871 Parenting : 0.49734373233864587 Card : 0.45213066576240535 Trivia : 0.41822086583022494 Casino : 0.41822086583022494 Educational;Education : 0.39561433254210465 Board : 0.38431106589804453 Educational : 0.3730077992539844 Education;Education : 0.339097999321804 Word : 0.2599751328133831 Casual;Pretend Play : 0.23736859952526282 Music : 0.20345879959308238 Racing;Action & Adventure : 0.169548999660902 Puzzle;Brain Games : 0.169548999660902 Entertainment;Music & Video : 0.169548999660902 Casual;Brain Games : 0.1356391997287216 Casual;Action & Adventure : 0.1356391997287216 Arcade;Action & Adventure : 0.12433593308466147 Action;Action & Adventure : 0.10172939979654119 Educational;Pretend Play : 0.09042613315248108 Simulation;Action & Adventure : 0.07912286650842093 Parenting;Education : 0.07912286650842093 Entertainment;Brain Games : 0.07912286650842093 Board;Brain Games : 0.07912286650842093 Parenting;Music & Video : 0.0678195998643608 Educational;Brain Games : 0.0678195998643608 Casual;Creativity : 0.0678195998643608 Art & Design;Creativity : 0.0678195998643608 Education;Pretend Play : 0.05651633322030067 Role Playing;Pretend Play : 0.04521306657624054 Education;Creativity : 0.04521306657624054 Role Playing;Action & Adventure : 0.0339097999321804 Puzzle;Action & Adventure : 0.0339097999321804 Entertainment;Creativity : 0.0339097999321804 Entertainment;Action & Adventure : 0.0339097999321804 Educational;Creativity : 0.0339097999321804 Educational;Action & Adventure : 0.0339097999321804 Education;Music & Video : 0.0339097999321804 Education;Brain Games : 0.0339097999321804 Education;Action & Adventure : 0.0339097999321804 Adventure;Action & Adventure : 0.0339097999321804 Video Players & Editors;Music & Video : 0.02260653328812027 Sports;Action & Adventure : 0.02260653328812027 Simulation;Pretend Play : 0.02260653328812027 Puzzle;Creativity : 0.02260653328812027 Music;Music & Video : 0.02260653328812027 Entertainment;Pretend Play : 0.02260653328812027 Casual;Education : 0.02260653328812027 Board;Action & Adventure : 0.02260653328812027 Video Players & Editors;Creativity : 0.011303266644060134 Trivia;Education : 0.011303266644060134 Travel & Local;Action & Adventure : 0.011303266644060134 Tools;Education : 0.011303266644060134 Strategy;Education : 0.011303266644060134 Strategy;Creativity : 0.011303266644060134 Strategy;Action & Adventure : 0.011303266644060134 Simulation;Education : 0.011303266644060134 Role Playing;Brain Games : 0.011303266644060134 Racing;Pretend Play : 0.011303266644060134 Puzzle;Education : 0.011303266644060134 Parenting;Brain Games : 0.011303266644060134 Music & Audio;Music & Video : 0.011303266644060134 Lifestyle;Pretend Play : 0.011303266644060134 Lifestyle;Education : 0.011303266644060134 Health & Fitness;Education : 0.011303266644060134 Health & Fitness;Action & Adventure : 0.011303266644060134 Entertainment;Education : 0.011303266644060134 Communication;Creativity : 0.011303266644060134 Comics;Creativity : 0.011303266644060134 Casual;Music & Video : 0.011303266644060134 Card;Action & Adventure : 0.011303266644060134 Books & Reference;Education : 0.011303266644060134 Art & Design;Pretend Play : 0.011303266644060134 Art & Design;Action & Adventure : 0.011303266644060134 Arcade;Pretend Play : 0.011303266644060134 Adventure;Education : 0.011303266644060134
We can see that in the AppleStore dataset the most common genres are those related to entertainment, while the Google profile is more equilibrated. However, this does not tell us much about the number of users for each category. Let's inspect this information in our datasets.
In the Google Play dataset, we can inspect the "Installs" column of the dataset, which return the number of times the app has been installed.
We need to convert the "Installs" string to float, which is only possible after removing characters that cannot be transformed like '+' and ','.
After defining the average number of users for each genre, we put it in a list called "users_g", and then use the sorted() function to sort it in descending order based on number of users. To sort a list according to a certain index, we need to import "itemgetter".
from operator import itemgetter
#use the previously defined function
#to generate list of unique genres
genre_list = freq_table(free_google, 1)
users_g = []
#loop through the genres
for genre in genre_list:
total = 0
len_genre = 0
#check each genre against all apps in the dataset
for app in free_google:
if app[1] == genre:
#the following two lines convert string to float
installs = app[5].replace('+', '')
installs = float(installs.replace(",", ""))
#increase total number of installs
total += installs
#increase number of apps of that genre by 1
len_genre += 1
#create a list of lists that associates genre
#with average installs
users_g.append([genre, total/len_genre])
#sort and print the list
google_users = sorted(users_g, key=itemgetter(1), reverse = True)
print(google_users)
[['COMMUNICATION', 38590581.08741259], ['VIDEO_PLAYERS', 24727872.452830188], ['SOCIAL', 23253652.127118643], ['PHOTOGRAPHY', 17840110.40229885], ['PRODUCTIVITY', 16787331.344927534], ['GAME', 15544014.51048951], ['TRAVEL_AND_LOCAL', 13984077.710144928], ['ENTERTAINMENT', 11640705.88235294], ['TOOLS', 10830251.970588235], ['NEWS_AND_MAGAZINES', 9549178.467741935], ['BOOKS_AND_REFERENCE', 8814199.78835979], ['SHOPPING', 7036877.311557789], ['PERSONALIZATION', 5201482.6122448975], ['WEATHER', 5145550.285714285], ['HEALTH_AND_FITNESS', 4188821.9853479853], ['MAPS_AND_NAVIGATION', 4049274.6341463416], ['FAMILY', 3697848.1731343283], ['SPORTS', 3650602.276666667], ['ART_AND_DESIGN', 1986335.0877192982], ['FOOD_AND_DRINK', 1924897.7363636363], ['EDUCATION', 1833495.145631068], ['BUSINESS', 1712290.1474201474], ['LIFESTYLE', 1446158.2238372094], ['FINANCE', 1387692.475609756], ['HOUSE_AND_HOME', 1360598.042253521], ['DATING', 854028.8303030303], ['COMICS', 832613.8888888889], ['AUTO_AND_VEHICLES', 647317.8170731707], ['LIBRARIES_AND_DEMO', 638503.734939759], ['PARENTING', 542603.6206896552], ['BEAUTY', 513151.88679245283], ['EVENTS', 253542.22222222222], ['MEDICAL', 120550.61980830671]]
Now let's do the same for our other dataset.
Since information on the number of installations is missing in the AppleStore dataset, we'll use a workaround. We'll sum up the number of user ratings for each app in the same genre, then divide by the total number of apps for that genre.
Again, we want our results to be displayed in a sorted list:
#use the previously defined function
#to generate list of unique genres
genre_list = freq_table(free_apple, 11)
users_a = []
#loop through the genres
for genre in genre_list:
total = 0
len_genre = 0
#check each genre against all apps in the dataset
for app in free_apple:
if app[11] == genre:
#increase total number of ratings
total += float(app[5])
#increase number of apps of that genre by 1
len_genre += 1
#create a list of lists that associates genre
#with average number of ratings
users_a.append([genre, total/len_genre])
#sort and print the list
apple_users = sorted(users_a, key=itemgetter(1), reverse = True)
print(apple_users)
[['Navigation', 86090.33333333333], ['Reference', 79350.4705882353], ['Social Networking', 71548.34905660378], ['Music', 57326.530303030304], ['Weather', 52279.892857142855], ['Book', 46384.916666666664], ['Food & Drink', 33333.92307692308], ['Finance', 32367.02857142857], ['Photo & Video', 28441.54375], ['Travel', 28243.8], ['Shopping', 27230.734939759037], ['Health & Fitness', 23298.015384615384], ['Sports', 23008.898550724636], ['Games', 22886.36709539121], ['News', 21248.023255813954], ['Productivity', 21028.410714285714], ['Utilities', 19156.493670886077], ['Lifestyle', 16815.48], ['Entertainment', 14195.358565737051], ['Business', 7491.117647058823], ['Education', 7003.983050847458], ['Catalogs', 4004.0], ['Medical', 612.0]]
In order to make it easier to visualize our results, let's plot the number of users for each genre in each dataset as a bar plot.
To do this, we first define the function barplot() that takes as an argument a list of list whose elements are made of two items.
import pandas as pd
import re
import matplotlib.pyplot as plt
def barplot(list_of_2_element_list):
d = {ya[0]:ya[1] for ya in list_of_2_element_list}
plt.figure(figsize=(9,15))
axes = plt.axes()
axes.get_xaxis().set_visible(False)
spines = axes.spines
spines['top'].set_visible(False)
spines['right'].set_visible(False)
spines['bottom'].set_visible(False)
spines['left'].set_visible(False)
ax = plt.barh(*zip(*d.items()), height=.5)
plt.yticks(list(d.keys()), list(d.keys()))
plt.xticks(range(4), range(4))
rectangles = ax.patches
for rectangle in rectangles:
x_value = rectangle.get_width()
y_value = rectangle.get_y() + rectangle.get_height() / 2
space = 5
ha = 'left'
label = "{}".format(x_value)
if x_value > 0:
plt.annotate(
label,
(x_value, y_value),
xytext=(space, 0),
textcoords="offset points",
va='center',
ha=ha)
axes.tick_params(tick1On=False)
plt.show()
Now, let's use our function on the results obtained above ("google_users" and "apple_users") to visualize our findings.
%matplotlib inline
barplot(apple_users)
%matplotlib inline
barplot(google_users)
In order to have a more concrete idea of what kind of app these are, let's see some of them.
We prefer to see the more popular apps, therefore, we sort our two datasets in descending order based on "Installs", for google, and on the total number of ratings, for AppleStore.
Then, we extract the first ten apps that appear in our list for each of the top two categories, namely, "Communication" and "Video_Players" for the Google Store, and "Navigation" and "Reference" for the AppleStore.
#create a sorted list of apps ordered by their number of installs
free_google_s = sorted(free_google, key=itemgetter(5), reverse = True)
index = 0
for app in free_google_s:
#print the first ten items
if app[1] == 'COMMUNICATION' and index < 10:
print(app[0], app[1])
index += 1
index = 0
for app in free_google_s:
#print the first ten items
if app[1] == 'VIDEO_PLAYERS' and index < 10:
print(app[0], app[1])
index += 1
Google Duo - High Quality Video Calls COMMUNICATION imo free video calls and chat COMMUNICATION LINE: Free Calls & Messages COMMUNICATION UC Browser - Fast Download Private & Secure COMMUNICATION Viber Messenger COMMUNICATION Lightning Web Browser COMMUNICATION Web Browser COMMUNICATION SolMail - All-in-One email app COMMUNICATION LokLok: Draw on a Lock Screen COMMUNICATION U - Webinars, Meetings & Messenger COMMUNICATION MX Player VIDEO_PLAYERS Video Player All Format for Android VIDEO_PLAYERS CJ VLC HD Remote (+ Stream) VIDEO_PLAYERS DR TV VIDEO_PLAYERS GoPlus Cam VIDEO_PLAYERS Video Wallpaper Show VIDEO_PLAYERS A-Z Screen Recorder - VIDEO_PLAYERS YourTube Video Views BG VIDEO_PLAYERS CJ Camcorder VIDEO_PLAYERS CX Monthly Tech News VIDEO_PLAYERS
#create a sorted list of apps ordered by their number of ratings
free_apple_s = sorted(free_apple, key=itemgetter(5), reverse = True)
index = 0
for app in free_apple_s:
#print the first ten items
if app[11] == 'Navigation' and index < 10:
print(app[1], app[11])
index += 1
index = 0
for app in free_apple_s:
#print the first ten items
if app[11] == 'Reference' and index < 10:
print(app[1], app[11])
index += 1
Railway Route Search Navigation CoPilot GPS – Car Navigation & Offline Maps Navigation Waze - GPS Navigation, Maps & Real-time Traffic Navigation ImmobilienScout24: Real Estate Search in Germany Navigation Google Maps - Navigation & Transit Navigation Geocaching® Navigation Bible Reference City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) Reference Guides for Pokémon GO - Pokemon GO News and Cheats Reference Real Bike Traffic Rider Virtual Reality Glasses Reference WWDC Reference Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free Reference Dictionary.com Dictionary & Thesaurus for iPad Reference LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools Reference Google Translate Reference Dictionary.com Dictionary & Thesaurus Reference
Now let's discuss which genre of app seems to be liked the most by users, based on their ratings. Only for this piece of the analysis, we will not limit our set to the free apps, because it is more likely that users leave a review for an app if they paid for it.
Therefore, instead of using "free_google" and "free_apple", we will use the datasets prepared in the preceding step, after cleaning the data from mistakes and duplicates and eliminating non-English apps: "english_google" and "english_apple".
Some of the apps in Google Store do not have a value for the "Rating" column (the string in the original dataset returns 'NaN'). Therefore, we insert a check to ignore those apps, and we also exclude them from the count of total number of apps for each genre, to avoid skewing our results.
We'll also want to limit our rating to the third decimal, in order to better see differences between the resulting values.
#use the previously defined function
#to generate list of unique genres
#(we need to do it again because we're using the non-free dataset)
genre_list = freq_table(english_google, 1)
ratings_g = []
for genre in genre_list:
total = 0
len_genre = 0
#check each genre against all the apps
for app in english_google:
#exclude the apps with rating 'NaN'
if app[1] == genre and app[2] != 'NaN':
#add the rating to the total
total += float(app[2])
#increase the number of apps by 1
len_genre += 1
#create a list with average rating for each genre
#round the average rating to 3 decimals to make it more readable
ratings_g.append([genre, round(total/len_genre, 3)])
#order the list by ratings in descending order
google_ratings = sorted(ratings_g, key=itemgetter(1), reverse = True)
print(google_ratings)
[['EVENTS', 4.436], ['ART_AND_DESIGN', 4.359], ['EDUCATION', 4.352], ['BOOKS_AND_REFERENCE', 4.344], ['PERSONALIZATION', 4.332], ['PARENTING', 4.3], ['BEAUTY', 4.279], ['SOCIAL', 4.247], ['WEATHER', 4.245], ['GAME', 4.244], ['HEALTH_AND_FITNESS', 4.243], ['SHOPPING', 4.231], ['SPORTS', 4.214], ['AUTO_AND_VEHICLES', 4.19], ['FAMILY', 4.184], ['PRODUCTIVITY', 4.183], ['LIBRARIES_AND_DEMO', 4.178], ['COMICS', 4.171], ['FOOD_AND_DRINK', 4.171], ['MEDICAL', 4.166], ['PHOTOGRAPHY', 4.156], ['HOUSE_AND_HOME', 4.154], ['ENTERTAINMENT', 4.13], ['COMMUNICATION', 4.121], ['FINANCE', 4.116], ['NEWS_AND_MAGAZINES', 4.108], ['BUSINESS', 4.103], ['LIFESTYLE', 4.092], ['TRAVEL_AND_LOCAL', 4.07], ['VIDEO_PLAYERS', 4.045], ['TOOLS', 4.042], ['MAPS_AND_NAVIGATION', 4.028], ['DATING', 3.98]]
Now, let's do the same for the AppleStore dataset.
#use the previously defined function
#to generate list of unique genres
#(we need to do it again because we're using the non-free dataset)
genre_list = freq_table(english_apple[1:], 11)
ratings_g = []
for genre in genre_list:
total = 0
len_genre = 0
#check each genre against all the apps
for app in english_apple[1:]:
if app[11] == genre:
#add the rating to the total
total += float(app[7])
#increase the number of apps by 1
len_genre += 1
#create a list with average rating for each genre
#round the average rating to 3 decimals to make it more readable
ratings_g.append([genre, round(total/len_genre, 3)])
#order the list by ratings in descending order
apple_ratings = sorted(ratings_g, key=itemgetter(1), reverse = True)
print(apple_ratings)
[['Catalogs', 4.2], ['Games', 4.057], ['Productivity', 4.03], ['Music', 3.982], ['Shopping', 3.982], ['Business', 3.981], ['Reference', 3.98], ['Health & Fitness', 3.869], ['Book', 3.868], ['Photo & Video', 3.855], ['Weather', 3.703], ['Navigation', 3.661], ['Food & Drink', 3.648], ['Education', 3.586], ['Travel', 3.568], ['Social Networking', 3.508], ['Entertainment', 3.504], ['Finance', 3.469], ['Medical', 3.452], ['Lifestyle', 3.413], ['Utilities', 3.389], ['News', 3.357], ['Sports', 3.087]]
Let's plot our findings for Google Store and Apple Store respectively:
%matplotlib inline
barplot(google_ratings)
%matplotlib inline
barplot(apple_ratings)
In this project, we wanted to analyze two datasets that describe the features of a large sample of apps in the Google store and in the Appple store respectively, in order to determine a potential profile for our next free app to be launched on the market.
To do this, we went through the following preparatory steps:
We collected data from two publicly available datasets and made sure that the source of the datasets was adequate in terms of safety, completeness, and validity of our data.
We checked our databases for errors and duplicates.
We ignored information about apps that are not free and not in English language.
In order to determine which features can generate higher interaction from the users, we asked:
Which app genres represent a larger percentage of our datasets? We found out that most free apps in AppleStore belong to the categories related to entertainment, like games, photos, and social networking; in Google Play, these are accompanied by other apps, like those related to business, productivity, and general tools.
Which kind of app has the highest number of users? In Google Play, the highest number of users is associated to Communication, Video Players, and Social apps. In Apple Store, based on the average number of ratings, the most popular apps belong to the categories of Navigation, Reference, Social Networking, and Music.
Which genre of app has the highest average ratings? In Google store, apps in Events, Art and Design, and Education have the highest ratings, but we have weak ratings for maps, video players, and communication, even though these are amongst the genres with more users, so it may be good to develop an app to improve these services. On the other hand, in AppleStore, similar categories like photo and video rank in the mid of the list, surpassed for example by games and music.
There are some limitations to be addressed in our analysis:
We have to consider that, although some kinds of apps like social networks do attract a lot of users, they are also very difficult to build from zero, since the market is already dominated by big ones like Facebook and Instagram.
Another complication is due to the fact that there seems to be no generale overlap between the findings in the two datasets. This could be caused by our source, which only includes a sample of all the available apps, and in that case further research is needed in order to come to a conclusion, or it may reflect a real difference in users' preferences between Apple Store and Google Play.
However, we can focus on the categories that ranked higher for number of users in both datasets. Those are:
Therefore we could develop our app in one of those dimensions, or maybe combine them. Here are some suggestions:
An app for photo and video editing based on different weather conditions in the pic;
A user guide for common tools in photo and video editing;
A guide that directs the users towards spots to make photos or videos in particular weather conditions (e.g., where to see the full moon tonight, where to spot a great sunset today, ...).