speakers = [{'name':'Mai Giménez',
'twitter': '@adahopper',
'weapons': ['Python', 'Bash', 'C++'],
'pyladies': True},
{'name':'Angela Rivera',
'twitter': '@ghilbrae ',
'weapons': ['Python', 'Django', 'C++'],
'pyladies': True}]
for speaker in speakers:
for k,v in speaker.items():
print("- {}: {}".format(k,v))
print()
#print('\n'.join(["- {}: {}".format(k, v) for speaker in speakers for k,v in speaker.items()]))
- name: Mai Giménez - pyladies: True - weapons: ['Python', 'Bash', 'C++'] - twitter: @adahopper - name: Angela Rivera - pyladies: True - weapons: ['Python', 'Django', 'C++'] - twitter: @ghilbrae
from IPython.display import Image
Image(filename='pyladies.png')
Image(filename='notebook.png')
Image(filename='marvel_logo.jpg')
Marvel, es una editorial de cómics estadounidense fundada por Martin Goodman en 1939, como Marvel Mystery Comics. Aunque Marvel, tal y como hoy la conocemos (Marvel Worldwide Inc.), data de 1961 con la publicación de Los Cuatro Fantásticos y otras historias de superhéroes creadas por autores como Stan Lee, Jack Kirby o Steve Ditko, entre otros.
Marvel es madre de archiconocidos personajes o equipos como:
¡Y todos estos datos son nuestros!
from marvel.marvel import Marvel
from marveldev import Developer
Para acceder a la API es necesario pedir unas credenciales de desarrolladores en http://developer.marvel.com/
¡Ojo con las peticiones! Podemos pedir hasta 100 resultados cada vez.
developer = Developer()
marvel = Marvel(*developer.get_marvel_credentials())
character_data_wrapper = marvel.get_characters(orderBy="-modified", limit="100")
print(character_data_wrapper.status)
Ok
for character in character_data_wrapper.data.results[:10]:
print("* {character.name}: {character.modified_raw}".format(character=character))
* Thor (Goddess): 2014-11-05T15:16:57-0500 * Spider-Man (Miles Morales): 2014-10-23T12:07:33-0400 * Hawkeye (Kate Bishop): 2014-10-23T12:05:03-0400 * Black Widow: 2014-09-09T16:09:03-0400 * New Mutants: 2014-08-12T12:59:29-0400 * Cosmo (dog): 2014-07-24T15:14:21-0400 * Rocket Raccoon: 2014-07-17T17:32:43-0400 * Ronan: 2014-07-17T16:45:26-0400 * Star-Lord (Peter Quill): 2014-07-14T20:45:53-0400 * Captain Marvel (Carol Danvers): 2014-07-08T18:17:18-0400
¿Qué información tenemos disponible para cada personaje?
', '.join([attr for attr in dir(character) if not attr.startswith('_')])
'comics, description, detail, dict, events, get_comics, get_events, get_related_resource, get_series, get_stories, id, list_to_instance_list, marvel, modified, modified_raw, name, resourceURI, resource_url, series, stories, thumbnail, to_dict, urls, wiki'
Mmmm, no está mal pero ¿es eso lo que buscamos? Veamos el wiki:
from IPython.core.display import HTML
HTML("<iframe src={} width=1000 height=800></iframe>".format(character_data_wrapper.data.results[2].wiki))
Aquí encontramos muchos más datos acerca del personaje: Rasgos físicos, ocupación, educación... ¡Tiene buena pinta! ¡Scrappemos la wiki!
El problema reside en que Marvel sólo nos deja obtener hasta 100 resultados cada vez.
Lo primero que deberíamos hacer es recoger información de la web y almacenarla.
Pero, a alguien más se le ha ocurrido eso, y no vamos a reinventar la rueda. @asamiller ha desarrollado una app en node.js que explora la API de Marvel y almacena los datos usando Orchestrate. El código está disponible en github.
# TODO
En realidad molaría scrappear la wiki y no tener una versión estática, que además puede estar un poco desfasada, pero esto deberíamos incluirlo en la librería pyMarvel. Si te animas, búscanos después de la charla y hablamos.
import json
from os.path import join
from os import listdir
import socket
MARVELOUSDB_PATH_A = "../marvelousdb-master/data/characters/"
MARVELOUSDB_PATH_M = "../marvelousdb/data/characters/"
MARVELOUSDB_PATH = MARVELOUSDB_PATH_M if 'alan' in socket.gethostname() else MARVELOUSDB_PATH_A
json_db = [join(MARVELOUSDB_PATH, json_file) for json_file in listdir(MARVELOUSDB_PATH)]
print("En MarvelousDB tenemos un backup de {} personajes".format(len(json_db)))
En MarvelousDB tenemos un backup de 1402 personajes
Pandas es una librería de código abierto, con licencia BSD, que permite trabajar eficientemente analizando datos en Python.
A pandas se le da bien:
import pandas as pd
Un DataFrame es una estructura de 2 dimensiones con datos etiquetados en columnas. Los datos que componen un DataFrame pueden ser de distintos tipos. Piensa en un dataframe como si fuera una hoja de cáculo o una tabla SQL.
Se pueden crear a partir de:
Al crear un DataFrame, también se pueden especificar los índices (index, etiquetas para las filas) y las columnas. Si no se proporcionan estas etiquetas como argumentos pandas creará un DataFrame usando el sentido común.
En nuestro caso, leeremos todos los ficheros json y crearemos un DataFrame. Como tenemos información jerárquica en los ficheros json necesitamos normalizar los datos, pero pandas tiene funciones que lo hacen por nosotros.
json_to_dataframe = []
for json_file in json_db:
with open(json_file, 'r') as jf:
json_character = json.loads(''.join(jf.readlines()))
json_plain = pd.io.json.json_normalize(json_character)
json_to_dataframe.append(json_plain)
marvel_df = pd.concat(json_to_dataframe)
Podemos hacer esto en una super instrucción. Perdemos en legibilidad pero ganamos en molancia. ¡Totalmente desaconsejado!
df = pd.concat([pd.io.json.json_normalize(json.loads(''.join(open(json_file,'r').readlines()))) for json_file in json_db])
Podemos realizar operaciones lógica sobre todos los elementos de un DataFrame, son operaciones vectoriales. Esto acerlera los cálculos.
all(df == marvel_df)
True
¿Y que pinta tiene un DataFrame?
marvel_df.head()
comics.available | comics.collectionURI | comics.items | comics.returned | description | events.available | events.collectionURI | events.items | events.returned | id | ... | wiki.specieshistory | wiki.team_name | wiki.teamicon | wiki.technology | wiki.tie-ins | wiki.title_graphic | wiki.universe | wiki.weapons | wiki.weaponss | wiki.weight | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 36 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Marvel Adventures Super Heroes (201... | 36 | AIM is a terrorist organization bent on destro... | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 1009144 | ... | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | NaN | NaN | NaN |
0 | 43 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Incredible Hulks (2009) #619', 'res... | 43 | Formerly known as Emil Blonsky, a spy of Sovie... | 2 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Chaos War', 'resourceURI': 'http://... | 2 | 1009146 | ... | NaN | NaN | NaN | NaN | NaN | NaN | Marvel Universe | None | NaN | (Abomination) 980 lbs.; (Blonsky) 180 lbs. |
0 | 43 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Avengers Academy (2010) #21', 'reso... | 43 | 4 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Fear Itself', 'resourceURI': 'http:... | 4 | 1009148 | ... | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | He uses a prison ball-and-chain as a weapon, a... | NaN | 365 lbs. (variable) | |
0 | 8 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Uncanny X-Men (1963) #402', 'resour... | 8 | 1 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Age of Apocalypse', 'resourceURI': ... | 1 | 1009149 | ... | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | Unrevealed | NaN | Unrevealed | |
0 | 20 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Weapon X: Days of Future Now (Trade... | 20 | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 1009150 | ... | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | Agent Zero carries a wide array of weapons inc... | NaN | 230 lbs. |
5 rows × 89 columns
Los DataFrames de pandas están implementados sobre numpy, de modo que si queremos saber la longitud que tiene un DataFrame es exactamente igual que en numpy.
marvel_df.shape
(1402, 89)
Tenemos 89 columnas, es decir 89 campos que explorar sobre personajes de la Marvel, ¡Genial!
', '.join(marvel_df.columns.values)
'comics.available, comics.collectionURI, comics.items, comics.returned, description, events.available, events.collectionURI, events.items, events.returned, id, modified, name, resourceURI, series.available, series.collectionURI, series.items, series.returned, stories.available, stories.collectionURI, stories.items, stories.returned, thumbnail.extension, thumbnail.path, urls, wiki.Date_of_birth, wiki.Place_of_birth, wiki.abilities, wiki.aliases, wiki.appearance, wiki.base_of_operations, wiki.bio, wiki.bio_text, wiki.blurb, wiki.builder, wiki.categories, wiki.categorytext, wiki.citizenship, wiki.creator, wiki.creators, wiki.current_members, wiki.debut, wiki.distinguishing_features, wiki.dstinguishing_features, wiki.education, wiki.event_text, wiki.eyes, wiki.features, wiki.former_members, wiki.govenment, wiki.government, wiki.groups, wiki.hair, wiki.height, wiki.home_world, wiki.identity, wiki.key_characters, wiki.key_issues, wiki.leader, wiki.location, wiki.main_image, wiki.members, wiki.object_text, wiki.occupation, wiki.origin, wiki.other_members, wiki.owner, wiki.paraphernalia, wiki.place_of_birth, wiki.place_of_creation, wiki.place_text, wiki.points_of_interest, wiki.power, wiki.powers, wiki.real_name, wiki.relatives, wiki.significant_citizens, wiki.significant_issues, wiki.skin, wiki.special_limitations, wiki.specieshistory, wiki.team_name, wiki.teamicon, wiki.technology, wiki.tie-ins, wiki.title_graphic, wiki.universe, wiki.weapons, wiki.weaponss, wiki.weight'
En realidad no deberíamos lanzar las campanas al vuelo porque spoiler muchos de los campos están vacios
marvel_df.dropna()
comics.available | comics.collectionURI | comics.items | comics.returned | description | events.available | events.collectionURI | events.items | events.returned | id | ... | wiki.specieshistory | wiki.team_name | wiki.teamicon | wiki.technology | wiki.tie-ins | wiki.title_graphic | wiki.universe | wiki.weapons | wiki.weaponss | wiki.weight |
---|
0 rows × 89 columns
Series es un array de 1 dimensión etiquetado. Como una tabla con una única columna. Puede almacenar cualquier tipo de datos:
Se etiquetan en función del índice, si por ejemplo, el índice que le pasamos son fechas se creará una instancia de TimeSerie.
Cuando se hace una selección de 1 columna en un DataFrame se crea una Serie.
Vamos a usar los creadores de comics para jugar un poco con las Series.
#Sacamos la lista de creadores que hay en nuestros datos
creators_serie = marvel_df['wiki.creators'].dropna()
creators_serie.describe()
count 119 unique 37 top this has not been updated yet freq 44 dtype: object
#Renombramos la serie y el índice
creators_serie.name = 'Creadores de personajes'
creators_serie.index.name = 'creators'
# Podemos usar head o como estamos sobre series también podemos coger una porción de la lista
# creators_serie.head()
creators_serie[:20]
creators 0 this has not been updated yet 0 0 0 0 0 0 0 Peter David & Sam Keith 0 Bill Mantlo and Ed Hanigan 0 0 Stan Lee and Steve Ditko 0 Grant Morrison & Igor Kordey 0 Chris Claremont 0 0 Chris Claremont & Dave Cockrum 0 this has not been updated yet 0 this has not been updated yet 0 0 Stan Lee, Jack Kirby 0 Grant Morrison Name: Creadores de personajes, dtype: object
Vamos a eliminar todos aquellas filas en el DataFrame en las que el creador no exista, bien porque encontremos la cadena de error o bien porque el campo esté vacío.
default_string = creators_serie != "this has not been updated yet"
default_string.head()
creators 0 False 0 True 0 True 0 True 0 True Name: Creadores de personajes, dtype: bool
empty_string = creators_serie != ""
empty_string[:10]
creators 0 True 0 False 0 False 0 False 0 False 0 False 0 False 0 True 0 True 0 False Name: Creadores de personajes, dtype: bool
Ahora simplemente juntamos estas dos máscaras.
default_string and empty_string
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-47-544bf713079b> in <module>() ----> 1 default_string and empty_string /Users/ada/Dev/.virtualenvs/marvel/lib/python3.3/site-packages/pandas/core/generic.py in __nonzero__(self) 690 raise ValueError("The truth value of a {0} is ambiguous. " 691 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." --> 692 .format(self.__class__.__name__)) 693 694 __bool__ = __nonzero__ ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
A pesar de que la palabra reservada and exista también en pandas y pudiéramos pensar que funcionaría para unir series no es así, ya que la operación no se aplica elemento a elemento.
Sin embargo, pandas sabe que esto nos podría hacer falta y tenemos operadores que funcionan para elementos (& (and), | (or), ~(not))
creators_mask = default_string & empty_string
creators_mask[:10]
creators 0 False 0 False 0 False 0 False 0 False 0 False 0 False 0 True 0 True 0 False Name: Creadores de personajes, dtype: bool
creators_serie[creators_mask].head()
creators 0 Peter David & Sam Keith 0 Bill Mantlo and Ed Hanigan 0 Stan Lee and Steve Ditko 0 Grant Morrison & Igor Kordey 0 Chris Claremont Name: Creadores de personajes, dtype: object
Aquí ya tenemos buena parte de la información que queremos, pero vamos a separar los autores que trabajan juntos para poder contar cuantos personajes ha creado cada uno.
import re
creators = [re.split('&|and|,', line) for line in creators_serie[creators_mask]]
clean_creators = pd.Series([c.rstrip().lstrip() for creator in creators for c in creator])
clean_creators.head()
0 Peter David 1 Sam Keith 2 Bill Mantlo 3 Ed Hanigan 4 Stan Lee dtype: object
clean_creators.value_counts()
Chris Claremont 17 Stan Lee 9 John Byrne 7 Jack Kirby 4 Brian K. Vaughan 4 Adrian Alphona 4 Steve Ditko 3 Grant Morrison 2 Christina Weir 2 John Buscema 2 Scott Lobdell 2 Nunzio DeFilippis 2 Paul Smith 1 Jim Lee 1 Keron Grant 1 Marc Silvestri 1 John Romita Jr. 1 Brian Michael Bendis 1 Joe Bennett 1 Sam Keith 1 John Romita Sr. 1 Roger Cruz 1 Mark Millar 1 Andy Kubert (artist) 1 Javier Saltares 1 John Cassaday 1 Frank Miller 1 Bill Everett 1 Ed Hanigan 1 Len Wein 1 Peter David 1 Bill Mantlo 1 Marv Wolfman 1 Christopher Priest 1 Alan Moore 1 Keront Grant 1 Chris Bachalo 1 Howard Mackie 1 Mark Millar (writer) 1 Salvador Larroca 1 Art Adams 1 Alan Davis 1 Joss Whedon 1 Dave Cockrum 1 Mark Bagley 1 Igor Kordey 1 1 dtype: int64
Esperábamos que Stan Lee ganara y además tenemos la impresión de que ha creado más de 9 personajes, sin querer hacer un feo a Chris Claremont.
Según nuestras fuentes (Wikipedia & ComicVine):
from IPython.display import Image
Image(filename='stanvschris.png')
Obviamente es un problema de falta de datos. Por eso debemos ser muy cuidadosos con la confianza que tenemos en nuestros resultados. Un corpus con errores nos llevará a conclusiones erróneas, hay que ser conscientes de esto.
En la API Marvel no distingue entre personajes y equipos. Es decir, Los Vengadores tiene el mismo status de personaje que Rachel Grey, pero existe un campo en la wiki que nos permite diferenciar grupos de personajes: Former members. Intentaremos entonces filtrar para quedarnos sólo con los personajes.
Lo normal es que quisiéramos eliminar las filas que contienen nulos, y pandas tiene implementada una función para ello dropna. Pero lo que queremos es quedarnos con aquellas filas en cuya columna current_members tengamos un nulo, porque hemos comprobado que si no hay miembros es porque es un personaje.
marvel_df.dropna(subset=['wiki.current_members'])['name']
0 A.I.M. 0 Avengers 0 Brotherhood of Evil Mutants 0 Exiles 0 Fantastic Four 0 Force Works 0 Hellfire Club 0 Hydra 0 Imperial Guard 0 Marauders 0 Reavers 0 S.H.I.E.L.D. 0 Serpent Society 0 X-Force 0 X-Men ... 0 Sinister Six 0 ClanDestine 0 New X-Men 0 Masters of Evil 0 Generation X 0 Guardians of the Galaxy 0 U-Foes 0 Sentinels 0 New Mutants 0 Lightning Lords of Nepal 0 Nine-Fold Daughters of Xao 0 Confederates of the Curious 0 X-Babies 0 Lethal Legion 0 Brotherhood of Mutants (Ultimate) Name: name, Length: 70, dtype: object
%timeit (~marvel_df['wiki.current_members'].isnull())
import numpy as np
%timeit (np.invert(marvel_df['wiki.current_members'].isnull()))
1000 loops, best of 3: 206 µs per loop 1000 loops, best of 3: 226 µs per loop
not_groups_mask = marvel_df['wiki.current_members'].isnull()
not_groups_mask.head()
0 False 0 True 0 True 0 True 0 True Name: wiki.current_members, dtype: bool
marvel_df_characters = marvel_df[not_groups_mask]
marvel_df_characters.head()
comics.available | comics.collectionURI | comics.items | comics.returned | description | events.available | events.collectionURI | events.items | events.returned | id | ... | wiki.specieshistory | wiki.team_name | wiki.teamicon | wiki.technology | wiki.tie-ins | wiki.title_graphic | wiki.universe | wiki.weapons | wiki.weaponss | wiki.weight | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 43 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Incredible Hulks (2009) #619', 'res... | 43 | Formerly known as Emil Blonsky, a spy of Sovie... | 2 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Chaos War', 'resourceURI': 'http://... | 2 | 1009146 | ... | NaN | NaN | NaN | NaN | NaN | NaN | Marvel Universe | None | NaN | (Abomination) 980 lbs.; (Blonsky) 180 lbs. |
0 | 43 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Avengers Academy (2010) #21', 'reso... | 43 | 4 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Fear Itself', 'resourceURI': 'http:... | 4 | 1009148 | ... | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | He uses a prison ball-and-chain as a weapon, a... | NaN | 365 lbs. (variable) | |
0 | 8 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Uncanny X-Men (1963) #402', 'resour... | 8 | 1 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Age of Apocalypse', 'resourceURI': ... | 1 | 1009149 | ... | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | Unrevealed | NaN | Unrevealed | |
0 | 20 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Weapon X: Days of Future Now (Trade... | 20 | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 1009150 | ... | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | Agent Zero carries a wide array of weapons inc... | NaN | 230 lbs. | |
0 | 11 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Uncanny X-Men (1963) #181', 'resour... | 11 | 1 | http://gateway.marvel.com/v1/public/characters... | [{'name': 'Secret Wars', 'resourceURI': 'http:... | 1 | 1009151 | ... | NaN | NaN | NaN | NaN | NaN | NaN | Marvel Universe | NaN | 100 lbs |
5 rows × 89 columns
Se nos han colado The Watchers, éste es uno de los problemas del aprendizaje automático que los datos de entrada pueden contener errores, y el sistema que entrenemos debe ser capaz de generalizar suficiente como para sobreponerse a estos errores.
marvel_df_characters.shape
(1332, 89)
Hemos comenzado con 1402 personajes, eliminando los equipos nos quedamos con 1332. Es decir, hemos perdido el 4.9929 % de los datos. Nada grave por ahora.
Un caso de estudio...
from IPython.display import Image
Image(filename='oracle.jpg')
Por ejemplo, ¿qué encontramos en relación a la representación racial?
marvel_df_characters['wiki.skin'].dropna()
0 White (as GAmbit), Black (as Death) Name: wiki.skin, dtype: object
Otro ejemplo: sería muy interesante saber quienes son los líderes de los grupos de superhéroes, pero...
marvel_groups = marvel_df.dropna(subset=['wiki.current_members'])
marvel_groups['wiki.leader'].dropna()
0 Steve Rogers 0 Name: wiki.leader, dtype: object
Vamos a eliminar a todos aquellos personajes de los que no tenemos información. Sin datos no tenemos nada que analizar. A los científicos nos encantaría tener muchos datos disponibles, porque eso implicaría que podríamos hacer muchos experimentos y sacar conclusiones probablemente válidas. Lamentablemente la mayor parte del tiempo no podremos hacer machine learning sobre big data.
Vamos a querer quedarnos con la siguiente información:
# Agrupamos los datos para tener claro con que queremos trabajar
# No hay nadie con 'ocupation' así que lo quitamos
physical_data = {'wiki.hair':'hair', 'wiki.weight':'weight', 'wiki.height':'height', 'wiki.eyes':'eyes'}
cultural_data = {'wiki.education':'education', 'wiki.citizenship':'citizenship',
'wiki.place_of_birth':'place_of_birth', 'wiki.occupation':'occupation'}
personal_data = {'wiki.bio':'bio', 'wiki.bio_text':'bio', 'wiki.categories':'categories'}
marvelesque_data = {'wiki.abilities':'abilities', 'wiki.weapons':'weapons', 'wiki.powers': 'powers'}
data_keys = (list(physical_data.keys()) + list(cultural_data.keys()) +
list(personal_data.keys()) + ['name','comics.available'])
#+ marvelesque_data
print(data_keys)
['wiki.height', 'wiki.hair', 'wiki.eyes', 'wiki.weight', 'wiki.place_of_birth', 'wiki.citizenship', 'wiki.occupation', 'wiki.education', 'wiki.bio_text', 'wiki.bio', 'wiki.categories', 'name', 'comics.available']
clean_df = marvel_df_characters.dropna(subset = data_keys)
clean_df = clean_df[data_keys].set_index('name')
clean_df.shape
(762, 12)
clean_df[list(physical_data.keys())].head()
wiki.height | wiki.hair | wiki.eyes | wiki.weight | |
---|---|---|---|---|
name | ||||
Abomination (Emil Blonsky) | (Abomination) 6'8"; (Blonsky) 5'10" | (Abomination) None; (Blonsky) Blond | (Abomination) Green; (Blonsky) Blue | (Abomination) 980 lbs.; (Blonsky) 180 lbs. |
Absorbing Man | 6'4" (variable) | Bald | Blue | 365 lbs. (variable) |
Abyss | Unrevealed | Unrevealed | Unrevealed | Unrevealed |
Agent Zero | 6'3" | (Originally) Brown; (currently) Black | Blue | 230 lbs. |
Annihilus | 5'11" | None | Green | 200 lbs. |
clean_df[list(physical_data.keys())].describe()
wiki.height | wiki.hair | wiki.eyes | wiki.weight | |
---|---|---|---|---|
count | 762 | 762 | 762 | 762 |
unique | 213 | 223 | 165 | 307 |
top | Unrevealed | Black | Blue | Unrevealed |
freq | 44 | 165 | 236 | 48 |
clean_df[list(cultural_data.keys())].head()
wiki.place_of_birth | wiki.citizenship | wiki.occupation | wiki.education | |
---|---|---|---|---|
name | ||||
Abomination (Emil Blonsky) | Zagreb, Yugoslavia | Citizen of Croatia; former citizen of Yugoslavia | Professional Criminal, Former Spy | Unrevealed |
Absorbing Man | New York City, New York | U.S.A. with a criminal record | Professional criminal; former boxer | High school dropout |
Abyss | Unrevealed | Unrevealed | Cosmic sorcerer | Unrevealed |
Agent Zero | Unrevealed location in former East Germany | German | Mercenary, former government operative, freedo... | Unrevealed |
Annihilus | Planet of [[Arthros]], Sector 17A, [[Negative ... | Arthros | Conqueror, scavenger | Unrevealed |
¿Cómo diriáis que es físicamente el personaje típico de la marvel? (pandas lo sabe)
clean_df[list(cultural_data.keys())].describe()
wiki.place_of_birth | wiki.citizenship | wiki.occupation | wiki.education | |
---|---|---|---|---|
count | 762 | 762 | 762 | 762 |
unique | 412 | 262 | 636 | 357 |
top | Unrevealed | U.S.A. | Adventurer | Unrevealed |
freq | 156 | 230 | 31 | 236 |
De modo que el personaje arquetípico de la Marvel tiene el pelo negro y los ojos azules, es de EE.UU. se dedica a ser aventurero.
¡Los datos son caros! Teníamos 1402, pero en realidad solo tenemos 762 personajes con datos para poder trabajar. Hemos perdido el 45.6491 % de los datos.
Exploremos el dataframe que nos ha quedado:
clean_df.dtypes
wiki.height object wiki.hair object wiki.eyes object wiki.weight object wiki.place_of_birth object wiki.citizenship object wiki.occupation object wiki.education object wiki.bio_text object wiki.bio object wiki.categories object comics.available int64 dtype: object
clean_df.describe()
comics.available | |
---|---|
count | 762.000000 |
mean | 53.292651 |
std | 179.820372 |
min | 0.000000 |
25% | 2.000000 |
50% | 10.000000 |
75% | 33.750000 |
max | 2575.000000 |
clean_df[clean_df['comics.available'] == 2575.000000]
wiki.height | wiki.hair | wiki.eyes | wiki.weight | wiki.place_of_birth | wiki.citizenship | wiki.occupation | wiki.education | wiki.bio_text | wiki.bio | wiki.categories | comics.available | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
name | ||||||||||||
Spider-Man | 5'10" | Brown | Hazel | 167 lbs. | Forest Hills, New York | U.S.A. | Scientist and inventor; former freelance photo... | College graduate (biophysics major), doctorate... | The bite of an irradiated spider granted high-... | The bite of an irradiated spider granted high-... | [Avengers, Civil War, Heroes, Marvel Knights, ... | 2575 |
¡Spiderman es el rey del cómic!
Antes de ponernos a jugar con los datos (más), tenemos una columna de la que se pude sacar mucho partido "wiki.categories"
clean_df['wiki.categories']
name Abomination (Emil Blonsky) [Avengers, Deceased, Hulk, International, Vill... Absorbing Man [Avengers, Civil War, Villains] Abyss [Cosmic, Magic, Villains] Agent Zero [Heroes, X-Men, Villains, International, Mutants] Annihilus [Annihilation, Cosmic, Fantastic Four, Villains] Apocalypse [Mutants, Villains, International, X-Men] Spider-Girl (Anya Corazon) [Women, Heroes, Spider-Man, Civil War, Initiat... Arcade [Spider-Man, Villains, X-Men] Archangel [X-Men, Heroes, Reformed Villains, Mutants] Arclight [X-Men, Women, Villains, Mutants, People who u... Aurora [Heroes, Women, X-Men, International, Canadian... Avalanche [X-Men, International, Villains, Mutants] Banshee [X-Men, People who used to be dead but aren't ... Baron Strucker [Villains, International, Thunderbolts, People... Baron Zemo (Heinrich Zemo) [Villains, Avengers, Deceased, International] ... Contessa (Vera Vidal) [Heroes, Women] Chores MacGillicudy [Deceased, Heroes] Iron Fist (Wu Ao-Shi) [Heroes, Women, Deceased] Loa [X-Men, Women, Heroes, Mutants] Grey Gargoyle [Avengers, International, Villains] Nekra [Avengers, Mutants, Villains, Women] Miss America [Women, Deceased] Whizzer (Stanley Stewart) [Heroes, Avengers, Squadron Supreme] Scarlet Spider (Kaine) [Villains, Spider-Man, people who used to be d... Hope Summers [Mutants, Women, X-Men] Enchantress (Sylvie Lushton) [Avengers, Magic, Women] Hank Pym [Heroes, Avengers, Civil War, Initiative] Azazel (Mutant) [Magic, Mutants, Villains, X-Men] Spider-Man (House of M) [House of M] Gargoyle (Yuri Topolov) [Hulk, Villains] Name: wiki.categories, Length: 762, dtype: object
A priori no tenemos información de que personajes son hombres, mujeres o alienígenas. Pero Marvel debió intuir que nos podría interesar el papel de las mujeres en los cómics y nos incluyó una categoría: "Mujeres", que nos va a facilitar la vida un montón. Vamos a crear dos nuevas columnas en el DataFrame:
women = clean_df['wiki.categories'].map(lambda x: 'Women' in x)
clean_df['Women'] = women
women[:5]
name Abomination (Emil Blonsky) False Absorbing Man False Abyss False Agent Zero False Annihilus False Name: wiki.categories, dtype: bool
# ~ Esto es una negación element-wise
print("Women: #{}, men #{}".format(clean_df[women].shape[0],clean_df[~women].shape[0]))
Women: #199, men #563
Es decir, tenemos 199 personajes femeninos y 563 masculinos. Es decir solo el 26% de los personajes son femeninos.
villain = clean_df['wiki.categories'].map(lambda x: 'Villains' in x)
clean_df['Villain'] = villain
men = ~women
gender_data = {'Women':{'Heroes':0,'Villains':0},'Men':{'Heroes':0,'Villains':0}}
# Women and villains
gender_data['Women']['Villains'] = clean_df[villain & women].shape[0]
# Women and heroes
gender_data['Women']['Heroes'] = clean_df[~villain & women].shape[0]
# Men and villains
gender_data['Men']['Villains'] = clean_df[villain & men].shape[0]
# Men and heroes
gender_data['Men']['Heroes'] = clean_df[~villain & men].shape[0]
gender_data
{'Women': {'Villains': 30, 'Heroes': 169}, 'Men': {'Villains': 201, 'Heroes': 362}}
%matplotlib inline
import matplotlib.pyplot as plt
n_groups = 2
men_data = (gender_data['Men']['Villains'], gender_data['Men']['Heroes'])
women_data = (gender_data['Women']['Villains'], gender_data['Women']['Heroes'])
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.4
opacity = 0.5
rects1 = plt.bar(index, men_data, bar_width,
alpha=opacity,
color='b',
label='Hombres')
rects2 = plt.bar(index + bar_width, women_data, bar_width,
alpha=opacity,
color='r',
label='Mujeres')
plt.xlabel('Rol')
plt.ylabel('Número de personajes')
plt.title('Distribución por género y roles')
plt.xticks(index + bar_width, ('Villanos', 'Héroes'))
plt.legend(loc=0, borderaxespad=1.)
plt.show()
Otro caso de además de contrastar el número de villanos, es el de salir de dudas con respecto a una sospecha que tenemos. La cantidad de pelirrojas que hay en los cómics!
La ocurrencia del cabello rojo en la población es del 1-2% globalmente y del 2-6% en poblaciones con ascendencia del norte u oeste de Europa. Irlanda y Escocia destacan con un 10% y 13% de ocurrencia, respectivamente.
Veamos qué ocurre en Marvel.
red_heads = clean_df['wiki.hair'].map(lambda x: 'Red' in x)
clean_df['red_heads'] = red_heads
red_heads[:5]
name Abomination (Emil Blonsky) False Absorbing Man False Abyss False Agent Zero False Annihilus False Name: wiki.hair, dtype: bool
print("Red heads: #{}, Non-red heads #{}".format(clean_df[red_heads].shape[0],clean_df[~red_heads].shape[0]))
Red heads: #76, Non-red heads #686
non_red = ~red_heads
hair_data = {'Women':{'Red heads':0,'Non-red heads':0},'Men':{'Red heads':0,'Non-red heads':0}}
# Red haired women
hair_data['Women']['Red heads'] = clean_df[red_heads & women].shape[0]
# Non-red haired women
hair_data['Women']['Non-red heads'] = clean_df[~red_heads & women].shape[0]
# Red haired men
hair_data['Men']['Red heads'] = clean_df[red_heads & men].shape[0]
# Non-red haired women
hair_data['Men']['Non-red heads'] = clean_df[~red_heads & men].shape[0]
hair_data
{'Women': {'Non-red heads': 169, 'Red heads': 30}, 'Men': {'Non-red heads': 517, 'Red heads': 46}}
#¿Qué es esto?
redwomen = 30 / 199.
redmen = 46 / 563.
print('Women: {0:5.2f}%, Men: {1:5.2f}%'.format(redwomen * 100, redmen * 100))
Women: 15.08%, Men: 8.17%
Es "simplemente" una serie de algoritmos que permiten que una máquina aprenda a partir de datos. Por lo tanto los ingredientes básicos para nuestra receta son datos + algoritmos.
Para hacer este cócktel necesitamos las habilidades de ingeniería informática, estadística y conociemiento del problema.
Y a partir del aprendizaje automático podremos:
import sys
import matplotlib
%matplotlib inline
import sklearn
print("Versión de Python: ", sys.version)
print("Versión de Pandas: ", pd.version.short_version)
print("Versión de Numpy: ", np.version.short_version)
print("Versión de Matplotlib: ", matplotlib.__version__)
print("Versión de Pandas: ", pd.version.short_version)
print("Versión de scikit-learn: ", sklearn.__version__)
Versión de Python: 3.3.4 (default, Jul 25 2014, 00:04:27) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] Versión de Pandas: 0.14.1 Versión de Numpy: 1.8.2 Versión de Matplotlib: 1.4.0 Versión de Pandas: 0.14.1 Versión de scikit-learn: 0.15.2
Algoritmo simple para clasificar muestras.
Necesita muestras etiquetadas.
Calcula la distancia a los vecinos de la muestra a etiquetar. Cada vecino (hasta N o K) vota.
Añadir nuevos datos no es gratis.
Es computacionalmente complejo.
La mayoría no siempre tiene la razón.
Hipótesis: Las características físicas diferencian a los personajes femeninos de los masculinos
clean_df['wiki.weight'].describe()
count 762 unique 307 top Unrevealed freq 48 dtype: object
physical = clean_df[clean_df['wiki.weight'] != "Unrevealed"]
any(physical['wiki.height'] == "Unrevealed")
False
¡Genial! Al menos los que no tienen peso son los mismo que no tienen altura
physical_knn = physical[['wiki.weight', 'wiki.height', 'Women', 'Villain']]
physical_knn.dtypes
wiki.weight object wiki.height object Women bool Villain bool dtype: object
Queremos que sean enteros
physical_knn
wiki.weight | wiki.height | Women | Villain | |
---|---|---|---|---|
name | ||||
Abomination (Emil Blonsky) | (Abomination) 980 lbs.; (Blonsky) 180 lbs. | (Abomination) 6'8"; (Blonsky) 5'10" | False | True |
Absorbing Man | 365 lbs. (variable) | 6'4" (variable) | False | True |
Agent Zero | 230 lbs. | 6'3" | False | True |
Annihilus | 200 lbs. | 5'11" | False | True |
Apocalypse | 300 lbs. (variable) | Variable (usually around 7') | False | True |
Spider-Girl (Anya Corazon) | 115 lbs. | 5'3" | True | False |
Arcade | 140 lbs. | 5'6" | False | True |
Archangel | 150 lbs. | 6' | False | False |
Arclight | 126 lbs. | 5'8" | True | True |
Aurora | 140 lbs. | 5'11" | True | False |
Avalanche | 195 lbs. | 5'7" | False | True |
Banshee | 170 lbs. | 6' | False | False |
Baron Strucker | 225 lbs. | 6'2" | False | True |
Baron Zemo (Heinrich Zemo) | 180 lbs | 5'9" | False | True |
Bastion | 375 lbs. | 6'3" | False | True |
Batroc the Leaper | 225 lbs. | 6’ | False | True |
Battering Ram | 380 lbs. | 7'4" | False | False |
Beak | 140 | 5'9" | False | False |
Beast | 402 lbs. | 5'11" | False | False |
Beef | 250 lbs. | 6'6" | False | True |
Beta-Ray Bill | (As Bill) 480 lbs.; (as Walters) 132 lbs. | (As Bill) 6'7"; (as Walters) 5'9" | False | False |
Big Wheel | 140 lbs. | 5'5" | False | False |
Bishop | 275 lbs. | 6'6" | False | False |
Black Bolt | 210 lbs | 6' 2" | False | False |
Black Cat | 120 lbs. | 5'10" | True | False |
Black Knight | 180 lbs. | 5’ 11” | False | False |
Black Panther | 200 lbs. | 6' | False | False |
Black Tom | (originally) 200 lbs.; (currently) Variable | (originally) 6'0"; (currently) Variable | False | True |
Black Widow | 131 lbs. | 5'7" | True | False |
Blackheart | 679 lbs (Variable) | 6'10" (Variable) | False | True |
... | ... | ... | ... | ... |
Starhawk (Stakar Ogord) | 450 lbs. | 6'4" | False | False |
Vance Astro | (as an adult, with protective gear) 250 lbs. | (as an adult) 6'1" | False | False |
Jamie Braddock | 151 lbs. | 6'1" | False | True |
Jazinda | 135 lbs. (variable) | 5'6" (variable) | True | False |
Tinkerer | 120 lbs. | 5'4" | False | True |
Cosmo | 70 lbs. | 23" (at withers) | False | False |
Red Hulk | 245 lbs. (Ross); 1200 lbs. (Red Hulk) | 6'1" (Ross); 7' (Red Hulk) | False | False |
American Eagle (Jason Strongbow) | 200 lbs | 6' | False | False |
Cottonmouth | 200 lbs. | 6' | False | True |
Vanisher (Telford Porter) | 175 lbs. | 5'5" | False | True |
Sphinx (Anath-Na Mut) | 450 lbs | 7'2" | False | True |
Molten Man | 550 lbs. | 6’5” | False | True |
Henry Peter Gyrich | 205 lbs. | 6’ 1” | False | False |
Cypher | 150 lbs. | 5'9" | False | False |
Karma | 119 lbs. | 5'4" | True | False |
She-Hulk (Lyra) | 220 lbs. | 6'6" | True | False |
She-Hulk (Ultimate) | 110 lbs. (as Betty); Unrevealed (as She-Hulk) | 5'6" (as Betty); Unrevealed (as She-Hulk) | False | False |
Talon (Fraternity of Raptors) | 180 lbs. | 6'1" | False | False |
Angel (Golden Age) | False | False | ||
Meggan | Usually 120 lbs., 130 lbs. in true form | Usually 5'7", 5'10" in true form | True | False |
Loa | 139 lbs. | 5'8" | True | False |
Grey Gargoyle | (normal) 175 lbs.; (stone) 750 lbs. | 5’11” | False | True |
Nekra | 145 lbs. | 5' 11" | True | True |
Miss America | 130 lbs | 5'8" | True | False |
Whizzer (Stanley Stewart) | 180 lbs. | 5'11" | False | False |
Scarlet Spider (Kaine) | 250 lbs. | 6'4" | False | True |
Hank Pym | varies, normally 185 lbs. | varies, normally 6' | False | False |
Azazel (Mutant) | 149 lbs. | 6' | False | True |
Spider-Man (House of M) | 165 lbs. | 5'10" | False | False |
Gargoyle (Yuri Topolov) | 215 lbs. | 4'6" | False | True |
714 rows × 4 columns
physical_knn.applymap(str)
physical_knn = physical_knn[physical_knn['wiki.weight'].str.contains("lbs.")]
physical_knn = physical_knn[physical_knn['wiki.height'].str.contains('’|\'')]
def get_weight(pandas_weight):
""" Return first int parameter in a string """
for p in pandas_weight.split():
try:
return int(p)
except ValueError:
pass
physical_knn['wiki.weight'] = physical_knn['wiki.weight'].map(get_weight)
FOOT = 30.48
INCH = 2.54
def get_height(pandas_height):
""" Return first int parameter in a string """
height = None
for p in pandas_height.split():
colon_split = p.split('\'')
strange_colon_split = p.split('’')
if len(colon_split) == 2 :
height = colon_split
elif len(colon_split) == 4 :
height = colon_split[:2]
height[1] += "\'"
elif len(strange_colon_split) == 2 :
height = strange_colon_split
elif len((pandas_height.split()[-1]).split('\'')) == 2:
height = pandas_height.split()[-1].split('\'')
elif len((pandas_height.split()[-1]).split('’')) == 2:
height = pandas_height.split()[-1].split('’')
else:
universe_split = ((pandas_height.split(';')[0]).split()[-1]).split('\'')
if len(universe_split) == 2:
height = universe_split
else:
space_split = (pandas_height.split(';')[0].split()[-2:])
if space_split[0][-1] == '\'' or space_split[0][-1] == '’':
height = [space_split[0][:-1], space_split[1]]
else:
return None
if height:
try:
foot_part = int(height[0])
inch_part = int(height[1][:-1]) if height[1][:-1].strip() else 0
return (foot_part*FOOT + inch_part*INCH)
except ValueError:
pass
physical_knn['wiki.height'] = physical_knn['wiki.height'].map(get_height)
physical_knn = physical_knn.dropna()
physical_knn.dtypes
wiki.weight float64 wiki.height float64 Women bool Villain bool dtype: object
physical_knn.shape
(578, 4)
¡Ahora ya podemos empezar a trabajar!
from math import floor
TRAIN_PERCENTAGE = 0.8
train_section = floor(physical_knn.shape[0]*TRAIN_PERCENTAGE)
test_section = physical_knn.shape[0]-train_section
print("Usaremos {} personajes para entrenar el clasificador y"\
" {} para probar el clasificador entrenado.".format(train_section, test_section))
Usaremos 462 personajes para entrenar el clasificador y 116 para probar el clasificador entrenado.
train_rows = np.random.choice(physical_knn.index.values, train_section)
test_rows = np.setdiff1d(physical_knn.index.values,train_rows)
physical_knn.loc[train_rows[0]]
wiki.weight 190 wiki.height 190.5 Women True Villain False Name: Cerise, dtype: object
Separamos datos y etiquetas
X_train = physical_knn.loc[train_rows][['wiki.weight','wiki.height']]
y_train = physical_knn.loc[train_rows]['Women']
X_test = physical_knn.loc[test_rows][['wiki.weight','wiki.height']]
y_test = physical_knn.loc[test_rows]['Women']
Vamos a echarle un vistazo a los datos para comprobar la complejidad de la tarea:
for i, group in physical_knn.groupby(women):
if not i:
ax = group.plot(kind='scatter', x='wiki.height', y='wiki.weight',
color='DarkBlue', label='Men');
else:
print(i)
group.plot(kind='scatter', x='wiki.height', y='wiki.weight',
color='DarkGreen', label='Women', ax=ax)
print(physical_knn.groupby(women).aggregate(np.mean))
True wiki.weight wiki.height Women Villain wiki.categories False 243.191943 183.439763 0 0.355450 True 149.628205 170.701026 1 0.134615
Creamos una instancia del clasificador
from sklearn import neighbors
classifier = neighbors.KNeighborsClassifier()
classifier.fit(X_train, y_train)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_neighbors=5, p=2, weights='uniform')
predict = classifier.predict(X_test)
predict
array([False, False, False, False, False, False, True, True, False, False, False, False, False, False, True, False, False, False, False, False, True, False, True, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, True, False, False, False, False, False, True, True, True, False, True, False, False, False, False, False, False, True, False, False, False, True, True, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, False, False, True, False, True, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, False, True, True, False, False, False, False, True, False, False, False, False, False, False, False, True, False, False, False, False, True, True, True, True, False, True, True, False, True, False, False, True, True, False, False, False, False, True, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, True, False, False, False, False, False, False, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, True, True, False, True, False, False, True, False, True, False, True, True, False, False, True, True, False, False, False, False, False, True, False, False, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, True, False, False, False, False, True, True, False, False, False, False, False, False, False, True, False, False, False, False], dtype=bool)
from sklearn import metrics
accuracy = metrics.accuracy_score(y_test, predict)
precision, recall, f1, _ = metrics.precision_recall_fscore_support(y_test, predict)
print("* Acierto: {:.2f}%".format(accuracy*100))
print("* Precisión: {}\n* Exhaustividad: {}.\n* F1-Score: {}".format(accuracy*100, precision, recall, f1))
* Acierto: 82.13% * Precisión: 82.12927756653993 * Exhaustividad: [ 0.89162562 0.58333333]. * F1-Score: [ 0.87864078 0.61403509]
%%latex
\begin{align}
accuray = \frac{\text{# True Positives}+\text{# True Negatives}}
{\text{# True Positives}+\text{False Positives} + \text{False Negatives} + \text{True Negatives}}
\end{align}
%%latex
\begin{align}
precision = \frac{\text{# True Positives}} {\text{# True Positives}+\text{False Positives}}
\end{align}
from matplotlib.colors import ListedColormap
cmap_light = ListedColormap(['#AAAAFF', '#AAFFAA'])
cmap_bold = ListedColormap(['#0000FF', '#00FF00'])
step = 2
x_min, x_max = X_test['wiki.height'].min() - 1, X_test['wiki.height'].max() + 1
y_min, y_max = X_test['wiki.weight'].min() - 1, X_test['wiki.weight'].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, step),
np.arange(y_min, y_max, step))
prediction = classifier.predict(X_test)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
plt.scatter( X_test['wiki.height'], X_test['wiki.weight'], c=y_test, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), 400)
(19.0, 400)
import time
for n in range(1,20, 2):
classifier = neighbors.KNeighborsClassifier(n_neighbors=n)
classifier.fit(X_train, y_train)
predict = classifier.predict(X_test)
accuracy = metrics.accuracy_score(y_test, predict)
print("({}) Acierto: {:.2f}%".format(n, accuracy*100))
(1) Acierto: 74.52% (3) Acierto: 79.09% (5) Acierto: 82.13% (7) Acierto: 84.03% (9) Acierto: 85.93% (11) Acierto: 86.69% (13) Acierto: 86.31% (15) Acierto: 85.93% (17) Acierto: 86.31% (19) Acierto: 85.93%
Hipótesis: Las características sociales (nacionalidad y educación) diferencian a los personajes femeninos de los masculinos
cultural_knn = clean_df[['wiki.education', 'wiki.citizenship', 'Women', 'Villain']]
cultural_knn.dtypes
wiki.education object wiki.citizenship object Women bool Villain bool dtype: object
usa = cultural_knn['wiki.citizenship'].map(lambda x: 'U.S.A.' in x)
cultural_knn['USA'] = usa
cultural_knn = cultural_knn.drop('wiki.citizenship',1)
cultural_knn
wiki.education | Women | Villain | USA | |
---|---|---|---|---|
name | ||||
Abomination (Emil Blonsky) | Unrevealed | False | True | False |
Absorbing Man | High school dropout | False | True | True |
Abyss | Unrevealed | False | True | False |
Agent Zero | Unrevealed | False | True | False |
Annihilus | Unrevealed | False | True | False |
Apocalypse | Centuries of study and experience | False | True | False |
Spider-Girl (Anya Corazon) | High school student | True | False | True |
Arcade | Unrevealed | False | True | True |
Archangel | College degree from Xavier's School for Gifted... | False | False | True |
Arclight | Unrevealed; some military training | True | True | True |
Aurora | Madame DuPont's School for Girls | True | False | False |
Avalanche | Unrevealed | False | True | False |
Banshee | Bachelor of Science degree from Trinity Colleg... | False | False | False |
Baron Strucker | University graduate | False | True | False |
Baron Zemo (Heinrich Zemo) | Doctorate Degree | False | True | False |
Bastion | Inapplicable | False | True | False |
Batroc the Leaper | Military training | False | True | False |
Battering Ram | Unrevealed | False | False | True |
Beak | Some college-level courses | False | False | True |
Beast | Ph.D. Biophysics | False | False | True |
Beef | False | True | True | |
Beta-Ray Bill | Unrevealed | False | False | False |
Big Wheel | College educated | False | False | True |
Bishop | Unrevealed | False | False | True |
Black Bolt | Unrevealed | False | False | False |
Black Cat | College graduate (arts major) | True | False | True |
Black Knight | Unrevealed | False | False | False |
Black Panther | Ph.D in physics | False | False | False |
Black Tom | Oxford University | False | True | False |
Black Widow | Unrevealed; intensive espionage training throu... | True | False | False |
... | ... | ... | ... | ... |
Vanisher (Telford Porter) | False | True | False | |
Sphinx (Anath-Na Mut) | studied under caretakers of Arcturus, absorbed... | False | True | False |
Molten Man | College graduate | False | True | True |
Henry Peter Gyrich | University graduate | False | False | False |
Reptil | Unrevealed | False | False | False |
Cypher | High school, university level courses in langu... | False | False | True |
Karma | unrevealed | True | False | False |
She-Hulk (Lyra) | Tutored by the [[Gynosure]] | True | False | False |
She-Hulk (Ultimate) | Communications degree from Berkeley, studies a... | False | False | True |
Talon (Fraternity of Raptors) | Unrevealed, but the Datasong of Talon's armor ... | False | False | False |
Angel (Golden Age) | Unrevealed | False | False | False |
Romulus | Unrevealed, possible knowledge of genetics. | False | True | False |
Meggan | No formal schooling; self-taught from watching... | True | False | False |
Lucky Pierre | Unrevealed | False | False | False |
Shadu the Shady | Unrevealed | False | False | False |
Contessa (Vera Vidal) | Unrevealed | True | False | False |
Chores MacGillicudy | Unrevealed | False | False | False |
Iron Fist (Wu Ao-Shi) | Unrevealed | True | False | False |
Loa | Currently in high school level courses | True | False | False |
Grey Gargoyle | Unrevealed | False | True | False |
Nekra | Elementary school | True | True | False |
Miss America | Unrevealed | True | False | False |
Whizzer (Stanley Stewart) | High school Graduate | False | False | False |
Scarlet Spider (Kaine) | Possesses memories of Peter Parker's college e... | False | True | False |
Hope Summers | Unrevealed | True | False | False |
Enchantress (Sylvie Lushton) | Unrevealed | True | False | False |
Hank Pym | extensive knowledge in various fields of scien... | False | False | False |
Azazel (Mutant) | Unrevealed | False | True | False |
Spider-Man (House of M) | Ph.D in biochemistry | False | False | True |
Gargoyle (Yuri Topolov) | False | True | False |
762 rows × 4 columns
def delete_without_education(cultural_knn, not_education):
for word in not_education:
cultural_knn = cultural_knn[~cultural_knn['wiki.education'].str.contains(word)]
return cultural_knn
#Eliminar todo los que sean "Unreveal"
cultural_knn = delete_without_education(cultural_knn, ["Unrevealed", "unrevealed", 'None', 'none', 'Not applicable',
'Unknown', 'unknown', 'Inapplicable', 'Limited'])
cultural_knn = cultural_knn[cultural_knn['wiki.education'] != '']
# Crear los grupos de niveles educativos
education = cultural_knn['wiki.education']
unfinished = education.map(lambda x: 'unfinished' in x or 'dropout' in x or 'incomplete' in x
or 'drop-out' in x or 'No official schooling' in x
or 'No formal education' in x or 'Unfinished' in x
or 'Incomplete' in x)
education[unfinished].tolist()
education = education[~unfinished]
phd = education.map(lambda x: 'Ph.D' in x or 'master' in x or 'Masters' in x or 'PhD' in x
or 'doctorate' in x or 'Doctorate' in x or 'Ph.d.' in x or 'Doctoral' in x
or 'NASA' in x or 'Journalism graduate' in x or 'scientist' in x
or 'Geneticist' in x or 'residency' in x)
education[phd].tolist()
education = education[~phd]
college = education.map(lambda x: 'College' in x or 'college' in x or 'University' in x
or 'post-graduate' in x or 'B.A' in x or 'B.S.' in x or 'university' in x
or 'Master' in x or 'Collage' in x or 'Degree' in x or 'degree' in x
or 'Engineering' in x or 'engineer' in x or 'programming' in x
or 'Programming' in x or 'Doctor' in x or 'Medical school' in x
or 'higher education' in x)
education[college].tolist()
education = education[~college]
militar = education.map(lambda x: 'Military' in x or 'Xandarian Nova Corps' in x or 'FBI' in x
or 'S.H.I.E.L.D.' in x or 'military' in x or 'Nicholas Fury' in x
or 'Warrior' in x or 'combat' in x or 'Combat' in x or 'Soldier' in x
or 'spy academy' in x or 'Police' in x or 'warfare' in x or 'Public Eye' in x)
education[militar].tolist()
education = education[~militar]
hs = education.map(lambda x: 'High school' in x or 'high school' in x or 'High-school' in x
or 'High School' in x or 'high School' in x)
education[hs].tolist()
education = education[~hs]
tutored = education.map(lambda x: 'Tutored' in x or 'tutors' in x or 'tutored' in x
or 'Mentored' in x or 'Home schooled' in x or 'Private education' in x)
education[tutored].tolist()
education = education[~tutored]
autodidacta = education.map(lambda x: 'Self-taught' in x or 'self-taught' in x
or 'Little or no formal schooling' in x or 'Little formal schooling' in x
or 'Some acting school' in x or 'through observation' in x)
education[autodidacta].tolist()
education = education[~autodidacta]
special = education.map(lambda x: 'Sorcery' in x or 'cosmic experience' in x or 'magic' in x
or 'Priests of Pama' in x or 'Xavier Institute' in x or 'Carlos Javier’s' in x
or 'Self educated' in x or 'Shao-Lom' in x or 'Centuries of study and experience' in x
or 'Askani' in x or 'Madame DuPont' in x or 'Titanian' in x or 'arcane arts' in x
or 'Muir-MacTaggert' in x or 'Uploaded data' in x or 'Programmed' in x
or 'Accelerated' in x or 'Inhumans' in x or 'Able to access knowledge' in x
or 'lifetime' in x or 'Watchers\' homeworld' in x or 'Uranian Eternals' in x
or 'Arcturus' in x or 'Oatridge School for Boys' in x)
education[special].tolist()
education = education[~special]
basic = education.map(lambda x: 'Self-taught' in x or 'Homed schooled' in x or 'graduate school' in x
or 'Elementary school' in x or 'Secondary school' in x or 'school graduate' in x
or 'Boarding school' in x or 'Massachusetts Academy' in x
or 'school graduate' in x)
education[basic].tolist()
education = education[~basic]
educational_dict = {'autodidacta': autodidacta, 'unfinished': unfinished, 'superior': phd, 'college':college,
'militar': militar, 'high school':hs, 'tutored': tutored, 'special':special, 'basic': basic}
numeric = {'autodidacta': 1, 'unfinished': 2, 'superior': 3, 'college':4,
'militar': 5, 'high school':6, 'tutored': 7, 'special':8, 'basic': 9}
def clean_education_levels(educational_dict, cultural_knn):
""" It will use our new categories in the wiki.education column"""
for k, education in educational_dict.items():
index = education[education.loc[:]].index
for character in index:
cultural_knn.loc[character, 'wiki.education'] = numeric[k]
clean_education_levels(educational_dict, cultural_knn)
TRAIN_PERCENTAGE = 0.8
train_section = floor(cultural_knn.shape[0]*TRAIN_PERCENTAGE)
test_section = cultural_knn.shape[0]-train_section
print("Usaremos {} personajes para entrenar el clasificador y"\
" {} para probar el clasificador entrenado.\n".format(train_section, test_section))
train_rows = np.random.choice(cultural_knn.index.values, train_section)
test_rows = np.setdiff1d(cultural_knn.index.values,train_rows)
print(cultural_knn.loc[train_rows[0]])
Usaremos 344 personajes para entrenar el clasificador y 87 para probar el clasificador entrenado. wiki.education 4 Women False Villain True USA True Name: Toxin (Eddie Brock), dtype: object
for i, group in cultural_knn.groupby(women):
print(group)
wiki.education Women Villain USA name Absorbing Man 2 False True True Apocalypse 8 False True False Archangel 4 False False True Banshee 4 False False False Baron Strucker 4 False True False Baron Zemo (Heinrich Zemo) 3 False True False Batroc the Leaper 5 False True False Beak 4 False False True Beast 3 False False True Big Wheel 4 False False True Black Panther 3 False False False Black Tom 4 False True False Blackheart 7 False True False Blade 6 False False True Blizzard 6 False False True Cable 8 False False True Luke Cage 2 False False True Cannonball 6 False False True Captain America 5 False False True Captain Britain 3 False False False Captain Marvel (Mar-Vell) 5 False False False Captain Stacy 4 False False True Carnage 6 False True True Chamber 8 False False False Cloak 2 False False True Malcolm Colcord 5 False True False Colossus 4 False False True Constrictor 4 False False True Count Nefaria 4 False True True Crimson Dynamo 3 False True False ... ... ... ... ... Mindworm 6 False False True Calamity 4 False False False Skaar 2 False False False Supernaut 5 False False False Hypno-Hustler 2 False True False Vin Gonzales 5 False False False Blue Shield 6 False False False Crimson Crusader 9 False False False Cobalt Man 3 False True False Jackal 3 False True True High Evolutionary 3 False True False Anole 6 False False True Justin Hammer 4 False True False Junta 5 False False True Omega Sentinel 5 False False False 3-D Man 5 False False True Nightcrawler (Ultimate) 6 False False False Angel (Ultimate) 4 False False False Vance Astro 4 False False False Jamie Braddock 4 False True False Tinkerer 4 False True False Sphinx (Anath-Na Mut) 8 False True False Molten Man 4 False True True Henry Peter Gyrich 4 False False False Cypher 4 False False True She-Hulk (Ultimate) 4 False False True Whizzer (Stanley Stewart) 6 False False False Scarlet Spider (Kaine) 4 False True False Hank Pym 3 False False False Spider-Man (House of M) 3 False False True [305 rows x 4 columns] wiki.education Women Villain USA name Spider-Girl (Anya Corazon) 6 True False True Aurora 8 True False False Black Cat 4 True False True Catseye 9 True True True Clea 4 True False False Crystal 7 True False False Dagger 2 True False True Darkstar 5 True False False Dazzler 4 True False True Dust 6 True False False Elektra 4 True False False Expediter 7 True False False Firestar 4 True False True Emma Frost 4 True False True Husk 8 True False False Invisible Woman 2 True False True Jocasta 2 True False False Jessica Jones 6 True False False Jubilee 6 True False False Lady Deathstrike 7 True True False Magik (Illyana Rasputin) 6 True False False Magma (Amara Aquilla) 4 True False False Marrow 5 True False True Rachel Grey 4 True False True Alicia Masters 4 True False False Medusa 7 True False False Meltdown 2 True False True Moondragon 8 True False True Namorita 2 True False False Nocturne 4 True False True ... ... ... ... ... Hobgoblin (Robin Borne) 3 True True True Nova (Frankie Raye) 4 True False False Puck (Zuzha Yu) 2 True False False Wind Dancer 6 True False False Sway 8 True False True Mantis 8 True False False Joystick 2 True False False Satana 7 True False True Turbo 4 True False True M (Monet St. Croix) 9 True False False Bloodaxe 3 True True True Layla Miller 9 True False True Beyonder 7 True False False Cammi 2 True False True Tana Nile 9 True False False Praxagora 8 True False False Skreet 8 True False False Thena 8 True False False Mockingbird 3 True False False Menace 4 True True True Geiger 4 True False False Carlie Cooper 4 True False True Imp 9 True False False Armor (Hisako Ichiki) 8 True False True Thundra 5 True False False Vapor 4 True True False She-Hulk (Lyra) 7 True False False Meggan 1 True False False Loa 6 True False False Nekra 9 True True False [126 rows x 4 columns]
for i, group in cultural_knn.groupby(women):
if not i:
area = (np.pi * (group.shape[0])**2)*.002
ax = group.plot(kind='scatter', x='wiki.education', y='USA', s=area,
color='Cornflowerblue', label='Men', alpha=0.5);
else:
area = (np.pi * (group.shape[0])**2)*.002
group.plot(kind='scatter', x='wiki.education', y='USA',
color='LightGreen', label='Women', ax=ax, s=area, alpha=0.5)
ax.set_xticks(range(1,10))
ax.set_xticklabels(list(numeric.keys()), rotation='vertical')
ax.set_yticks(range(0,2))
ax.set_yticklabels(['USA', 'non USA'], rotation='horizontal')
ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.1), ncol=2, fancybox=True, shadow=True)
<matplotlib.legend.Legend at 0x117198a90>
X_train = cultural_knn.loc[train_rows][['wiki.education','USA']]
y_train = cultural_knn.loc[train_rows]['Women']
X_test = cultural_knn.loc[test_rows][['wiki.education','USA']]
y_test = cultural_knn.loc[test_rows]['Women']
classifier = neighbors.KNeighborsClassifier()
classifier.fit(X_train, y_train)
predict = classifier.predict(X_test)
accuracy = metrics.accuracy_score(y_test, predict)
precision, recall, f1, _ = metrics.precision_recall_fscore_support(y_test, predict)
print("* Acierto: {:.2f}%".format(accuracy*100))
print("* Precisión: {}\n* Exhaustividad: {}.\n* F1-Score: {}".format(accuracy*100, precision, recall, f1))
* Acierto: 68.04% * Precisión: 68.04123711340206 * Exhaustividad: [ 0.72159091 0.27777778]. * F1-Score: [ 0.90714286 0.09259259]
cmap_light = ListedColormap(['#AAAAFF', '#AAFFAA'])
cmap_bold = ListedColormap(['#0000FF', '#00FF00'])
step = 1
xx, yy = np.meshgrid(np.arange(1, 10, step),
np.arange(0, 1, step))
prediction = classifier.predict(X_test)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
plt.scatter( X_test['wiki.education'], X_test['USA'], c=y_test, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.yticks(range(0,2),['USA', 'non USA'], rotation='horizontal')
plt.ylim(-0.5, 1.5)
plt.xticks(range(1,11), list(numeric.keys()), rotation='vertical')
plt.xlabel("Education")
<matplotlib.text.Text at 0x7f8220813e50>
%pylab inline --no-import-all
pd.set_option('display.mpl_style', 'default')
figsize(15, 6)
pd.set_option('display.line_width', 4000)
pd.set_option('display.max_columns', 100)
from matplotlib.pyplot import *
Populating the interactive namespace from numpy and matplotlib line_width has been deprecated, use display.width instead (currently both are identical)
marvel_df['modified'] = pd.to_datetime(marvel_df['modified'])
plot(marvel_df['modified'])
[<matplotlib.lines.Line2D at 0x7f8221388350>]
start = marvel_df.modified.min()
end = marvel_df.modified.max()
yearly_range = pd.date_range(start, end, freq='365D6H')
marvel_df[['modified']].head()
modified | |
---|---|
0 | 1970-01-01 00:00:00 |
0 | 1970-01-01 00:00:00 |
0 | 1970-01-01 00:00:00 |
0 | 2011-05-17 21:26:18 |
0 | 1970-01-01 00:00:00 |
characters_per_year = marvel_df.groupby(marvel_df['modified'].map(lambda x: x.year)).size()
characters_per_year
modified 1970 802 2004 8 2010 54 2011 119 2012 53 2013 292 2014 74 dtype: int64
characters_per_year.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f821f8ca990>
marvel_df.sort('modified',ascending=False).head()
comics.available | comics.collectionURI | comics.items | comics.returned | description | events.available | events.collectionURI | events.items | events.returned | id | modified | name | resourceURI | series.available | series.collectionURI | series.items | series.returned | stories.available | stories.collectionURI | stories.items | stories.returned | thumbnail.extension | thumbnail.path | urls | wiki.Date_of_birth | wiki.Place_of_birth | wiki.abilities | wiki.aliases | wiki.appearance | wiki.base_of_operations | wiki.bio | wiki.bio_text | wiki.blurb | wiki.builder | wiki.categories | wiki.categorytext | wiki.citizenship | wiki.creator | wiki.creators | wiki.current_members | wiki.debut | wiki.distinguishing_features | wiki.dstinguishing_features | wiki.education | wiki.event_text | wiki.eyes | wiki.features | wiki.former_members | wiki.govenment | wiki.government | wiki.groups | wiki.hair | wiki.height | wiki.home_world | wiki.identity | wiki.key_characters | wiki.key_issues | wiki.leader | wiki.location | wiki.main_image | wiki.members | wiki.object_text | wiki.occupation | wiki.origin | wiki.other_members | wiki.owner | wiki.paraphernalia | wiki.place_of_birth | wiki.place_of_creation | wiki.place_text | wiki.points_of_interest | wiki.power | wiki.powers | wiki.real_name | wiki.relatives | wiki.significant_citizens | wiki.significant_issues | wiki.skin | wiki.special_limitations | wiki.specieshistory | wiki.team_name | wiki.teamicon | wiki.technology | wiki.tie-ins | wiki.title_graphic | wiki.universe | wiki.weapons | wiki.weaponss | wiki.weight | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26 | http://gateway.marvel.com/v1/public/characters... | [{'id': 36834, 'resourceURI': 'http://gateway.... | 26 | Decades after participating in military airdro... | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 1011006 | 2014-03-05 18:58:52 | Wolverine (Ultimate) | http://gateway.marvel.com/v1/public/characters... | 17 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 17 | 21 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 20 | jpg | http://i.annihil.us/u/prod/marvel/i/mg/9/03/53... | [{'url': 'http://marvel.com/comics/characters/... | NaN | NaN | NaN | Logan, Weapon X, Lucky Jim | NaN | NaN | Howlett's past is mostly unknown, but during W... | Howlett's past is mostly unknown, but during W... | NaN | NaN | [Ultimate Marvel, Deceased] | NaN | Presumably Canada | NaN | NaN | NaN | Ultimate X-Men #1 (2001) | NaN | NaN | Unrevealed | NaN | Blue | NaN | NaN | NaN | NaN | [[X-Men (Ultimate)|X-Men]]; formerly [[Brother... | Black | 5'9" | NaN | ("Logan") publicly known; (James Howlett) Know... | NaN | NaN | NaN | NaN | Ultwolv.jpg | NaN | NaN | Student, adventurer; formerly mercenary, gover... | NaN | NaN | NaN | Unrevealed, probably somewhere in Canada | NaN | NaN | NaN | NaN | Wolverine's mutant healing factor enables him ... | James Howlett | Wife (name and status unrevealed); son (allege... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [[Ultimate]] | NaN | NaN | 292 lbs. (including adamantium) | |
0 | 6 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 6 | While Eddie Brock’s academic career seemed to ... | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 1011128 | 2014-03-05 18:58:42 | Venom (Ultimate) | http://gateway.marvel.com/v1/public/characters... | 5 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 5 | 3 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 3 | jpg | http://i.annihil.us/u/prod/marvel/i/mg/e/10/53... | [{'url': 'http://marvel.com/comics/characters/... | NaN | NaN | Eddie has a natural aptitude for bioengineerin... | The Suit | NaN | NaN | Eddie Brock was the son of a brilliant scienti... | Eddie Brock was the son of a brilliant scienti... | NaN | NaN | [Ultimate_Marvel] | NaN | United States | NaN | NaN | NaN | Ultimate Spider-Man #33 (2003) | NaN | NaN | College student, extensive Bioengineering studies | NaN | (Eddie) Blue; (Venom) White | NaN | NaN | NaN | NaN | none | (Eddie) Blond; (Venom) None | 5'11" | NaN | Secret | NaN | NaN | NaN | NaN | Ultimatevenom.jpg | NaN | NaN | Student | Ultimate Spider-Man #36-37 (2003) | NaN | NaN | NaN | New York, New York | NaN | NaN | NaN | NaN | The symbiotic suit bonded to Brock was designe... | Edward "Eddie" Brock Jr. | Edward Brock Sr. (father, deceased), unidentif... | NaN | Reunited with Peter, shared their fathers’ wor... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [[Ultimate]] | NaN | NaN | 175 lbs |
0 | 22 | http://gateway.marvel.com/v1/public/characters... | [{'id': 35528, 'resourceURI': 'http://gateway.... | 22 | One of Spider-Man's oldest enemies, Mac Gargan... | 5 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 5 | 1010788 | 2014-03-05 18:58:37 | Venom (Mac Gargan) | http://gateway.marvel.com/v1/public/characters... | 10 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 10 | 21 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 20 | jpg | http://i.annihil.us/u/prod/marvel/i/mg/5/50/53... | [{'url': 'http://marvel.com/comics/characters/... | NaN | NaN | Mac Gargan has the intellectual skills of an a... | Spider-Man; formerly Scorpion | NaN | NaN | One of Spider-Man's oldest enemies, MacGargan ... | NaN | NaN | [Avengers, Civil War, Spider-Man, Spider-Man V... | NaN | U.S.A. with a criminal record | NaN | NaN | NaN | (As Gargan) Amazing Spider-Man #19 (1964); (as... | NaN | NaN | High school graduate | NaN | Brown | NaN | NaN | NaN | NaN | Formerly [[Avengers (Osborn's team)]], [[Thund... | Brown (shaves head) | 6'3" | NaN | Publicly known | NaN | NaN | NaN | NaN | Venom(MacGargan)_Head.jpg | NaN | NaN | U.S. government agent; former professional cri... | (As Scorpion) Amazing Spider-Man #20 (1965); (... | NaN | NaN | As Scorpion: Mac Gargan wore a costume that wa... | Yonkers, New York | NaN | NaN | NaN | NaN | As Scorpion: enhanced strength, enabling him t... | MacDonald "Mac" Gargan | None | NaN | Confronted by Venom symbiote (Marvel Knights: ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [[Marvel Universe]] | As Scorpion: The tail on his Scorpion costume ... | NaN | 220 lbs. / 245 lbs. (with symbiote) | |
0 | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 1011239 | 2014-03-05 18:58:33 | Valkyrie (Ultimate) | http://gateway.marvel.com/v1/public/characters... | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | jpg | http://i.annihil.us/u/prod/marvel/i/mg/4/20/53... | [{'url': 'http://marvel.com/comics/characters/... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
0 | 30 | http://gateway.marvel.com/v1/public/characters... | [{'id': 41845, 'resourceURI': 'http://gateway.... | 30 | He claims he is the legendary Norse thunder de... | 0 | http://gateway.marvel.com/v1/public/characters... | [] | 0 | 1011025 | 2014-03-05 18:58:19 | Thor (Ultimate) | http://gateway.marvel.com/v1/public/characters... | 19 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 19 | 24 | http://gateway.marvel.com/v1/public/characters... | [{'resourceURI': 'http://gateway.marvel.com/v1... | 20 | jpg | http://i.annihil.us/u/prod/marvel/i/mg/3/80/53... | [{'url': 'http://marvel.com/comics/characters/... | NaN | NaN | NaN | None | NaN | NaN | He claims he is the legendary Norse thunder de... | He claims he is the legendary Norse thunder de... | NaN | NaN | [Ultimate Marvel] | NaN | (Thor) Asgard; (Golmen) Norway | NaN | this has not been updated yet | NaN | The Ultimates #4 (2002) | NaN | NaN | Unrevealed | NaN | Blue | NaN | NaN | NaN | NaN | Formerly Ultimates | Blond | 6'5" | NaN | (Thor) Publicly known; (Golmen) known to autho... | NaN | NaN | NaN | NaN | Thorult head.jpg | NaN | NaN | Guardian deity; formerly psychiatric nurse | NaN | NaN | NaN | NaN | (Thor) Asgard; (Golmen) Norway | NaN | NaN | NaN | NaN | Thor possesses immense superhuman strength, en... | Thor or Thorlief Golmen | (Thor) Odin (father), Loki (half-brother); (Go... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [[Ultimate]] | Enchanted hammer named Mjolnir. | NaN | 285 lbs. |