import pandas as pd
import numpy as np
data = pd.read_csv("person_info.csv")
You can find the data here: https://github.com/lizhouf/oscr2019/blob/master/person_info.csv
data
first_name | last_name | birthday | age | state | address | City | phone | car_1 | gpa | year | class_of | online_signiture | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Carol | Davis | 9/29/1996 | 23 | Illinois | 1674 Carolyns Circle | Burr Ridge, Illinois(IL), 60527 | 312-295-1941 | curt1995@gmail.com | 6ZUA618 | 2.85 | 1 | 2022 | Don't aim for success if you want it just do w... |
1 | Bruno | Horan | 6/11/1995 | 24 | California | 1561 Still Street | San Diego, California(CA), 92111 | 858-449-3324 | guadalupe1974@yahoo.com | 982KRK | 3.47 | 2 | 2021 | In any investment, you expect to have fun and ... |
2 | William | Moody | 2/27/1997 | 22 | Illinois | 541 Jade wood Drive | Arlington Heights, Illinois(IL), 60004 | 979-614-4038 | roosevelt.fee@hotmail.com | PS9-S917 | 2.78 | 2 | 2021 | It's not my fault that people don't appreciate... |
3 | Robin | Steel | 8/3/1989 | 57 | Texas | 1674 Caroly ns Circle | Josephine, Texas(TX), 75173 | 214-694-7864 | lloyd2009@hotmail.com | na | 4.33 | 4 | 2019 | The press is the hired agent of a monied system |
4 | Michelle | Roberts | 7/17/1995 | 24 | Oregon | 1372 Gateway Road | Portland, Oregon(OR), 97217\n\n | 503-283-2255 | ben1972@gmail.com | 6XNK620 | 3.75 | NaN | 2019 | I am desperate for change - now - not in 8 yea... |
5 | June | Sneed | 3/27/2000 | 19 | Arizona | 2411 Clarksburg Park Road | Phoenix, Arizona(AZ), 85003 | 256-286-5628 | kathlyn_runolf@yahoo.com | NaN | 3.60 | Jr | 2021 | Civilization is the progress toward a society ... |
6 | Curtis | Campbell | 3/15/1991 | 28 | Idahol | 2760 Science Center Drive | Pocatello, Idaho(ID), 83201 | 979-614-4038 | justen_schust@yahoo.com | PS9-S917 | 2.32 | 3 | 2020 | I have an incredible amount of basketball know... |
7 | Dorothy | Schott | 1/2/1997 | 21 | California | 2742 Sunny Day Drive | Santa Ana, California(CA), 92770 | 501-281-4074 | megane_purd1@hotmail.com | NaN | 3.93 | NaN | 2020 | A lawyer with his briefcase can steal more tha... |
8 | Mae | Skinner | 3/16/1995 | 24 | Pennsylvania | NaN | Newark, Pennsylvania(PA), 19714 | 501-334-8502 | enrique.berni@gmail.com | WCE-2823 | 3.85 | 3 | 2020 | When humor can be made to alternate with melan... |
9 | David | Victoria | 8/2/1996 | 23 | Maine | 3327 Chipmunk Lane | Harpswell, Maine(ME), 04079 | 207-570-1895 | carolina1977@hotmail.com | VDS-5639 | 1.74 | S | 2021 | The difference between a beautifully made fail... |
In python, we can use the .info() method to check the data information (note that "object" includes string data type):
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10 entries, 0 to 9 Data columns (total 14 columns): first_name 10 non-null object last_name 10 non-null object birthday 10 non-null object age 10 non-null int64 state 10 non-null object address 9 non-null object City 10 non-null object phone 10 non-null object email 10 non-null object car_1 8 non-null object gpa 10 non-null float64 year 8 non-null object class_of 10 non-null int64 online_signiture 10 non-null object dtypes: float64(1), int64(2), object(11) memory usage: 1.2+ KB
If we want the class_of column to be all string types (i.e. dummy variables), we can:
data.class_of = data.class_of.to_string()
We can see that, after these operations, the data type description of the class_of column is now changed to object.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10 entries, 0 to 9 Data columns (total 14 columns): first_name 10 non-null object last_name 10 non-null object birthday 10 non-null object age 10 non-null int64 state 10 non-null object address 9 non-null object City 10 non-null object phone 10 non-null object email 10 non-null object car_1 8 non-null object gpa 10 non-null float64 year 8 non-null object class_of 10 non-null object online_signiture 10 non-null object dtypes: float64(1), int64(1), object(12) memory usage: 1.2+ KB