Creating a DataFrame

There are actually quite a few ways to create a DataFrame from existing objects.

Let's explore!

In [1]:
# Setup
import pandas as pd

From a 2-dimensional object

If your data is already in rows and columns, like a list of lists, you can just pass it along to the constructor. Labels and Column headings will be automatically generated as a range.

In [2]:
test_users_list = [
    ['Craig', 'Dennis', 42.42],
    ['Treasure', 'Porth', 25.00]
]

pd.DataFrame(test_users_list)
Out[2]:
0 1 2
0 Craig Dennis 42.42
1 Treasure Porth 25.00

Notice how both the labels and column headings are autogenerated. You can specify the index and columns.

In [3]:
pd.DataFrame(test_users_list, index=['craigsdennis', 'treasure'],
            columns=['first_name', 'last_name', 'balance'])
Out[3]:
first_name last_name balance
craigsdennis Craig Dennis 42.42
treasure Treasure Porth 25.00

From a dictionary

Much like a Series, if you do not specify the index it will be autogenerated in range format.

In [4]:
# Default expected Dictionary layout is column name, to ordered values
test_user_data = {
    'first_name': ['Craig', 'Treasure'],
    'last_name': ['Dennis', 'Porth'],
    'balance': [42.42, 25.00]
}

pd.DataFrame(test_user_data)
Out[4]:
first_name last_name balance
0 Craig Dennis 42.42
1 Treasure Porth 25.00

And remember that you can specify the index by supplying the index keyword argument.

In [5]:
pd.DataFrame(test_user_data, index=['craigsdennis', 'treasure'])
Out[5]:
first_name last_name balance
craigsdennis Craig Dennis 42.42
treasure Treasure Porth 25.00

DataFrame.from_dict adds more options

The orient keyword

The orient keyword allows you to specify whether the keys of your dictionary are part of the labels (index) or the column titles (columns). Note how the nested dictionaries have been used to define the columns. You could also pass a list to the columns

In [6]:
by_username = {
    'craigsdennis': {
        'first_name': 'Craig',
        'last_name': 'Dennis',
        'balance': 42.42
    },
    'treasure': {
        'first_name': 'Treasure',
        'last_name': 'Porth',
        'balance': 25.00
    }
}

pd.DataFrame.from_dict(by_username, orient='index')
Out[6]:
first_name last_name balance
craigsdennis Craig Dennis 42.42
treasure Treasure Porth 25.00