Get list from pandas DataFrame column headers


Keywords:python 


Question: 

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.

For example, if I'm given a DataFrame like this:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would want to get a list like this:

>>> header_list
[y, gdp, cap]

14 Answers: 

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use:

list(my_dataframe)


There is a built in method which is the most performant:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function to return a list.

EDIT

For those who hate typing this is probably the shortest method:

list(df)


Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)



Its gets even simpler (by pandas 0.16.0) :

df.columns.tolist()

will give you the column names in a nice list.



>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']


That's available as my_dataframe.columns.



It's interesting but df.columns.values.tolist() is almost 3 times faster then df.columns.tolist() but I thought that they are the same:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop


[column for column in my_dataframe]

pandas docs: Iteration over dataframes return column labels



In the Notebook

For data exploration in the IPython notebook, my preferred way is this:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

In a code repository

In code I find it more explicit to do

df.columns

Because it tells others reading your code what you are doing.



simplest way is:

list(my_dataframe.columns)


n = []
for i in my_dataframe.columns:
    n.append(i)
print n


I feel question deserves additional explanation.

As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__ command.

If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:

df.keys().tolist() because there is no df.columns method implemented yet.

The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.



as answered by Simeon Visser...you could do

list(my_dataframe.columns.values) 

or

list(my_dataframe) # for less typing.

But I think most the sweet spot is:

list(my_dataframe.columns)

It is explicit, at the same time not unnecessarily long.



can use index attributes

df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
                 index=['a', 'b', 'c'])