DataFrame manipulation—renaming, rearranging, reversing, and slicing
After creating a DataFrame object, you can perform various operations on it. This recipe covers the following operations on DataFrame objects. Renaming a column, rearranging columns, reversing the DataFrame, and slicing the DataFrame to extract a row, column, and a subset of data.
Getting ready
Make sure the df object is available in your Python namespace. Refer to Creating a pandas.DataFrame object recipe of this chapter to set up this object.
How to do it…
Execute the following steps for this recipe:
- Rename the date column to timestamp for df. Print it:
>>> df.rename(columns={'date':'timestamp'}, inplace=True)
>>> df
We get the following output:
timestamp open high low close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
- Create a new DataFrame object by rearranging the columns in df:
>>> df.reindex(columns=[
'volume',
'close',
'timestamp',
'high',
'open',
'low'
])
We get the following output:
volume close timestamp high open low
0 219512 71.7925 2019-11-13 09:00:00 71.8450 71.8075 71.7775
1 59252 71.7925 2019-11-13 09:15:00 71.8000 71.7925 71.7800
2 57187 71.7625 2019-11-13 09:30:00 71.8125 71.7925 71.7600
3 43048 71.7425 2019-11-13 09:45:00 71.7650 71.7600 71.7350
4 45863 71.7775 2019-11-13 10:00:00 71.7800 71.7425 71.7425
5 42460 71.8150 2019-11-13 10:15:00 71.8225 71.7750 71.7700
6 62403 71.7800 2019-11-13 10:30:00 71.8300 71.8150 71.7775
7 34090 71.7525 2019-11-13 10:45:00 71.7875 71.7750 71.7475
8 39320 71.7625 2019-11-13 11:00:00 71.7825 71.7525 71.7475
9 20190 71.7875 2019-11-13 11:15:00 71.7925 71.7625 71.7600
- Create a new DataFrame object by reversing the rows in df:
>>> df[::-1]
We get the following output:
timestamp open high low close volume
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
- Extract the close column from df:
>>> df['close']
We get the following output:
0 71.7925
1 71.7925
2 71.7625
3 71.7425
4 71.7775
5 71.8150
6 71.7800
7 71.7525
8 71.7625
9 71.7875
Name: close, dtype: float64
- Extract the first row from df:
>>> df.iloc[0]
We get the following output:
timestamp 2019-11-13 09:00:00
open 71.8075
high 71.845
low 71.7775
close 71.7925
volume 219512
Name: 10, dtype: object
- Extract a 2 × 2 matrix with the first two rows and first two columns only:
>>> df.iloc[:2, :2]
We get the following output:
timestamp open
0 2019-11-13 09:00:00 71.8075
1 2019-11-13 09:15:00 71.7925
How it works...
Renaming: In step 1, you rename the date column to timestamp using the rename() method of pandas DataFrame. You pass the columns argument as a dictionary with the existing names to be replaced as keys and their new names as the corresponding values. You also pass the inplace argument as True so that df is modified directly. If it is not passed, the default value is False, meaning a new DataFrame would be created instead of modifying df.
Rearranging: In step 2, you use the reindex() method to create a new DataFrame from df by rearranging its columns. You pass the columns argument with a list of column names as strings in the required order.
Revering: In step 3, you create a new DataFrame from df with its rows reversed by using the indexing operator in a special way - [::-1]. This is similar to the way we reverse regular Python lists.
Slicing: In step 4, you extract the column close by using the indexing operator on df. You pass the column name, close, as the index here. The return data is a pandas.Series object. You can use the iloc property on DataFrame objects to extract a row, a column, or a subset DataFrame object. In step 5, you extract the first-row using iloc with 0 as the index. The return data is a pandas.Series object In step 6, you extract a 2x2 subset from df using iloc with (:2, :2) as the index. This implies all data in rows until index 2 (which are 0 and 1) and columns until index 2 (which again are 0 and 1) would be extracted. The return data is a pandas.DataFrame object.
For all the operations shown in this recipe where a new DataFrame object is returned, the original DataFrame object remains unchanged.
There's more
The .iloc() property can also be used to extract a column from a DataFrame. This is shown in the following code.
Extract the 4th column from df. Observe the output:
>>> df.iloc[:, 4]
We get the following output:
0 71.7925
1 71.7925
2 71.7625
3 71.7425
4 71.7775
5 71.8150
6 71.7800
7 71.7525
8 71.7625
9 71.7875
Name: close, dtype: float64
Note that this output and the output of step 4 are identical.