DataFrame manipulation—renaming rearranging reversing and slicing_Python Algorithmic Trading Cookbook-QQ阅读男频科幻网

上QQ阅读APP看书，第一时间看更新

DataFrame manipulation—renaming, rearranging, reversing, and slicing

After creating a DataFrame object, you can perform various operations on it. This recipe covers the following operations on DataFrame objects. Renaming a column, rearranging columns, reversing the DataFrame, and slicing the DataFrame to extract a row, column, and a subset of data.

Getting ready

Make sure the df object is available in your Python namespace. Refer to Creating a pandas.DataFrame object recipe of this chapter to set up this object.

How to do it…

Execute the following steps for this recipe:

Rename the date column to timestamp for df. Print it:

>>> df.rename(columns={'date':'timestamp'}, inplace=True)
>>> df

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925  59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625  57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425  43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775  45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150  42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800  62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525  34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625  39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875  20190

Create a new DataFrame object by rearranging the columns in df:

>>> df.reindex(columns=[
               'volume', 
               'close', 
               'timestamp', 
               'high', 
               'open', 
               'low'
            ])

We get the following output:

  volume   close           timestamp    high    open     low
0 219512 71.7925 2019-11-13 09:00:00 71.8450 71.8075 71.7775
1  59252 71.7925 2019-11-13 09:15:00 71.8000 71.7925 71.7800
2  57187 71.7625 2019-11-13 09:30:00 71.8125 71.7925 71.7600
3  43048 71.7425 2019-11-13 09:45:00 71.7650 71.7600 71.7350
4  45863 71.7775 2019-11-13 10:00:00 71.7800 71.7425 71.7425
5  42460 71.8150 2019-11-13 10:15:00 71.8225 71.7750 71.7700
6  62403 71.7800 2019-11-13 10:30:00 71.8300 71.8150 71.7775
7  34090 71.7525 2019-11-13 10:45:00 71.7875 71.7750 71.7475
8  39320 71.7625 2019-11-13 11:00:00 71.7825 71.7525 71.7475
9  20190 71.7875 2019-11-13 11:15:00 71.7925 71.7625 71.7600

Create a new DataFrame object by reversing the rows in df:

>>> df[::-1]

We get the following output:

            timestamp    open    high     low   close volume
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875  20190
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625  39320
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525  34090
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800  62403
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150  42460
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775  45863
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425  43048
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625  57187
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925  59252
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512

Extract the close column from df:

>>> df['close']

We get the following output:

0    71.7925
1    71.7925
2    71.7625
3    71.7425
4    71.7775
5    71.8150
6    71.7800
7    71.7525
8    71.7625
9    71.7875
Name: close, dtype: float64

Extract the first row from df:

>>> df.iloc[0]

We get the following output:

timestamp    2019-11-13 09:00:00
open                     71.8075
high                      71.845
low                      71.7775
close                    71.7925
volume                    219512
Name: 10, dtype: object

Extract a 2 × 2 matrix with the first two rows and first two columns only:

>>> df.iloc[:2, :2]

We get the following output:

            timestamp    open
0 2019-11-13 09:00:00 71.8075
1 2019-11-13 09:15:00 71.7925

How it works...

Renaming: In step 1, you rename the date column to timestamp using the rename() method of pandas DataFrame. You pass the columns argument as a dictionary with the existing names to be replaced as keys and their new names as the corresponding values. You also pass the inplace argument as True so that df is modified directly. If it is not passed, the default value is False, meaning a new DataFrame would be created instead of modifying df.

Rearranging: In step 2, you use the reindex() method to create a new DataFrame from df by rearranging its columns. You pass the columns argument with a list of column names as strings in the required order.

Revering: In step 3, you create a new DataFrame from df with its rows reversed by using the indexing operator in a special way - [::-1]. This is similar to the way we reverse regular Python lists.

Slicing: In step 4, you extract the column close by using the indexing operator on df. You pass the column name, close, as the index here. The return data is a pandas.Series object. You can use the iloc property on DataFrame objects to extract a row, a column, or a subset DataFrame object. In step 5, you extract the first-row using iloc with 0 as the index. The return data is a pandas.Series object In step 6, you extract a 2x2 subset from df using iloc with (:2, :2) as the index. This implies all data in rows until index 2 (which are 0 and 1) and columns until index 2 (which again are 0 and 1) would be extracted. The return data is a pandas.DataFrame object.

For all the operations shown in this recipe where a new DataFrame object is returned, the original DataFrame object remains unchanged.

There's more

The .iloc() property can also be used to extract a column from a DataFrame. This is shown in the following code.

Extract the 4th column from df. Observe the output:

>>> df.iloc[:, 4]

We get the following output:

0    71.7925
1    71.7925
2    71.7625
3    71.7425
4    71.7775
5    71.8150
6    71.7800
7    71.7525
8    71.7625
9    71.7875
Name: close, dtype: float64

Note that this output and the output of step 4 are identical.

本周热推：

AI 3.0 ARM体系结构与编程西门子S7-1200 PLC编程从入门到实战 Linux常用命令简明手册 AI的25种可能