Creating a DataFrame from other formats_Python Algorithmic Trading Cookbook-QQ阅读男生历史网

上QQ阅读APP看书，第一时间看更新

Creating a DataFrame from other formats

In this recipe, you will create DataFrame objects from other formats, such as .csv files, .json strings, and pickle files. A .csv file created using a spreadsheet application, valid JSON data received over web APIs, or valid pickle objects received over sockets can all be processed further using Python by converting them to DataFrame objects.

Loading pickled data received from untrusted sources can be unsafe. Please use read_pickle() with caution. You can find more details here: https://docs.python.org/3/library/pickle.html. If you are using this function on the pickle file created in the previous recipe, it is perfectly safe to use read_pickle().

Getting ready

Make sure you have followed the previous recipe before starting this recipe.

How to do it…

Execute the following steps for this recipe:

Create a DataFrame object by reading a CSV file:

>>> pandas.read_csv('dataframe.csv')

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925  59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625  57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425  43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775  45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150  42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800  62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525  34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625  39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875  20190

Create a DataFrame object by reading a JSON string:

>>> pandas.read_json("""{
        "timestamp": {
            "0":"13-11-2019 09:00:00", "1":"13-11-2019 09:15:00", 
            "2":"13-11-2019 09:30:00","3":"13-11-2019 09:45:00", 
            "4":"13-11-2019 10:00:00","5":"13-11-2019 10:15:00",
            "6":"13-11-2019 10:30:00","7":"13-11-2019 10:45:00",
            "8":"13-11-2019 11:00:00","9":"13-11-2019 11:15:00"},

        "open":{
            "0":71.8075,"1":71.7925,"2":71.7925,"3":71.76,
            "4":71.7425,"5":71.775,"6":71.815,"7":71.775,
            "8":71.7525,"9":71.7625},

        "high":{
            "0":71.845,"1":71.8,"2":71.8125,"3":71.765,"4":71.78,
            "5":71.8225,"6":71.83,"7":71.7875,"8":71.7825,
            "9":71.7925},

        "low":{
            "0":71.7775,"1":71.78,"2":71.76,"3":71.735,"4":71.7425,
            "5":71.77,"6":71.7775,"7":71.7475,"8":71.7475,
            "9":71.76},

        "close":{
            "0":71.7925,"1":71.7925,"2":71.7625,"3":71.7425,
            "4":71.7775,"5":71.815,"6":71.78,"7":71.7525,
            "8":71.7625,"9":71.7875},

        "volume":{
            "0":219512,"1":59252,"2":57187,"3":43048,"4":45863,
            "5":42460,"6":62403,"7":34090,"8":39320,"9":20190}}
            """)

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925  59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625  57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425  43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775  45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150  42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800  62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525  34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625  39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875  20190

Create a DataFrame object by unpickling the df.pickle file:

>>> pandas.read_pickle('df.pickle')

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925  59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625  57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425  43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775  45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150  42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800  62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525  34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625  39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875  20190

How it works...

In step 1, you use the pandas.read_csv() function to create a DataFrame object from a .csv file. You pass dataframe.csv, the file path from where the .csv file should be read, as an argument. Recall, you have created dataframe.csv in step 1 of the previous recipe.

In step 2, you use the pandas.read_json() function to create a DataFrame object from a valid JSON string. You pass the JSON string from the output of step 2 in the previous recipe as an argument to this function.

In step 3, you use the pandas.read_pickle() method to create a DataFrame object from a pickle file. You pass df.pickle, the file path from where the pickle file should be read, as an argument to this function. Recall, what you created df.pickle in step 3 of the previous recipe.

If you have followed the previous recipe, the outputs for all the three steps would all be the same DataFrame object. And this would be identical to df from the previous recipe.

The methods read_csv(), read_json(), and read_pickle() can take more optional arguments than the ones shown in this recipe. Refer to the official docs for complete information on these methods.