Loading data from files into a DataFrame
The pandas library provides facilities for easy retrieval of data from a variety of data sources as pandas objects. As a quick example, let's examine the ability of pandas to load data in CSV format.
This example will use a file provided with the code from this book, data/goog.csv, and the contents of the file represent time series financial information for the Google stock.
The following statement uses the operating system (from within Jupyter Notebook or IPython) to display the content of this file. Which command you will need to use depends on your operating system:
This information can be easily imported into a DataFrame using the pd.read_csv() function:
pandas has no idea that the first column in the file is a date and has treated the contents of the date field as a string. This can be verified using the following pandas statement, which shows the type of the Date column as a string:
The parse_dates parameter of the pd.read_csv() function to guide pandas on how to convert data directly into a pandas date object. The following informs pandas to convert the content of the Date column into actual TimeStamp objects:
If we check whether it worked, we see that the date is a Timestamp:
Unfortunately, this has not used the date field as the index for the data frame. Instead, it uses the default zero-based integer index labels:
This can be fixed using the index_col parameter of the pd.read_csv() function to specify which column in the file should be used as the index:
And the index now is a DateTimeIndex, which lets us look up rows using dates.