In this post, we will learn:
- How to import data into python
- How to import time series data
- How to handle different time series formats while importing
A) Importing Normal Data
Suppose you have a data file saved in csv format on your computer. How to import this into Python? I saved this following data set on my computer under the name: datasheet.csv.
Add the filename at the end (datasheet.csv in this case).
Change all the backslashes '\' to forward slashes '/'. Provide a name under which the imported data set is stored in python (I used the name: newdata).To know that data has been imported, use newdata.head() to display the first five observations. As you can see the first column (before Age column) is the index column (0, 1, 2, 3...).
B) Importing Time Series data
1) When 'time' data is in single column in mm-dd-yyyy format
If I have the time series data file in csv format, how to import it?
Use the following code:
index_col = 0 means treat the first column as the index.
To check whether the data type is datetime, you may use the following code. As you can see, data type has been correctly read as dtype='datetime64[ns]'.Use the following code:
index_col = 0 means treat the first column as the index.
2) When 'time' data is in single column in dd-mm-yyyy format
If the time series is in dd-mm-yyyy format, then use the dayfirst = True option.
3) When 'time' data is in multiple columns
What if our time column is in separate columns: Date in one column, month in column and year in other?
Provide column name numbers and parse_dates will combine the columns.
Summary
Do you have any questions? I will try to answer them to the best of my abilities.