import datetime
import pandas as pd
import forecastio
import getpass
In this notebook, I’ll write a small illustration on downloading historical weather data using forceast.io. I’ll also illustrate handling timezone issues when using such time series data. I am going to use python-forecastio, which is a Python wrapper around forecast.io service. I’ll be downloading hourly weather data for Austin, Texas.
# Enter your API here
= getpass.getpass() api_key
········
len(api_key)
32
Austin’s Latitude and longitude
= 30.25
lat = -97.25 lng
Let us see the forecast for 1 Jan 2015
= datetime.datetime(2015,1,1) date
= forecastio.load_forecast(api_key, lat, lng, time=date, units="us") forecast
forecast
<forecastio.models.Forecast at 0x10319ce50>
= forecast.hourly() hourly
hourly.data
[<forecastio.models.ForecastioDataPoint at 0x1068643d0>,
<forecastio.models.ForecastioDataPoint at 0x106864bd0>,
<forecastio.models.ForecastioDataPoint at 0x106864ad0>,
<forecastio.models.ForecastioDataPoint at 0x106864cd0>,
<forecastio.models.ForecastioDataPoint at 0x106864fd0>,
<forecastio.models.ForecastioDataPoint at 0x106864d10>,
<forecastio.models.ForecastioDataPoint at 0x100734e10>,
<forecastio.models.ForecastioDataPoint at 0x1061e3450>,
<forecastio.models.ForecastioDataPoint at 0x1061e3350>,
<forecastio.models.ForecastioDataPoint at 0x1068b3250>,
<forecastio.models.ForecastioDataPoint at 0x1068b3110>,
<forecastio.models.ForecastioDataPoint at 0x1068b3150>,
<forecastio.models.ForecastioDataPoint at 0x1068b3190>,
<forecastio.models.ForecastioDataPoint at 0x1068b31d0>,
<forecastio.models.ForecastioDataPoint at 0x1068b3210>,
<forecastio.models.ForecastioDataPoint at 0x1068b3fd0>,
<forecastio.models.ForecastioDataPoint at 0x1068b3dd0>,
<forecastio.models.ForecastioDataPoint at 0x1068b3e10>,
<forecastio.models.ForecastioDataPoint at 0x1068b3e50>,
<forecastio.models.ForecastioDataPoint at 0x1068b3f50>,
<forecastio.models.ForecastioDataPoint at 0x1068c84d0>,
<forecastio.models.ForecastioDataPoint at 0x1068c8390>,
<forecastio.models.ForecastioDataPoint at 0x1068c8510>,
<forecastio.models.ForecastioDataPoint at 0x1068c8550>]
Extracting data for a single hour.
0].d hourly.data[
{u'apparentTemperature': 32.57,
u'dewPoint': 33.39,
u'humidity': 0.79,
u'icon': u'clear-night',
u'precipIntensity': 0,
u'precipProbability': 0,
u'pressure': 1032.61,
u'summary': u'Clear',
u'temperature': 39.46,
u'time': 1420005600,
u'visibility': 10,
u'windBearing': 21,
u'windSpeed': 10.95}
Let us say that we want to use the temperature and humidity only.
= ["temperature", "humidity"] attributes
= []
times = {}
data for attr in attributes:
= [] data[attr]
Now, let us download hourly data for 30 days staring January 1 this year.
= datetime.datetime(2015, 1, 1)
start for offset in range(1, 60):
= forecastio.load_forecast(api_key, lat, lng, time=start+datetime.timedelta(offset), units="us")
forecast = forecast.hourly()
h = h.data
d for p in d:
times.append(p.time)for attr in attributes:
data[attr].append(p.d[attr])
Now, let us create a Pandas data frame for this time series data.
= pd.DataFrame(data, index=times) df
df.head()
humidity | temperature | |
---|---|---|
2015-01-01 11:30:00 | 0.73 | 38.74 |
2015-01-01 12:30:00 | 0.74 | 38.56 |
2015-01-01 13:30:00 | 0.75 | 38.56 |
2015-01-01 14:30:00 | 0.79 | 37.97 |
2015-01-01 15:30:00 | 0.80 | 37.78 |
Now, we need to fix the timezone.
= df.tz_localize("Asia/Kolkata").tz_convert("US/Central") df
df.head()
humidity | temperature | |
---|---|---|
2015-01-01 00:00:00-06:00 | 0.73 | 38.74 |
2015-01-01 01:00:00-06:00 | 0.74 | 38.56 |
2015-01-01 02:00:00-06:00 | 0.75 | 38.56 |
2015-01-01 03:00:00-06:00 | 0.79 | 37.97 |
2015-01-01 04:00:00-06:00 | 0.80 | 37.78 |
I’ll now export this file to a CSV to use it for following demonstrations on aggregations on time series.
"weather.csv") df.to_csv(
A quick validation of our downloaded data.
%matplotlib inline
import matplotlib.pyplot as plt
'ggplot') plt.style.use(
=True); df.plot(subplots