Datalab NOAA weather example.

this notebook uses data lab to look at an anomaly in the records for the weather stations in Washington state.

This must be run in Datalab.

In [7]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plot
%matplotlib inline  

look for the hotest spots in washington state.

In [140]:
%%sql
SELECT
  max, (max-32)*5/9 celsius, mo, da, state, stn, name
FROM (
  SELECT
    max, mo, da, state, stn, name
  FROM
    [bigquery-public-data:noaa_gsod.gsod2015] a
  JOIN
    [bigquery-public-data:noaa_gsod.stations] b
  ON
    a.stn=b.usaf
    AND a.wban=b.wban
  WHERE
    state="WA"
    AND max<1000
    AND country='US' )
ORDER BY
  max DESC
Out[140]:
maxcelsiusmodastatestnname
113.045.00629WA727846WALLA WALLA RGNL
113.045.00628WA727846WALLA WALLA RGNL
111.944.38888888890628WA727827MOSES LAKE/GRANT CO
111.944.38888888890629WA727827MOSES LAKE/GRANT CO
111.244.00924WA720272SKAGIT RGNL
111.244.00925WA720272SKAGIT RGNL
111.043.88888888890629WA727845TRI CITIES
111.043.88888888890628WA727845TRI CITIES
111.043.88888888890627WA727845TRI CITIES
109.943.27777777780628WA727890OMAK
109.943.27777777780629WA727890OMAK
109.042.77777777780629WA727825PANGBORN MEM
109.042.77777777780627WA727846WALLA WALLA RGNL
109.042.77777777780628WA727825PANGBORN MEM
108.042.22222222220731WA727845TRI CITIES
108.042.22222222220628WA727856FELTS FLD
108.042.22222222220628WA727883BOWERS FLD
108.042.22222222220627WA727883BOWERS FLD
108.042.22222222220801WA727845TRI CITIES
108.042.22222222220628WA727810YAKIMA AIR TERMINAL
108.042.22222222220629WA727856FELTS FLD
108.042.22222222220629WA727810YAKIMA AIR TERMINAL
108.042.22222222220627WA727810YAKIMA AIR TERMINAL
107.642.00917WA720272SKAGIT RGNL
107.141.72222222220703WA727810YAKIMA AIR TERMINAL

(rows: 16492, time: 1.8s, 131MB processed, job: job_TjZACksJIvtkZHaVK7jKaS6OKCo)

Skagit Rgnl is in a very cool costal community. this is an anomaly. let's look at two other nearby stations.

In [154]:
%%sql
SELECT
  usaf, name
FROM [bigquery-public-data:noaa_gsod.stations] 
WHERE
    name="BELLINGHAM INTL" OR name="PADILLA BAY RESERVE" OR name = "SKAGIT RGNL"
Out[154]:
usafname
720272SKAGIT RGNL
998007PADILLA BAY RESERVE
720272SKAGIT RGNL
727976BELLINGHAM INTL

(rows: 4, time: 0.5s, 688KB processed, job: job_w91UGysLNNSmwhFecSolFrI4liY)

Now pull out the weather data for the year for all three stations and plot them.

In [147]:
q = "SELECT max AS temperature, \
       TIMESTAMP(STRING(year) + '-' + STRING(mo) + '-' + STRING(da)) AS timestamp \
FROM [bigquery-public-data:noaa_gsod.gsod2015] \
WHERE stn = '%s' and max <500 \
ORDER BY year DESC, mo DESC, da DESC"
In [148]:
stationlist = ['720272','727930', '727976' ]
dflist = [bq.Query(q % station).to_dataframe() for station in stationlist]
In [151]:
from pylab import rcParams
rcParams['figure.figsize'] = 20, 5
with plot.style.context('fivethirtyeight'):
  for df in dflist:
    plot.plot(df['timestamp'], df['temperature'], linewidth=2)
plot.show()  

we can clearly see the skagit station is not always functioning properly. there is clearly another anomaly in March.

In [ ]: