This vignette explores the ws_monitor data model used throughout the PWFSLSmoke package to store and work with monitoring data.

The PWFSLSmoke package is designed to provide a compact, full-featured suite of utilities for working with PM 2.5 data used to monitor wildfire smoke. A uniform data model provides consistent data access across monitoring data available from different agencies. The core data model in this package is defined by the ws_monitor object used to store data associated with groups of individual monitors.

To work efficiently with the package it is important to understand the structure of this data object and which functions operate on it. Package functions that begin with monitor_, expect objects of class ws_monitor as their first argument. (Note that 'ws_' stands for 'wildfire smoke'.)

Monitoring Data

Monitoring data will typically be obtained from an agency charged with archiving data acquired at monitoring sites. For wildifre smoke, the primary pollutant is PM 2.5 and the sites archiving this data include AirNow, WRCC and AIRSIS.

The data model for monitoring data consists of an R list with two dataframes: data and meta.

The data dataframe contains all hourly measurements organized with rows (the 'unlimited' dimension) as unique timesteps and columns as unique monitors. The very first column is always named datetime and contains the POSIXct datetime in Coordinated Universal Time (UTC).

The meta dataframe contains all metadata associated with monitoring sites and is organized with rows as unique sites and columns as site attributes. The following columns are guaranteed to exist in the meta dataframe:

(The MazamaSpatialUtils package is used to assign timezones and state and country codes.)

Additional columns may be available in the meta dataframe and these will depend on the source of the data.

It is important to note that the monitorID acts as a unique key that connects data with metadata. The monitorID is used for column names in the data dataframe and for row names in the meta dataframe. So the following will always be true:

rownames(ws_monitor$meta) == ws_monitor$meta$monitorID
colnames(ws_monitor$data) == c('datetime',ws_monitor$meta$monitorID)

Example 1: Exploring ws_monitor Objects

We will use the built-in “Northwest_Megafires” dataset and the monitor_subset() function to subset a ws_monitor object which we can then explore.

library(PWFSLSmoke)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: maps
## Loading required package: MazamaSpatialUtils
## Loading required package: sp
# Get some airnow data for Washington state in the summer of 2015
N_M <- monitor_subset(Northwest_Megafires, tlim=c(20150801,20150831))
# To work with AirNow data directly, uncomment the next two lines
#N_M <- airnow_load(startdate=20150801, enddate=20150831)
#WA <- monitor_subset(airnow, stateCodes=c("WA"))
WA <- monitor_subset(N_M, stateCodes='WA')

# 'ws_monitor' objects can be identified by their class
class(WA)
## [1] "ws_monitor" "list"
# Examine the 'meta' dataframe
dim(WA$meta)
## [1] 55 24
rownames(WA$meta)
##  [1] "530330017" "530330080" "530050002" "530330024" "530330057"
##  [6] "530332004" "530530029" "530530031" "530610005" "530611007"
## [11] "530630047" "530670013" "530531018" "530272002" "530310003"
## [16] "530730015" "530251002" "530650004" "530010003" "530750006"
## [21] "530750003" "530331011" "530210002" "530330037" "530710005"
## [26] "530750005" "530150015" "530470009" "530370002" "530090013"
## [31] "530610020" "530070010" "530770015" "530650002" "530470010"
## [36] "530770009" "530570015" "530130002" "530030004" "530110022"
## [41] "530579999" "530639997" "530299999" "530639996" "530410004"
## [46] "530770016" "530090015" "530450007" "530470013" "530570011"
## [51] "530350007" "530070011" "530330030" "530110024" "530090017"
colnames(WA$meta)
##  [1] "AQSID"          "siteCode"       "siteName"       "status"        
##  [5] "agencyID"       "agencyName"     "EPARegion"      "latitude"      
##  [9] "longitude"      "elevation"      "GMTOffsetHours" "countryCode"   
## [13] "FIPSCMSACode"   "CMSAName"       "FIPSMSACode"    "MSAName"       
## [17] "FIPSStateCode"  "stateCode"      "GNISCountyCode" "countyName"    
## [21] "GNISCityCode"   "cityName"       "timezone"       "monitorID"
# Examine the 'data' dataframe
dim(WA$data)
## [1] 721  56
colnames(WA$data)
##  [1] "datetime"  "530330017" "530330080" "530050002" "530330024"
##  [6] "530330057" "530332004" "530530029" "530530031" "530610005"
## [11] "530611007" "530630047" "530670013" "530531018" "530272002"
## [16] "530310003" "530730015" "530251002" "530650004" "530010003"
## [21] "530750006" "530750003" "530331011" "530210002" "530330037"
## [26] "530710005" "530750005" "530150015" "530470009" "530370002"
## [31] "530090013" "530610020" "530070010" "530770015" "530650002"
## [36] "530470010" "530770009" "530570015" "530130002" "530030004"
## [41] "530110022" "530579999" "530639997" "530299999" "530639996"
## [46] "530410004" "530770016" "530090015" "530450007" "530470013"
## [51] "530570011" "530350007" "530070011" "530330030" "530110024"
## [56] "530090017"
# This should always be true
all(rownames(WA$meta) == colnames(WA$data[,-1]))
## [1] TRUE

Example 2: Manipulating ws_monitor Objects

The PWFSLSmoke package has numerous functions that can work with ws_monitor objects, all of which begin with monitor_. If you need to do something that the package functions do not provide you can manipulate ws_monitor objects directly as long as you retain the structure of the data model.

The following code mixes use of package functions with direct manipulation of the ws_monitor object.

# Use special knowledge of AirNow IDs to subset airnow data for Spokane county monitors
SpokaneCountyIDs <- N_M$meta$monitorID[stringr::str_detect(N_M$meta$monitorID, "^53063")]
Spokane <- monitor_subset(N_M, monitorIDs=SpokaneCountyIDs)

# Apply 3-hr rolling mean
Spokane_3hr <- monitor_rollingMean(Spokane, 3, align="center")

# 1) Replace data columns with their squares (exponentiation is not supplied by the package)
Spokane_3hr_squared <- Spokane_3hr
Spokane_3hr_squared$data[,-1] <- (Spokane_3hr$data[,-1])^2 # exclude the 'datetime' column

# NOTE:  Exponentiation is only used as an example. It does not generate a meaningful result.

# Create a daily averaged 'ws_monitor' object
Spokane_daily_3hr <- monitor_dailyStatistic(Spokane_3hr)

# 2) Check out the correlation between monitors (correlation is not supplied by the package)
data <- Spokane_daily_3hr$data[,-1] # exclude the 'datetime' column
cor(data, use='complete.obs')
##           530630047 530639997 530639996
## 530630047 1.0000000 0.9418159 0.9416837
## 530639997 0.9418159 1.0000000 0.9520286
## 530639996 0.9416837 0.9520286 1.0000000

This introduction to the ws_monitor data model should be enough to get you started. Lots more documentation and examples are available in the package documentation.

Best of luck exploring and understanding PM 2.5 values associated with wildfire smoke!