AQAI Database Schema¶

This document provides an overview of the PostgreSQL table schemas used in the AQAI system. Each schema is described in detail with example tables.

Cohorts Table¶

The cohorts table contains all raw data for the specified pollutant in the location and time range required.

Schema:

locationId (integer): Unique identifier for the location.
location (text): Name of the location.
parameter (text): Type of pollutant.
value (float): Measured value of the pollutant.
date (json): Date and time of the measurement in UTC and local time.
unit (text): Unit of measurement.
coordinates (json): Coordinates of the measurement location.
country (text): Country of the measurement location.
city (text): City of the measurement location.
isMobile (boolean): Indicates if the measurement was taken from a mobile source.
isAnalysis (boolean): Indicates if the measurement is an analysis.
entity (text): Entity responsible for the measurement.
sensorType (text): Type of sensor used.
train_validation_set (text): Indicates if the data is for training or validation.
cohort (text): Cohort identifier.
cohort_type (text): Type of cohort.
timestamp_utc (timestamp): UTC timestamp of the measurement.
timestamp_local (timestamp): Local timestamp of the measurement.
y (float): Latitude of the measurement location.
x (float): Longitude of the measurement location.

Example:

locationId	location	parameter	value	date	unit	coordinates	country	city	isMobile	isAnalysis	entity	sensorType	train_validation_set	cohort	cohort_type	timestamp_utc	timestamp_local	y	x
23	Amgalan	pm25	181	{“utc”: “2019-03-13T22:30:00+00:00”, “local”: “2019-03-14T06:30:00+08:00”}	µg/m³	{“latitude”: 47.913429, “longitude”: 106.99790699999998}	MN	NULL	f	NULL	Governmental Organization	reference grade	0	0_2015-09-01T00:00:00Z_2023-11-30T00:00:00Z	training	2019-03-13T22:30:00.000000Z	2019-03-14T06:30:00+0800	47.913429	106.997907

Features Table¶

The features table gathers all satellite variables spatially and temporally co-located with the pollutant observation.

Schema:

sensor_datetime (timestamp): Date and time of the sensor reading.
sensor_longitude (float): Longitude of the sensor reading.
sensor_latitude (float): Latitude of the sensor reading.
location_id (text): Identifier for the location.
Optical_Depth_047 (float): Optical depth at 0.47 microns.
B4 (float): Band 4 reflectance.
B3 (float): Band 3 reflectance.
B2 (float): Band 2 reflectance.
avg_rad (float): Average radiance.
temperature_2m_above_ground (float): Temperature 2 meters above ground.
relative_humidity_2m_above_ground (float): Relative humidity 2 meters above ground.
total_precipitation_surface (float): Total precipitation at surface level.
total_cloud_cover_entire_atmosphere (float): Total cloud cover for the entire atmosphere.
u_component_of_wind_10m_above_ground (float): U component of wind at 10 meters above ground.
v_component_of_wind_10m_above_ground (float): V component of wind at 10 meters above ground.
discrete_classification (integer): Discrete classification value.

Example:

sensor_datetime	sensor_longitude	sensor_latitude	location_id	Optical_Depth_047	B4	B3	B2	avg_rad	temperature_2m_above_ground	relative_humidity_2m_above_ground	total_precipitation_surface	total_cloud_cover_entire_atmosphere	u_component_of_wind_10m_above_ground	v_component_of_wind_10m_above_ground	discrete_classification
2019-03-11 20:30:00	106.997907	47.913429	Amgalan	117.976936723832	NULL	NULL	NULL	25.4099998474121	-0.12999572753904	33.2400001525879	0.0300000004470348	99.8	-0.547999973595142	-0.589999985694885	30

Model Metadata Table¶

The model_metadata table stores all outputs of the machine learning pipeline. It includes transactions and codes data from the processed schema joined on the hash of the diagnosis note. This serves as the complete training and validation set, which is then split into temporal splits for cross validation.

Schema:

model_id (text): Unique identifier for the model.
model_set (text): Set identifier for the model.
features (json): Features used in the model.
labels (json): Labels used in the model.
hyperparameters (text): Hyperparameters used for the model.
run_date (timestamp): Date and time when the model was run.
timestamp (timestamp): Timestamp of the record.

Example:

model_id	model_set	features	labels	hyperparameters	run_date	timestamp
randomforestregressor_n_estimators500_max_depth10_8	randomforestregressor_n_estimators500_max_depth10	{Optical_Depth_047, SR_B4, SR_B3, SR_B2, avg_rad, temperature_2m_above_ground, relative_humidity_2m_above_ground, precipitable_water_entire_atmosphere, total_cloud_cover_entire_atmosphere, u_component_of_wind_10m_above_ground, v_component_of_wind_10m_above_ground}	{value}	n_estimators500_max_depth10	2024-05-31 14:48:41.217907	2024-05-31 14:48:41.217907

Results Table¶

The results table is used for model comparison and selection using Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE).

Schema:

model_id (text): Unique identifier for the model.
run_date (timestamp): Date and time when the model was run.
cohort (text): Cohort identifier.
mse (numeric): Mean Squared Error of the model.
mape (numeric): Mean Absolute Percentage Error of the model.

Example:

model_id	run_date	cohort	mse	mape
randomforestregressor_n_estimators500_max_depth10_5	NULL	5	2069.5940127034537	1846990659123641.2

AQAI Database Schema¶

Cohorts Table¶

Features Table¶

Model Metadata Table¶

Results Table¶

openaq-engine

Navigation

Related Topics