openaq_engine.src.preprocessing package

Submodules

openaq_engine.src.preprocessing.filter module

class openaq_engine.src.preprocessing.filter.Filter[source]

Bases: object

static filter_cities(df: DataFrame, cities: List[str]) DataFrame[source]

Filters the DataFrame for specific cities.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing location data.

  • cities (list of str) – The list of cities to filter for.

Returns:

The filtered DataFrame containing only rows from the specified cities.

Return type:

pd.DataFrame

static filter_countries(df: DataFrame, countries: List[str]) DataFrame[source]

Filters the DataFrame for specific countries.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing location data.

  • countries (list of str) – The list of countries to filter for.

Returns:

The filtered DataFrame containing only rows from the specified countries.

Return type:

pd.DataFrame

static filter_extreme_values(df: DataFrame) DataFrame[source]

Filters the DataFrame to remove rows with extreme PM2.5 values.

Parameters:

df (pd.DataFrame) – The DataFrame containing air quality data.

Returns:

The filtered DataFrame with rows containing extreme PM2.5 values removed.

Return type:

pd.DataFrame

static filter_no_coordinates(df: DataFrame) DataFrame[source]

Filters the DataFrame to remove rows with empty coordinates.

Parameters:

df (pd.DataFrame) – The DataFrame containing coordinate data.

Returns:

The filtered DataFrame with rows that have empty coordinates removed.

Return type:

pd.DataFrame

static filter_non_null_values(df: DataFrame) DataFrame[source]

Filters the DataFrame to remove rows with non-positive values.

Parameters:

df (pd.DataFrame) – The DataFrame containing air quality data.

Returns:

The filtered DataFrame with rows containing non-positive values removed.

Return type:

pd.DataFrame

static filter_pollutant(df: DataFrame, pollutant_to_predict: str) DataFrame[source]

Filters the DataFrame for rows containing the specified pollutant.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing air quality data.

  • pollutant_to_predict (str) – The pollutant to filter for.

Returns:

The filtered DataFrame containing only rows with the specified pollutant.

Return type:

pd.DataFrame

Module contents