openaq_engine.src.preprocessing package¶
Submodules¶
openaq_engine.src.preprocessing.filter module¶
- class openaq_engine.src.preprocessing.filter.Filter[source]¶
Bases:
object
- static filter_cities(df: DataFrame, cities: List[str]) DataFrame [source]¶
Filters the DataFrame for specific cities.
- Parameters:
df (pd.DataFrame) – The DataFrame containing location data.
cities (list of str) – The list of cities to filter for.
- Returns:
The filtered DataFrame containing only rows from the specified cities.
- Return type:
pd.DataFrame
- static filter_countries(df: DataFrame, countries: List[str]) DataFrame [source]¶
Filters the DataFrame for specific countries.
- Parameters:
df (pd.DataFrame) – The DataFrame containing location data.
countries (list of str) – The list of countries to filter for.
- Returns:
The filtered DataFrame containing only rows from the specified countries.
- Return type:
pd.DataFrame
- static filter_extreme_values(df: DataFrame) DataFrame [source]¶
Filters the DataFrame to remove rows with extreme PM2.5 values.
- Parameters:
df (pd.DataFrame) – The DataFrame containing air quality data.
- Returns:
The filtered DataFrame with rows containing extreme PM2.5 values removed.
- Return type:
pd.DataFrame
- static filter_no_coordinates(df: DataFrame) DataFrame [source]¶
Filters the DataFrame to remove rows with empty coordinates.
- Parameters:
df (pd.DataFrame) – The DataFrame containing coordinate data.
- Returns:
The filtered DataFrame with rows that have empty coordinates removed.
- Return type:
pd.DataFrame
- static filter_non_null_values(df: DataFrame) DataFrame [source]¶
Filters the DataFrame to remove rows with non-positive values.
- Parameters:
df (pd.DataFrame) – The DataFrame containing air quality data.
- Returns:
The filtered DataFrame with rows containing non-positive values removed.
- Return type:
pd.DataFrame
- static filter_pollutant(df: DataFrame, pollutant_to_predict: str) DataFrame [source]¶
Filters the DataFrame for rows containing the specified pollutant.
- Parameters:
df (pd.DataFrame) – The DataFrame containing air quality data.
pollutant_to_predict (str) – The pollutant to filter for.
- Returns:
The filtered DataFrame containing only rows with the specified pollutant.
- Return type:
pd.DataFrame