Mlflow Experiment Implementation
================================

Backend/Model Deployment
------------------------
Develop MLOps tooling wrapper for the current implementation of the global model to enable model experimentation.

**Status**: Complete

Methodology
-----------
The team used MLFlow to run each pipeline to gather, process, analyze, and validate the results of the registered machine learning model.

1. **Module Options**: Each module has a set of options to parameterize the CLI command using the specified arguments.
   
.. code-block:: python

    from mlflows.cli.cohort_builder import cohort_builder_options

2. **Module Configuration**: Each module has a configuration file where default configs can be stored, as these vary less frequently.
   
.. code-block:: python

    from config.model_settings import CohortBuilderConfig

3. **Flow Classes**: Each module has its own "flow class," which initializes the module with the configuration in a standard way.
   
.. code-block:: python

    class CohortBuilderFlow:
        def __init__(self):
            self.config = CohortBuilderConfig()

        def execute(self):
            return CohortBuilder.from_dataclass_config(self.config)

4. **Command Execution**: Commands execute the modules and any dependent modules. Experiment IDs are the name of the module and the datetime of execution. Storage can be a hosted solution or a local path.
   
.. code-block:: python

    @cohort_builder_options()
    @click.command("cohort-builder", help="Generate cohorts for time splits")
    def cohort_builder(country, source, pollutant, latest_date):
        experiment_id = mlflow.create_experiment(
            f"cohort_builder_{str(datetime.now())}", os.getenv("MLFLOW_S3_BUCKET")
        )

        with mlflow.start_run(experiment_id=experiment_id, nested=True):
            engine = get_dbengine()
            time_splitter = TimeSplitterFlow().execute()
            train_validation_dict = time_splitter.execute(
                country, source, pollutant, latest_date
            )

            cohort_builder = CohortBuilderFlow().execute()
            cohort_builder.execute(
                train_validation_dict, engine, country, source, pollutant
            )

5. **CLI Grouping**: Commands are added to the CLI group `openaq-engine`, grouping the modules into the package.
   
.. code-block:: python

    @click.group("openaq-engine", help="Library to query openaq data")
    @click.pass_context
    def cli(ctx):
        ...

    cli.add_command(time_splitter)
    cli.add_command(cohort_builder)
    cli.add_command(feature_builder)
    cli.add_command(run_pipeline)

    if __name__ == "__main__":
        cli()

6. **MLFlow Tracking URI**: MLFlow initiates the tracking URI through environment variables. This should be set to your specified server or "localhost".
   
.. code-block:: python

    mlflow.set_tracking_uri(
        os.getenv("MLFLOW_TRACKING_URI")
    )

Example: Extending the Library
------------------------------
If a user wants to extend the library and write their own class, they can follow these steps:

1. **Create a Configuration Class**: Define the configuration for the new module in `config.model_settings`.
   
.. code-block:: python

    class CustomModuleConfig:
        PARAM1 = "default_value"
        PARAM2 = 10

2. **Create a Flow Class**: Define the flow class for the new module.
   
.. code-block:: python

    class CustomModuleFlow:
        def __init__(self):
            self.config = CustomModuleConfig()

        def execute(self):
            # Implement the execution logic
            pass

3. **Define CLI Command**: Create a CLI command for the new module.
   
.. code-block:: python

    import click
    from mlflows.cli.custom_module import custom_module_options
    from config.model_settings import CustomModuleConfig

    @custom_module_options()
    @click.command("custom-module", help="Description of custom module")
    def custom_module(param1, param2):
        experiment_id = mlflow.create_experiment(
            f"custom_module_{str(datetime.now())}", os.getenv("MLFLOW_S3_BUCKET")
        )

        with mlflow.start_run(experiment_id=experiment_id, nested=True):
            custom_module = CustomModuleFlow().execute()
            custom_module.execute(param1, param2)

4. **Add Command to CLI Group**: Add the new command to the CLI group.
   
.. code-block:: python

    cli.add_command(custom_module)

5. **Run the CLI**: Execute the CLI with the new command.
   
.. code-block:: sh

    openaq-engine custom-module --param1 value1 --param2 value2

This approach allows users to extend the library easily and integrate new modules into the existing MLFlow experiment framework.