Launcher

The launch Decorator

The launcher allows launching multiple experiments on a cluster using hydra.

mlxp.launcher.launch(config_path: str = 'configs', seeding_function: Callable[[Any], None] | None = None) Callable[[Callable[[Any], Any]], Any][source]

Create a decorator of the main function to be executed. launch allows composing configurations from multiple configuration files by leveraging hydra (see hydra-core package). This function behaves similarly to hydra.main provided in the hydra-core package: https://github.com/facebookresearch/hydra/blob/main/hydra/main.py. It expects a path to a configuration file named config.yaml contained in the directory config_path and returns a decorator. The returned decorator expects functions with the following signature: main(ctx: mlxp.Context).

Example:

import mlxp

@mlxp.launch(config_path='configs',
             seeding_function=set_seeds)
def main(ctx: mlxp.Context)->None:

    print(ctx.config)

if __name__ == "__main__":
    main()

Runing the above python code will create an object ctx of type mlxp.Context on the fly and provide it to the function main. Such object stores information about the run. In particular, the field ctx.config stores the options contained in the config file ‘config.yaml’. Additionally, ctx.logger, provides a logger object of the class mlxp.Logger for logging results of the run. Just like in hydra, it is also possible to override the configs from the command line and to sweep over multiple values of a given configuration when executing python code. See: https://hydra.cc/docs/intro/ for complete documentation on how to use Hydra.

This function is necessary to enable MLXP’s functionalities including:
  1. Multiple submissions to a cluster queue using mlxpsub

  2. Job versioning: Creating a ‘safe’ working directory from which jobs are executed when submitted to a cluster queue, to ensure each job was executed with a specific version of the code.

Parameters:
  • config_path (str (default './configs')) – The config path, a directory where the default user configuration and MLXP settings are stored.

  • seeding_function (Union[Callable[[Any], None],None] (default None)) – A callable for setting the seed of random number generators. It is called with the seed option in ‘ctx.config.seed’ passed to it.

Returns:

A decorator of the main function to be executed.

Type:

Callable[[TaskFunction], Any]

class mlxp.launcher.Context(config: DictConfig | None = None, mlxp: DictConfig | None = None, info: DictConfig | None = None, logger: Logger | None = None)[source]

Bases: object

The contex object passed to the decorated function when using decorator mlxp.launch.

config: ConfigDict

A structure containing project-specific options provided by the user. These options are loaded from a yaml file ‘config.yaml’ contained in the directory ‘config_path’ provided as argument to the decorator mlxp.launch. It’s content can be overriden from the command line.

mlxp: ConfigDict

A structure containing MLXP’s default settings for the project. Its content is loaded from a yaml file ‘mlxp.yaml’ located in the same directory ‘config.yaml’.

info: ConfigDict

A structure containing information about the current run: ex. status, start time, hostname, etc.

logger: Logger | None

A logger object that can be used for logging variables (metrics, checkpoints, artifacts). When logging is enabled, these variables are all stored in a uniquely defined directory.

mlxp.launcher.instance_from_dict(class_name: str, arguments: Dict[str, Any]) T[source]

Create an instance of a class based on a dictionary of arguments.

Parameters:
  • class_name (str) – The name of the class

  • arguments (Dict[str,Any]) – A dictionary of arguments to the class constructor

Returns:

An instance of a class ‘class_name’ constructed using the arguments in ‘arguments’.

Return type:

Type or Callable

mlxp.launcher.instantiate(class_name: str) T | Callable[source]

Dynamically imports a module and retrieves a class or function in it by name.

Given the fully qualified name of a class or function (in the form ‘module.submodule.ClassName’ or ‘module.submodule.function_name’), this function imports the module and returns a handle to the class or function.

Parameters:

class_name (str) – The fully qualified name of the class or function to retrieve. This should include the module path and the name, e.g., ‘module.submodule.ClassName’ or ‘module.submodule.function_name’.

Returns:

A handle (reference) to the class or function specified by class_name.

Return type:

Type or Callable

Raises:
  • ImportError – If the module cannot be imported.

  • AttributeError – If the class or function cannot be found in the module.

  • NameError – If the name cannot be evaluated after attempts to retrieve it.

Example:

>>> MyClass = instantiate('my_module.MyClass')
>>> my_instance = MyClass()
>>> my_function = instantiate('my_module.my_function')
>>> result = my_function()

The mlxpsub Command

mlxp.mlxpsub.mlxpsub()[source]

A function for submitting a script to a job scheduler. Usage: mlxpsub <script.sh>

The ‘script.sh’ must contain the scheduler’s options defining the resource allocation for each individual job. Below is an example of ‘script.sh’

Example:

#!/bin/bash

#OAR -l core=1, walltime=6:00:00
#OAR -t besteffort
#OAR -t idempotent
#OAR -p gpumem>'16000'

python main.py  optimizer.lr=10.,1.,0.1 seed=1,2,3,4
python main.py  model.num_units=100,200 seed=1,2,3,4

The command assumes the script contains at least a python command of the form: python <python_file_name.py> options_1=A,B,C option_2=X,Y where <python_file_name.py> is a python file that uses MLXP for launching.

MLXP creates a script for each job corresponding to an option setting. Each script is located in a directory of the form parent_log_dir/log_id, where log_id is automatically assigned by MLXP for each job.

Here is an example of the first created script in ‘logs/1/script.sh’

Example:

#!/bin/bash
#OAR -n logs/1
#OAR -E /root/logs/1/log.stderr
#OAR -O /root/logs/1/log.stdout
#OAR -l core=1, walltime=6:00:00
#OAR -t besteffort
#OAR -t idempotent
#OAR -p gpumem>'16000'

cd /root/workdir/
python main.py  optimizer.lr=10. seed=1

As you can see, MLXP automatically assigns values for the job’s name, stdout and stderr file paths, so there is no need to specify those in the original script ‘script.sh’. These scripts contain the same scheduler’s options as in ‘script.sh’ and a single python command using one specific option setting: optimizer.lr=10. seed=1 Additionally, MLXP pre-processes the python command to extract its working directory and set it explicitly in the newly created script before the python command.

Note

It is also possible to have other commands in the ‘script.sh’, for instance to activate an environment: (conda activate my_env). These commands will be copied from ‘script.sh’ to the new created script and placed before the python command. Variable assignments and directory changes will be systematically ignored.

To use mlxpsub, MLXP must be installed on both the head node and all compute nodes. However, application-specific modules do not need to be installed on the head node. You can avoid installing them on the head node by ensuring that these modules are only imported within the function that is decorated with the mlxp.launch decorator.

In the follwing example, the mlxp.launch decorator is used in the file main.py to decorate the function train. The version below of main.py requires torch to be installed in the head node:

main.py
import torch

import mlxp

@mlxp.launch(config_path='./configs')
def train(ctx: mlxp.Context)->None:

    cfg = ctx.config
    logger = ctx.logger

    ...

if __name__ == "__main__":
    train()

To avoid installing torch on the head node, you can make the following simple modification to the main.py file:

main.py
import mlxp

@mlxp.launch(config_path='./configs')
def train(ctx: mlxp.Context)->None:

    import torch

    cfg = ctx.config
    logger = ctx.logger

    ...

if __name__ == "__main__":
    train()