pypers.pipeline

class pypers.pipeline.Configurator(pipeline: Pipeline)

Bases: object

Automatically configures hyperparameters of a pipeline.

Parameters:

pipeline (Pipeline) – An instance of the Pipeline class.

configure(base_cfg, input)

Configure the hyperparameters of the pipeline.

Parameters:
  • base_cfg (Config) – The base configuration.

  • input (Any) – The input data.

Returns:

The configured hyperparameters.

Return type:

Config

first_differing_stage(config1: Config, config2: Config)

Find the first stage with differing configurations between two sets of hyperparameters.

Parameters:
  • config1 (Config) – The first set of hyperparameters.

  • config2 (Config) – The second set of hyperparameters.

Returns:

The first differing stage, or None if no differences are found.

Return type:

Stage or None

property pipeline

Get the pipeline associated with this configurator.

Returns:

The pipeline instance.

Return type:

Pipeline

class pypers.pipeline.Pipeline(configurator: Optional[Configurator] = None)

Bases: object

Defines a processing pipeline.

This class defines a processing pipeline that consists of multiple stages. Each stage performs a specific operation on the input data. The pipeline processes the input data by executing the process method of each stage successively.

Note that hyperparameters are not set automatically if the process_image() method is used directly. Hyperparameters are only set automatically if the configure method or batch processing is used.

Parameters:

configurator (Configurator, optional) – An instance of the Configurator class used to automatically configure hyperparameters of the pipeline. If not provided, a default Configurator instance will be created.

append(stage: Stage, after: Optional[Union[int, str]] = None)
configure(base_cfg, *args, **kwargs)

Automatically configures hyperparameters.

property fields
find(stage_id, not_found_dummy=inf)

Returns the position of the stage identified by stage_id.

Returns not_found_dummy if the stage is not found.

get_extra_stages(first_stage, last_stage, available_inputs)
process(input, cfg, first_stage=None, last_stage=None, data=None, log_root_dir=None, out=None, **kwargs)

Processes the input.

The process() methods of the stages of the pipeline are executed successively.

Parameters:
  • input – The input to be processed (can be None if and only if data is not None).

  • cfg – A Config object which represents the hyperparameters.

  • first_stage – The name of the first stage to be executed.

  • last_stage – The name of the last stage to be executed.

  • data – The results of a previous execution.

  • log_root_dir – Path to a directory where log files should be written to.

  • out – An instance of an Output sub-class, 'muted' if no output should be produced, or None if the default output should be used.

Returns:

Tuple (data, cfg, timings), where data is the pipeline data object comprising all final and intermediate results, cfg are the finally used hyperparameters, and timings is a dictionary containing the execution time of each individual pipeline stage (in seconds).

The parameter data is used if and only if first_stage is not None. In this case, the outputs produced by the stages of the pipeline which are being skipped must be fed in using the data parameter obtained from a previous execution of this method.

stage(stage_id)
class pypers.pipeline.ProcessingControl(first_stage: Optional[str] = None, last_stage: Optional[str] = None)

Bases: object

A class used to control the processing of stages in a pipeline.

This class keeps track of the first and last stages of a pipeline, and determines whether a given stage should be processed based on its position in the pipeline.

Parameters:
  • first_stage (str, optional) – The first stage of the pipeline. Processing starts from this stage. If None, processing starts from the beginning.

  • last_stage (str, optional) – The last stage of the pipeline. Processing stops after this stage. If None, processing goes until the end.

step(stage)

Determines whether the given stage should be processed.

If the stage is the first stage of the pipeline, processing starts. If the stage is the last stage of the pipeline, processing stops after this stage.

Parameters:

stage (str) – The stage to check.

Returns:

True if the stage should be processed, False otherwise.

Return type:

bool

class pypers.pipeline.Stage

Bases: object

A pipeline stage.

Each stage can be controlled by a separate set of hyperparameters. Refer to the documentation of the respective pipeline stages for details. Most hyperparameters reside in namespaces, which are uniquely associated with the corresponding pipeline stages.

Parameters:
  • name – Readable identifier of this stage.

  • id – The stage ID, used as the hyperparameter namespace. Defaults to the result of the suggest_stage_id() function if not specified.

  • inputs – List of inputs required by this stage.

  • outputs – List of outputs produced by this stage.

Automation

Hyperparameters can be set automatically using the configure() method.

Inputs and outputs

Each stage must declare its required inputs and the outputs it produces. These are used by create_pipeline() to automatically determine the stage order. The input input is provided by the pipeline itself.

add_callback(name, cb)
configure(*args, **kwargs)
consumes = []
enabled_by_default = True
inputs = []
outputs = []
process(cfg: Optional[Config] = None, log_root_dir: Optional[str] = None, out: Optional[Output] = None, **inputs)

Executes the current pipeline stage.

This method runs the current stage of the pipeline with the provided inputs, configuration parameters, and logging settings. It then returns the outputs produced by this stage.

Parameters:
  • input_data (dict) – A dictionary containing the inputs required by this stage. Each key-value pair in the dictionary represents an input name and its corresponding value.

  • cfg (dict) – A dictionary containing the hyperparameters to be used by this stage. Each key-value pair in the dictionary represents a hyperparameter name and its corresponding value.

  • log_root_dir (str, optional) – The path to the directory where log files will be written. If this parameter is None, no log files will be written.

  • out (Output, ‘muted’, or None, optional) – An instance of a subclass of Output to handle the output of this stage. If this parameter is 'muted', no output will be produced. If this parameter is None, the default output handler will be used.

Returns:

A dictionary containing the outputs produced by this stage. Each key-value pair in the dictionary represents an output name and its corresponding value.

Return type:

dict

remove_callback(name, cb)
skip(data, out=None, **kwargs)
pypers.pipeline.create_pipeline(stages: Sequence)

Creates and returns a new Pipeline object configured for the given stages.

The stage order is determined automatically.

pypers.pipeline.suggest_stage_id(class_name: str) str

Suggest stage ID based on a class name.

This function validates the class name, then finds and groups tokens in the class name. Tokens are grouped if they are consecutive and alphanumeric, but do not start with numbers. The function then converts the tokens to lowercase, removes underscores, and joins them with hyphens.

Parameters:

class_name (str) – The name of the class to suggest a configuration namespace for.

Returns:

A string of hyphen-separated tokens from the class name.

Return type:

str

Raises:

AssertionError – If the class name is not valid.