astronomer.providers.amazon.aws.operators.sagemaker

Module Contents

Classes

SageMakerProcessingOperatorAsync

SageMakerProcessingOperatorAsync is used to analyze data and evaluate machine learning

SageMakerTransformOperatorAsync

SageMakerTransformOperatorAsync starts a transform job and polls for the status asynchronously.

SageMakerTrainingOperatorAsync

SageMakerTrainingOperatorAsync starts a model training job and polls for the status asynchronously.

Functions

serialize(result)

Serialize any objects coming from Sagemaker API response to json string

astronomer.providers.amazon.aws.operators.sagemaker.serialize(result)[source]

Serialize any objects coming from Sagemaker API response to json string

class astronomer.providers.amazon.aws.operators.sagemaker.SageMakerProcessingOperatorAsync(*, config, aws_conn_id=DEFAULT_CONN_ID, wait_for_completion=True, print_log=True, check_interval=CHECK_INTERVAL_SECOND, max_ingestion_time=None, action_if_job_exists='timestamp', **kwargs)[source]

Bases: airflow.providers.amazon.aws.operators.sagemaker.SageMakerProcessingOperator

SageMakerProcessingOperatorAsync is used to analyze data and evaluate machine learning models on Amazon SageMaker. With SageMakerProcessingOperatorAsync, you can use a simplified, managed experience on SageMaker to run your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon SageMaker processing job

Parameters:
  • config (dict) – The configuration necessary to start a processing job (templated). For details of the configuration parameter see :ref:SageMaker.Client.create_processing_job

  • aws_conn_id (str) – The AWS connection ID to use.

  • wait_for_completion (bool) – Even if wait is set to False, in async we will defer and the operation waits to check the status of the processing job.

  • print_log (bool) – if the operator should print the cloudwatch log during processing

  • check_interval (int) – if wait is set to be true, this is the time interval in seconds which the operator will check the status of the processing job

  • max_ingestion_time (int | None) – The operation fails if the processing job doesn’t finish within max_ingestion_time seconds. If you set this parameter to None, the operation does not timeout.

  • action_if_job_exists (str) – Behaviour if the job name already exists. Possible options are “increment” (default) and “fail”.

execute(context)[source]

Creates processing job via sync hook create_processing_job and pass the control to trigger and polls for the status of the processing job in async

execute_complete(context, event=None)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class astronomer.providers.amazon.aws.operators.sagemaker.SageMakerTransformOperatorAsync(*, config, aws_conn_id=DEFAULT_CONN_ID, wait_for_completion=True, check_interval=CHECK_INTERVAL_SECOND, max_ingestion_time=None, check_if_job_exists=True, action_if_job_exists='timestamp', **kwargs)[source]

Bases: airflow.providers.amazon.aws.operators.sagemaker.SageMakerTransformOperator

SageMakerTransformOperatorAsync starts a transform job and polls for the status asynchronously. A transform job uses a trained model to get inferences on a dataset and saves these results to an Amazon S3 location that you specify.

See also

For more information on how to use this operator, take a look at the guide: :ref:howto/operator:SageMakerTransformOperator

Parameters:
  • config (dict) –

    The configuration necessary to start a transform job (templated).

    If you need to create a SageMaker transform job based on an existed SageMaker model:

    config = transform_config
    

    If you need to create both SageMaker model and SageMaker Transform job:

    config = {
        'Model': model_config,
        'Transform': transform_config
    }
    

    For details of the configuration parameter of transform_config see :ref:SageMaker.Client.create_transform_job

    For details of the configuration parameter of model_config, See: :ref:SageMaker.Client.create_model

  • aws_conn_id (str) – The AWS connection ID to use.

  • check_interval (int) – If wait is set to True, the time interval, in seconds, that this operation waits to check the status of the transform job.

  • max_ingestion_time (int | None) – The operation fails if the transform job doesn’t finish within max_ingestion_time seconds. If you set this parameter to None, the operation does not timeout.

  • check_if_job_exists (bool) – If set to true, then the operator will check whether a transform job already exists for the name in the config.

  • action_if_job_exists (str) – Behaviour if the job name already exists. Possible options are “increment” (default) and “fail”. This is only relevant if check_if_job_exists is True.

execute(context)[source]

Creates transform job via sync hook create_transform_job and pass the control to trigger and polls for the status of the transform job in async

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class astronomer.providers.amazon.aws.operators.sagemaker.SageMakerTrainingOperatorAsync(*, config, aws_conn_id=DEFAULT_CONN_ID, wait_for_completion=True, print_log=True, check_interval=CHECK_INTERVAL_SECOND, max_ingestion_time=None, check_if_job_exists=True, action_if_job_exists='timestamp', **kwargs)[source]

Bases: airflow.providers.amazon.aws.operators.sagemaker.SageMakerTrainingOperator

SageMakerTrainingOperatorAsync starts a model training job and polls for the status asynchronously. After training completes, Amazon SageMaker saves the resulting model artifacts to an Amazon S3 location that you specify.

See also

For more information on how to use this operator, take a look at the guide: :ref:howto/operator:SageMakerTrainingOperator

Parameters:
  • config (dict) – The configuration necessary to start a training job (templated). For details of the configuration parameter see SageMaker.Client.create_training_job

  • aws_conn_id (str) – The AWS connection ID to use.

  • print_log (bool) – if the operator should print the cloudwatch log during training

  • check_interval (int) – if wait is set to be true, this is the time interval in seconds which the operator will check the status of the training job

  • max_ingestion_time (int | None) – The operation fails if the training job doesn’t finish within max_ingestion_time seconds. If you set this parameter to None, the operation does not timeout.

  • check_if_job_exists (bool) – If set to true, then the operator will check whether a training job already exists for the name in the config.

  • action_if_job_exists (str) – Behaviour if the job name already exists. Possible options are “increment” (default) and “fail”. This is only relevant if check_if_job_exists is True.

execute(context)[source]

Creates SageMaker training job via sync hook create_training_job and pass the control to trigger and polls for the status of the training job in async

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.