astronomer.providers.amazon.aws.operators.sagemaker
¶
Module Contents¶
Classes¶
SageMakerProcessingOperatorAsync is used to analyze data and evaluate machine learning |
|
SageMakerTransformOperatorAsync starts a transform job and polls for the status asynchronously. |
|
SageMakerTrainingOperatorAsync starts a model training job and polls for the status asynchronously. |
Functions¶
|
Serialize any objects coming from Sagemaker API response to json string |
- astronomer.providers.amazon.aws.operators.sagemaker.serialize(result)[source]¶
Serialize any objects coming from Sagemaker API response to json string
- class astronomer.providers.amazon.aws.operators.sagemaker.SageMakerProcessingOperatorAsync(*, config, aws_conn_id=DEFAULT_CONN_ID, wait_for_completion=True, print_log=True, check_interval=CHECK_INTERVAL_SECOND, max_ingestion_time=None, action_if_job_exists='increment', **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.sagemaker.SageMakerProcessingOperator
SageMakerProcessingOperatorAsync is used to analyze data and evaluate machine learning models on Amazon SageMaker. With SageMakerProcessingOperatorAsync, you can use a simplified, managed experience on SageMaker to run your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation.
See also
For more information on how to use this operator, take a look at the guide: Create an Amazon SageMaker processing job
- Parameters:
config (dict) – The configuration necessary to start a processing job (templated). For details of the configuration parameter see :ref:
SageMaker.Client.create_processing_job
aws_conn_id (str) – The AWS connection ID to use.
wait_for_completion (bool) – Even if wait is set to False, in async we will defer and the operation waits to check the status of the processing job.
print_log (bool) – if the operator should print the cloudwatch log during processing
check_interval (int) – if wait is set to be true, this is the time interval in seconds which the operator will check the status of the processing job
max_ingestion_time (int | None) – The operation fails if the processing job doesn’t finish within max_ingestion_time seconds. If you set this parameter to None, the operation does not timeout.
action_if_job_exists (str) – Behaviour if the job name already exists. Possible options are “increment” (default) and “fail”.
- class astronomer.providers.amazon.aws.operators.sagemaker.SageMakerTransformOperatorAsync(*, config, aws_conn_id=DEFAULT_CONN_ID, wait_for_completion=True, check_interval=CHECK_INTERVAL_SECOND, max_ingestion_time=None, check_if_job_exists=True, action_if_job_exists='increment', **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.sagemaker.SageMakerTransformOperator
SageMakerTransformOperatorAsync starts a transform job and polls for the status asynchronously. A transform job uses a trained model to get inferences on a dataset and saves these results to an Amazon S3 location that you specify.
See also
For more information on how to use this operator, take a look at the guide: :ref:
howto/operator:SageMakerTransformOperator
- Parameters:
config (dict) –
The configuration necessary to start a transform job (templated).
If you need to create a SageMaker transform job based on an existed SageMaker model:
config = transform_config
If you need to create both SageMaker model and SageMaker Transform job:
config = { 'Model': model_config, 'Transform': transform_config }
For details of the configuration parameter of transform_config see :ref:
SageMaker.Client.create_transform_job
For details of the configuration parameter of model_config, See: :ref:
SageMaker.Client.create_model
aws_conn_id (str) – The AWS connection ID to use.
check_interval (int) – If wait is set to True, the time interval, in seconds, that this operation waits to check the status of the transform job.
max_ingestion_time (int | None) – The operation fails if the transform job doesn’t finish within max_ingestion_time seconds. If you set this parameter to None, the operation does not timeout.
check_if_job_exists (bool) – If set to true, then the operator will check whether a transform job already exists for the name in the config.
action_if_job_exists (str) – Behaviour if the job name already exists. Possible options are “increment” (default) and “fail”. This is only relevant if check_if_job_exists is True.
- class astronomer.providers.amazon.aws.operators.sagemaker.SageMakerTrainingOperatorAsync(*, config, aws_conn_id=DEFAULT_CONN_ID, wait_for_completion=True, print_log=True, check_interval=CHECK_INTERVAL_SECOND, max_ingestion_time=None, check_if_job_exists=True, action_if_job_exists='increment', **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.sagemaker.SageMakerTrainingOperator
SageMakerTrainingOperatorAsync starts a model training job and polls for the status asynchronously. After training completes, Amazon SageMaker saves the resulting model artifacts to an Amazon S3 location that you specify.
See also
For more information on how to use this operator, take a look at the guide: :ref:
howto/operator:SageMakerTrainingOperator
- Parameters:
config (dict) – The configuration necessary to start a training job (templated). For details of the configuration parameter see
SageMaker.Client.create_training_job
aws_conn_id (str) – The AWS connection ID to use.
print_log (bool) – if the operator should print the cloudwatch log during training
check_interval (int) – if wait is set to be true, this is the time interval in seconds which the operator will check the status of the training job
max_ingestion_time (int | None) – The operation fails if the training job doesn’t finish within max_ingestion_time seconds. If you set this parameter to None, the operation does not timeout.
check_if_job_exists (bool) – If set to true, then the operator will check whether a training job already exists for the name in the config.
action_if_job_exists (str) – Behaviour if the job name already exists. Possible options are “increment” (default) and “fail”. This is only relevant if check_if_job_exists is True.