astronomer.providers.amazon.aws.operators.emr
¶
Module Contents¶
Classes¶
An async operator that submits jobs to EMR on EKS virtual clusters. |
- class astronomer.providers.amazon.aws.operators.emr.EmrContainerOperatorAsync(*, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None, aws_conn_id='aws_default', wait_for_completion=True, poll_interval=30, max_tries=None, tags=None, max_polling_attempts=None, **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.emr.EmrContainerOperator
An async operator that submits jobs to EMR on EKS virtual clusters.
- Parameters:
name (str) – The name of the job run.
virtual_cluster_id (str) – The EMR on EKS virtual cluster ID
execution_role_arn (str) – The IAM role ARN associated with the job run.
release_label (str) – The Amazon EMR release version to use for the job run.
job_driver (dict) – Job configuration details, e.g. the Spark job parameters.
configuration_overrides (dict | None) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration.
client_request_token (str | None) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you.
aws_conn_id (str) – The Airflow connection used for AWS credentials.
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check query status on EMR
max_tries (int | None) – Deprecated - use max_polling_attempts instead.
max_polling_attempts (int | None) – Maximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state.
tags (dict | None) – The tags assigned to job runs. Defaults to None