astronomer.providers.google.cloud.triggers.dataproc

Module Contents

Classes

DataprocCreateClusterTrigger

Asynchronously check the status of a cluster

DataprocDeleteClusterTrigger

Asynchronously check the status of a cluster

DataProcSubmitTrigger

Check for the state of a previously submitted Dataproc job.

class astronomer.providers.google.cloud.triggers.dataproc.DataprocCreateClusterTrigger(*, project_id=None, region=None, cluster_name, end_time, metadata=(), delete_on_error=True, cluster_config=None, labels=None, gcp_conn_id='google_cloud_default', polling_interval=5.0, **kwargs)[source]

Bases: airflow.triggers.base.BaseTrigger

Asynchronously check the status of a cluster

Parameters:
  • project_id (Optional[str]) – The ID of the Google Cloud project the cluster belongs to

  • region (Optional[str]) – The Cloud Dataproc region in which to handle the request

  • cluster_name (str) – The name of the cluster

  • end_time (float) – Time in second left to check the cluster status

  • metadata (Sequence[Tuple[str, str]]) – Additional metadata that is provided to the method

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • polling_interval (float) – Time in seconds to sleep between checks of cluster status

serialize()[source]

Serializes DataprocCreateClusterTrigger arguments and classpath.

async run()[source]

Check the status of cluster until reach the terminal state

class astronomer.providers.google.cloud.triggers.dataproc.DataprocDeleteClusterTrigger(cluster_name, end_time, project_id=None, region=None, metadata=(), gcp_conn_id='google_cloud_default', polling_interval=5.0, **kwargs)[source]

Bases: airflow.triggers.base.BaseTrigger

Asynchronously check the status of a cluster

Parameters:
  • cluster_name (str) – The name of the cluster

  • end_time (float) – Time in second left to check the cluster status

  • project_id (Optional[str]) – The ID of the Google Cloud project the cluster belongs to

  • region (Optional[str]) – The Cloud Dataproc region in which to handle the request

  • metadata (Sequence[Tuple[str, str]]) – Additional metadata that is provided to the method

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • polling_interval (float) – Time in seconds to sleep between checks of cluster status

serialize()[source]

Serializes DataprocDeleteClusterTrigger arguments and classpath.

async run()[source]

Wait until cluster is deleted completely

class astronomer.providers.google.cloud.triggers.dataproc.DataProcSubmitTrigger(*, dataproc_job_id, region=None, project_id=None, gcp_conn_id='google_cloud_default', polling_interval=5.0)[source]

Bases: airflow.triggers.base.BaseTrigger

Check for the state of a previously submitted Dataproc job.

Parameters:
  • dataproc_job_id (str) – The Dataproc job ID to poll. (templated)

  • region (Optional[str]) – Required. The Cloud Dataproc region in which to handle the request. (templated)

  • project_id (Optional[str]) – The ID of the google cloud project in which to create the cluster. (templated)

  • location – (To be deprecated). The Cloud Dataproc region in which to handle the request. (templated)

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud Platform.

  • wait_timeout – How many seconds wait for job to be ready.

serialize()[source]

Serializes DataProcSubmitTrigger arguments and classpath.

async run()[source]

Simple loop until the job running on Google Cloud DataProc is completed or not