astronomer.providers.apache.livy.operators.livy

This module contains the Apache Livy operator async.

Module Contents

Classes

LivyOperatorAsync

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark

class astronomer.providers.apache.livy.operators.livy.LivyOperatorAsync(*, file, class_name=None, args=None, conf=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, proxy_user=None, livy_conn_id='livy_default', livy_conn_auth_type=None, polling_interval=0, extra_options=None, extra_headers=None, retry_args=None, **kwargs)[source]

Bases: airflow.providers.apache.livy.operators.livy.LivyOperator

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster asynchronously.

Parameters:
  • file (str) – path of the file containing the application to execute (required).

  • class_name (str | None) – name of the application Java/Spark main class.

  • args (Sequence[str | int | float] | None) – application command line arguments.

  • jars (Sequence[str] | None) – jars to be used in this sessions.

  • py_files (Sequence[str] | None) – python files to be used in this session.

  • files (Sequence[str] | None) – files to be used in this session.

  • driver_memory (str | None) – amount of memory to use for the driver process.

  • driver_cores (int | str | None) – number of cores to use for the driver process.

  • executor_memory (str | None) – amount of memory to use per executor process.

  • executor_cores (int | str | None) – number of cores to use for each executor.

  • num_executors (int | str | None) – number of executors to launch for this session.

  • archives (Sequence[str] | None) – archives to be used in this session.

  • queue (str | None) – name of the YARN queue to which the application is submitted.

  • name (str | None) – name of this session.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

  • proxy_user (str | None) – user to impersonate when running the job.

  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • polling_interval (int) – time in seconds between polling for job completion. If poll_interval=0, in that case return the batch_id and if polling_interval > 0, poll the livy job for termination in the polling interval defined.

  • extra_options (dict[str, Any] | None) – Additional option can be passed when creating a request. For example, run(json=obj) is passed as aiohttp.ClientSession().get(json=obj)

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

  • retry_args (dict[str, Any] | None) – Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity

execute(context)[source]

Airflow runs this method on the worker and defers using the trigger. Submit the job and get the job_id using which we defer and poll in trigger

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.