astronomer.providers.apache.livy.operators.livy
¶
This module contains the Apache Livy operator async.
Module Contents¶
Classes¶
This operator wraps the Apache Livy batch REST API, allowing to submit a Spark |
- class astronomer.providers.apache.livy.operators.livy.LivyOperatorAsync(*, file, class_name=None, args=None, conf=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, proxy_user=None, livy_conn_id='livy_default', livy_conn_auth_type=None, polling_interval=0, extra_options=None, extra_headers=None, retry_args=None, **kwargs)[source]¶
Bases:
airflow.providers.apache.livy.operators.livy.LivyOperator
This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster asynchronously.
- Parameters:
file (str) – path of the file containing the application to execute (required).
class_name (Optional[str]) – name of the application Java/Spark main class.
args (Optional[Sequence[Union[str, int, float]]]) – application command line arguments.
jars (Optional[Sequence[str]]) – jars to be used in this sessions.
py_files (Optional[Sequence[str]]) – python files to be used in this session.
files (Optional[Sequence[str]]) – files to be used in this session.
driver_memory (Optional[str]) – amount of memory to use for the driver process.
driver_cores (Optional[Union[int, str]]) – number of cores to use for the driver process.
executor_memory (Optional[str]) – amount of memory to use per executor process.
executor_cores (Optional[Union[int, str]]) – number of cores to use for each executor.
num_executors (Optional[Union[int, str]]) – number of executors to launch for this session.
archives (Optional[Sequence[str]]) – archives to be used in this session.
queue (Optional[str]) – name of the YARN queue to which the application is submitted.
name (Optional[str]) – name of this session.
conf (Optional[Dict[Any, Any]]) – Spark configuration properties.
proxy_user (Optional[str]) – user to impersonate when running the job.
livy_conn_id (str) – reference to a pre-defined Livy Connection.
polling_interval (int) – time in seconds between polling for job completion. If poll_interval=0, in that case return the batch_id and if polling_interval > 0, poll the livy job for termination in the polling interval defined.
extra_options (Optional[Dict[str, Any]]) – Additional option can be passed when creating a request. For example,
run(json=obj)
is passed asaiohttp.ClientSession().get(json=obj)
extra_headers (Optional[Dict[str, Any]]) – A dictionary of headers passed to the HTTP request to livy.
retry_args (Optional[Dict[str, Any]]) – Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity