astronomer.providers.apache.hive.sensors.named_hive_partition

Module Contents

Classes

NamedHivePartitionSensorAsync

Waits asynchronously for a set of partitions to show up in Hive.

class astronomer.providers.apache.hive.sensors.named_hive_partition.NamedHivePartitionSensorAsync(*, partition_names, metastore_conn_id='metastore_default', poke_interval=60 * 3, hook=None, **kwargs)[source]

Bases: airflow.providers.apache.hive.sensors.named_hive_partition.NamedHivePartitionSensor

Waits asynchronously for a set of partitions to show up in Hive.

Note

HivePartitionSensorAsync uses impyla library instead of PyHive. The sync version of this sensor uses PyHive <https://github.com/dropbox/PyHive>.

Since we use impyla library, please set the connection to use the port 10000 instead of 9083. For auth_mechanism='GSSAPI' the ticket renewal happens through command airflow kerberos in worker/trigger.

You may also need to allow traffic from Airflow worker/Triggerer to the Hive instance, depending on where they are running. For example, you might consider adding an entry in the etc/hosts file present in the Airflow worker/Triggerer, which maps the EMR Master node Public IP Address to its Private DNS Name to allow the network traffic.

The library version of hive and hadoop in Dockerfile should match the remote cluster where they are running.

Parameters:
  • partition_names (list[str]) – List of fully qualified names of the partitions to wait for. A fully qualified name is of the form schema.table/pk1=pv1/pk2=pv2, for example, default.users/ds=2016-01-01.

  • metastore_conn_id (str) – Metastore thrift service connection id.

execute(context)[source]

Submit a job to Hive and defer

execute_complete(context, event=None)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.