astronomer.providers.apache.hive.sensors.hive_partition¶
Classes¶
Waits for a given partition to show up in Hive table asynchronously. |
Module Contents¶
- class astronomer.providers.apache.hive.sensors.hive_partition.HivePartitionSensorAsync(*, table, partition="ds='{{ ds }}'", metastore_conn_id='metastore_default', schema='default', poke_interval=60 * 3, **kwargs)[source]¶
Bases:
airflow.providers.apache.hive.sensors.hive_partition.HivePartitionSensor
Waits for a given partition to show up in Hive table asynchronously.
Note
HivePartitionSensorAsync uses impyla library instead of PyHive. The sync version of this sensor uses PyHive <https://github.com/dropbox/PyHive>.
Since we use impyla library, please set the connection to use the port
10000
instead of9083
. Forauth_mechanism='GSSAPI'
the ticket renewal happens through commandairflow kerberos
in worker/trigger.You may also need to allow traffic from Airflow worker/Triggerer to the Hive instance, depending on where they are running. For example, you might consider adding an entry in the
etc/hosts
file present in the Airflow worker/Triggerer, which maps the EMR Master node Public IP Address to its Private DNS Name to allow the network traffic.The library version of hive and hadoop in
Dockerfile
should match the remote cluster where they are running.- Parameters:
table (str) – the table where the partition is present.
partition (str | None) – The partition clause to wait for. This is passed as notation as in “ds=’2015-01-01’”
schema (str) – database which needs to be connected in hive. By default, it is ‘default’
metastore_conn_id (str) – connection string to connect to hive.
polling_interval – The interval in seconds to wait between checks for partition.