astronomer.providers.snowflake.operators.snowflake

Module Contents

Classes

SnowflakeOperatorAsync

  • SnowflakeOperatorAsync uses the snowflake python connector execute_async method to submit a database command

SnowflakeSqlApiOperatorAsync

Implemented Async Snowflake SQL API Operator to support multiple SQL statements sequentially,

class astronomer.providers.snowflake.operators.snowflake.SnowflakeOperatorAsync(*, snowflake_conn_id='snowflake_default', warehouse=None, database=None, role=None, schema=None, authenticator=None, session_parameters=None, poll_interval=5, handler=fetch_all_snowflake_handler, return_last=True, **kwargs)[source]

Bases: airflow.providers.snowflake.operators.snowflake.SnowflakeOperator

  • SnowflakeOperatorAsync uses the snowflake python connector execute_async method to submit a database command for asynchronous execution.

  • Submit multiple queries in parallel without waiting for each query to complete.

  • Accepts list of queries or multiple queries with ‘;’ semicolon separated string and params. It loops through the queries and execute them in sequence. Uses execute_async method to run the query

  • Once a query is submitted, it executes the query from one connection and gets the query IDs from the response and passes it to the Triggerer and closes the connection (so that the worker slots can be freed up).

  • The trigger gets the list of query IDs as input and polls every few seconds to snowflake and checks for the query status based on the query ID from different connection.

Where can this operator fit in?
  • Execute time taking queries which can be executed in parallel

  • For batch based operation like copy or inserting the data in parallel.

Best practices:
  • Ensure that you know which queries are dependent upon other queries before you run any queries in parallel. Some queries are interdependent and order sensitive, and therefore not suitable for parallelizing. For example, obviously an INSERT statement should not start until after the corresponding to CREATE TABLE statement has finished.

  • Ensure that you do not run too many queries for the memory that you have available. Running multiple queries in parallel typically consumes more memory, especially if more than one set of results is stored in memory at the same time.

  • Ensure that transaction control statements (BEGIN, COMMIT, and ROLLBACK) do not execute in parallel with other statements.

Parameters:
  • snowflake_conn_id (str) – Reference to Snowflake connection id

  • sql – the sql code to be executed. (templated)

  • autocommit – if True, each command is automatically committed. (default value: True)

  • parameters – (optional) the parameters to render the SQL query with.

  • warehouse (str | None) – name of warehouse (will overwrite any warehouse defined in the connection’s extra JSON)

  • database (str | None) – name of database (will overwrite database defined in connection)

  • schema (str | None) – name of schema (will overwrite schema defined in connection)

  • role (str | None) – name of role (will overwrite any role defined in connection’s extra JSON)

  • authenticator (str | None) – authenticator for Snowflake. ‘snowflake’ (default) to use the internal Snowflake authenticator ‘externalbrowser’ to authenticate using your web browser and Okta, ADFS or any other SAML 2.0-compliant identify provider (IdP) that has been defined for your account ‘https://<your_okta_account_name>.okta.com’ to authenticate through native Okta.

  • session_parameters (dict[str, Any] | None) – You can set session-level parameters at the time you connect to Snowflake

  • handler (Callable[[Any], Any]) – (optional) the function that will be applied to the cursor (default: fetch_all_handler).

  • return_last (bool) – (optional) if return the result of only last statement (default: True).

  • poll_interval (int) – the interval in seconds to poll the query

get_db_hook()[source]

Get the Snowflake Hook

execute(context)[source]

Make a sync connection to snowflake and run query in execute_async function in snowflake and close the connection and with the query ids, fetch the status of the query. By deferring the SnowflakeTrigger class pass along with query ids.

execute_complete(context, event=None)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class astronomer.providers.snowflake.operators.snowflake.SnowflakeSqlApiOperatorAsync(*, snowflake_conn_id='snowflake_default', warehouse=None, database=None, role=None, schema=None, authenticator=None, session_parameters=None, poll_interval=5, statement_count=0, token_life_time=LIFETIME, token_renewal_delta=RENEWAL_DELTA, bindings=None, **kwargs)[source]

Bases: airflow.providers.snowflake.operators.snowflake.SnowflakeOperator

Implemented Async Snowflake SQL API Operator to support multiple SQL statements sequentially, which is the behavior of the SnowflakeOperator, the Snowflake SQL API allows submitting multiple SQL statements in a single request. In combination with aiohttp, make post request to submit SQL statements for execution, poll to check the status of the execution of a statement. Fetch query results concurrently. This Operator currently uses key pair authentication, so you need tp provide private key raw content or private key file path in the snowflake connection along with other details

Where can this operator fit in?
  • To execute multiple SQL statements in a single request

  • To execute the SQL statement asynchronously and to execute standard queries and most DDL and DML statements

  • To develop custom applications and integrations that perform queries

  • To create provision users and roles, create table, etc.

The following commands are not supported:
  • The PUT command (in Snowflake SQL)

  • The GET command (in Snowflake SQL)

  • The CALL command with stored procedures that return a table(stored procedures with the RETURNS TABLE clause).

Parameters:
  • snowflake_conn_id (str) – Reference to Snowflake connection id

  • sql – the sql code to be executed. (templated)

  • autocommit – if True, each command is automatically committed. (default value: True)

  • parameters – (optional) the parameters to render the SQL query with.

  • warehouse (str | None) – name of warehouse (will overwrite any warehouse defined in the connection’s extra JSON)

  • database (str | None) – name of database (will overwrite database defined in connection)

  • schema (str | None) – name of schema (will overwrite schema defined in connection)

  • role (str | None) – name of role (will overwrite any role defined in connection’s extra JSON)

  • authenticator (str | None) – authenticator for Snowflake. ‘snowflake’ (default) to use the internal Snowflake authenticator ‘externalbrowser’ to authenticate using your web browser and Okta, ADFS or any other SAML 2.0-compliant identify provider (IdP) that has been defined for your account ‘https://<your_okta_account_name>.okta.com’ to authenticate through native Okta.

  • session_parameters (dict[str, Any] | None) – You can set session-level parameters at the time you connect to Snowflake

  • poll_interval (int) – the interval in seconds to poll the query

  • statement_count (int) – Number of SQL statement to be executed

  • token_life_time (datetime.timedelta) – lifetime of the JWT Token

  • token_renewal_delta (datetime.timedelta) – Renewal time of the JWT Token

  • bindings (dict[str, Any] | None) – (Optional) Values of bind variables in the SQL statement. When executing the statement, Snowflake replaces placeholders (? and :name) in the statement with these specified values.

LIFETIME
RENEWAL_DELTA
execute(context)[source]

Make a POST API request to snowflake by using SnowflakeSQL and execute the query to get the ids. By deferring the SnowflakeSqlApiTrigger class passed along with query ids.

execute_complete(context, event=None)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.