astronomer.providers.google.cloud.operators.bigquery

This module contains Google BigQueryAsync providers.

Module Contents

Classes

BigQueryInsertJobOperatorAsync

Starts a BigQuery job asynchronously, and returns job id.

BigQueryCheckOperatorAsync

BigQueryCheckOperatorAsync is asynchronous operator, submit the job and check

BigQueryGetDataOperatorAsync

Fetches the data from a BigQuery table (alternatively fetch data for selected columns)

BigQueryIntervalCheckOperatorAsync

Checks asynchronously that the values of metrics given as SQL expressions are within

BigQueryValueCheckOperatorAsync

Performs a simple value check using sql code.

Attributes

BIGQUERY_JOB_DETAILS_LINK_FMT

class astronomer.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperatorAsync(configuration, project_id=None, location=None, job_id=None, force_rerun=True, reattach_states=None, gcp_conn_id='google_cloud_default', delegate_to=None, impersonation_chain=None, cancel_on_kill=True, result_retry=DEFAULT_RETRY, result_timeout=None, deferrable=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator, airflow.models.baseoperator.BaseOperator

Starts a BigQuery job asynchronously, and returns job id. This operator works in the following way:

  • it calculates a unique hash of the job using job’s configuration or uuid if force_rerun is True

  • creates job_id in form of

    [provided_job_id | airflow_{dag_id}_{task_id}_{exec_date}]_{uniqueness_suffix}

  • submits a BigQuery job using the job_id

  • if job with given id already exists then it tries to reattach to the job if its not done and its

    state is in reattach_states. If the job is done the operator will raise AirflowException.

Using force_rerun will submit a new job every time without attaching to already existing ones.

For job definition see here:

Parameters:
  • configuration (dict[str, Any]) – The configuration parameter maps directly to BigQuery’s configuration field in the job object. For more details see https://cloud.google.com/bigquery/docs/reference/v2/jobs

  • job_id (str | None) – The ID of the job. It will be suffixed with hash of job configuration unless force_rerun is True. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or dashes (-). The maximum length is 1,024 characters. If not provided then uuid will be generated.

  • force_rerun (bool) – If True then operator will use hash of uuid as job id suffix

  • reattach_states (set[str] | None) – Set of BigQuery job’s states in case of which we should reattach to the job. Should be other than final states.

  • project_id (str | None) – Google Cloud Project where the job is running

  • location (str | None) – location the job is running

  • gcp_conn_id (str) – The connection ID used to connect to Google Cloud.

  • delegate_to (str | None) – The account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • cancel_on_kill (bool) – Flag which indicates whether cancel the hook’s job or not, when on_kill is called

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class astronomer.providers.google.cloud.operators.bigquery.BigQueryCheckOperatorAsync(*, sql, gcp_conn_id='google_cloud_default', use_legacy_sql=True, location=None, impersonation_chain=None, labels=None, deferrable=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.bigquery.BigQueryCheckOperator

BigQueryCheckOperatorAsync is asynchronous operator, submit the job and check for the status in async mode by using the job id

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class astronomer.providers.google.cloud.operators.bigquery.BigQueryGetDataOperatorAsync(*, dataset_id, table_id, project_id=None, max_results=100, selected_fields=None, gcp_conn_id='google_cloud_default', delegate_to=None, location=None, impersonation_chain=None, deferrable=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.bigquery.BigQueryGetDataOperator

Fetches the data from a BigQuery table (alternatively fetch data for selected columns) and returns data in a python list. The number of elements in the returned list will be equal to the number of rows fetched. Each element in the list will again be a list where element would represent the columns values for that row.

Example Result: [['Tony', '10'], ['Mike', '20'], ['Steve', '15']]

Note

If you pass fields to selected_fields which are in different order than the order of columns already in BQ table, the data will still be in the order of BQ table. For example if the BQ table has 3 columns as [A,B,C] and you pass ‘B,A’ in the selected_fields the data would still be of the form 'A,B'.

Example:

get_data = BigQueryGetDataOperator(
    task_id='get_data_from_bq',
    dataset_id='test_dataset',
    table_id='Transaction_partitions',
    max_results=100,
    selected_fields='DATE',
    gcp_conn_id='airflow-conn-id'
)
Parameters:
  • dataset_id (str) – The dataset ID of the requested table. (templated)

  • table_id (str) – The table ID of the requested table. (templated)

  • max_results (int) – The maximum number of records (rows) to be fetched from the table. (templated)

  • selected_fields (str | None) – List of fields to return (comma-separated). If unspecified, all fields are returned.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.

  • delegate_to (str | None) – The account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

  • location (str | None) – The location used for the operation.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

generate_query()[source]

Generate a select query if selected fields are given or with * for the given dataset and table id

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class astronomer.providers.google.cloud.operators.bigquery.BigQueryIntervalCheckOperatorAsync(*, table, metrics_thresholds, date_filter_column='ds', days_back=-7, gcp_conn_id='google_cloud_default', use_legacy_sql=True, location=None, impersonation_chain=None, labels=None, deferrable=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.bigquery.BigQueryIntervalCheckOperator

Checks asynchronously that the values of metrics given as SQL expressions are within a certain tolerance of the ones from days_back before.

This method constructs a query like so ::

SELECT {metrics_threshold_dict_key} FROM {table} WHERE {date_filter_column}=<date>

Parameters:
  • table (str) – the table name

  • days_back (SupportsAbs[int]) – number of days between ds and the ds we want to check against. Defaults to 7 days

  • metrics_thresholds (dict) – a dictionary of ratios indexed by metrics, for example ‘COUNT(*)’: 1.5 would require a 50 percent or less difference between the current day, and the prior days_back.

  • use_legacy_sql (bool) – Whether to use legacy SQL (true) or standard SQL (false).

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.

  • location (str | None) – The geographic location of the job. See details at: https://cloud.google.com/bigquery/docs/locations#specifying_your_location

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • labels (dict | None) – a dictionary containing labels for the table, passed to BigQuery

execute(context)[source]

Execute the job in sync mode and defers the trigger with job id to poll for the status

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class astronomer.providers.google.cloud.operators.bigquery.BigQueryValueCheckOperatorAsync(*, sql, pass_value, tolerance=None, gcp_conn_id='google_cloud_default', use_legacy_sql=True, location=None, impersonation_chain=None, labels=None, deferrable=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.bigquery.BigQueryValueCheckOperator

Performs a simple value check using sql code.

See also

For more information on how to use this operator, take a look at the guide: Compare query result to pass value

Parameters:
  • sql (str) – the sql to be executed

  • use_legacy_sql (bool) – Whether to use legacy SQL (true) or standard SQL (false).

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.

  • location (str | None) – The geographic location of the job. See details at: https://cloud.google.com/bigquery/docs/locations#specifying_your_location

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • labels (dict | None) – a dictionary containing labels for the table, passed to BigQuery

  • deferrable (bool) – Run operator in the deferrable mode

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event)[source]

Callback for when the trigger fires - returns immediately. Relies on trigger to throw an exception, otherwise it assumes execution was successful.