AWS Glue

2023/11/16 - AWS Glue - 5 new api methods

Changes  Introduces new column statistics APIs to support statistics generation for tables within the Glue Data Catalog.

GetColumnStatisticsTaskRuns (new) Link ¶

Retrieves information about all runs associated with the specified table.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_runs(
    DatabaseName='string',
    TableName='string',
    MaxResults=123,
    NextToken='string'
)
type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the database where the table resides.

type TableName

string

param TableName

[REQUIRED]

The name of the table.

type MaxResults

integer

param MaxResults

The maximum size of the response.

type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

rtype

dict

returns

Response Syntax

{
    'ColumnStatisticsTaskRuns': [
        {
            'CustomerId': 'string',
            'ColumnStatisticsTaskRunId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'ColumnNameList': [
                'string',
            ],
            'CatalogID': 'string',
            'Role': 'string',
            'SampleSize': 123.0,
            'SecurityConfiguration': 'string',
            'NumberOfWorkers': 123,
            'WorkerType': 'string',
            'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'StartTime': datetime(2015, 1, 1),
            'EndTime': datetime(2015, 1, 1),
            'ErrorMessage': 'string',
            'DPUSeconds': 123.0
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • ColumnStatisticsTaskRuns (list) --

      A list of column statistics task runs.

      • (dict) --

        The object that shows the details of the column stats run.

        • CustomerId (string) --

          The Amazon Web Services account ID.

        • ColumnStatisticsTaskRunId (string) --

          The identifier for the particular column statistics task run.

        • DatabaseName (string) --

          The database where the table resides.

        • TableName (string) --

          The name of the table for which column statistics is generated.

        • ColumnNameList (list) --

          A list of the column names. If none is supplied, all column names for the table will be used by default.

          • (string) --

        • CatalogID (string) --

          The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.

        • Role (string) --

          The IAM role that the service assumes to generate statistics.

        • SampleSize (float) --

          The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

        • SecurityConfiguration (string) --

          Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

        • NumberOfWorkers (integer) --

          The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

        • WorkerType (string) --

          The type of workers being used for generating stats. The default is g.1x .

        • Status (string) --

          The status of the task run.

        • CreationTime (datetime) --

          The time that this task was created.

        • LastUpdated (datetime) --

          The last point in time when this task was modified.

        • StartTime (datetime) --

          The start time of the task.

        • EndTime (datetime) --

          The end time of the task.

        • ErrorMessage (string) --

          The error message for the job.

        • DPUSeconds (float) --

          The calculated DPU usage in seconds for all autoscaled workers.

    • NextToken (string) --

      A continuation token, if not all task runs have yet been returned.

GetColumnStatisticsTaskRun (new) Link ¶

Get the associated metadata/information for a task run, given a task run ID.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_run(
    ColumnStatisticsTaskRunId='string'
)
type ColumnStatisticsTaskRunId

string

param ColumnStatisticsTaskRunId

[REQUIRED]

The identifier for the particular column statistics task run.

rtype

dict

returns

Response Syntax

{
    'ColumnStatisticsTaskRun': {
        'CustomerId': 'string',
        'ColumnStatisticsTaskRunId': 'string',
        'DatabaseName': 'string',
        'TableName': 'string',
        'ColumnNameList': [
            'string',
        ],
        'CatalogID': 'string',
        'Role': 'string',
        'SampleSize': 123.0,
        'SecurityConfiguration': 'string',
        'NumberOfWorkers': 123,
        'WorkerType': 'string',
        'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
        'CreationTime': datetime(2015, 1, 1),
        'LastUpdated': datetime(2015, 1, 1),
        'StartTime': datetime(2015, 1, 1),
        'EndTime': datetime(2015, 1, 1),
        'ErrorMessage': 'string',
        'DPUSeconds': 123.0
    }
}

Response Structure

  • (dict) --

    • ColumnStatisticsTaskRun (dict) --

      A ColumnStatisticsTaskRun object representing the details of the column stats run.

      • CustomerId (string) --

        The Amazon Web Services account ID.

      • ColumnStatisticsTaskRunId (string) --

        The identifier for the particular column statistics task run.

      • DatabaseName (string) --

        The database where the table resides.

      • TableName (string) --

        The name of the table for which column statistics is generated.

      • ColumnNameList (list) --

        A list of the column names. If none is supplied, all column names for the table will be used by default.

        • (string) --

      • CatalogID (string) --

        The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.

      • Role (string) --

        The IAM role that the service assumes to generate statistics.

      • SampleSize (float) --

        The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

      • SecurityConfiguration (string) --

        Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

      • NumberOfWorkers (integer) --

        The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

      • WorkerType (string) --

        The type of workers being used for generating stats. The default is g.1x .

      • Status (string) --

        The status of the task run.

      • CreationTime (datetime) --

        The time that this task was created.

      • LastUpdated (datetime) --

        The last point in time when this task was modified.

      • StartTime (datetime) --

        The start time of the task.

      • EndTime (datetime) --

        The end time of the task.

      • ErrorMessage (string) --

        The error message for the job.

      • DPUSeconds (float) --

        The calculated DPU usage in seconds for all autoscaled workers.

StopColumnStatisticsTaskRun (new) Link ¶

Stops a task run for the specified table.

See also: AWS API Documentation

Request Syntax

client.stop_column_statistics_task_run(
    DatabaseName='string',
    TableName='string'
)
type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the database where the table resides.

type TableName

string

param TableName

[REQUIRED]

The name of the table.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

StartColumnStatisticsTaskRun (new) Link ¶

Starts a column statistics task run, for a specified table and columns.

See also: AWS API Documentation

Request Syntax

client.start_column_statistics_task_run(
    DatabaseName='string',
    TableName='string',
    ColumnNameList=[
        'string',
    ],
    Role='string',
    SampleSize=123.0,
    CatalogID='string',
    SecurityConfiguration='string'
)
type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the database where the table resides.

type TableName

string

param TableName

[REQUIRED]

The name of the table to generate statistics.

type ColumnNameList

list

param ColumnNameList

A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.

  • (string) --

type Role

string

param Role

[REQUIRED]

The IAM role that the service assumes to generate statistics.

type SampleSize

float

param SampleSize

The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

type CatalogID

string

param CatalogID

The ID of the Data Catalog where the table reside. If none is supplied, the Amazon Web Services account ID is used by default.

type SecurityConfiguration

string

param SecurityConfiguration

Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

rtype

dict

returns

Response Syntax

{
    'ColumnStatisticsTaskRunId': 'string'
}

Response Structure

  • (dict) --

    • ColumnStatisticsTaskRunId (string) --

      The identifier for the column statistics task run.

ListColumnStatisticsTaskRuns (new) Link ¶

List all task runs for a particular account.

See also: AWS API Documentation

Request Syntax

client.list_column_statistics_task_runs(
    MaxResults=123,
    NextToken='string'
)
type MaxResults

integer

param MaxResults

The maximum size of the response.

type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

rtype

dict

returns

Response Syntax

{
    'ColumnStatisticsTaskRunIds': [
        'string',
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • ColumnStatisticsTaskRunIds (list) --

      A list of column statistics task run IDs.

      • (string) --

    • NextToken (string) --

      A continuation token, if not all task run IDs have yet been returned.