AWS Glue

2020/04/20 - AWS Glue - 4 updated api methods

Changes  Added a new ConnectionType "KAFKA" and a ConnectionProperty "KAFKA_BOOTSTRAP_SERVERS" to support Kafka connection.

CreateConnection (updated) Link ¶
Changes (request)
{'ConnectionInput': {'ConnectionType': {'KAFKA'}}}

Creates a connection definition in the Data Catalog.

See also: AWS API Documentation

Request Syntax

client.create_connection(
    CatalogId='string',
    ConnectionInput={
        'Name': 'string',
        'Description': 'string',
        'ConnectionType': 'JDBC'|'SFTP'|'MONGODB'|'KAFKA',
        'MatchCriteria': [
            'string',
        ],
        'ConnectionProperties': {
            'string': 'string'
        },
        'PhysicalConnectionRequirements': {
            'SubnetId': 'string',
            'SecurityGroupIdList': [
                'string',
            ],
            'AvailabilityZone': 'string'
        }
    }
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog in which to create the connection. If none is provided, the AWS account ID is used by default.

type ConnectionInput

dict

param ConnectionInput

[REQUIRED]

A ConnectionInput object defining the connection to create.

  • Name (string) -- [REQUIRED]

    The name of the connection.

  • Description (string) --

    The description of the connection.

  • ConnectionType (string) -- [REQUIRED]

    The type of the connection. Currently, these types are supported:

    • JDBC - Designates a connection to a database through Java Database Connectivity (JDBC).

    • KAFKA - Designates a connection to an Apache Kafka streaming platform.

    • MONGODB - Designates a connection to a MongoDB document database.

    SFTP is not supported.

  • MatchCriteria (list) --

    A list of criteria that can be used in selecting this connection.

    • (string) --

  • ConnectionProperties (dict) -- [REQUIRED]

    These key-value pairs define parameters for the connection.

    • (string) --

      • (string) --

  • PhysicalConnectionRequirements (dict) --

    A map of physical connection requirements, such as virtual private cloud (VPC) and SecurityGroup , that are needed to successfully make this connection.

    • SubnetId (string) --

      The subnet ID used by the connection.

    • SecurityGroupIdList (list) --

      The security group ID list used by the connection.

      • (string) --

    • AvailabilityZone (string) --

      The connection's Availability Zone. This field is redundant because the specified subnet implies the Availability Zone to be used. Currently the field must be populated, but it will be deprecated in the future.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

GetConnection (updated) Link ¶
Changes (response)
{'Connection': {'ConnectionType': {'KAFKA'}}}

Retrieves a connection definition from the Data Catalog.

See also: AWS API Documentation

Request Syntax

client.get_connection(
    CatalogId='string',
    Name='string',
    HidePassword=True|False
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog in which the connection resides. If none is provided, the AWS account ID is used by default.

type Name

string

param Name

[REQUIRED]

The name of the connection definition to retrieve.

type HidePassword

boolean

param HidePassword

Allows you to retrieve the connection metadata without returning the password. For instance, the AWS Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the AWS KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.

rtype

dict

returns

Response Syntax

{
    'Connection': {
        'Name': 'string',
        'Description': 'string',
        'ConnectionType': 'JDBC'|'SFTP'|'MONGODB'|'KAFKA',
        'MatchCriteria': [
            'string',
        ],
        'ConnectionProperties': {
            'string': 'string'
        },
        'PhysicalConnectionRequirements': {
            'SubnetId': 'string',
            'SecurityGroupIdList': [
                'string',
            ],
            'AvailabilityZone': 'string'
        },
        'CreationTime': datetime(2015, 1, 1),
        'LastUpdatedTime': datetime(2015, 1, 1),
        'LastUpdatedBy': 'string'
    }
}

Response Structure

  • (dict) --

    • Connection (dict) --

      The requested connection definition.

      • Name (string) --

        The name of the connection definition.

      • Description (string) --

        The description of the connection.

      • ConnectionType (string) --

        The type of the connection. Currently, only JDBC is supported; SFTP is not supported.

      • MatchCriteria (list) --

        A list of criteria that can be used in selecting this connection.

        • (string) --

      • ConnectionProperties (dict) --

        These key-value pairs define parameters for the connection:

        • HOST - The host URI: either the fully qualified domain name (FQDN) or the IPv4 address of the database host.

        • PORT - The port number, between 1024 and 65535, of the port on which the database host is listening for database connections.

        • USER_NAME - The name under which to log in to the database. The value string for USER_NAME is "USERNAME ".

        • PASSWORD - A password, if one is used, for the user name.

        • ENCRYPTED_PASSWORD - When you enable connection password protection by setting ConnectionPasswordEncryption in the Data Catalog encryption settings, this field stores the encrypted password.

        • JDBC_DRIVER_JAR_URI - The Amazon Simple Storage Service (Amazon S3) path of the JAR file that contains the JDBC driver to use.

        • JDBC_DRIVER_CLASS_NAME - The class name of the JDBC driver to use.

        • JDBC_ENGINE - The name of the JDBC engine to use.

        • JDBC_ENGINE_VERSION - The version of the JDBC engine to use.

        • CONFIG_FILES - (Reserved for future use.)

        • INSTANCE_ID - The instance ID to use.

        • JDBC_CONNECTION_URL - The URL for connecting to a JDBC data source.

        • JDBC_ENFORCE_SSL - A Boolean string (true, false) specifying whether Secure Sockets Layer (SSL) with hostname matching is enforced for the JDBC connection on the client. The default is false.

        • CUSTOM_JDBC_CERT - An Amazon S3 location specifying the customer's root certificate. AWS Glue uses this root certificate to validate the customer’s certificate when connecting to the customer database. AWS Glue only handles X.509 certificates. The certificate provided must be DER-encoded and supplied in Base64 encoding PEM format.

        • SKIP_CUSTOM_JDBC_CERT_VALIDATION - By default, this is false . AWS Glue validates the Signature algorithm and Subject Public Key Algorithm for the customer certificate. The only permitted algorithms for the Signature algorithm are SHA256withRSA, SHA384withRSA or SHA512withRSA. For the Subject Public Key Algorithm, the key length must be at least 2048. You can set the value of this property to true to skip AWS Glue’s validation of the customer certificate.

        • CUSTOM_JDBC_CERT_STRING - A custom JDBC certificate string which is used for domain match or distinguished name match to prevent a man-in-the-middle attack. In Oracle database, this is used as the SSL_SERVER_CERT_DN ; in Microsoft SQL Server, this is used as the hostNameInCertificate .

        • CONNECTION_URL - The URL for connecting to a general (non-JDBC) data source.

        • KAFKA_BOOTSTRAP_SERVERS - A comma-separated list of host and port pairs that are the addresses of the Apache Kafka brokers in a Kafka cluster to which a Kafka client will connect to and bootstrap itself.

        • (string) --

          • (string) --

      • PhysicalConnectionRequirements (dict) --

        A map of physical connection requirements, such as virtual private cloud (VPC) and SecurityGroup , that are needed to make this connection successfully.

        • SubnetId (string) --

          The subnet ID used by the connection.

        • SecurityGroupIdList (list) --

          The security group ID list used by the connection.

          • (string) --

        • AvailabilityZone (string) --

          The connection's Availability Zone. This field is redundant because the specified subnet implies the Availability Zone to be used. Currently the field must be populated, but it will be deprecated in the future.

      • CreationTime (datetime) --

        The time that this connection definition was created.

      • LastUpdatedTime (datetime) --

        The last time that this connection definition was updated.

      • LastUpdatedBy (string) --

        The user, group, or role that last updated this connection definition.

GetConnections (updated) Link ¶
Changes (request, response)
Request
{'Filter': {'ConnectionType': {'KAFKA'}}}
Response
{'ConnectionList': {'ConnectionType': {'KAFKA'}}}

Retrieves a list of connection definitions from the Data Catalog.

See also: AWS API Documentation

Request Syntax

client.get_connections(
    CatalogId='string',
    Filter={
        'MatchCriteria': [
            'string',
        ],
        'ConnectionType': 'JDBC'|'SFTP'|'MONGODB'|'KAFKA'
    },
    HidePassword=True|False,
    NextToken='string',
    MaxResults=123
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog in which the connections reside. If none is provided, the AWS account ID is used by default.

type Filter

dict

param Filter

A filter that controls which connections are returned.

  • MatchCriteria (list) --

    A criteria string that must match the criteria recorded in the connection definition for that connection definition to be returned.

    • (string) --

  • ConnectionType (string) --

    The type of connections to return. Currently, only JDBC is supported; SFTP is not supported.

type HidePassword

boolean

param HidePassword

Allows you to retrieve the connection metadata without returning the password. For instance, the AWS Glue console uses this flag to retrieve the connection, and does not display the password. Set this parameter when the caller might not have permission to use the AWS KMS key to decrypt the password, but it does have permission to access the rest of the connection properties.

type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

type MaxResults

integer

param MaxResults

The maximum number of connections to return in one response.

rtype

dict

returns

Response Syntax

{
    'ConnectionList': [
        {
            'Name': 'string',
            'Description': 'string',
            'ConnectionType': 'JDBC'|'SFTP'|'MONGODB'|'KAFKA',
            'MatchCriteria': [
                'string',
            ],
            'ConnectionProperties': {
                'string': 'string'
            },
            'PhysicalConnectionRequirements': {
                'SubnetId': 'string',
                'SecurityGroupIdList': [
                    'string',
                ],
                'AvailabilityZone': 'string'
            },
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdatedTime': datetime(2015, 1, 1),
            'LastUpdatedBy': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • ConnectionList (list) --

      A list of requested connection definitions.

      • (dict) --

        Defines a connection to a data source.

        • Name (string) --

          The name of the connection definition.

        • Description (string) --

          The description of the connection.

        • ConnectionType (string) --

          The type of the connection. Currently, only JDBC is supported; SFTP is not supported.

        • MatchCriteria (list) --

          A list of criteria that can be used in selecting this connection.

          • (string) --

        • ConnectionProperties (dict) --

          These key-value pairs define parameters for the connection:

          • HOST - The host URI: either the fully qualified domain name (FQDN) or the IPv4 address of the database host.

          • PORT - The port number, between 1024 and 65535, of the port on which the database host is listening for database connections.

          • USER_NAME - The name under which to log in to the database. The value string for USER_NAME is "USERNAME ".

          • PASSWORD - A password, if one is used, for the user name.

          • ENCRYPTED_PASSWORD - When you enable connection password protection by setting ConnectionPasswordEncryption in the Data Catalog encryption settings, this field stores the encrypted password.

          • JDBC_DRIVER_JAR_URI - The Amazon Simple Storage Service (Amazon S3) path of the JAR file that contains the JDBC driver to use.

          • JDBC_DRIVER_CLASS_NAME - The class name of the JDBC driver to use.

          • JDBC_ENGINE - The name of the JDBC engine to use.

          • JDBC_ENGINE_VERSION - The version of the JDBC engine to use.

          • CONFIG_FILES - (Reserved for future use.)

          • INSTANCE_ID - The instance ID to use.

          • JDBC_CONNECTION_URL - The URL for connecting to a JDBC data source.

          • JDBC_ENFORCE_SSL - A Boolean string (true, false) specifying whether Secure Sockets Layer (SSL) with hostname matching is enforced for the JDBC connection on the client. The default is false.

          • CUSTOM_JDBC_CERT - An Amazon S3 location specifying the customer's root certificate. AWS Glue uses this root certificate to validate the customer’s certificate when connecting to the customer database. AWS Glue only handles X.509 certificates. The certificate provided must be DER-encoded and supplied in Base64 encoding PEM format.

          • SKIP_CUSTOM_JDBC_CERT_VALIDATION - By default, this is false . AWS Glue validates the Signature algorithm and Subject Public Key Algorithm for the customer certificate. The only permitted algorithms for the Signature algorithm are SHA256withRSA, SHA384withRSA or SHA512withRSA. For the Subject Public Key Algorithm, the key length must be at least 2048. You can set the value of this property to true to skip AWS Glue’s validation of the customer certificate.

          • CUSTOM_JDBC_CERT_STRING - A custom JDBC certificate string which is used for domain match or distinguished name match to prevent a man-in-the-middle attack. In Oracle database, this is used as the SSL_SERVER_CERT_DN ; in Microsoft SQL Server, this is used as the hostNameInCertificate .

          • CONNECTION_URL - The URL for connecting to a general (non-JDBC) data source.

          • KAFKA_BOOTSTRAP_SERVERS - A comma-separated list of host and port pairs that are the addresses of the Apache Kafka brokers in a Kafka cluster to which a Kafka client will connect to and bootstrap itself.

          • (string) --

            • (string) --

        • PhysicalConnectionRequirements (dict) --

          A map of physical connection requirements, such as virtual private cloud (VPC) and SecurityGroup , that are needed to make this connection successfully.

          • SubnetId (string) --

            The subnet ID used by the connection.

          • SecurityGroupIdList (list) --

            The security group ID list used by the connection.

            • (string) --

          • AvailabilityZone (string) --

            The connection's Availability Zone. This field is redundant because the specified subnet implies the Availability Zone to be used. Currently the field must be populated, but it will be deprecated in the future.

        • CreationTime (datetime) --

          The time that this connection definition was created.

        • LastUpdatedTime (datetime) --

          The last time that this connection definition was updated.

        • LastUpdatedBy (string) --

          The user, group, or role that last updated this connection definition.

    • NextToken (string) --

      A continuation token, if the list of connections returned does not include the last of the filtered connections.

UpdateConnection (updated) Link ¶
Changes (request)
{'ConnectionInput': {'ConnectionType': {'KAFKA'}}}

Updates a connection definition in the Data Catalog.

See also: AWS API Documentation

Request Syntax

client.update_connection(
    CatalogId='string',
    Name='string',
    ConnectionInput={
        'Name': 'string',
        'Description': 'string',
        'ConnectionType': 'JDBC'|'SFTP'|'MONGODB'|'KAFKA',
        'MatchCriteria': [
            'string',
        ],
        'ConnectionProperties': {
            'string': 'string'
        },
        'PhysicalConnectionRequirements': {
            'SubnetId': 'string',
            'SecurityGroupIdList': [
                'string',
            ],
            'AvailabilityZone': 'string'
        }
    }
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog in which the connection resides. If none is provided, the AWS account ID is used by default.

type Name

string

param Name

[REQUIRED]

The name of the connection definition to update.

type ConnectionInput

dict

param ConnectionInput

[REQUIRED]

A ConnectionInput object that redefines the connection in question.

  • Name (string) -- [REQUIRED]

    The name of the connection.

  • Description (string) --

    The description of the connection.

  • ConnectionType (string) -- [REQUIRED]

    The type of the connection. Currently, these types are supported:

    • JDBC - Designates a connection to a database through Java Database Connectivity (JDBC).

    • KAFKA - Designates a connection to an Apache Kafka streaming platform.

    • MONGODB - Designates a connection to a MongoDB document database.

    SFTP is not supported.

  • MatchCriteria (list) --

    A list of criteria that can be used in selecting this connection.

    • (string) --

  • ConnectionProperties (dict) -- [REQUIRED]

    These key-value pairs define parameters for the connection.

    • (string) --

      • (string) --

  • PhysicalConnectionRequirements (dict) --

    A map of physical connection requirements, such as virtual private cloud (VPC) and SecurityGroup , that are needed to successfully make this connection.

    • SubnetId (string) --

      The subnet ID used by the connection.

    • SecurityGroupIdList (list) --

      The security group ID list used by the connection.

      • (string) --

    • AvailabilityZone (string) --

      The connection's Availability Zone. This field is redundant because the specified subnet implies the Availability Zone to be used. Currently the field must be populated, but it will be deprecated in the future.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --