2020/11/19 - AWS Glue - 20 new 14 updated api methods
Changes Adding support for Glue Schema Registry. The AWS Glue Schema Registry is a new feature that allows you to centrally discover, control, and evolve data stream schemas.
Puts the metadata key value pair for a specified schema version ID. A maximum of 10 key value pairs will be allowed per schema version. They can be added over one or more calls.
See also: AWS API Documentation
Request Syntax
client.put_schema_version_metadata( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, SchemaVersionNumber={ 'LatestVersion': True|False, 'VersionNumber': 123 }, SchemaVersionId='string', MetadataKeyValue={ 'MetadataKey': 'string', 'MetadataValue': 'string' } )
dict
The unique ID for the schema.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
dict
The version number of the schema.
LatestVersion (boolean) --
VersionNumber (integer) --
string
The unique version ID of the schema version.
dict
[REQUIRED]
The metadata key's corresponding value.
MetadataKey (string) --
A metadata key.
MetadataValue (string) --
A metadata key’s corresponding value.
dict
Response Syntax
{ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string', 'LatestVersion': True|False, 'VersionNumber': 123, 'SchemaVersionId': 'string', 'MetadataKey': 'string', 'MetadataValue': 'string' }
Response Structure
(dict) --
SchemaArn (string) --
The Amazon Resource Name (ARN) for the schema.
SchemaName (string) --
The name for the schema.
RegistryName (string) --
The name for the registry.
LatestVersion (boolean) --
The latest version of the schema.
VersionNumber (integer) --
The version number of the schema.
SchemaVersionId (string) --
The unique version ID of the schema version.
MetadataKey (string) --
The metadata key.
MetadataValue (string) --
The value of the metadata key.
Remove versions from the specified schema. A version number or range may be supplied. If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned. Calling the GetSchemaVersions API after this call will list the status of the deleted versions.
When the range of version numbers contain check pointed version, the API will return a 409 conflict and will not proceed with the deletion. You have to remove the checkpoint first using the DeleteSchemaCheckpoint API before using this API.
You cannot use the DeleteSchemaVersions API to delete the first schema version in the schema set. The first schema version can only be deleted by the DeleteSchema API. This operation will also delete the attached SchemaVersionMetadata under the schema versions. Hard deletes will be enforced on the database.
If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned.
See also: AWS API Documentation
Request Syntax
client.delete_schema_versions( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, Versions='string' )
dict
[REQUIRED]
This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
string
[REQUIRED]
A version range may be supplied which may be of the format:
a single version number, 5
a range, 5-8 : deletes versions 5, 6, 7, 8
dict
Response Syntax
{ 'SchemaVersionErrors': [ { 'VersionNumber': 123, 'ErrorDetails': { 'ErrorCode': 'string', 'ErrorMessage': 'string' } }, ] }
Response Structure
(dict) --
SchemaVersionErrors (list) --
A list of SchemaVersionErrorItem objects, each containing an error and schema version.
(dict) --
An object that contains the error details for an operation on a schema version.
VersionNumber (integer) --
The version number of the schema.
ErrorDetails (dict) --
The details of the error for the schema version.
ErrorCode (string) --
The error code for an error.
ErrorMessage (string) --
The error message for an error.
Deletes the entire schema set, including the schema set and all of its versions. To get the status of the delete operation, you can call GetSchema API after the asynchronous call. Deleting a registry will disable all online operations for the schema, such as the GetSchemaByDefinition , and RegisterSchemaVersion APIs.
See also: AWS API Documentation
Request Syntax
client.delete_schema( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' } )
dict
[REQUIRED]
This is a wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
dict
Response Syntax
{ 'SchemaArn': 'string', 'SchemaName': 'string', 'Status': 'AVAILABLE'|'PENDING'|'DELETING' }
Response Structure
(dict) --
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema being deleted.
SchemaName (string) --
The name of the schema being deleted.
Status (string) --
The status of the schema.
Queries for the schema version metadata information.
See also: AWS API Documentation
Request Syntax
client.query_schema_version_metadata( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, SchemaVersionNumber={ 'LatestVersion': True|False, 'VersionNumber': 123 }, SchemaVersionId='string', MetadataList=[ { 'MetadataKey': 'string', 'MetadataValue': 'string' }, ], MaxResults=123, NextToken='string' )
dict
A wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
dict
The version number of the schema.
LatestVersion (boolean) --
VersionNumber (integer) --
string
The unique version ID of the schema version.
list
Search key-value pairs for metadata, if they are not provided all the metadata information will be fetched.
(dict) --
A structure containing a key value pair for metadata.
MetadataKey (string) --
A metadata key.
MetadataValue (string) --
A metadata key’s corresponding value.
integer
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
string
A continuation token, if this is a continuation call.
dict
Response Syntax
{ 'MetadataInfoMap': { 'string': { 'MetadataValue': 'string', 'CreatedTime': 'string' } }, 'SchemaVersionId': 'string', 'NextToken': 'string' }
Response Structure
(dict) --
MetadataInfoMap (dict) --
A map of a metadata key and associated values.
(string) --
(dict) --
A structure containing metadata information for a schema version.
MetadataValue (string) --
The metadata key’s corresponding value.
CreatedTime (string) --
The time at which the entry was created.
SchemaVersionId (string) --
The unique version ID of the schema version.
NextToken (string) --
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
Describes the specified registry in detail.
See also: AWS API Documentation
Request Syntax
client.get_registry( RegistryId={ 'RegistryName': 'string', 'RegistryArn': 'string' } )
dict
[REQUIRED]
This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
RegistryName (string) --
Name of the registry. Used only for lookup. One of RegistryArn or RegistryName has to be provided.
RegistryArn (string) --
Arn of the registry to be updated. One of RegistryArn or RegistryName has to be provided.
dict
Response Syntax
{ 'RegistryName': 'string', 'RegistryArn': 'string', 'Description': 'string', 'Status': 'AVAILABLE'|'DELETING', 'CreatedTime': 'string', 'UpdatedTime': 'string' }
Response Structure
(dict) --
RegistryName (string) --
The name of the registry.
RegistryArn (string) --
The Amazon Resource Name (ARN) of the registry.
Description (string) --
A description of the registry.
Status (string) --
The status of the registry.
CreatedTime (string) --
The date and time the registry was created.
UpdatedTime (string) --
The date and time the registry was updated.
Updates an existing registry which is used to hold a collection of schemas. The updated properties relate to the registry, and do not modify any of the schemas within the registry.
See also: AWS API Documentation
Request Syntax
client.update_registry( RegistryId={ 'RegistryName': 'string', 'RegistryArn': 'string' }, Description='string' )
dict
[REQUIRED]
This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
RegistryName (string) --
Name of the registry. Used only for lookup. One of RegistryArn or RegistryName has to be provided.
RegistryArn (string) --
Arn of the registry to be updated. One of RegistryArn or RegistryName has to be provided.
string
[REQUIRED]
A description of the registry. If description is not provided, this field will not be updated.
dict
Response Syntax
{ 'RegistryName': 'string', 'RegistryArn': 'string' }
Response Structure
(dict) --
RegistryName (string) --
The name of the updated registry.
RegistryArn (string) --
The Amazon Resource name (ARN) of the updated registry.
Creates a new registry which may be used to hold a collection of schemas.
See also: AWS API Documentation
Request Syntax
client.create_registry( RegistryName='string', Description='string', Tags={ 'string': 'string' } )
string
[REQUIRED]
Name of the registry to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.
string
A description of the registry. If description is not provided, there will not be any default value for this.
dict
AWS tags that contain a key value pair and may be searched by console, command line, or API.
(string) --
(string) --
dict
Response Syntax
{ 'RegistryArn': 'string', 'RegistryName': 'string', 'Description': 'string', 'Tags': { 'string': 'string' } }
Response Structure
(dict) --
RegistryArn (string) --
The Amazon Resource Name (ARN) of the newly created registry.
RegistryName (string) --
The name of the registry.
Description (string) --
A description of the registry.
Tags (dict) --
The tags for the registry.
(string) --
(string) --
Validates the supplied schema. This call has no side effects, it simply validates using the supplied schema using DataFormat as the format. Since it does not take a schema set name, no compatibility checks are performed.
See also: AWS API Documentation
Request Syntax
client.check_schema_version_validity( DataFormat='AVRO', SchemaDefinition='string' )
string
[REQUIRED]
The data format of the schema definition. Currently only AVRO is supported.
string
[REQUIRED]
The definition of the schema that has to be validated.
dict
Response Syntax
{ 'Valid': True|False, 'Error': 'string' }
Response Structure
(dict) --
Valid (boolean) --
Return true, if the schema is valid and false otherwise.
Error (string) --
A validation failure error message.
Returns a list of schema versions that you have created, with minimal information. Schema versions in Deleted status will not be included in the results. Empty results will be returned if there are no schema versions available.
See also: AWS API Documentation
Request Syntax
client.list_schema_versions( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, MaxResults=123, NextToken='string' )
dict
[REQUIRED]
This is a wrapper structure to contain schema identity fields. The structure contains:
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
integer
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
string
A continuation token, if this is a continuation call.
dict
Response Syntax
{ 'Schemas': [ { 'SchemaArn': 'string', 'SchemaVersionId': 'string', 'VersionNumber': 123, 'Status': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING', 'CreatedTime': 'string' }, ], 'NextToken': 'string' }
Response Structure
(dict) --
Schemas (list) --
An array of SchemaVersionList objects containing details of each schema version.
(dict) --
An object containing the details about a schema version.
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema.
SchemaVersionId (string) --
The unique identifier of the schema version.
VersionNumber (integer) --
The version number of the schema.
Status (string) --
The status of the schema version.
CreatedTime (string) --
The date and time the schema version was created.
NextToken (string) --
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
Removes a key value pair from the schema version metadata for the specified schema version ID.
See also: AWS API Documentation
Request Syntax
client.remove_schema_version_metadata( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, SchemaVersionNumber={ 'LatestVersion': True|False, 'VersionNumber': 123 }, SchemaVersionId='string', MetadataKeyValue={ 'MetadataKey': 'string', 'MetadataValue': 'string' } )
dict
A wrapper structure that may contain the schema name and Amazon Resource Name (ARN).
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
dict
The version number of the schema.
LatestVersion (boolean) --
VersionNumber (integer) --
string
The unique version ID of the schema version.
dict
[REQUIRED]
The value of the metadata key.
MetadataKey (string) --
A metadata key.
MetadataValue (string) --
A metadata key’s corresponding value.
dict
Response Syntax
{ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string', 'LatestVersion': True|False, 'VersionNumber': 123, 'SchemaVersionId': 'string', 'MetadataKey': 'string', 'MetadataValue': 'string' }
Response Structure
(dict) --
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema.
SchemaName (string) --
The name of the schema.
RegistryName (string) --
The name of the registry.
LatestVersion (boolean) --
The latest version of the schema.
VersionNumber (integer) --
The version number of the schema.
SchemaVersionId (string) --
The version ID for the schema version.
MetadataKey (string) --
The metadata key.
MetadataValue (string) --
The value of the metadata key.
Get the specified schema by its unique ID assigned when a version of the schema is created or registered. Schema versions in Deleted status will not be included in the results.
See also: AWS API Documentation
Request Syntax
client.get_schema_version( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, SchemaVersionId='string', SchemaVersionNumber={ 'LatestVersion': True|False, 'VersionNumber': 123 } )
dict
This is a wrapper structure to contain schema identity fields. The structure contains:
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
string
The SchemaVersionId of the schema version. This field is required for fetching by schema ID. Either this or the SchemaId wrapper has to be provided.
dict
The version number of the schema.
LatestVersion (boolean) --
VersionNumber (integer) --
dict
Response Syntax
{ 'SchemaVersionId': 'string', 'SchemaDefinition': 'string', 'DataFormat': 'AVRO', 'SchemaArn': 'string', 'VersionNumber': 123, 'Status': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING', 'CreatedTime': 'string' }
Response Structure
(dict) --
SchemaVersionId (string) --
The SchemaVersionId of the schema version.
SchemaDefinition (string) --
The schema definition for the schema ID.
DataFormat (string) --
The data format of the schema definition. Currently only AVRO is supported.
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema.
VersionNumber (integer) --
The version number of the schema.
Status (string) --
The status of the schema version.
CreatedTime (string) --
The date and time the schema version was created.
Returns a list of registries that you have created, with minimal registry information. Registries in the Deleting status will not be included in the results. Empty results will be returned if there are no registries available.
See also: AWS API Documentation
Request Syntax
client.list_registries( MaxResults=123, NextToken='string' )
integer
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
string
A continuation token, if this is a continuation call.
dict
Response Syntax
{ 'Registries': [ { 'RegistryName': 'string', 'RegistryArn': 'string', 'Description': 'string', 'Status': 'AVAILABLE'|'DELETING', 'CreatedTime': 'string', 'UpdatedTime': 'string' }, ], 'NextToken': 'string' }
Response Structure
(dict) --
Registries (list) --
An array of RegistryDetailedListItem objects containing minimal details of each registry.
(dict) --
A structure containing the details for a registry.
RegistryName (string) --
The name of the registry.
RegistryArn (string) --
The Amazon Resource Name (ARN) of the registry.
Description (string) --
A description of the registry.
Status (string) --
The status of the registry.
CreatedTime (string) --
The data the registry was created.
UpdatedTime (string) --
The date the registry was updated.
NextToken (string) --
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
Retrieves a schema by the SchemaDefinition . The schema definition is sent to the Schema Registry, canonicalized, and hashed. If the hash is matched within the scope of the SchemaName or ARN (or the default registry, if none is supplied), that schema’s metadata is returned. Otherwise, a 404 or NotFound error is returned. Schema versions in Deleted statuses will not be included in the results.
See also: AWS API Documentation
Request Syntax
client.get_schema_by_definition( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, SchemaDefinition='string' )
dict
[REQUIRED]
This is a wrapper structure to contain schema identity fields. The structure contains:
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.
SchemaId$SchemaName: The name of the schema. One of SchemaArn or SchemaName has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
string
[REQUIRED]
The definition of the schema for which schema details are required.
dict
Response Syntax
{ 'SchemaVersionId': 'string', 'SchemaArn': 'string', 'DataFormat': 'AVRO', 'Status': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING', 'CreatedTime': 'string' }
Response Structure
(dict) --
SchemaVersionId (string) --
The schema ID of the schema version.
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema.
DataFormat (string) --
The data format of the schema definition. Currently only AVRO is supported.
Status (string) --
The status of the schema version.
CreatedTime (string) --
The date and time the schema was created.
Delete the entire registry including schema and all of its versions. To get the status of the delete operation, you can call the GetRegistry API after the asynchronous call. Deleting a registry will disable all online operations for the registry such as the UpdateRegistry , CreateSchema , UpdateSchema , and RegisterSchemaVersion APIs.
See also: AWS API Documentation
Request Syntax
client.delete_registry( RegistryId={ 'RegistryName': 'string', 'RegistryArn': 'string' } )
dict
[REQUIRED]
This is a wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
RegistryName (string) --
Name of the registry. Used only for lookup. One of RegistryArn or RegistryName has to be provided.
RegistryArn (string) --
Arn of the registry to be updated. One of RegistryArn or RegistryName has to be provided.
dict
Response Syntax
{ 'RegistryName': 'string', 'RegistryArn': 'string', 'Status': 'AVAILABLE'|'DELETING' }
Response Structure
(dict) --
RegistryName (string) --
The name of the registry being deleted.
RegistryArn (string) --
The Amazon Resource Name (ARN) of the registry being deleted.
Status (string) --
The status of the registry. A successful operation will return the Deleting status.
Adds a new version to the existing schema. Returns an error if new version of schema does not meet the compatibility requirements of the schema set. This API will not create a new schema set and will return a 404 error if the schema set is not already present in the Schema Registry.
If this is the first schema definition to be registered in the Schema Registry, this API will store the schema version and return immediately. Otherwise, this call has the potential to run longer than other operations due to compatibility modes. You can call the GetSchemaVersion API with the SchemaVersionId to check compatibility modes.
If the same schema definition is already stored in Schema Registry as a version, the schema ID of the existing schema is returned to the caller.
See also: AWS API Documentation
Request Syntax
client.register_schema_version( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, SchemaDefinition='string' )
dict
[REQUIRED]
This is a wrapper structure to contain schema identity fields. The structure contains:
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
string
[REQUIRED]
The schema definition using the DataFormat setting for the SchemaName .
dict
Response Syntax
{ 'SchemaVersionId': 'string', 'VersionNumber': 123, 'Status': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING' }
Response Structure
(dict) --
SchemaVersionId (string) --
The unique ID that represents the version of this schema.
VersionNumber (integer) --
The version of this schema (for sync flow only, in case this is the first version).
Status (string) --
The status of the schema version.
Describes the specified schema in detail.
See also: AWS API Documentation
Request Syntax
client.get_schema( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' } )
dict
[REQUIRED]
This is a wrapper structure to contain schema identity fields. The structure contains:
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
dict
Response Syntax
{ 'RegistryName': 'string', 'RegistryArn': 'string', 'SchemaName': 'string', 'SchemaArn': 'string', 'Description': 'string', 'DataFormat': 'AVRO', 'Compatibility': 'NONE'|'DISABLED'|'BACKWARD'|'BACKWARD_ALL'|'FORWARD'|'FORWARD_ALL'|'FULL'|'FULL_ALL', 'SchemaCheckpoint': 123, 'LatestSchemaVersion': 123, 'NextSchemaVersion': 123, 'SchemaStatus': 'AVAILABLE'|'PENDING'|'DELETING', 'CreatedTime': 'string', 'UpdatedTime': 'string' }
Response Structure
(dict) --
RegistryName (string) --
The name of the registry.
RegistryArn (string) --
The Amazon Resource Name (ARN) of the registry.
SchemaName (string) --
The name of the schema.
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema.
Description (string) --
A description of schema if specified when created
DataFormat (string) --
The data format of the schema definition. Currently only AVRO is supported.
Compatibility (string) --
The compatibility mode of the schema.
SchemaCheckpoint (integer) --
The version number of the checkpoint (the last time the compatibility mode was changed).
LatestSchemaVersion (integer) --
The latest version of the schema associated with the returned schema definition.
NextSchemaVersion (integer) --
The next version of the schema associated with the returned schema definition.
SchemaStatus (string) --
The status of the schema.
CreatedTime (string) --
The date and time the schema was created.
UpdatedTime (string) --
The date and time the schema was updated.
Updates the description, compatibility setting, or version checkpoint for a schema set.
For updating the compatibility setting, the call will not validate compatibility for the entire set of schema versions with the new compatibility setting. If the value for Compatibility is provided, the VersionNumber (a checkpoint) is also required. The API will validate the checkpoint version number for consistency.
If the value for the VersionNumber (checkpoint) is provided, Compatibility is optional and this can be used to set/reset a checkpoint for the schema.
This update will happen only if the schema is in the AVAILABLE state.
See also: AWS API Documentation
Request Syntax
client.update_schema( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, SchemaVersionNumber={ 'LatestVersion': True|False, 'VersionNumber': 123 }, Compatibility='NONE'|'DISABLED'|'BACKWARD'|'BACKWARD_ALL'|'FORWARD'|'FORWARD_ALL'|'FULL'|'FULL_ALL', Description='string' )
dict
[REQUIRED]
This is a wrapper structure to contain schema identity fields. The structure contains:
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.
SchemaId$SchemaName: The name of the schema. One of SchemaArn or SchemaName has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
dict
Version number required for check pointing. One of VersionNumber or Compatibility has to be provided.
LatestVersion (boolean) --
VersionNumber (integer) --
string
The new compatibility setting for the schema.
string
The new description for the schema.
dict
Response Syntax
{ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }
Response Structure
(dict) --
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema.
SchemaName (string) --
The name of the schema.
RegistryName (string) --
The name of the registry that contains the schema.
Returns a list of schemas with minimal details. Schemas in Deleting status will not be included in the results. Empty results will be returned if there are no schemas available.
When the RegistryId is not provided, all the schemas across registries will be part of the API response.
See also: AWS API Documentation
Request Syntax
client.list_schemas( RegistryId={ 'RegistryName': 'string', 'RegistryArn': 'string' }, MaxResults=123, NextToken='string' )
dict
A wrapper structure that may contain the registry name and Amazon Resource Name (ARN).
RegistryName (string) --
Name of the registry. Used only for lookup. One of RegistryArn or RegistryName has to be provided.
RegistryArn (string) --
Arn of the registry to be updated. One of RegistryArn or RegistryName has to be provided.
integer
Maximum number of results required per page. If the value is not supplied, this will be defaulted to 25 per page.
string
A continuation token, if this is a continuation call.
dict
Response Syntax
{ 'Schemas': [ { 'RegistryName': 'string', 'SchemaName': 'string', 'SchemaArn': 'string', 'Description': 'string', 'SchemaStatus': 'AVAILABLE'|'PENDING'|'DELETING', 'CreatedTime': 'string', 'UpdatedTime': 'string' }, ], 'NextToken': 'string' }
Response Structure
(dict) --
Schemas (list) --
An array of SchemaListItem objects containing details of each schema.
(dict) --
An object that contains minimal details for a schema.
RegistryName (string) --
the name of the registry where the schema resides.
SchemaName (string) --
The name of the schema.
SchemaArn (string) --
The Amazon Resource Name (ARN) for the schema.
Description (string) --
A description for the schema.
SchemaStatus (string) --
The status of the schema.
CreatedTime (string) --
The date and time that a schema was created.
UpdatedTime (string) --
The date and time that a schema was updated.
NextToken (string) --
A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.
Creates a new schema set and registers the schema definition. Returns an error if the schema set already exists without actually registering the version.
When the schema set is created, a version checkpoint will be set to the first version. Compatibility mode "DISABLED" restricts any additional schema versions from being added after the first schema version. For all other compatibility modes, validation of compatibility settings will be applied only from the second version onwards when the RegisterSchemaVersion API is used.
When this API is called without a RegistryId , this will create an entry for a "default-registry" in the registry database tables, if it is not already present.
See also: AWS API Documentation
Request Syntax
client.create_schema( RegistryId={ 'RegistryName': 'string', 'RegistryArn': 'string' }, SchemaName='string', DataFormat='AVRO', Compatibility='NONE'|'DISABLED'|'BACKWARD'|'BACKWARD_ALL'|'FORWARD'|'FORWARD_ALL'|'FULL'|'FULL_ALL', Description='string', Tags={ 'string': 'string' }, SchemaDefinition='string' )
dict
This is a wrapper shape to contain the registry identity fields. If this is not provided, the default registry will be used. The ARN format for the same will be: arn:aws:glue:us-east-2:<customer id>:registry/default-registry:random-5-letter-id .
RegistryName (string) --
Name of the registry. Used only for lookup. One of RegistryArn or RegistryName has to be provided.
RegistryArn (string) --
Arn of the registry to be updated. One of RegistryArn or RegistryName has to be provided.
string
[REQUIRED]
Name of the schema to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.
string
[REQUIRED]
The data format of the schema definition. Currently only AVRO is supported.
string
The compatibility mode of the schema. The possible values are:
NONE : No compatibility mode applies. You can use this choice in development scenarios or if you do not know the compatibility mode that you want to apply to schemas. Any new version added will be accepted without undergoing a compatibility check.
DISABLED : This compatibility choice prevents versioning for a particular schema. You can use this choice to prevent future versioning of a schema.
BACKWARD : This compatibility choice is recommended as it allows data receivers to read both the current and one previous schema version. This means that for instance, a new schema version cannot drop data fields or change the type of these fields, so they can't be read by readers using the previous version.
BACKWARD_ALL : This compatibility choice allows data receivers to read both the current and all previous schema versions. You can use this choice when you need to delete fields or add optional fields, and check compatibility against all previous schema versions.
FORWARD : This compatibility choice allows data receivers to read both the current and one next schema version, but not necessarily later versions. You can use this choice when you need to add fields or delete optional fields, but only check compatibility against the last schema version.
FORWARD_ALL : This compatibility choice allows data receivers to read written by producers of any new registered schema. You can use this choice when you need to add fields or delete optional fields, and check compatibility against all previous schema versions.
FULL : This compatibility choice allows data receivers to read data written by producers using the previous or next version of the schema, but not necessarily earlier or later versions. You can use this choice when you need to add or remove optional fields, but only check compatibility against the last schema version.
FULL_ALL : This compatibility choice allows data receivers to read data written by producers using all previous schema versions. You can use this choice when you need to add or remove optional fields, and check compatibility against all previous schema versions.
string
An optional description of the schema. If description is not provided, there will not be any automatic default value for this.
dict
AWS tags that contain a key value pair and may be searched by console, command line, or API. If specified, follows the AWS tags-on-create pattern.
(string) --
(string) --
string
The schema definition using the DataFormat setting for SchemaName .
dict
Response Syntax
{ 'RegistryName': 'string', 'RegistryArn': 'string', 'SchemaName': 'string', 'SchemaArn': 'string', 'Description': 'string', 'DataFormat': 'AVRO', 'Compatibility': 'NONE'|'DISABLED'|'BACKWARD'|'BACKWARD_ALL'|'FORWARD'|'FORWARD_ALL'|'FULL'|'FULL_ALL', 'SchemaCheckpoint': 123, 'LatestSchemaVersion': 123, 'NextSchemaVersion': 123, 'SchemaStatus': 'AVAILABLE'|'PENDING'|'DELETING', 'Tags': { 'string': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionStatus': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING' }
Response Structure
(dict) --
RegistryName (string) --
The name of the registry.
RegistryArn (string) --
The Amazon Resource Name (ARN) of the registry.
SchemaName (string) --
The name of the schema.
SchemaArn (string) --
The Amazon Resource Name (ARN) of the schema.
Description (string) --
A description of the schema if specified when created.
DataFormat (string) --
The data format of the schema definition. Currently only AVRO is supported.
Compatibility (string) --
The schema compatibility mode.
SchemaCheckpoint (integer) --
The version number of the checkpoint (the last time the compatibility mode was changed).
LatestSchemaVersion (integer) --
The latest version of the schema associated with the returned schema definition.
NextSchemaVersion (integer) --
The next version of the schema associated with the returned schema definition.
SchemaStatus (string) --
The status of the schema.
Tags (dict) --
The tags for the schema.
(string) --
(string) --
SchemaVersionId (string) --
The unique identifier of the first schema version.
SchemaVersionStatus (string) --
The status of the first schema version created.
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
This API allows you to compare two schema versions between two schema definitions under the same schema.
See also: AWS API Documentation
Request Syntax
client.get_schema_versions_diff( SchemaId={ 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, FirstSchemaVersionNumber={ 'LatestVersion': True|False, 'VersionNumber': 123 }, SecondSchemaVersionNumber={ 'LatestVersion': True|False, 'VersionNumber': 123 }, SchemaDiffType='SYNTAX_DIFF' )
dict
[REQUIRED]
This is a wrapper structure to contain schema identity fields. The structure contains:
SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.
SchemaId$SchemaName: The name of the schema. One of SchemaArn or SchemaName has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
dict
[REQUIRED]
The first of the two schema versions to be compared.
LatestVersion (boolean) --
VersionNumber (integer) --
dict
[REQUIRED]
The second of the two schema versions to be compared.
LatestVersion (boolean) --
VersionNumber (integer) --
string
[REQUIRED]
Refers to SYNTAX_DIFF , which is the currently supported diff type.
dict
Response Syntax
{ 'Diff': 'string' }
Response Structure
(dict) --
Diff (string) --
The difference between schemas as a string in JsonPatch format.
{'PartitionInputList': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Creates one or more partitions in a batch operation.
See also: AWS API Documentation
Request Syntax
client.batch_create_partition( CatalogId='string', DatabaseName='string', TableName='string', PartitionInputList=[ { 'Values': [ 'string', ], 'LastAccessTime': datetime(2015, 1, 1), 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'Parameters': { 'string': 'string' }, 'LastAnalyzedTime': datetime(2015, 1, 1) }, ] )
string
The ID of the catalog in which the partition is to be created. Currently, this should be the AWS account ID.
string
[REQUIRED]
The name of the metadata database in which the partition is to be created.
string
[REQUIRED]
The name of the metadata table in which the partition is to be created.
list
[REQUIRED]
A list of PartitionInput structures that define the partitions to be created.
(dict) --
The structure used to create and update a partition.
Values (list) --
The values of the partition. Although this parameter is not required by the SDK, you must specify this parameter for a valid input.
The values for the keys for the new partition must be passed as an array of String objects that must be ordered in the same order as the partition keys appearing in the Amazon S3 prefix. Otherwise AWS Glue will add the values to the wrong keys.
(string) --
LastAccessTime (datetime) --
The last time at which the partition was accessed.
StorageDescriptor (dict) --
Provides information about the physical location where the partition is stored.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) -- [REQUIRED]
The name of the column.
SortOrder (integer) -- [REQUIRED]
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
Parameters (dict) --
These key-value pairs define partition parameters.
(string) --
(string) --
LastAnalyzedTime (datetime) --
The last time at which column statistics were computed for this partition.
dict
Response Syntax
{ 'Errors': [ { 'PartitionValues': [ 'string', ], 'ErrorDetail': { 'ErrorCode': 'string', 'ErrorMessage': 'string' } }, ] }
Response Structure
(dict) --
Errors (list) --
The errors encountered when trying to create the requested partitions.
(dict) --
Contains information about a partition error.
PartitionValues (list) --
The values that define the partition.
(string) --
ErrorDetail (dict) --
The details about the partition error.
ErrorCode (string) --
The code associated with this error.
ErrorMessage (string) --
A message describing the error.
{'Partitions': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Retrieves partitions in a batch request.
See also: AWS API Documentation
Request Syntax
client.batch_get_partition( CatalogId='string', DatabaseName='string', TableName='string', PartitionsToGet=[ { 'Values': [ 'string', ] }, ] )
string
The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.
string
[REQUIRED]
The name of the catalog database where the partitions reside.
string
[REQUIRED]
The name of the partitions' table.
list
[REQUIRED]
A list of partition values identifying the partitions to retrieve.
(dict) --
Contains a list of values defining partitions.
Values (list) -- [REQUIRED]
The list of values.
(string) --
dict
Response Syntax
{ 'Partitions': [ { 'Values': [ 'string', ], 'DatabaseName': 'string', 'TableName': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'Parameters': { 'string': 'string' }, 'LastAnalyzedTime': datetime(2015, 1, 1), 'CatalogId': 'string' }, ], 'UnprocessedKeys': [ { 'Values': [ 'string', ] }, ] }
Response Structure
(dict) --
Partitions (list) --
A list of the requested partitions.
(dict) --
Represents a slice of table data.
Values (list) --
The values of the partition.
(string) --
DatabaseName (string) --
The name of the catalog database in which to create the partition.
TableName (string) --
The name of the database table in which to create the partition.
CreationTime (datetime) --
The time at which the partition was created.
LastAccessTime (datetime) --
The last time at which the partition was accessed.
StorageDescriptor (dict) --
Provides information about the physical location where the partition is stored.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
Parameters (dict) --
These key-value pairs define partition parameters.
(string) --
(string) --
LastAnalyzedTime (datetime) --
The last time at which column statistics were computed for this partition.
CatalogId (string) --
The ID of the Data Catalog in which the partition resides.
UnprocessedKeys (list) --
A list of the partition values in the request for which partitions were not returned.
(dict) --
Contains a list of values defining partitions.
Values (list) --
The list of values.
(string) --
{'Entries': {'PartitionInput': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}}
Updates one or more partitions in a batch operation.
See also: AWS API Documentation
Request Syntax
client.batch_update_partition( CatalogId='string', DatabaseName='string', TableName='string', Entries=[ { 'PartitionValueList': [ 'string', ], 'PartitionInput': { 'Values': [ 'string', ], 'LastAccessTime': datetime(2015, 1, 1), 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'Parameters': { 'string': 'string' }, 'LastAnalyzedTime': datetime(2015, 1, 1) } }, ] )
string
The ID of the catalog in which the partition is to be updated. Currently, this should be the AWS account ID.
string
[REQUIRED]
The name of the metadata database in which the partition is to be updated.
string
[REQUIRED]
The name of the metadata table in which the partition is to be updated.
list
[REQUIRED]
A list of up to 100 BatchUpdatePartitionRequestEntry objects to update.
(dict) --
A structure that contains the values and structure used to update a partition.
PartitionValueList (list) -- [REQUIRED]
A list of values defining the partitions.
(string) --
PartitionInput (dict) -- [REQUIRED]
The structure used to update a partition.
Values (list) --
The values of the partition. Although this parameter is not required by the SDK, you must specify this parameter for a valid input.
The values for the keys for the new partition must be passed as an array of String objects that must be ordered in the same order as the partition keys appearing in the Amazon S3 prefix. Otherwise AWS Glue will add the values to the wrong keys.
(string) --
LastAccessTime (datetime) --
The last time at which the partition was accessed.
StorageDescriptor (dict) --
Provides information about the physical location where the partition is stored.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) -- [REQUIRED]
The name of the column.
SortOrder (integer) -- [REQUIRED]
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
Parameters (dict) --
These key-value pairs define partition parameters.
(string) --
(string) --
LastAnalyzedTime (datetime) --
The last time at which column statistics were computed for this partition.
dict
Response Syntax
{ 'Errors': [ { 'PartitionValueList': [ 'string', ], 'ErrorDetail': { 'ErrorCode': 'string', 'ErrorMessage': 'string' } }, ] }
Response Structure
(dict) --
Errors (list) --
The errors encountered when trying to update the requested partitions. A list of BatchUpdatePartitionFailureEntry objects.
(dict) --
Contains information about a batch update partition error.
PartitionValueList (list) --
A list of values defining the partitions.
(string) --
ErrorDetail (dict) --
The details about the batch update partition error.
ErrorCode (string) --
The code associated with this error.
ErrorMessage (string) --
A message describing the error.
{'PartitionInput': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Creates a new partition.
See also: AWS API Documentation
Request Syntax
client.create_partition( CatalogId='string', DatabaseName='string', TableName='string', PartitionInput={ 'Values': [ 'string', ], 'LastAccessTime': datetime(2015, 1, 1), 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'Parameters': { 'string': 'string' }, 'LastAnalyzedTime': datetime(2015, 1, 1) } )
string
The AWS account ID of the catalog in which the partition is to be created.
string
[REQUIRED]
The name of the metadata database in which the partition is to be created.
string
[REQUIRED]
The name of the metadata table in which the partition is to be created.
dict
[REQUIRED]
A PartitionInput structure defining the partition to be created.
Values (list) --
The values of the partition. Although this parameter is not required by the SDK, you must specify this parameter for a valid input.
The values for the keys for the new partition must be passed as an array of String objects that must be ordered in the same order as the partition keys appearing in the Amazon S3 prefix. Otherwise AWS Glue will add the values to the wrong keys.
(string) --
LastAccessTime (datetime) --
The last time at which the partition was accessed.
StorageDescriptor (dict) --
Provides information about the physical location where the partition is stored.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) -- [REQUIRED]
The name of the column.
SortOrder (integer) -- [REQUIRED]
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
Parameters (dict) --
These key-value pairs define partition parameters.
(string) --
(string) --
LastAnalyzedTime (datetime) --
The last time at which column statistics were computed for this partition.
dict
Response Syntax
{}
Response Structure
(dict) --
{'TableInput': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Creates a new table definition in the Data Catalog.
See also: AWS API Documentation
Request Syntax
client.create_table( CatalogId='string', DatabaseName='string', TableInput={ 'Name': 'string', 'Description': 'string', 'Owner': 'string', 'LastAccessTime': datetime(2015, 1, 1), 'LastAnalyzedTime': datetime(2015, 1, 1), 'Retention': 123, 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'PartitionKeys': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'ViewOriginalText': 'string', 'ViewExpandedText': 'string', 'TableType': 'string', 'Parameters': { 'string': 'string' }, 'TargetTable': { 'CatalogId': 'string', 'DatabaseName': 'string', 'Name': 'string' } }, PartitionIndexes=[ { 'Keys': [ 'string', ], 'IndexName': 'string' }, ] )
string
The ID of the Data Catalog in which to create the Table . If none is supplied, the AWS account ID is used by default.
string
[REQUIRED]
The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.
dict
[REQUIRED]
The TableInput object that defines the metadata table to create in the catalog.
Name (string) -- [REQUIRED]
The table name. For Hive compatibility, this is folded to lowercase when it is stored.
Description (string) --
A description of the table.
Owner (string) --
The table owner.
LastAccessTime (datetime) --
The last time that the table was accessed.
LastAnalyzedTime (datetime) --
The last time that column statistics were computed for this table.
Retention (integer) --
The retention time for this table.
StorageDescriptor (dict) --
A storage descriptor containing information about the physical storage of this table.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) -- [REQUIRED]
The name of the column.
SortOrder (integer) -- [REQUIRED]
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
PartitionKeys (list) --
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:
"PartitionKeys": []
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
ViewOriginalText (string) --
If the table is a view, the original text of the view; otherwise null .
ViewExpandedText (string) --
If the table is a view, the expanded text of the view; otherwise null .
TableType (string) --
The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).
Parameters (dict) --
These key-value pairs define properties associated with the table.
(string) --
(string) --
TargetTable (dict) --
A TableIdentifier structure that describes a target table for resource linking.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
DatabaseName (string) --
The name of the catalog database that contains the target table.
Name (string) --
The name of the target table.
list
A list of partition indexes, PartitionIndex structures, to create in the table.
(dict) --
A structure for a partition index.
Keys (list) -- [REQUIRED]
The keys for the partition index.
(string) --
IndexName (string) -- [REQUIRED]
The name of the partition index.
dict
Response Syntax
{}
Response Structure
(dict) --
{'Partition': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Retrieves information about a specified partition.
See also: AWS API Documentation
Request Syntax
client.get_partition( CatalogId='string', DatabaseName='string', TableName='string', PartitionValues=[ 'string', ] )
string
The ID of the Data Catalog where the partition in question resides. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The name of the catalog database where the partition resides.
string
[REQUIRED]
The name of the partition's table.
list
[REQUIRED]
The values that define the partition.
(string) --
dict
Response Syntax
{ 'Partition': { 'Values': [ 'string', ], 'DatabaseName': 'string', 'TableName': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'Parameters': { 'string': 'string' }, 'LastAnalyzedTime': datetime(2015, 1, 1), 'CatalogId': 'string' } }
Response Structure
(dict) --
Partition (dict) --
The requested information, in the form of a Partition object.
Values (list) --
The values of the partition.
(string) --
DatabaseName (string) --
The name of the catalog database in which to create the partition.
TableName (string) --
The name of the database table in which to create the partition.
CreationTime (datetime) --
The time at which the partition was created.
LastAccessTime (datetime) --
The last time at which the partition was accessed.
StorageDescriptor (dict) --
Provides information about the physical location where the partition is stored.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
Parameters (dict) --
These key-value pairs define partition parameters.
(string) --
(string) --
LastAnalyzedTime (datetime) --
The last time at which column statistics were computed for this partition.
CatalogId (string) --
The ID of the Data Catalog in which the partition resides.
{'Partitions': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Retrieves information about the partitions in a table.
See also: AWS API Documentation
Request Syntax
client.get_partitions( CatalogId='string', DatabaseName='string', TableName='string', Expression='string', NextToken='string', Segment={ 'SegmentNumber': 123, 'TotalSegments': 123 }, MaxResults=123 )
string
The ID of the Data Catalog where the partitions in question reside. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The name of the catalog database where the partitions reside.
string
[REQUIRED]
The name of the partitions' table.
string
An expression that filters the partitions to be returned.
The expression uses SQL syntax similar to the SQL WHERE filter clause. The SQL statement parser JSQLParser parses the expression.
Operators : The following are the operators that you can use in the Expression API call:
=
Checks whether the values of the two operands are equal; if yes, then the condition becomes true.
Example: Assume 'variable a' holds 10 and 'variable b' holds 20.
(a = b) is not true.
< >
Checks whether the values of two operands are equal; if the values are not equal, then the condition becomes true.
Example: (a < > b) is true.
>
Checks whether the value of the left operand is greater than the value of the right operand; if yes, then the condition becomes true.
Example: (a > b) is not true.
<
Checks whether the value of the left operand is less than the value of the right operand; if yes, then the condition becomes true.
Example: (a < b) is true.
>=
Checks whether the value of the left operand is greater than or equal to the value of the right operand; if yes, then the condition becomes true.
Example: (a >= b) is not true.
<=
Checks whether the value of the left operand is less than or equal to the value of the right operand; if yes, then the condition becomes true.
Example: (a <= b) is true.
AND, OR, IN, BETWEEN, LIKE, NOT, IS NULL
Logical operators.
Supported Partition Key Types : The following are the supported partition keys.
string
date
timestamp
int
bigint
long
tinyint
smallint
decimal
If an invalid type is encountered, an exception is thrown.
The following list shows the valid operators on each type. When you define a crawler, the partitionKey type is created as a STRING , to be compatible with the catalog partitions.
Sample API Call :
string
A continuation token, if this is not the first call to retrieve these partitions.
dict
The segment of the table's partitions to scan in this request.
SegmentNumber (integer) -- [REQUIRED]
The zero-based index number of the segment. For example, if the total number of segments is 4, SegmentNumber values range from 0 through 3.
TotalSegments (integer) -- [REQUIRED]
The total number of segments.
integer
The maximum number of partitions to return in a single response.
dict
Response Syntax
{ 'Partitions': [ { 'Values': [ 'string', ], 'DatabaseName': 'string', 'TableName': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'Parameters': { 'string': 'string' }, 'LastAnalyzedTime': datetime(2015, 1, 1), 'CatalogId': 'string' }, ], 'NextToken': 'string' }
Response Structure
(dict) --
Partitions (list) --
A list of requested partitions.
(dict) --
Represents a slice of table data.
Values (list) --
The values of the partition.
(string) --
DatabaseName (string) --
The name of the catalog database in which to create the partition.
TableName (string) --
The name of the database table in which to create the partition.
CreationTime (datetime) --
The time at which the partition was created.
LastAccessTime (datetime) --
The last time at which the partition was accessed.
StorageDescriptor (dict) --
Provides information about the physical location where the partition is stored.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
Parameters (dict) --
These key-value pairs define partition parameters.
(string) --
(string) --
LastAnalyzedTime (datetime) --
The last time at which column statistics were computed for this partition.
CatalogId (string) --
The ID of the Data Catalog in which the partition resides.
NextToken (string) --
A continuation token, if the returned list of partitions does not include the last one.
{'Table': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Retrieves the Table definition in a Data Catalog for a specified table.
See also: AWS API Documentation
Request Syntax
client.get_table( CatalogId='string', DatabaseName='string', Name='string' )
string
The ID of the Data Catalog where the table resides. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The name of the database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
string
[REQUIRED]
The name of the table for which to retrieve the definition. For Hive compatibility, this name is entirely lowercase.
dict
Response Syntax
{ 'Table': { 'Name': 'string', 'DatabaseName': 'string', 'Description': 'string', 'Owner': 'string', 'CreateTime': datetime(2015, 1, 1), 'UpdateTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'LastAnalyzedTime': datetime(2015, 1, 1), 'Retention': 123, 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'PartitionKeys': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'ViewOriginalText': 'string', 'ViewExpandedText': 'string', 'TableType': 'string', 'Parameters': { 'string': 'string' }, 'CreatedBy': 'string', 'IsRegisteredWithLakeFormation': True|False, 'TargetTable': { 'CatalogId': 'string', 'DatabaseName': 'string', 'Name': 'string' }, 'CatalogId': 'string' } }
Response Structure
(dict) --
Table (dict) --
The Table object that defines the specified table.
Name (string) --
The table name. For Hive compatibility, this must be entirely lowercase.
DatabaseName (string) --
The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.
Description (string) --
A description of the table.
Owner (string) --
The owner of the table.
CreateTime (datetime) --
The time when the table definition was created in the Data Catalog.
UpdateTime (datetime) --
The last time that the table was updated.
LastAccessTime (datetime) --
The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.
LastAnalyzedTime (datetime) --
The last time that column statistics were computed for this table.
Retention (integer) --
The retention time for this table.
StorageDescriptor (dict) --
A storage descriptor containing information about the physical storage of this table.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
PartitionKeys (list) --
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:
"PartitionKeys": []
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
ViewOriginalText (string) --
If the table is a view, the original text of the view; otherwise null .
ViewExpandedText (string) --
If the table is a view, the expanded text of the view; otherwise null .
TableType (string) --
The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).
Parameters (dict) --
These key-value pairs define properties associated with the table.
(string) --
(string) --
CreatedBy (string) --
The person or entity who created the table.
IsRegisteredWithLakeFormation (boolean) --
Indicates whether the table has been registered with AWS Lake Formation.
TargetTable (dict) --
A TableIdentifier structure that describes a target table for resource linking.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
DatabaseName (string) --
The name of the catalog database that contains the target table.
Name (string) --
The name of the target table.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
{'TableVersion': {'Table': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}}
Retrieves a specified version of a table.
See also: AWS API Documentation
Request Syntax
client.get_table_version( CatalogId='string', DatabaseName='string', TableName='string', VersionId='string' )
string
The ID of the Data Catalog where the tables reside. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
string
[REQUIRED]
The name of the table. For Hive compatibility, this name is entirely lowercase.
string
The ID value of the table version to be retrieved. A VersionID is a string representation of an integer. Each version is incremented by 1.
dict
Response Syntax
{ 'TableVersion': { 'Table': { 'Name': 'string', 'DatabaseName': 'string', 'Description': 'string', 'Owner': 'string', 'CreateTime': datetime(2015, 1, 1), 'UpdateTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'LastAnalyzedTime': datetime(2015, 1, 1), 'Retention': 123, 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'PartitionKeys': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'ViewOriginalText': 'string', 'ViewExpandedText': 'string', 'TableType': 'string', 'Parameters': { 'string': 'string' }, 'CreatedBy': 'string', 'IsRegisteredWithLakeFormation': True|False, 'TargetTable': { 'CatalogId': 'string', 'DatabaseName': 'string', 'Name': 'string' }, 'CatalogId': 'string' }, 'VersionId': 'string' } }
Response Structure
(dict) --
TableVersion (dict) --
The requested table version.
Table (dict) --
The table in question.
Name (string) --
The table name. For Hive compatibility, this must be entirely lowercase.
DatabaseName (string) --
The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.
Description (string) --
A description of the table.
Owner (string) --
The owner of the table.
CreateTime (datetime) --
The time when the table definition was created in the Data Catalog.
UpdateTime (datetime) --
The last time that the table was updated.
LastAccessTime (datetime) --
The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.
LastAnalyzedTime (datetime) --
The last time that column statistics were computed for this table.
Retention (integer) --
The retention time for this table.
StorageDescriptor (dict) --
A storage descriptor containing information about the physical storage of this table.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
PartitionKeys (list) --
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:
"PartitionKeys": []
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
ViewOriginalText (string) --
If the table is a view, the original text of the view; otherwise null .
ViewExpandedText (string) --
If the table is a view, the expanded text of the view; otherwise null .
TableType (string) --
The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).
Parameters (dict) --
These key-value pairs define properties associated with the table.
(string) --
(string) --
CreatedBy (string) --
The person or entity who created the table.
IsRegisteredWithLakeFormation (boolean) --
Indicates whether the table has been registered with AWS Lake Formation.
TargetTable (dict) --
A TableIdentifier structure that describes a target table for resource linking.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
DatabaseName (string) --
The name of the catalog database that contains the target table.
Name (string) --
The name of the target table.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
VersionId (string) --
The ID value that identifies this table version. A VersionId is a string representation of an integer. Each version is incremented by 1.
{'TableVersions': {'Table': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}}
Retrieves a list of strings that identify available versions of a specified table.
See also: AWS API Documentation
Request Syntax
client.get_table_versions( CatalogId='string', DatabaseName='string', TableName='string', NextToken='string', MaxResults=123 )
string
The ID of the Data Catalog where the tables reside. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.
string
[REQUIRED]
The name of the table. For Hive compatibility, this name is entirely lowercase.
string
A continuation token, if this is not the first call.
integer
The maximum number of table versions to return in one response.
dict
Response Syntax
{ 'TableVersions': [ { 'Table': { 'Name': 'string', 'DatabaseName': 'string', 'Description': 'string', 'Owner': 'string', 'CreateTime': datetime(2015, 1, 1), 'UpdateTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'LastAnalyzedTime': datetime(2015, 1, 1), 'Retention': 123, 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'PartitionKeys': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'ViewOriginalText': 'string', 'ViewExpandedText': 'string', 'TableType': 'string', 'Parameters': { 'string': 'string' }, 'CreatedBy': 'string', 'IsRegisteredWithLakeFormation': True|False, 'TargetTable': { 'CatalogId': 'string', 'DatabaseName': 'string', 'Name': 'string' }, 'CatalogId': 'string' }, 'VersionId': 'string' }, ], 'NextToken': 'string' }
Response Structure
(dict) --
TableVersions (list) --
A list of strings identifying available versions of the specified table.
(dict) --
Specifies a version of a table.
Table (dict) --
The table in question.
Name (string) --
The table name. For Hive compatibility, this must be entirely lowercase.
DatabaseName (string) --
The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.
Description (string) --
A description of the table.
Owner (string) --
The owner of the table.
CreateTime (datetime) --
The time when the table definition was created in the Data Catalog.
UpdateTime (datetime) --
The last time that the table was updated.
LastAccessTime (datetime) --
The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.
LastAnalyzedTime (datetime) --
The last time that column statistics were computed for this table.
Retention (integer) --
The retention time for this table.
StorageDescriptor (dict) --
A storage descriptor containing information about the physical storage of this table.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
PartitionKeys (list) --
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:
"PartitionKeys": []
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
ViewOriginalText (string) --
If the table is a view, the original text of the view; otherwise null .
ViewExpandedText (string) --
If the table is a view, the expanded text of the view; otherwise null .
TableType (string) --
The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).
Parameters (dict) --
These key-value pairs define properties associated with the table.
(string) --
(string) --
CreatedBy (string) --
The person or entity who created the table.
IsRegisteredWithLakeFormation (boolean) --
Indicates whether the table has been registered with AWS Lake Formation.
TargetTable (dict) --
A TableIdentifier structure that describes a target table for resource linking.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
DatabaseName (string) --
The name of the catalog database that contains the target table.
Name (string) --
The name of the target table.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
VersionId (string) --
The ID value that identifies this table version. A VersionId is a string representation of an integer. Each version is incremented by 1.
NextToken (string) --
A continuation token, if the list of available versions does not include the last one.
{'TableList': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Retrieves the definitions of some or all of the tables in a given Database .
See also: AWS API Documentation
Request Syntax
client.get_tables( CatalogId='string', DatabaseName='string', Expression='string', NextToken='string', MaxResults=123 )
string
The ID of the Data Catalog where the tables reside. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The database in the catalog whose tables to list. For Hive compatibility, this name is entirely lowercase.
string
A regular expression pattern. If present, only those tables whose names match the pattern are returned.
string
A continuation token, included if this is a continuation call.
integer
The maximum number of tables to return in a single response.
dict
Response Syntax
{ 'TableList': [ { 'Name': 'string', 'DatabaseName': 'string', 'Description': 'string', 'Owner': 'string', 'CreateTime': datetime(2015, 1, 1), 'UpdateTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'LastAnalyzedTime': datetime(2015, 1, 1), 'Retention': 123, 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'PartitionKeys': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'ViewOriginalText': 'string', 'ViewExpandedText': 'string', 'TableType': 'string', 'Parameters': { 'string': 'string' }, 'CreatedBy': 'string', 'IsRegisteredWithLakeFormation': True|False, 'TargetTable': { 'CatalogId': 'string', 'DatabaseName': 'string', 'Name': 'string' }, 'CatalogId': 'string' }, ], 'NextToken': 'string' }
Response Structure
(dict) --
TableList (list) --
A list of the requested Table objects.
(dict) --
Represents a collection of related data organized in columns and rows.
Name (string) --
The table name. For Hive compatibility, this must be entirely lowercase.
DatabaseName (string) --
The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.
Description (string) --
A description of the table.
Owner (string) --
The owner of the table.
CreateTime (datetime) --
The time when the table definition was created in the Data Catalog.
UpdateTime (datetime) --
The last time that the table was updated.
LastAccessTime (datetime) --
The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.
LastAnalyzedTime (datetime) --
The last time that column statistics were computed for this table.
Retention (integer) --
The retention time for this table.
StorageDescriptor (dict) --
A storage descriptor containing information about the physical storage of this table.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
PartitionKeys (list) --
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:
"PartitionKeys": []
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
ViewOriginalText (string) --
If the table is a view, the original text of the view; otherwise null .
ViewExpandedText (string) --
If the table is a view, the expanded text of the view; otherwise null .
TableType (string) --
The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).
Parameters (dict) --
These key-value pairs define properties associated with the table.
(string) --
(string) --
CreatedBy (string) --
The person or entity who created the table.
IsRegisteredWithLakeFormation (boolean) --
Indicates whether the table has been registered with AWS Lake Formation.
TargetTable (dict) --
A TableIdentifier structure that describes a target table for resource linking.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
DatabaseName (string) --
The name of the catalog database that contains the target table.
Name (string) --
The name of the target table.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
NextToken (string) --
A continuation token, present if the current list segment is not the last.
{'TableList': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Searches a set of tables based on properties in the table metadata as well as on the parent database. You can search against text or filter conditions.
You can only get tables that you have access to based on the security policies defined in Lake Formation. You need at least a read-only access to the table for it to be returned. If you do not have access to all the columns in the table, these columns will not be searched against when returning the list of tables back to you. If you have access to the columns but not the data in the columns, those columns and the associated metadata for those columns will be included in the search.
See also: AWS API Documentation
Request Syntax
client.search_tables( CatalogId='string', NextToken='string', Filters=[ { 'Key': 'string', 'Value': 'string', 'Comparator': 'EQUALS'|'GREATER_THAN'|'LESS_THAN'|'GREATER_THAN_EQUALS'|'LESS_THAN_EQUALS' }, ], SearchText='string', SortCriteria=[ { 'FieldName': 'string', 'Sort': 'ASC'|'DESC' }, ], MaxResults=123, ResourceShareType='FOREIGN'|'ALL' )
string
A unique identifier, consisting of `` account_id `` .
string
A continuation token, included if this is a continuation call.
list
A list of key-value pairs, and a comparator used to filter the search results. Returns all entities matching the predicate.
The Comparator member of the PropertyPredicate struct is used only for time fields, and can be omitted for other field types. Also, when comparing string values, such as when Key=Name , a fuzzy match algorithm is used. The Key field (for example, the value of the Name field) is split on certain punctuation characters, for example, -, :, #, etc. into tokens. Then each token is exact-match compared with the Value member of PropertyPredicate . For example, if Key=Name and Value=link , tables named customer-link and xx-link-yy are returned, but xxlinkyy is not returned.
(dict) --
Defines a property predicate.
Key (string) --
The key of the property.
Value (string) --
The value of the property.
Comparator (string) --
The comparator used to compare this property to others.
string
A string used for a text search.
Specifying a value in quotes filters based on an exact match to the value.
list
A list of criteria for sorting the results by a field name, in an ascending or descending order.
(dict) --
Specifies a field to sort by and a sort order.
FieldName (string) --
The name of the field on which to sort.
Sort (string) --
An ascending or descending sort.
integer
The maximum number of tables to return in a single response.
string
Allows you to specify that you want to search the tables shared with your account. The allowable values are FOREIGN or ALL .
If set to FOREIGN , will search the tables shared with your account.
If set to ALL , will search the tables shared with your account, as well as the tables in yor local account.
dict
Response Syntax
{ 'NextToken': 'string', 'TableList': [ { 'Name': 'string', 'DatabaseName': 'string', 'Description': 'string', 'Owner': 'string', 'CreateTime': datetime(2015, 1, 1), 'UpdateTime': datetime(2015, 1, 1), 'LastAccessTime': datetime(2015, 1, 1), 'LastAnalyzedTime': datetime(2015, 1, 1), 'Retention': 123, 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'PartitionKeys': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'ViewOriginalText': 'string', 'ViewExpandedText': 'string', 'TableType': 'string', 'Parameters': { 'string': 'string' }, 'CreatedBy': 'string', 'IsRegisteredWithLakeFormation': True|False, 'TargetTable': { 'CatalogId': 'string', 'DatabaseName': 'string', 'Name': 'string' }, 'CatalogId': 'string' }, ] }
Response Structure
(dict) --
NextToken (string) --
A continuation token, present if the current list segment is not the last.
TableList (list) --
A list of the requested Table objects. The SearchTables response returns only the tables that you have access to.
(dict) --
Represents a collection of related data organized in columns and rows.
Name (string) --
The table name. For Hive compatibility, this must be entirely lowercase.
DatabaseName (string) --
The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.
Description (string) --
A description of the table.
Owner (string) --
The owner of the table.
CreateTime (datetime) --
The time when the table definition was created in the Data Catalog.
UpdateTime (datetime) --
The last time that the table was updated.
LastAccessTime (datetime) --
The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.
LastAnalyzedTime (datetime) --
The last time that column statistics were computed for this table.
Retention (integer) --
The retention time for this table.
StorageDescriptor (dict) --
A storage descriptor containing information about the physical storage of this table.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) --
The name of the column.
SortOrder (integer) --
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
PartitionKeys (list) --
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:
"PartitionKeys": []
(dict) --
A column in a Table .
Name (string) --
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
ViewOriginalText (string) --
If the table is a view, the original text of the view; otherwise null .
ViewExpandedText (string) --
If the table is a view, the expanded text of the view; otherwise null .
TableType (string) --
The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).
Parameters (dict) --
These key-value pairs define properties associated with the table.
(string) --
(string) --
CreatedBy (string) --
The person or entity who created the table.
IsRegisteredWithLakeFormation (boolean) --
Indicates whether the table has been registered with AWS Lake Formation.
TargetTable (dict) --
A TableIdentifier structure that describes a target table for resource linking.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
DatabaseName (string) --
The name of the catalog database that contains the target table.
Name (string) --
The name of the target table.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
{'PartitionInput': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Updates a partition.
See also: AWS API Documentation
Request Syntax
client.update_partition( CatalogId='string', DatabaseName='string', TableName='string', PartitionValueList=[ 'string', ], PartitionInput={ 'Values': [ 'string', ], 'LastAccessTime': datetime(2015, 1, 1), 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'Parameters': { 'string': 'string' }, 'LastAnalyzedTime': datetime(2015, 1, 1) } )
string
The ID of the Data Catalog where the partition to be updated resides. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The name of the catalog database in which the table in question resides.
string
[REQUIRED]
The name of the table in which the partition to be updated is located.
list
[REQUIRED]
List of partition key values that define the partition to update.
(string) --
dict
[REQUIRED]
The new partition object to update the partition to.
The Values property can't be changed. If you want to change the partition key values for a partition, delete and recreate the partition.
Values (list) --
The values of the partition. Although this parameter is not required by the SDK, you must specify this parameter for a valid input.
The values for the keys for the new partition must be passed as an array of String objects that must be ordered in the same order as the partition keys appearing in the Amazon S3 prefix. Otherwise AWS Glue will add the values to the wrong keys.
(string) --
LastAccessTime (datetime) --
The last time at which the partition was accessed.
StorageDescriptor (dict) --
Provides information about the physical location where the partition is stored.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) -- [REQUIRED]
The name of the column.
SortOrder (integer) -- [REQUIRED]
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
Parameters (dict) --
These key-value pairs define partition parameters.
(string) --
(string) --
LastAnalyzedTime (datetime) --
The last time at which column statistics were computed for this partition.
dict
Response Syntax
{}
Response Structure
(dict) --
{'TableInput': {'StorageDescriptor': {'SchemaReference': {'SchemaId': {'RegistryName': 'string', 'SchemaArn': 'string', 'SchemaName': 'string'}, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 'long'}}}}
Updates a metadata table in the Data Catalog.
See also: AWS API Documentation
Request Syntax
client.update_table( CatalogId='string', DatabaseName='string', TableInput={ 'Name': 'string', 'Description': 'string', 'Owner': 'string', 'LastAccessTime': datetime(2015, 1, 1), 'LastAnalyzedTime': datetime(2015, 1, 1), 'Retention': 123, 'StorageDescriptor': { 'Columns': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'Location': 'string', 'InputFormat': 'string', 'OutputFormat': 'string', 'Compressed': True|False, 'NumberOfBuckets': 123, 'SerdeInfo': { 'Name': 'string', 'SerializationLibrary': 'string', 'Parameters': { 'string': 'string' } }, 'BucketColumns': [ 'string', ], 'SortColumns': [ { 'Column': 'string', 'SortOrder': 123 }, ], 'Parameters': { 'string': 'string' }, 'SkewedInfo': { 'SkewedColumnNames': [ 'string', ], 'SkewedColumnValues': [ 'string', ], 'SkewedColumnValueLocationMaps': { 'string': 'string' } }, 'StoredAsSubDirectories': True|False, 'SchemaReference': { 'SchemaId': { 'SchemaArn': 'string', 'SchemaName': 'string', 'RegistryName': 'string' }, 'SchemaVersionId': 'string', 'SchemaVersionNumber': 123 } }, 'PartitionKeys': [ { 'Name': 'string', 'Type': 'string', 'Comment': 'string', 'Parameters': { 'string': 'string' } }, ], 'ViewOriginalText': 'string', 'ViewExpandedText': 'string', 'TableType': 'string', 'Parameters': { 'string': 'string' }, 'TargetTable': { 'CatalogId': 'string', 'DatabaseName': 'string', 'Name': 'string' } }, SkipArchive=True|False )
string
The ID of the Data Catalog where the table resides. If none is provided, the AWS account ID is used by default.
string
[REQUIRED]
The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.
dict
[REQUIRED]
An updated TableInput object to define the metadata table in the catalog.
Name (string) -- [REQUIRED]
The table name. For Hive compatibility, this is folded to lowercase when it is stored.
Description (string) --
A description of the table.
Owner (string) --
The table owner.
LastAccessTime (datetime) --
The last time that the table was accessed.
LastAnalyzedTime (datetime) --
The last time that column statistics were computed for this table.
Retention (integer) --
The retention time for this table.
StorageDescriptor (dict) --
A storage descriptor containing information about the physical storage of this table.
Columns (list) --
A list of the Columns in the table.
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
Location (string) --
The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.
InputFormat (string) --
The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.
OutputFormat (string) --
The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.
Compressed (boolean) --
True if the data in the table is compressed, or False if not.
NumberOfBuckets (integer) --
Must be specified if the table contains any dimension columns.
SerdeInfo (dict) --
The serialization/deserialization (SerDe) information.
Name (string) --
Name of the SerDe.
SerializationLibrary (string) --
Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .
Parameters (dict) --
These key-value pairs define initialization parameters for the SerDe.
(string) --
(string) --
BucketColumns (list) --
A list of reducer grouping columns, clustering columns, and bucketing columns in the table.
(string) --
SortColumns (list) --
A list specifying the sort order of each bucket in the table.
(dict) --
Specifies the sort order of a sorted column.
Column (string) -- [REQUIRED]
The name of the column.
SortOrder (integer) -- [REQUIRED]
Indicates that the column is sorted in ascending order (== 1 ), or in descending order (==0 ).
Parameters (dict) --
The user-supplied properties in key-value form.
(string) --
(string) --
SkewedInfo (dict) --
The information about values that appear frequently in a column (skewed values).
SkewedColumnNames (list) --
A list of names of columns that contain skewed values.
(string) --
SkewedColumnValues (list) --
A list of values that appear so frequently as to be considered skewed.
(string) --
SkewedColumnValueLocationMaps (dict) --
A mapping of skewed values to the columns that contain them.
(string) --
(string) --
StoredAsSubDirectories (boolean) --
True if the table data is stored in subdirectories, or False if not.
SchemaReference (dict) --
An object that references a schema stored in the AWS Glue Schema Registry.
When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.
SchemaId (dict) --
A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.
SchemaArn (string) --
SchemaName (string) --
RegistryName (string) --
SchemaVersionId (string) --
The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.
SchemaVersionNumber (integer) --
The version number of the schema.
PartitionKeys (list) --
A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.
When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:
"PartitionKeys": []
(dict) --
A column in a Table .
Name (string) -- [REQUIRED]
The name of the Column .
Type (string) --
The data type of the Column .
Comment (string) --
A free-form text comment.
Parameters (dict) --
These key-value pairs define properties associated with the column.
(string) --
(string) --
ViewOriginalText (string) --
If the table is a view, the original text of the view; otherwise null .
ViewExpandedText (string) --
If the table is a view, the expanded text of the view; otherwise null .
TableType (string) --
The type of this table (EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).
Parameters (dict) --
These key-value pairs define properties associated with the table.
(string) --
(string) --
TargetTable (dict) --
A TableIdentifier structure that describes a target table for resource linking.
CatalogId (string) --
The ID of the Data Catalog in which the table resides.
DatabaseName (string) --
The name of the catalog database that contains the target table.
Name (string) --
The name of the target table.
boolean
By default, UpdateTable always creates an archived version of the table before updating it. However, if skipArchive is set to true, UpdateTable does not create the archived version.
dict
Response Syntax
{}
Response Structure
(dict) --