AWS Glue

2024/04/02 - 6 updated api methods

Changes   Adding View related fields to responses of read-only Table APIs.

2024/02/05 - 2 updated api methods

Changes   Introduce Catalog Encryption Role within Glue Data Catalog Settings. Introduce SASL/PLAIN as an authentication method for Glue Kafka connections

2023/12/22 - 3 updated api methods

Changes   This release adds additional configurations for Query Session Context on the following APIs: GetUnfilteredTableMetadata, GetUnfilteredPartitionMetadata, GetUnfilteredPartitionsMetadata.

2023/11/30 - 2 updated api methods

Changes   Adds observation and analyzer support to the GetDataQualityResult and BatchGetDataQualityResult APIs.

2023/11/16 - 5 new api methods

Changes   Introduces new column statistics APIs to support statistics generation for tables within the Glue Data Catalog.

2023/11/14 - 6 new api methods

Changes   Introduces new storage optimization APIs to support automatic compaction of Apache Iceberg tables.

2023/11/02 - 5 updated api methods

Changes   This release introduces Google BigQuery Source and Target in AWS Glue CodeGenConfigurationNode.

2023/10/12 - 7 updated api methods

Changes   Extending version control support to GitLab and Bitbucket from AWSGlue

2023/08/24 - 3 updated api methods

Changes   Added API attributes that help in the monitoring of sessions.

2023/08/15 - 4 updated api methods

Changes   AWS Glue Crawlers can now accept SerDe overrides from a custom csv classifier. The two SerDe options are LazySimpleSerDe and OpenCSVSerDe. In case, the user wants crawler to do the selection, "None" can be selected for this purpose.

2023/07/26 - 5 updated api methods

Changes   Release Glue Studio Snowflake Connector Node for SDK/CLI

2023/07/24 - 5 updated api methods

Changes   Added support for Data Preparation Recipe node in Glue Studio jobs

2023/07/21 - 5 updated api methods

Changes   This release adds support for AWS Glue Crawler with Apache Hudi Tables, allowing Crawlers to discover Hudi Tables in S3 and register them in Glue Data Catalog for query engines to query against.

2023/07/17 - 3 updated api methods

Changes   Adding new supported permission type flags to get-unfiltered endpoints that callers may pass to indicate support for enforcing Lake Formation fine-grained access control on nested column attributes.

2023/07/07 - 1 updated api methods

Changes   This release enables customers to create new Apache Iceberg tables and associated metadata in Amazon S3 by using native AWS Glue CreateTable operation.

2023/06/29 - 5 updated api methods

Changes   This release adds support for AWS Glue Crawler with Iceberg Tables, allowing Crawlers to discover Iceberg Tables in S3 and register them in Glue Data Catalog for query engines to query against.

2023/06/26 - 5 updated api methods

Changes   Timestamp Starting Position For Kinesis and Kafka Data Sources in a Glue Streaming Job

2023/06/19 - 12 updated api methods

Changes   This release adds support for creating cross region table/database resource links

2023/05/30 - 21 updated api methods

Changes   Added Runtime parameter to allow selection of Ray Runtime

2023/05/25 - 12 updated api methods

Changes   Added ability to create data quality rulesets for shared, cross-account Glue Data Catalog tables. Added support for dataset comparison rules through a new parameter called AdditionalDataSources. Enhanced the data quality results with a map containing profiled metric values.

2023/05/16 - 2 updated api methods

Changes   Add Support for Tags for Custom Entity Types

2023/05/09 - 5 updated api methods

Changes   This release adds AmazonRedshift Source and Target nodes in addition to DynamicTransform OutputSchemas

2023/05/08 - 21 updated api methods

Changes   We don't do release notes https://w.amazon.com/bin/view/AWSDocs/common-tasks/release-notes

2023/04/03 - 10 updated api methods

Changes   Add support for database-level federation

2023/02/17 - 5 updated api methods

Changes   Release of Delta Lake Data Lake Format for Glue Studio Service

2023/02/15 - 5 updated api methods

Changes   Fix DirectJDBCSource not showing up in CLI code gen

2023/02/08 - 5 updated api methods

Changes   DirectJDBCSource + Glue 4.0 streaming options

2023/01/19 - 5 updated api methods

Changes   Release Glue Studio Hudi Data Lake Format for SDK/CLI

2022/12/15 - 5 updated api methods

Changes   This release adds support for AWS Glue Crawler with native DeltaLake tables, allowing Crawlers to classify Delta Lake format tables and catalog them for query engines to query against.

2022/11/30 - 16 new 8 updated api methods

Changes   This release adds support for AWS Glue Data Quality, which helps you evaluate and monitor the quality of your data and includes the API for creating, deleting, or updating data quality rulesets, runs and evaluations.

2022/11/29 - 5 updated api methods

Changes   This release allows the creation of Custom Visual Transforms (Dynamic Transforms) to be created via AWS Glue CLI/SDK.

2022/11/18 - 5 updated api methods

Changes   AWSGlue Crawler - Adding support for Table and Column level Comments with database level datatypes for JDBC based crawler.

2022/10/27 - 4 updated api methods

Changes   Added support for custom datatypes when using custom csv classifier.

2022/10/05 - 2 new 5 updated api methods

Changes   This SDK release adds support to sync glue jobs with source control provider. Additionally, a new parameter called SourceControlDetails will be added to Job model.

2022/09/22 - 5 updated api methods

Changes   Added support for S3 Event Notifications for Catalog Target Crawlers.

2022/08/08 - 17 updated api methods

Changes   Add an option to run non-urgent or non-time sensitive Glue Jobs on spare capacity

2022/07/14 - 21 updated api methods

Changes   This release adds an additional worker type for Glue Streaming jobs.

2022/06/30 - 1 updated api methods

Changes   This release adds tag as an input of CreateDatabase

2022/06/24 - 1 new api methods

Changes   This release enables the new ListCrawls API for viewing the AWS Glue Crawler run history.

2022/05/17 - 5 updated api methods

Changes   This release adds a new optional parameter called codeGenNodeConfiguration to CRUD job APIs that allows users to manage visual jobs via APIs. The updated CreateJob and UpdateJob will create jobs that can be viewed in Glue Studio as a visual graph. GetJob can be used to get codeGenNodeConfiguration.

2022/04/21 - 5 new api methods

Changes   This release adds APIs to create, read, delete, list, and batch read of Glue custom entity types

2022/04/14 - 6 updated api methods

Changes   Auto Scaling for Glue version 3.0 and later jobs to dynamically scale compute resources. This SDK change provides customers with the auto-scaled DPU usage

2022/03/18 - 9 new 3 updated api methods

Changes   Added 9 new APIs for AWS Glue Interactive Sessions: ListSessions, StopSession, CreateSession, GetSession, DeleteSession, RunStatement, GetStatement, ListStatements, CancelStatement

2022/02/16 - 7 updated api methods

Changes   Support for optimistic locking in UpdateTable

2022/02/02 - 5 updated api methods

Changes   Launch Protobuf support for AWS Glue Schema Registry

2022/01/13 - 1 updated api methods

Changes   This SDK release adds support to pass run properties when starting a workflow run

2022/01/05 - 3 new 19 updated api methods

Changes   Add Delta Lake target support for Glue Crawler and 3rd Party Support for Lake Formation

2021/11/30 - 7 updated api methods

Changes   Support for DataLake transactions

2021/10/15 - 5 updated api methods

Changes   Enable S3 event base crawler API.

2021/10/05 - 1 updated api methods

Changes   This release adds tag as an input of CreateConnection

2021/08/23 - 9 new 2 updated api methods

Changes   Add support for Custom Blueprints

2021/07/14 - 9 updated api methods

Changes   Add support for Event Driven Workflows

2021/06/28 - 5 updated api methods

Changes   Add JSON Support for Glue Schema Registry

2021/06/07 - 5 updated api methods

Changes   Add SampleSize variable to S3Target to enable s3-sampling feature through API.

2021/03/29 - 1 updated api methods

Changes   Allow Dots in Registry and Schema Names for CreateRegistry, CreateSchema; Fixed issue when duplicate keys are present and not returned as part of QuerySchemaVersionMetadata.

2021/02/23 - 1 updated api methods

Changes   Updating the page size for Glue catalog getter APIs.

2020/12/22 - 2 updated api methods

Changes   AWS Glue Find Matches machine learning transforms now support column importance scores.

2020/12/21 - 4 updated api methods

Changes   Add 4 connection properties: SECRET_ID, CONNECTOR_URL, CONNECTOR_TYPE, CONNECTOR_CLASS_NAME. Add two connection types: MARKETPLACE, CUSTOM

2020/11/23 - 2 new 6 updated api methods

Changes   Feature1 - Glue crawler adds data lineage configuration option. Feature2 - AWS Glue Data Catalog adds APIs for PartitionIndex creation and deletion as part of Enhancement Partition Management feature.

2020/11/19 - 20 new 14 updated api methods

Changes   Adding support for Glue Schema Registry. The AWS Glue Schema Registry is a new feature that allows you to centrally discover, control, and evolve data stream schemas.

2020/10/27 - 3 updated api methods

Changes   AWS Glue machine learning transforms now support encryption-at-rest for labels and trained models.

2020/10/21 - 5 updated api methods

Changes   AWS Glue crawlers now support incremental crawls for the Amazon Simple Storage Service (Amazon S3) data source.

2020/10/05 - 5 updated api methods

Changes   AWS Glue crawlers now support Amazon DocumentDB (with MongoDB compatibility) and MongoDB collections. You can choose to crawl the entire data set or only a small sample to reduce crawl time.

2020/10/01 - 1 updated api methods

Changes   Adding additional optional map parameter to get-plan api

2020/09/21 - 1 new api methods

Changes   Adding support to update multiple partitions of a table in a single request

2020/09/09 - 1 new 1 updated api methods

Changes   Adding support for partitionIndexes to improve GetPartitions performance.

2020/08/10 - 6 updated api methods

Changes   Starting today, you can further control orchestration of your ETL workloads in AWS Glue by specifying the maximum number of concurrent runs for a Glue workflow.

2020/08/07 - 9 updated api methods

Changes   AWS Glue now adds support for Network connection type enabling you to access resources inside your VPC using Glue crawlers and Glue ETL jobs.

2020/07/27 - 1 new 4 updated api methods

Changes   Add ability to manually resume workflows in AWS Glue providing customers further control over the orchestration of ETL workloads.

2020/07/07 - 1 new 19 updated api methods

Changes   AWS Glue Data Catalog supports cross account sharing of tables through AWS Lake Formation

2020/06/25 - 6 new api methods

Changes   This release adds new APIs to support column level statistics in AWS Glue Data Catalog

2020/06/12 - 5 updated api methods

Changes   You can now choose to crawl the entire table or just a sample of records in DynamoDB when using AWS Glue crawlers. Additionally, you can also specify a scanning rate for crawling DynamoDB tables.

2020/06/03 - 2 updated api methods

Changes   Adding databaseName in the response for GetUserDefinedFunctions() API.

2020/05/15 - 1 new 9 updated api methods

Changes   Starting today, you can stop the execution of Glue workflows that are running. AWS Glue workflows are directed acyclic graphs (DAGs) of Glue triggers, crawlers and jobs. Using a workflow, you can design a complex multi-job extract, transform, and load (ETL) activity that AWS Glue can execute and track as single entity.

2020/04/20 - 4 updated api methods

Changes   Added a new ConnectionType "KAFKA" and a ConnectionProperty "KAFKA_BOOTSTRAP_SERVERS" to support Kafka connection.

2020/03/31 - 4 updated api methods

Changes   Add two enums for MongoDB connection: Added "CONNECTION_URL" to "ConnectionPropertyKey" and added "MONGODB" to "ConnectionType"

2020/02/28 - 1 new 1 updated api methods

Changes   AWS Glue adds resource tagging support for Machine Learning Transforms and adds a new API, ListMLTransforms to support tag filtering. With this feature, customers can use tags in AWS Glue to organize and control access to Machine Learning Transforms.

2020/02/12 - 5 updated api methods

Changes   Adding ability to add arguments that cannot be overridden to AWS Glue jobs

2019/11/21 - 4 updated api methods

Changes   This release adds support for Glue 1.0 compatible ML Transforms.

2019/09/19 - 4 updated api methods

Changes   AWS Glue DevEndpoints now supports GlueVersion, enabling you to choose Apache Spark 2.4.3 (in addition to Apache Spark 2.2.1). In addition to supporting the latest version of Spark, you will also have the ability to choose between Python 2 and Python 3.

2019/08/08 - 13 new 16 updated api methods

Changes   You can now use AWS Glue to find matching records across dataset even without identifiers to join on by using the new FindMatches ML Transform. Find related products, places, suppliers, customers, and more by teaching a custom machine learning transformation that you can use to identify matching matching records as part of your analysis, data cleaning, or master data management project by adding the FindMatches transformation to your Glue ETL Jobs. If your problem is more along the lines of deduplication, you can use the FindMatches in much the same way to identify customers who have signed up more than ones, products that have accidentally been added to your product catalog more than once, and so forth. Using the FindMatches MLTransform, you can teach a Transform your definition of a duplicate through examples, and it will use machine learning to identify other potential duplicates in your dataset. As with data integration, you can then use your new Transform in your deduplication projects by adding the FindMatches transformation to your Glue ETL Jobs. This release also contains additional APIs that support AWS Lake Formation.

2019/07/26 - 2 new 1 updated api methods

Changes   This release provides GetJobBookmark and GetJobBookmarks APIs. These APIs enable users to look at specific versions or all versions of the JobBookmark for a specific job. This release also enables resetting the job bookmark to a specific run via an enhancement of the ResetJobBookmark API.

2019/07/24 - 16 updated api methods

Changes   This release provides GlueVersion option for Job APIs and WorkerType option for DevEndpoint APIs. Job APIs enable users to pick specific GlueVersion for a specific job and pin the job to a specific runtime environment. DevEndpoint APIs enable users to pick different WorkerType for memory intensive workload.

2019/06/20 - 11 new 5 updated api methods

Changes   Starting today, you can now use workflows in AWS Glue to author directed acyclic graphs (DAGs) of Glue triggers, crawlers and jobs. Workflows enable orchestration of your ETL workloads by building dependencies between Glue entities (triggers, crawlers and jobs). You can visually track status of the different nodes in the workflows on the console making it easier to monitor progress and troubleshoot issues. Also, you can share parameters across entities in the workflow.

2019/06/05 - 5 updated api methods

Changes   Support specifying python version for Python shell jobs. A new parameter PythonVersion is added to the JobCommand data type.

2019/05/10 - 5 updated api methods

Changes   AWS Glue now supports specifying existing catalog tables for a crawler to examine as a data source. A new parameter CatalogTargets is added to the CrawlerTargets data type.

2019/04/05 - 8 updated api methods

Changes   AWS Glue now supports workerType choices in the CreateJob, UpdateJob, and StartJobRun APIs, to be used for memory-intensive jobs.

2019/03/26 - 4 updated api methods

Changes   This new feature will now allow customers to add a customized csv classifier with classifier API. They can specify a custom delimiter, quote symbol and control other behavior they'd like crawlers to have while recognizing csv files

2019/03/11 - 5 updated api methods

Changes   CreateDevEndpoint and UpdateDevEndpoint now support Arguments to configure the DevEndpoint.

2019/02/22 - 11 new 4 updated api methods

Changes   AWS Glue adds support for assigning AWS resource tags to jobs, triggers, development endpoints, and crawlers. Each tag consists of a key and an optional value, both of which you define. With this capacity, customers can use tags in AWS Glue to easily organize and identify your resources, create cost allocation reports, and control access to resources.

2019/01/18 - 7 updated api methods

Changes   AllocatedCapacity field is being deprecated and replaced with MaxCapacity field

2018/12/12 - 4 updated api methods

Changes   API Update for Glue: this update enables encryption of password inside connection objects stored in AWS Glue Data Catalog using DataCatalogEncryptionSettings. In addition, a new "HidePassword" flag is added to GetConnection and GetConnections to return connections without passwords.

2018/10/16 - 3 new api methods

Changes   New Glue APIs for creating, updating, reading and deleting Data Catalog resource-based policies.

2018/09/26 - 1 new api methods

Changes   AWS Glue now supports data encryption at rest for ETL jobs and development endpoints. With encryption enabled, when you run ETL jobs, or development endpoints, Glue will use AWS KMS keys to write encrypted data at rest. You can also encrypt the metadata stored in the Glue Data Catalog using keys that you manage with AWS KMS. Additionally, you can use AWS KMS keys to encrypt the logs generated by crawlers and ETL jobs as well as encrypt ETL job bookmarks. Encryption settings for Glue crawlers, ETL jobs, and development endpoints can be configured using the security configurations in Glue. Glue Data Catalog encryption can be enabled via the settings for the Glue Data Catalog.

2018/08/28 - 3 new api methods

Changes   New Glue APIs for creating, updating, reading and deleting Data Catalog resource-based policies.

2018/08/25 - 5 new 18 updated api methods

Changes   AWS Glue now supports data encryption at rest for ETL jobs and development endpoints. With encryption enabled, when you run ETL jobs, or development endpoints, Glue will use AWS KMS keys to write encrypted data at rest. You can also encrypt the metadata stored in the Glue Data Catalog using keys that you manage with AWS KMS. Additionally, you can use AWS KMS keys to encrypt the logs generated by crawlers and ETL jobs as well as encrypt ETL job bookmarks. Encryption settings for Glue crawlers, ETL jobs, and development endpoints can be configured using the security configurations in Glue. Glue Data Catalog encryption can be enabled via the settings for the Glue Data Catalog.

2018/07/30 - 4 updated api methods

Changes   Glue Development Endpoints now support association of multiple SSH public keys with a development endpoint.

2018/07/10 - 6 updated api methods

Changes   AWS Glue adds the ability to crawl DynamoDB tables.

2018/05/25 - 11 updated api methods

Changes   AWS Glue now sends a delay notification to Amazon CloudWatch Events when an ETL job runs longer than the specified delay notification threshold.

2018/04/10 - 11 updated api methods

Changes   "AWS Glue now supports timeout values for ETL jobs. With this release, all new ETL jobs have a default timeout value of 48 hours. AWS Glue also now supports the ability to start a schedule or job events trigger when it is created."

2018/03/20 - 2 updated api methods

Changes   API Updates for DevEndpoint: PublicKey is now optional for CreateDevEndpoint. The new DevEndpoint field PrivateAddress will be populated for DevEndpoints associated with a VPC.

2018/02/06 - 4 updated api methods

Changes   This new feature will now allow customers to add a customized json classifier. They can specify a json path to indicate the object, array or field of the json documents they'd like crawlers to inspect when they crawl json files.

2018/01/19 - 3 new 1 updated api methods

Changes   New AWS Glue DataCatalog APIs to manage table versions and a new feature to skip archiving of the old table version when updating table.

2018/01/12 - 6 updated api methods

Changes   Support is added to generate ETL scripts in Scala which can now be run by AWS Glue ETL jobs. In addition, the trigger API now supports firing when any conditions are met (in addition to all conditions). Also, jobs can be triggered based on a "failed" or "stopped" job run (in addition to a "succeeded" job run).

2017/11/16 - 8 updated api methods

Changes   API update for AWS Glue. New crawler configuration attribute enables customers to specify crawler behavior. New XML classifier enables classification of XML data.

2017/10/24 - 1 new 4 updated api methods

Changes   AWS Glue: Adding a new API, BatchStopJobRun, to stop one or more job runs for a specified Job.

2017/08/14 - 74 new api methods

Changes   AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes. AWS Glue generates Python code that is entirely customizable, reusable, and portable. Once your ETL job is ready, you can schedule it to run on AWS Glue's fully managed, scale-out Spark environment. AWS Glue provides a flexible scheduler with dependency resolution, job monitoring, and alerting. AWS Glue is serverless, so there is no infrastructure to buy, set up, or manage. It automatically provisions the environment needed to complete the job, and customers pay only for the compute resources consumed while running ETL jobs. With AWS Glue, data can be available for analytics in minutes.