AWS Glue

2020/07/27 - 1 new 4 updated api methods

Changes   Add ability to manually resume workflows in AWS Glue providing customers further control over the orchestration of ETL workloads.

2020/07/07 - 1 new 19 updated api methods

Changes   AWS Glue Data Catalog supports cross account sharing of tables through AWS Lake Formation

2020/06/25 - 6 new api methods

Changes   This release adds new APIs to support column level statistics in AWS Glue Data Catalog

2020/06/12 - 5 updated api methods

Changes   You can now choose to crawl the entire table or just a sample of records in DynamoDB when using AWS Glue crawlers. Additionally, you can also specify a scanning rate for crawling DynamoDB tables.

2020/06/03 - 2 updated api methods

Changes   Adding databaseName in the response for GetUserDefinedFunctions() API.

2020/05/15 - 1 new 9 updated api methods

Changes   Starting today, you can stop the execution of Glue workflows that are running. AWS Glue workflows are directed acyclic graphs (DAGs) of Glue triggers, crawlers and jobs. Using a workflow, you can design a complex multi-job extract, transform, and load (ETL) activity that AWS Glue can execute and track as single entity.

2020/04/20 - 4 updated api methods

Changes   Added a new ConnectionType "KAFKA" and a ConnectionProperty "KAFKA_BOOTSTRAP_SERVERS" to support Kafka connection.

2020/03/31 - 4 updated api methods

Changes   Add two enums for MongoDB connection: Added "CONNECTION_URL" to "ConnectionPropertyKey" and added "MONGODB" to "ConnectionType"

2020/02/28 - 1 new 1 updated api methods

Changes   AWS Glue adds resource tagging support for Machine Learning Transforms and adds a new API, ListMLTransforms to support tag filtering. With this feature, customers can use tags in AWS Glue to organize and control access to Machine Learning Transforms.

2020/02/12 - 5 updated api methods

Changes   Adding ability to add arguments that cannot be overridden to AWS Glue jobs

2019/11/21 - 4 updated api methods

Changes   This release adds support for Glue 1.0 compatible ML Transforms.

2019/09/19 - 4 updated api methods

Changes   AWS Glue DevEndpoints now supports GlueVersion, enabling you to choose Apache Spark 2.4.3 (in addition to Apache Spark 2.2.1). In addition to supporting the latest version of Spark, you will also have the ability to choose between Python 2 and Python 3.

2019/08/08 - 13 new 16 updated api methods

Changes   You can now use AWS Glue to find matching records across dataset even without identifiers to join on by using the new FindMatches ML Transform. Find related products, places, suppliers, customers, and more by teaching a custom machine learning transformation that you can use to identify matching matching records as part of your analysis, data cleaning, or master data management project by adding the FindMatches transformation to your Glue ETL Jobs. If your problem is more along the lines of deduplication, you can use the FindMatches in much the same way to identify customers who have signed up more than ones, products that have accidentally been added to your product catalog more than once, and so forth. Using the FindMatches MLTransform, you can teach a Transform your definition of a duplicate through examples, and it will use machine learning to identify other potential duplicates in your dataset. As with data integration, you can then use your new Transform in your deduplication projects by adding the FindMatches transformation to your Glue ETL Jobs. This release also contains additional APIs that support AWS Lake Formation.

2019/07/26 - 2 new 1 updated api methods

Changes   This release provides GetJobBookmark and GetJobBookmarks APIs. These APIs enable users to look at specific versions or all versions of the JobBookmark for a specific job. This release also enables resetting the job bookmark to a specific run via an enhancement of the ResetJobBookmark API.

2019/07/24 - 16 updated api methods

Changes   This release provides GlueVersion option for Job APIs and WorkerType option for DevEndpoint APIs. Job APIs enable users to pick specific GlueVersion for a specific job and pin the job to a specific runtime environment. DevEndpoint APIs enable users to pick different WorkerType for memory intensive workload.

2019/06/20 - 11 new 5 updated api methods

Changes   Starting today, you can now use workflows in AWS Glue to author directed acyclic graphs (DAGs) of Glue triggers, crawlers and jobs. Workflows enable orchestration of your ETL workloads by building dependencies between Glue entities (triggers, crawlers and jobs). You can visually track status of the different nodes in the workflows on the console making it easier to monitor progress and troubleshoot issues. Also, you can share parameters across entities in the workflow.

2019/06/05 - 5 updated api methods

Changes   Support specifying python version for Python shell jobs. A new parameter PythonVersion is added to the JobCommand data type.

2019/05/10 - 5 updated api methods

Changes   AWS Glue now supports specifying existing catalog tables for a crawler to examine as a data source. A new parameter CatalogTargets is added to the CrawlerTargets data type.

2019/04/05 - 8 updated api methods

Changes   AWS Glue now supports workerType choices in the CreateJob, UpdateJob, and StartJobRun APIs, to be used for memory-intensive jobs.

2019/03/26 - 4 updated api methods

Changes   This new feature will now allow customers to add a customized csv classifier with classifier API. They can specify a custom delimiter, quote symbol and control other behavior they'd like crawlers to have while recognizing csv files

2019/03/11 - 5 updated api methods

Changes   CreateDevEndpoint and UpdateDevEndpoint now support Arguments to configure the DevEndpoint.

2019/02/22 - 11 new 4 updated api methods

Changes   AWS Glue adds support for assigning AWS resource tags to jobs, triggers, development endpoints, and crawlers. Each tag consists of a key and an optional value, both of which you define. With this capacity, customers can use tags in AWS Glue to easily organize and identify your resources, create cost allocation reports, and control access to resources.

2019/01/18 - 7 updated api methods

Changes   AllocatedCapacity field is being deprecated and replaced with MaxCapacity field

2018/12/12 - 4 updated api methods

Changes   API Update for Glue: this update enables encryption of password inside connection objects stored in AWS Glue Data Catalog using DataCatalogEncryptionSettings. In addition, a new "HidePassword" flag is added to GetConnection and GetConnections to return connections without passwords.

2018/10/16 - 3 new api methods

Changes   New Glue APIs for creating, updating, reading and deleting Data Catalog resource-based policies.

2018/09/26 - 1 new api methods

Changes   AWS Glue now supports data encryption at rest for ETL jobs and development endpoints. With encryption enabled, when you run ETL jobs, or development endpoints, Glue will use AWS KMS keys to write encrypted data at rest. You can also encrypt the metadata stored in the Glue Data Catalog using keys that you manage with AWS KMS. Additionally, you can use AWS KMS keys to encrypt the logs generated by crawlers and ETL jobs as well as encrypt ETL job bookmarks. Encryption settings for Glue crawlers, ETL jobs, and development endpoints can be configured using the security configurations in Glue. Glue Data Catalog encryption can be enabled via the settings for the Glue Data Catalog.

2018/08/28 - 3 new api methods

Changes   New Glue APIs for creating, updating, reading and deleting Data Catalog resource-based policies.

2018/08/25 - 5 new 18 updated api methods

Changes   AWS Glue now supports data encryption at rest for ETL jobs and development endpoints. With encryption enabled, when you run ETL jobs, or development endpoints, Glue will use AWS KMS keys to write encrypted data at rest. You can also encrypt the metadata stored in the Glue Data Catalog using keys that you manage with AWS KMS. Additionally, you can use AWS KMS keys to encrypt the logs generated by crawlers and ETL jobs as well as encrypt ETL job bookmarks. Encryption settings for Glue crawlers, ETL jobs, and development endpoints can be configured using the security configurations in Glue. Glue Data Catalog encryption can be enabled via the settings for the Glue Data Catalog.

2018/07/30 - 4 updated api methods

Changes   Glue Development Endpoints now support association of multiple SSH public keys with a development endpoint.

2018/07/10 - 6 updated api methods

Changes   AWS Glue adds the ability to crawl DynamoDB tables.

2018/05/25 - 11 updated api methods

Changes   AWS Glue now sends a delay notification to Amazon CloudWatch Events when an ETL job runs longer than the specified delay notification threshold.

2018/04/10 - 11 updated api methods

Changes   "AWS Glue now supports timeout values for ETL jobs. With this release, all new ETL jobs have a default timeout value of 48 hours. AWS Glue also now supports the ability to start a schedule or job events trigger when it is created."

2018/03/20 - 2 updated api methods

Changes   API Updates for DevEndpoint: PublicKey is now optional for CreateDevEndpoint. The new DevEndpoint field PrivateAddress will be populated for DevEndpoints associated with a VPC.

2018/02/06 - 4 updated api methods

Changes   This new feature will now allow customers to add a customized json classifier. They can specify a json path to indicate the object, array or field of the json documents they'd like crawlers to inspect when they crawl json files.

2018/01/19 - 3 new 1 updated api methods

Changes   New AWS Glue DataCatalog APIs to manage table versions and a new feature to skip archiving of the old table version when updating table.

2018/01/12 - 6 updated api methods

Changes   Support is added to generate ETL scripts in Scala which can now be run by AWS Glue ETL jobs. In addition, the trigger API now supports firing when any conditions are met (in addition to all conditions). Also, jobs can be triggered based on a "failed" or "stopped" job run (in addition to a "succeeded" job run).

2017/11/16 - 8 updated api methods

Changes   API update for AWS Glue. New crawler configuration attribute enables customers to specify crawler behavior. New XML classifier enables classification of XML data.

2017/10/24 - 1 new 4 updated api methods

Changes   AWS Glue: Adding a new API, BatchStopJobRun, to stop one or more job runs for a specified Job.

2017/08/14 - 74 new api methods

Changes   AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes. AWS Glue generates Python code that is entirely customizable, reusable, and portable. Once your ETL job is ready, you can schedule it to run on AWS Glue's fully managed, scale-out Spark environment. AWS Glue provides a flexible scheduler with dependency resolution, job monitoring, and alerting. AWS Glue is serverless, so there is no infrastructure to buy, set up, or manage. It automatically provisions the environment needed to complete the job, and customers pay only for the compute resources consumed while running ETL jobs. With AWS Glue, data can be available for analytics in minutes.