2018/02/06 - AWS Glue - 4 updated api methods
Changes This new feature will now allow customers to add a customized json classifier. They can specify a json path to indicate the object, array or field of the json documents they'd like crawlers to inspect when they crawl json files.
{'JsonClassifier': {'JsonPath': 'string', 'Name': 'string'}}
Creates a classifier in the user's account. This may be a GrokClassifier , an XMLClassifier , or abbrev JsonClassifier , depending on which field of the request is present.
See also: AWS API Documentation
Request Syntax
client.create_classifier( GrokClassifier={ 'Classification': 'string', 'Name': 'string', 'GrokPattern': 'string', 'CustomPatterns': 'string' }, XMLClassifier={ 'Classification': 'string', 'Name': 'string', 'RowTag': 'string' }, JsonClassifier={ 'Name': 'string', 'JsonPath': 'string' } )
dict
A GrokClassifier object specifying the classifier to create.
Classification (string) -- [REQUIRED]
An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.
Name (string) -- [REQUIRED]
The name of the new classifier.
GrokPattern (string) -- [REQUIRED]
The grok pattern used by this classifier.
CustomPatterns (string) --
Optional custom grok patterns used by this classifier.
dict
An XMLClassifier object specifying the classifier to create.
Classification (string) -- [REQUIRED]
An identifier of the data format that the classifier matches.
Name (string) -- [REQUIRED]
The name of the classifier.
RowTag (string) --
The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).
dict
A JsonClassifier object specifying the classifier to create.
Name (string) -- [REQUIRED]
The name of the classifier.
JsonPath (string) -- [REQUIRED]
A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers .
dict
Response Syntax
{}
Response Structure
(dict) --
{'Classifier': {'JsonClassifier': {'CreationTime': 'timestamp', 'JsonPath': 'string', 'LastUpdated': 'timestamp', 'Name': 'string', 'Version': 'long'}}}
Retrieve a classifier by name.
See also: AWS API Documentation
Request Syntax
client.get_classifier( Name='string' )
string
[REQUIRED]
Name of the classifier to retrieve.
dict
Response Syntax
{ 'Classifier': { 'GrokClassifier': { 'Name': 'string', 'Classification': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'Version': 123, 'GrokPattern': 'string', 'CustomPatterns': 'string' }, 'XMLClassifier': { 'Name': 'string', 'Classification': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'Version': 123, 'RowTag': 'string' }, 'JsonClassifier': { 'Name': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'Version': 123, 'JsonPath': 'string' } } }
Response Structure
(dict) --
Classifier (dict) --
The requested classifier.
GrokClassifier (dict) --
A GrokClassifier object.
Name (string) --
The name of the classifier.
Classification (string) --
An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, and so on.
CreationTime (datetime) --
The time this classifier was registered.
LastUpdated (datetime) --
The time this classifier was last updated.
Version (integer) --
The version of this classifier.
GrokPattern (string) --
The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in Writing Custom Classifers .
CustomPatterns (string) --
Optional custom grok patterns defined by this classifier. For more information, see custom patterns in Writing Custom Classifers .
XMLClassifier (dict) --
An XMLClassifier object.
Name (string) --
The name of the classifier.
Classification (string) --
An identifier of the data format that the classifier matches.
CreationTime (datetime) --
The time this classifier was registered.
LastUpdated (datetime) --
The time this classifier was last updated.
Version (integer) --
The version of this classifier.
RowTag (string) --
The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).
JsonClassifier (dict) --
A JsonClassifier object.
Name (string) --
The name of the classifier.
CreationTime (datetime) --
The time this classifier was registered.
LastUpdated (datetime) --
The time this classifier was last updated.
Version (integer) --
The version of this classifier.
JsonPath (string) --
A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers .
{'Classifiers': {'JsonClassifier': {'CreationTime': 'timestamp', 'JsonPath': 'string', 'LastUpdated': 'timestamp', 'Name': 'string', 'Version': 'long'}}}
Lists all classifier objects in the Data Catalog.
See also: AWS API Documentation
Request Syntax
client.get_classifiers( MaxResults=123, NextToken='string' )
integer
Size of the list to return (optional).
string
An optional continuation token.
dict
Response Syntax
{ 'Classifiers': [ { 'GrokClassifier': { 'Name': 'string', 'Classification': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'Version': 123, 'GrokPattern': 'string', 'CustomPatterns': 'string' }, 'XMLClassifier': { 'Name': 'string', 'Classification': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'Version': 123, 'RowTag': 'string' }, 'JsonClassifier': { 'Name': 'string', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'Version': 123, 'JsonPath': 'string' } }, ], 'NextToken': 'string' }
Response Structure
(dict) --
Classifiers (list) --
The requested list of classifier objects.
(dict) --
Classifiers are written in Python and triggered during a crawl task. You can write your own classifiers to best categorize your data sources and specify the appropriate schemas to use for them. A classifier checks whether a given file is in a format it can handle, and if it is, the classifier creates a schema in the form of a StructType object that matches that data format.
A classifier can be a grok classifier, an XML classifier, or a JSON classifier, asspecified in one of the fields in the Classifier object.
GrokClassifier (dict) --
A GrokClassifier object.
Name (string) --
The name of the classifier.
Classification (string) --
An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, and so on.
CreationTime (datetime) --
The time this classifier was registered.
LastUpdated (datetime) --
The time this classifier was last updated.
Version (integer) --
The version of this classifier.
GrokPattern (string) --
The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in Writing Custom Classifers .
CustomPatterns (string) --
Optional custom grok patterns defined by this classifier. For more information, see custom patterns in Writing Custom Classifers .
XMLClassifier (dict) --
An XMLClassifier object.
Name (string) --
The name of the classifier.
Classification (string) --
An identifier of the data format that the classifier matches.
CreationTime (datetime) --
The time this classifier was registered.
LastUpdated (datetime) --
The time this classifier was last updated.
Version (integer) --
The version of this classifier.
RowTag (string) --
The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).
JsonClassifier (dict) --
A JsonClassifier object.
Name (string) --
The name of the classifier.
CreationTime (datetime) --
The time this classifier was registered.
LastUpdated (datetime) --
The time this classifier was last updated.
Version (integer) --
The version of this classifier.
JsonPath (string) --
A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers .
NextToken (string) --
A continuation token.
{'JsonClassifier': {'JsonPath': 'string', 'Name': 'string'}}
Modifies an existing classifier (a GrokClassifier , XMLClassifier , or JsonClassifier , depending on which field is present).
See also: AWS API Documentation
Request Syntax
client.update_classifier( GrokClassifier={ 'Name': 'string', 'Classification': 'string', 'GrokPattern': 'string', 'CustomPatterns': 'string' }, XMLClassifier={ 'Name': 'string', 'Classification': 'string', 'RowTag': 'string' }, JsonClassifier={ 'Name': 'string', 'JsonPath': 'string' } )
dict
A GrokClassifier object with updated fields.
Name (string) -- [REQUIRED]
The name of the GrokClassifier .
Classification (string) --
An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.
GrokPattern (string) --
The grok pattern used by this classifier.
CustomPatterns (string) --
Optional custom grok patterns used by this classifier.
dict
An XMLClassifier object with updated fields.
Name (string) -- [REQUIRED]
The name of the classifier.
Classification (string) --
An identifier of the data format that the classifier matches.
RowTag (string) --
The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).
dict
A JsonClassifier object with updated fields.
Name (string) -- [REQUIRED]
The name of the classifier.
JsonPath (string) --
A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers .
dict
Response Syntax
{}
Response Structure
(dict) --