2022/04/27 - Amazon Lookout for Equipment - 1 new 4 updated api methods
Changes This release adds the following new features: 1) Introduces an option for automatic schema creation 2) Now allows for Ingestion of data containing most common errors and allows automatic data cleaning 3) Introduces new API ListSensorStatistics that gives further information about the ingested data
Lists statistics about the data collected for each of the sensors that have been successfully ingested in the particular dataset. Can also be used to retreive Sensor Statistics for a previous ingestion job.
See also: AWS API Documentation
Request Syntax
client.list_sensor_statistics( DatasetName='string', IngestionJobId='string', MaxResults=123, NextToken='string' )
string
[REQUIRED]
The name of the dataset associated with the list of Sensor Statistics.
string
The ingestion job id associated with the list of Sensor Statistics. To get sensor statistics for a particular ingestion job id, both dataset name and ingestion job id must be submitted as inputs.
integer
Specifies the maximum number of sensors for which to retrieve statistics.
string
An opaque pagination token indicating where to continue the listing of sensor statistics.
dict
Response Syntax
{ 'SensorStatisticsSummaries': [ { 'ComponentName': 'string', 'SensorName': 'string', 'DataExists': True|False, 'MissingValues': { 'Count': 123, 'Percentage': ... }, 'InvalidValues': { 'Count': 123, 'Percentage': ... }, 'InvalidDateEntries': { 'Count': 123, 'Percentage': ... }, 'DuplicateTimestamps': { 'Count': 123, 'Percentage': ... }, 'CategoricalValues': { 'Status': 'POTENTIAL_ISSUE_DETECTED'|'NO_ISSUE_DETECTED', 'NumberOfCategory': 123 }, 'MultipleOperatingModes': { 'Status': 'POTENTIAL_ISSUE_DETECTED'|'NO_ISSUE_DETECTED' }, 'LargeTimestampGaps': { 'Status': 'POTENTIAL_ISSUE_DETECTED'|'NO_ISSUE_DETECTED', 'NumberOfLargeTimestampGaps': 123, 'MaxTimestampGapInDays': 123 }, 'MonotonicValues': { 'Status': 'POTENTIAL_ISSUE_DETECTED'|'NO_ISSUE_DETECTED', 'Monotonicity': 'DECREASING'|'INCREASING'|'STATIC' }, 'DataStartTime': datetime(2015, 1, 1), 'DataEndTime': datetime(2015, 1, 1) }, ], 'NextToken': 'string' }
Response Structure
(dict) --
SensorStatisticsSummaries (list) --
Provides ingestion-based statistics regarding the specified sensor with respect to various validation types, such as whether data exists, the number and percentage of missing values, and the number and percentage of duplicate timestamps.
(dict) --
Summary of ingestion statistics like whether data exists, number of missing values, number of invalid values and so on related to the particular sensor.
ComponentName (string) --
Name of the component to which the particular sensor belongs for which the statistics belong to.
SensorName (string) --
Name of the sensor that the statistics belong to.
DataExists (boolean) --
Parameter that indicates whether data exists for the sensor that the statistics belong to.
MissingValues (dict) --
Parameter that describes the total number of, and percentage of, values that are missing for the sensor that the statistics belong to.
Count (integer) --
Indicates the count of occurences of the given statistic.
Percentage (float) --
Indicates the percentage of occurances of the given statistic.
InvalidValues (dict) --
Parameter that describes the total number of, and percentage of, values that are invalid for the sensor that the statistics belong to.
Count (integer) --
Indicates the count of occurences of the given statistic.
Percentage (float) --
Indicates the percentage of occurances of the given statistic.
InvalidDateEntries (dict) --
Parameter that describes the total number of invalid date entries associated with the sensor that the statistics belong to.
Count (integer) --
Indicates the count of occurences of the given statistic.
Percentage (float) --
Indicates the percentage of occurances of the given statistic.
DuplicateTimestamps (dict) --
Parameter that describes the total number of duplicate timestamp records associated with the sensor that the statistics belong to.
Count (integer) --
Indicates the count of occurences of the given statistic.
Percentage (float) --
Indicates the percentage of occurances of the given statistic.
CategoricalValues (dict) --
Parameter that describes potential risk about whether data associated with the sensor is categorical.
Status (string) --
Indicates whether there is a potential data issue related to categorical values.
NumberOfCategory (integer) --
Indicates the number of categories in the data.
MultipleOperatingModes (dict) --
Parameter that describes potential risk about whether data associated with the sensor has more than one operating mode.
Status (string) --
Indicates whether there is a potential data issue related to having multiple operating modes.
LargeTimestampGaps (dict) --
Parameter that describes potential risk about whether data associated with the sensor contains one or more large gaps between consecutive timestamps.
Status (string) --
Indicates whether there is a potential data issue related to large gaps in timestamps.
NumberOfLargeTimestampGaps (integer) --
Indicates the number of large timestamp gaps, if there are any.
MaxTimestampGapInDays (integer) --
Indicates the size of the largest timestamp gap, in days.
MonotonicValues (dict) --
Parameter that describes potential risk about whether data associated with the sensor is mostly monotonic.
Status (string) --
Indicates whether there is a potential data issue related to having monotonic values.
Monotonicity (string) --
Indicates the monotonicity of values. Can be INCREASING, DECREASING, or STATIC.
DataStartTime (datetime) --
Indicates the time reference to indicate the beginning of valid data associated with the sensor that the statistics belong to.
DataEndTime (datetime) --
Indicates the time reference to indicate the end of valid data associated with the sensor that the statistics belong to.
NextToken (string) --
An opaque pagination token indicating where to continue the listing of sensor statistics.
{'DataEndTime': 'timestamp', 'DataQualitySummary': {'DuplicateTimestamps': {'TotalNumberOfDuplicateTimestamps': 'integer'}, 'InsufficientSensorData': {'MissingCompleteSensorData': {'AffectedSensorCount': 'integer'}, 'SensorsWithShortDateRange': {'AffectedSensorCount': 'integer'}}, 'InvalidSensorData': {'AffectedSensorCount': 'integer', 'TotalNumberOfInvalidValues': 'integer'}, 'MissingSensorData': {'AffectedSensorCount': 'integer', 'TotalNumberOfMissingValues': 'integer'}, 'UnsupportedTimestamps': {'TotalNumberOfUnsupportedTimestamps': 'integer'}}, 'DataStartTime': 'timestamp', 'IngestedDataSize': 'long', 'IngestedFilesSummary': {'DiscardedFiles': [{'Bucket': 'string', 'Key': 'string'}], 'IngestedNumberOfFiles': 'integer', 'TotalNumberOfFiles': 'integer'}, 'IngestionInputConfiguration': {'S3InputConfiguration': {'KeyPattern': 'string'}}, 'StatusDetail': 'string'}
Provides information on a specific data ingestion job such as creation time, dataset ARN, and status.
See also: AWS API Documentation
Request Syntax
client.describe_data_ingestion_job( JobId='string' )
string
[REQUIRED]
The job ID of the data ingestion job.
dict
Response Syntax
{ 'JobId': 'string', 'DatasetArn': 'string', 'IngestionInputConfiguration': { 'S3InputConfiguration': { 'Bucket': 'string', 'Prefix': 'string', 'KeyPattern': 'string' } }, 'RoleArn': 'string', 'CreatedAt': datetime(2015, 1, 1), 'Status': 'IN_PROGRESS'|'SUCCESS'|'FAILED', 'FailedReason': 'string', 'DataQualitySummary': { 'InsufficientSensorData': { 'MissingCompleteSensorData': { 'AffectedSensorCount': 123 }, 'SensorsWithShortDateRange': { 'AffectedSensorCount': 123 } }, 'MissingSensorData': { 'AffectedSensorCount': 123, 'TotalNumberOfMissingValues': 123 }, 'InvalidSensorData': { 'AffectedSensorCount': 123, 'TotalNumberOfInvalidValues': 123 }, 'UnsupportedTimestamps': { 'TotalNumberOfUnsupportedTimestamps': 123 }, 'DuplicateTimestamps': { 'TotalNumberOfDuplicateTimestamps': 123 } }, 'IngestedFilesSummary': { 'TotalNumberOfFiles': 123, 'IngestedNumberOfFiles': 123, 'DiscardedFiles': [ { 'Bucket': 'string', 'Key': 'string' }, ] }, 'StatusDetail': 'string', 'IngestedDataSize': 123, 'DataStartTime': datetime(2015, 1, 1), 'DataEndTime': datetime(2015, 1, 1) }
Response Structure
(dict) --
JobId (string) --
Indicates the job ID of the data ingestion job.
DatasetArn (string) --
The Amazon Resource Name (ARN) of the dataset being used in the data ingestion job.
IngestionInputConfiguration (dict) --
Specifies the S3 location configuration for the data input for the data ingestion job.
S3InputConfiguration (dict) --
The location information for the S3 bucket used for input data for the data ingestion.
Bucket (string) --
The name of the S3 bucket used for the input data for the data ingestion.
Prefix (string) --
The prefix for the S3 location being used for the input data for the data ingestion.
KeyPattern (string) --
Pattern for matching the Amazon S3 files which will be used for ingestion. If no KeyPattern is provided, we will use the default hierarchy file structure, which is same as KeyPattern {prefix}/{component_name}/*
RoleArn (string) --
The Amazon Resource Name (ARN) of an IAM role with permission to access the data source being ingested.
CreatedAt (datetime) --
The time at which the data ingestion job was created.
Status (string) --
Indicates the status of the DataIngestionJob operation.
FailedReason (string) --
Specifies the reason for failure when a data ingestion job has failed.
DataQualitySummary (dict) --
Gives statistics about a completed ingestion job. These statistics primarily relate to quantifying incorrect data such as MissingCompleteSensorData, MissingSensorData, UnsupportedDateFormats, InsufficientSensorData, and DuplicateTimeStamps.
InsufficientSensorData (dict) --
Parameter that gives information about insufficient data for sensors in the dataset. This includes information about those sensors that have complete data missing and those with a short date range.
MissingCompleteSensorData (dict) --
Parameter that describes the total number of sensors that have data completely missing for it.
AffectedSensorCount (integer) --
Indicates the number of sensors that have data missing completely.
SensorsWithShortDateRange (dict) --
Parameter that describes the total number of sensors that have a short date range of less than 90 days of data overall.
AffectedSensorCount (integer) --
Indicates the number of sensors that have less than 90 days of data.
MissingSensorData (dict) --
Parameter that gives information about data that is missing over all the sensors in the input data.
AffectedSensorCount (integer) --
Indicates the number of sensors that have atleast some data missing.
TotalNumberOfMissingValues (integer) --
Indicates the total number of missing values across all the sensors.
InvalidSensorData (dict) --
Parameter that gives information about data that is invalid over all the sensors in the input data.
AffectedSensorCount (integer) --
Indicates the number of sensors that have at least some invalid values.
TotalNumberOfInvalidValues (integer) --
Indicates the total number of invalid values across all the sensors.
UnsupportedTimestamps (dict) --
Parameter that gives information about unsupported timestamps in the input data.
TotalNumberOfUnsupportedTimestamps (integer) --
Indicates the total number of unsupported timestamps across the ingested data.
DuplicateTimestamps (dict) --
Parameter that gives information about duplicate timestamps in the input data.
TotalNumberOfDuplicateTimestamps (integer) --
Indicates the total number of duplicate timestamps.
IngestedFilesSummary (dict) --
Gives statistics about how many files have been ingested, and which files have not been ingested, for a particular ingestion job.
TotalNumberOfFiles (integer) --
Indicates the total number of files that were submitted for ingestion.
IngestedNumberOfFiles (integer) --
Indicates the number of files that were successfully ingested.
DiscardedFiles (list) --
Indicates the number of files that were discarded. A file could be discarded because its format is invalid (for example, a jpg or pdf) or not readable.
(dict) --
Contains information about an S3 bucket.
Bucket (string) --
The name of the specific S3 bucket.
Key (string) --
The AWS Key Management Service (AWS KMS) key being used to encrypt the S3 object. Without this key, data in the bucket is not accessible.
StatusDetail (string) --
Provides details about status of the ingestion job that is currently in progress.
IngestedDataSize (integer) --
Indicates the size of the ingested dataset.
DataStartTime (datetime) --
Indicates the earliest timestamp corresponding to data that was successfully ingested during this specific ingestion job.
DataEndTime (datetime) --
Indicates the latest timestamp corresponding to data that was successfully ingested during this specific ingestion job.
{'DataEndTime': 'timestamp', 'DataQualitySummary': {'DuplicateTimestamps': {'TotalNumberOfDuplicateTimestamps': 'integer'}, 'InsufficientSensorData': {'MissingCompleteSensorData': {'AffectedSensorCount': 'integer'}, 'SensorsWithShortDateRange': {'AffectedSensorCount': 'integer'}}, 'InvalidSensorData': {'AffectedSensorCount': 'integer', 'TotalNumberOfInvalidValues': 'integer'}, 'MissingSensorData': {'AffectedSensorCount': 'integer', 'TotalNumberOfMissingValues': 'integer'}, 'UnsupportedTimestamps': {'TotalNumberOfUnsupportedTimestamps': 'integer'}}, 'DataStartTime': 'timestamp', 'IngestedFilesSummary': {'DiscardedFiles': [{'Bucket': 'string', 'Key': 'string'}], 'IngestedNumberOfFiles': 'integer', 'TotalNumberOfFiles': 'integer'}, 'IngestionInputConfiguration': {'S3InputConfiguration': {'KeyPattern': 'string'}}, 'RoleArn': 'string'}
Provides a JSON description of the data in each time series dataset, including names, column names, and data types.
See also: AWS API Documentation
Request Syntax
client.describe_dataset( DatasetName='string' )
string
[REQUIRED]
The name of the dataset to be described.
dict
Response Syntax
{ 'DatasetName': 'string', 'DatasetArn': 'string', 'CreatedAt': datetime(2015, 1, 1), 'LastUpdatedAt': datetime(2015, 1, 1), 'Status': 'CREATED'|'INGESTION_IN_PROGRESS'|'ACTIVE', 'Schema': 'string', 'ServerSideKmsKeyId': 'string', 'IngestionInputConfiguration': { 'S3InputConfiguration': { 'Bucket': 'string', 'Prefix': 'string', 'KeyPattern': 'string' } }, 'DataQualitySummary': { 'InsufficientSensorData': { 'MissingCompleteSensorData': { 'AffectedSensorCount': 123 }, 'SensorsWithShortDateRange': { 'AffectedSensorCount': 123 } }, 'MissingSensorData': { 'AffectedSensorCount': 123, 'TotalNumberOfMissingValues': 123 }, 'InvalidSensorData': { 'AffectedSensorCount': 123, 'TotalNumberOfInvalidValues': 123 }, 'UnsupportedTimestamps': { 'TotalNumberOfUnsupportedTimestamps': 123 }, 'DuplicateTimestamps': { 'TotalNumberOfDuplicateTimestamps': 123 } }, 'IngestedFilesSummary': { 'TotalNumberOfFiles': 123, 'IngestedNumberOfFiles': 123, 'DiscardedFiles': [ { 'Bucket': 'string', 'Key': 'string' }, ] }, 'RoleArn': 'string', 'DataStartTime': datetime(2015, 1, 1), 'DataEndTime': datetime(2015, 1, 1) }
Response Structure
(dict) --
DatasetName (string) --
The name of the dataset being described.
DatasetArn (string) --
The Amazon Resource Name (ARN) of the dataset being described.
CreatedAt (datetime) --
Specifies the time the dataset was created in Amazon Lookout for Equipment.
LastUpdatedAt (datetime) --
Specifies the time the dataset was last updated, if it was.
Status (string) --
Indicates the status of the dataset.
Schema (string) --
A JSON description of the data that is in each time series dataset, including names, column names, and data types.
ServerSideKmsKeyId (string) --
Provides the identifier of the KMS key used to encrypt dataset data by Amazon Lookout for Equipment.
IngestionInputConfiguration (dict) --
Specifies the S3 location configuration for the data input for the data ingestion job.
S3InputConfiguration (dict) --
The location information for the S3 bucket used for input data for the data ingestion.
Bucket (string) --
The name of the S3 bucket used for the input data for the data ingestion.
Prefix (string) --
The prefix for the S3 location being used for the input data for the data ingestion.
KeyPattern (string) --
Pattern for matching the Amazon S3 files which will be used for ingestion. If no KeyPattern is provided, we will use the default hierarchy file structure, which is same as KeyPattern {prefix}/{component_name}/*
DataQualitySummary (dict) --
Gives statistics associated with the given dataset for the latest successful associated ingestion job id. These statistics primarily relate to quantifying incorrect data such as MissingCompleteSensorData, MissingSensorData, UnsupportedDateFormats, InsufficientSensorData, and DuplicateTimeStamps.
InsufficientSensorData (dict) --
Parameter that gives information about insufficient data for sensors in the dataset. This includes information about those sensors that have complete data missing and those with a short date range.
MissingCompleteSensorData (dict) --
Parameter that describes the total number of sensors that have data completely missing for it.
AffectedSensorCount (integer) --
Indicates the number of sensors that have data missing completely.
SensorsWithShortDateRange (dict) --
Parameter that describes the total number of sensors that have a short date range of less than 90 days of data overall.
AffectedSensorCount (integer) --
Indicates the number of sensors that have less than 90 days of data.
MissingSensorData (dict) --
Parameter that gives information about data that is missing over all the sensors in the input data.
AffectedSensorCount (integer) --
Indicates the number of sensors that have atleast some data missing.
TotalNumberOfMissingValues (integer) --
Indicates the total number of missing values across all the sensors.
InvalidSensorData (dict) --
Parameter that gives information about data that is invalid over all the sensors in the input data.
AffectedSensorCount (integer) --
Indicates the number of sensors that have at least some invalid values.
TotalNumberOfInvalidValues (integer) --
Indicates the total number of invalid values across all the sensors.
UnsupportedTimestamps (dict) --
Parameter that gives information about unsupported timestamps in the input data.
TotalNumberOfUnsupportedTimestamps (integer) --
Indicates the total number of unsupported timestamps across the ingested data.
DuplicateTimestamps (dict) --
Parameter that gives information about duplicate timestamps in the input data.
TotalNumberOfDuplicateTimestamps (integer) --
Indicates the total number of duplicate timestamps.
IngestedFilesSummary (dict) --
IngestedFilesSummary associated with the given dataset for the latest successful associated ingestion job id.
TotalNumberOfFiles (integer) --
Indicates the total number of files that were submitted for ingestion.
IngestedNumberOfFiles (integer) --
Indicates the number of files that were successfully ingested.
DiscardedFiles (list) --
Indicates the number of files that were discarded. A file could be discarded because its format is invalid (for example, a jpg or pdf) or not readable.
(dict) --
Contains information about an S3 bucket.
Bucket (string) --
The name of the specific S3 bucket.
Key (string) --
The AWS Key Management Service (AWS KMS) key being used to encrypt the S3 object. Without this key, data in the bucket is not accessible.
RoleArn (string) --
The Amazon Resource Name (ARN) of the IAM role that you are using for this the data ingestion job.
DataStartTime (datetime) --
Indicates the earliest timestamp corresponding to data that was successfully ingested during the most recent ingestion of this particular dataset.
DataEndTime (datetime) --
Indicates the latest timestamp corresponding to data that was successfully ingested during the most recent ingestion of this particular dataset.
{'DataIngestionJobSummaries': {'IngestionInputConfiguration': {'S3InputConfiguration': {'KeyPattern': 'string'}}}}
Provides a list of all data ingestion jobs, including dataset name and ARN, S3 location of the input data, status, and so on.
See also: AWS API Documentation
Request Syntax
client.list_data_ingestion_jobs( DatasetName='string', NextToken='string', MaxResults=123, Status='IN_PROGRESS'|'SUCCESS'|'FAILED' )
string
The name of the dataset being used for the data ingestion job.
string
An opaque pagination token indicating where to continue the listing of data ingestion jobs.
integer
Specifies the maximum number of data ingestion jobs to list.
string
Indicates the status of the data ingestion job.
dict
Response Syntax
{ 'NextToken': 'string', 'DataIngestionJobSummaries': [ { 'JobId': 'string', 'DatasetName': 'string', 'DatasetArn': 'string', 'IngestionInputConfiguration': { 'S3InputConfiguration': { 'Bucket': 'string', 'Prefix': 'string', 'KeyPattern': 'string' } }, 'Status': 'IN_PROGRESS'|'SUCCESS'|'FAILED' }, ] }
Response Structure
(dict) --
NextToken (string) --
An opaque pagination token indicating where to continue the listing of data ingestion jobs.
DataIngestionJobSummaries (list) --
Specifies information about the specific data ingestion job, including dataset name and status.
(dict) --
Provides information about a specified data ingestion job, including dataset information, data ingestion configuration, and status.
JobId (string) --
Indicates the job ID of the data ingestion job.
DatasetName (string) --
The name of the dataset used for the data ingestion job.
DatasetArn (string) --
The Amazon Resource Name (ARN) of the dataset used in the data ingestion job.
IngestionInputConfiguration (dict) --
Specifies information for the input data for the data inference job, including data Amazon S3 location parameters.
S3InputConfiguration (dict) --
The location information for the S3 bucket used for input data for the data ingestion.
Bucket (string) --
The name of the S3 bucket used for the input data for the data ingestion.
Prefix (string) --
The prefix for the S3 location being used for the input data for the data ingestion.
KeyPattern (string) --
Pattern for matching the Amazon S3 files which will be used for ingestion. If no KeyPattern is provided, we will use the default hierarchy file structure, which is same as KeyPattern {prefix}/{component_name}/*
Status (string) --
Indicates the status of the data ingestion job.
{'IngestionInputConfiguration': {'S3InputConfiguration': {'KeyPattern': 'string'}}}
Starts a data ingestion job. Amazon Lookout for Equipment returns the job status.
See also: AWS API Documentation
Request Syntax
client.start_data_ingestion_job( DatasetName='string', IngestionInputConfiguration={ 'S3InputConfiguration': { 'Bucket': 'string', 'Prefix': 'string', 'KeyPattern': 'string' } }, RoleArn='string', ClientToken='string' )
string
[REQUIRED]
The name of the dataset being used by the data ingestion job.
dict
[REQUIRED]
Specifies information for the input data for the data ingestion job, including dataset S3 location.
S3InputConfiguration (dict) -- [REQUIRED]
The location information for the S3 bucket used for input data for the data ingestion.
Bucket (string) -- [REQUIRED]
The name of the S3 bucket used for the input data for the data ingestion.
Prefix (string) --
The prefix for the S3 location being used for the input data for the data ingestion.
KeyPattern (string) --
Pattern for matching the Amazon S3 files which will be used for ingestion. If no KeyPattern is provided, we will use the default hierarchy file structure, which is same as KeyPattern {prefix}/{component_name}/*
string
[REQUIRED]
The Amazon Resource Name (ARN) of a role with permission to access the data source for the data ingestion job.
string
[REQUIRED]
A unique identifier for the request. If you do not set the client request token, Amazon Lookout for Equipment generates one.
This field is autopopulated if not provided.
dict
Response Syntax
{ 'JobId': 'string', 'Status': 'IN_PROGRESS'|'SUCCESS'|'FAILED' }
Response Structure
(dict) --
JobId (string) --
Indicates the job ID of the data ingestion job.
Status (string) --
Indicates the status of the StartDataIngestionJob operation.