2018/08/01 - Amazon Transcribe Service - 2 updated api methods
Changes With this update Amazon Transcribe now supports channel identification. It transcribes audio from separate channels and combines them into a single transcription.
{'TranscriptionJob': {'Settings': {'ChannelIdentification': 'boolean'}}}
Returns information about a transcription job. To see the status of the job, check the TranscriptionJobStatus field. If the status is COMPLETED , the job is finished and you can find the results at the location specified in the TranscriptionFileUri field.
See also: AWS API Documentation
Request Syntax
client.get_transcription_job( TranscriptionJobName='string' )
string
[REQUIRED]
The name of the job.
dict
Response Syntax
{ 'TranscriptionJob': { 'TranscriptionJobName': 'string', 'TranscriptionJobStatus': 'IN_PROGRESS'|'FAILED'|'COMPLETED', 'LanguageCode': 'en-US'|'es-US', 'MediaSampleRateHertz': 123, 'MediaFormat': 'mp3'|'mp4'|'wav'|'flac', 'Media': { 'MediaFileUri': 'string' }, 'Transcript': { 'TranscriptFileUri': 'string' }, 'CreationTime': datetime(2015, 1, 1), 'CompletionTime': datetime(2015, 1, 1), 'FailureReason': 'string', 'Settings': { 'VocabularyName': 'string', 'ShowSpeakerLabels': True|False, 'MaxSpeakerLabels': 123, 'ChannelIdentification': True|False } } }
Response Structure
(dict) --
TranscriptionJob (dict) --
An object that contains the results of the transcription job.
TranscriptionJobName (string) --
The name of the transcription job.
TranscriptionJobStatus (string) --
The status of the transcription job.
LanguageCode (string) --
The language code for the input speech.
MediaSampleRateHertz (integer) --
The sample rate, in Hertz, of the audio track in the input media file.
MediaFormat (string) --
The format of the input media file.
Media (dict) --
An object that describes the input media for the transcription job.
MediaFileUri (string) --
The S3 location of the input media file. The URI must be in the same region as the API endpoint that you are calling. The general form is:
https://<aws-region>.amazonaws.com/<bucket-name>/<keyprefix>/<objectkey>
For example:
https://s3-us-east-1.amazonaws.com/examplebucket/example.mp4
https://s3-us-east-1.amazonaws.com/examplebucket/mediadocs/example.mp4
For more information about S3 object names, see Object Keys in the Amazon S3 Developer Guide .
Transcript (dict) --
An object that describes the output of the transcription job.
TranscriptFileUri (string) --
The location where the transcription is stored.
Use this URI to access the transcription. If you specified an S3 bucket in the OutputBucketName field when you created the job, this is the URI of that bucket. If you chose to store the transcription in Amazon Transcribe, this is a shareable URL that provides secure access to that location.
CreationTime (datetime) --
A timestamp that shows when the job was created.
CompletionTime (datetime) --
A timestamp that shows when the job was completed.
FailureReason (string) --
If the TranscriptionJobStatus field is FAILED , this field contains information about why the job failed.
Settings (dict) --
Optional settings for the transcription job. Use these settings to turn on speaker recognition, to set the maximum number of speakers that should be identified and to specify a custom vocabulary to use when processing the transcription job.
VocabularyName (string) --
The name of a vocabulary to use when processing the transcription job.
ShowSpeakerLabels (boolean) --
Determines whether the transcription job uses speaker recognition to identify different speakers in the input audio. Speaker recognition labels individual speakers in the audio file. If you set the ShowSpeakerLabels field to true, you must also set the maximum number of speaker labels MaxSpeakerLabels field.
You can't set both ShowSpeakerLabels and ChannelIdentification in the same request. If you set both, your request returns a BadRequestException .
MaxSpeakerLabels (integer) --
The maximum number of speakers to identify in the input audio. If there are more speakers in the audio than this number, multiple speakers will be identified as a single speaker. If you specify the MaxSpeakerLabels field, you must set the ShowSpeakerLabels field to true.
ChannelIdentification (boolean) --
Instructs Amazon Transcribe to process each audio channel separately and then merge the transcription output of each channel into a single transcription.
Amazon Transcribe also produces a transcription of each item detected on an audio channel, including the start time and end time of the item and alternative transcriptions of the item including the confidence that Amazon Transcribe has in the transcription.
You can't set both ShowSpeakerLabels and ChannelIdentification in the same request. If you set both, your request returns a BadRequestException .
{'Settings': {'ChannelIdentification': 'boolean'}}Response
{'TranscriptionJob': {'Settings': {'ChannelIdentification': 'boolean'}}}
Starts an asynchronous job to transcribe speech to text.
See also: AWS API Documentation
Request Syntax
client.start_transcription_job( TranscriptionJobName='string', LanguageCode='en-US'|'es-US', MediaSampleRateHertz=123, MediaFormat='mp3'|'mp4'|'wav'|'flac', Media={ 'MediaFileUri': 'string' }, OutputBucketName='string', Settings={ 'VocabularyName': 'string', 'ShowSpeakerLabels': True|False, 'MaxSpeakerLabels': 123, 'ChannelIdentification': True|False } )
string
[REQUIRED]
The name of the job. You can't use the strings "." or ".." in the job name. The name must be unique within an AWS account.
string
[REQUIRED]
The language code for the language used in the input media file.
integer
The sample rate, in Hertz, of the audio track in the input media file.
string
[REQUIRED]
The format of the input media file.
dict
[REQUIRED]
An object that describes the input media for a transcription job.
MediaFileUri (string) --
The S3 location of the input media file. The URI must be in the same region as the API endpoint that you are calling. The general form is:
https://<aws-region>.amazonaws.com/<bucket-name>/<keyprefix>/<objectkey>
For example:
https://s3-us-east-1.amazonaws.com/examplebucket/example.mp4
https://s3-us-east-1.amazonaws.com/examplebucket/mediadocs/example.mp4
For more information about S3 object names, see Object Keys in the Amazon S3 Developer Guide .
string
The location where the transcription is stored.
If you set the OutputBucketName , Amazon Transcribe puts the transcription in the specified S3 bucket. When you call the GetTranscriptionJob operation, the operation returns this location in the TranscriptFileUri field. The S3 bucket must have permissions that allow Amazon Transcribe to put files in the bucket. For more information, see Permissions Required for IAM User Roles .
If you don't set the OutputBucketName , Amazon Transcribe generates a pre-signed URL, a shareable URL that provides secure access to your transcription, and returns it in the TranscriptFileUri field. Use this URL to download the transcription.
dict
A Settings object that provides optional settings for a transcription job.
VocabularyName (string) --
The name of a vocabulary to use when processing the transcription job.
ShowSpeakerLabels (boolean) --
Determines whether the transcription job uses speaker recognition to identify different speakers in the input audio. Speaker recognition labels individual speakers in the audio file. If you set the ShowSpeakerLabels field to true, you must also set the maximum number of speaker labels MaxSpeakerLabels field.
You can't set both ShowSpeakerLabels and ChannelIdentification in the same request. If you set both, your request returns a BadRequestException .
MaxSpeakerLabels (integer) --
The maximum number of speakers to identify in the input audio. If there are more speakers in the audio than this number, multiple speakers will be identified as a single speaker. If you specify the MaxSpeakerLabels field, you must set the ShowSpeakerLabels field to true.
ChannelIdentification (boolean) --
Instructs Amazon Transcribe to process each audio channel separately and then merge the transcription output of each channel into a single transcription.
Amazon Transcribe also produces a transcription of each item detected on an audio channel, including the start time and end time of the item and alternative transcriptions of the item including the confidence that Amazon Transcribe has in the transcription.
You can't set both ShowSpeakerLabels and ChannelIdentification in the same request. If you set both, your request returns a BadRequestException .
dict
Response Syntax
{ 'TranscriptionJob': { 'TranscriptionJobName': 'string', 'TranscriptionJobStatus': 'IN_PROGRESS'|'FAILED'|'COMPLETED', 'LanguageCode': 'en-US'|'es-US', 'MediaSampleRateHertz': 123, 'MediaFormat': 'mp3'|'mp4'|'wav'|'flac', 'Media': { 'MediaFileUri': 'string' }, 'Transcript': { 'TranscriptFileUri': 'string' }, 'CreationTime': datetime(2015, 1, 1), 'CompletionTime': datetime(2015, 1, 1), 'FailureReason': 'string', 'Settings': { 'VocabularyName': 'string', 'ShowSpeakerLabels': True|False, 'MaxSpeakerLabels': 123, 'ChannelIdentification': True|False } } }
Response Structure
(dict) --
TranscriptionJob (dict) --
An object containing details of the asynchronous transcription job.
TranscriptionJobName (string) --
The name of the transcription job.
TranscriptionJobStatus (string) --
The status of the transcription job.
LanguageCode (string) --
The language code for the input speech.
MediaSampleRateHertz (integer) --
The sample rate, in Hertz, of the audio track in the input media file.
MediaFormat (string) --
The format of the input media file.
Media (dict) --
An object that describes the input media for the transcription job.
MediaFileUri (string) --
The S3 location of the input media file. The URI must be in the same region as the API endpoint that you are calling. The general form is:
https://<aws-region>.amazonaws.com/<bucket-name>/<keyprefix>/<objectkey>
For example:
https://s3-us-east-1.amazonaws.com/examplebucket/example.mp4
https://s3-us-east-1.amazonaws.com/examplebucket/mediadocs/example.mp4
For more information about S3 object names, see Object Keys in the Amazon S3 Developer Guide .
Transcript (dict) --
An object that describes the output of the transcription job.
TranscriptFileUri (string) --
The location where the transcription is stored.
Use this URI to access the transcription. If you specified an S3 bucket in the OutputBucketName field when you created the job, this is the URI of that bucket. If you chose to store the transcription in Amazon Transcribe, this is a shareable URL that provides secure access to that location.
CreationTime (datetime) --
A timestamp that shows when the job was created.
CompletionTime (datetime) --
A timestamp that shows when the job was completed.
FailureReason (string) --
If the TranscriptionJobStatus field is FAILED , this field contains information about why the job failed.
Settings (dict) --
Optional settings for the transcription job. Use these settings to turn on speaker recognition, to set the maximum number of speakers that should be identified and to specify a custom vocabulary to use when processing the transcription job.
VocabularyName (string) --
The name of a vocabulary to use when processing the transcription job.
ShowSpeakerLabels (boolean) --
Determines whether the transcription job uses speaker recognition to identify different speakers in the input audio. Speaker recognition labels individual speakers in the audio file. If you set the ShowSpeakerLabels field to true, you must also set the maximum number of speaker labels MaxSpeakerLabels field.
You can't set both ShowSpeakerLabels and ChannelIdentification in the same request. If you set both, your request returns a BadRequestException .
MaxSpeakerLabels (integer) --
The maximum number of speakers to identify in the input audio. If there are more speakers in the audio than this number, multiple speakers will be identified as a single speaker. If you specify the MaxSpeakerLabels field, you must set the ShowSpeakerLabels field to true.
ChannelIdentification (boolean) --
Instructs Amazon Transcribe to process each audio channel separately and then merge the transcription output of each channel into a single transcription.
Amazon Transcribe also produces a transcription of each item detected on an audio channel, including the start time and end time of the item and alternative transcriptions of the item including the confidence that Amazon Transcribe has in the transcription.
You can't set both ShowSpeakerLabels and ChannelIdentification in the same request. If you set both, your request returns a BadRequestException .