Amazon Simple Storage Service

2018/06/12 - Amazon Simple Storage Service - 1 new api methods

Changes  Adds support for S3 Select.

SelectObjectContent (new) Link ΒΆ

This operation filters the contents of an Amazon S3 object based on a simple Structured Query Language (SQL) statement. In the request, along with the SQL expression, you must also specify a data serialization format (JSON or CSV) of the object. Amazon S3 uses this to parse object data into records, and returns only records that match the specified SQL expression. You must also specify the data serialization format for the response.

See also: AWS API Documentation

Request Syntax

client.select_object_content(
    Bucket='string',
    Key='string',
    SSECustomerAlgorithm='string',
    SSECustomerKey=b'bytes',
    SSECustomerKeyMD5='string',
    Expression='string',
    ExpressionType='SQL',
    RequestProgress={
        'Enabled': True|False
    },
    InputSerialization={
        'CSV': {
            'FileHeaderInfo': 'USE'|'IGNORE'|'NONE',
            'Comments': 'string',
            'QuoteEscapeCharacter': 'string',
            'RecordDelimiter': 'string',
            'FieldDelimiter': 'string',
            'QuoteCharacter': 'string'
        },
        'CompressionType': 'NONE'|'GZIP',
        'JSON': {
            'Type': 'DOCUMENT'|'LINES'
        }
    },
    OutputSerialization={
        'CSV': {
            'QuoteFields': 'ALWAYS'|'ASNEEDED',
            'QuoteEscapeCharacter': 'string',
            'RecordDelimiter': 'string',
            'FieldDelimiter': 'string',
            'QuoteCharacter': 'string'
        },
        'JSON': {
            'RecordDelimiter': 'string'
        }
    }
)
type Bucket

string

param Bucket

[REQUIRED] The S3 Bucket.

type Key

string

param Key

[REQUIRED] The Object Key.

type SSECustomerAlgorithm

string

param SSECustomerAlgorithm

The SSE Algorithm used to encrypt the object. For more information, go to Server-Side Encryption (Using Customer-Provided Encryption Keys .

type SSECustomerKey

bytes

param SSECustomerKey

The SSE Customer Key. For more information, go to Server-Side Encryption (Using Customer-Provided Encryption Keys .

type SSECustomerKeyMD5

string

param SSECustomerKeyMD5

The SSE Customer Key MD5. For more information, go to Server-Side Encryption (Using Customer-Provided Encryption Keys .

type Expression

string

param Expression

[REQUIRED] The expression that is used to query the object.

type ExpressionType

string

param ExpressionType

[REQUIRED] The type of the provided expression (e.g., SQL).

type RequestProgress

dict

param RequestProgress

Specifies if periodic request progress information should be enabled.

  • Enabled (boolean) -- Specifies whether periodic QueryProgress frames should be sent. Valid values: TRUE, FALSE. Default value: FALSE.

type InputSerialization

dict

param InputSerialization

[REQUIRED] Describes the format of the data in the object that is being queried.

  • CSV (dict) -- Describes the serialization of a CSV-encoded object.

    • FileHeaderInfo (string) -- Describes the first line of input. Valid values: None, Ignore, Use.

    • Comments (string) -- Single character used to indicate a row should be ignored when present at the start of a row.

    • QuoteEscapeCharacter (string) -- Single character used for escaping the quote character inside an already escaped value.

    • RecordDelimiter (string) -- Value used to separate individual records.

    • FieldDelimiter (string) -- Value used to separate individual fields in a record.

    • QuoteCharacter (string) -- Value used for escaping where the field delimiter is part of the value.

  • CompressionType (string) -- Specifies object's compression format. Valid values: NONE, GZIP. Default Value: NONE.

  • JSON (dict) -- Specifies JSON as object's input serialization format.

    • Type (string) -- The type of JSON. Valid values: Document, Lines.

type OutputSerialization

dict

param OutputSerialization

[REQUIRED] Describes the format of the data that you want Amazon S3 to return in response.

  • CSV (dict) -- Describes the serialization of CSV-encoded Select results.

    • QuoteFields (string) -- Indicates whether or not all output fields should be quoted.

    • QuoteEscapeCharacter (string) -- Single character used for escaping the quote character inside an already escaped value.

    • RecordDelimiter (string) -- Value used to separate individual records.

    • FieldDelimiter (string) -- Value used to separate individual fields in a record.

    • QuoteCharacter (string) -- Value used for escaping where the field delimiter is part of the value.

  • JSON (dict) -- Specifies JSON as request's output serialization format.

    • RecordDelimiter (string) -- The value used to separate individual records in the output.

rtype

dict

returns

The response of this operation contains an :class:`.EventStream` member. When iterated the :class:`.EventStream` will yield events based on the structure below, where only one of the top level keys will be present for any given event.

Response Syntax

{
    'Payload': EventStream({
        'Records': {
            'Payload': b'bytes'
        },
        'Stats': {
            'Details': {
                'BytesScanned': 123,
                'BytesProcessed': 123,
                'BytesReturned': 123
            }
        },
        'Progress': {
            'Details': {
                'BytesScanned': 123,
                'BytesProcessed': 123,
                'BytesReturned': 123
            }
        },
        'Cont': {},
        'End': {}
    })
}

Response Structure

  • (dict) --

    • Payload (:class:`.EventStream`) --

      • Records (dict) -- The Records Event.

        • Payload (bytes) -- The byte array of partial, one or more result records.

      • Stats (dict) -- The Stats Event.

        • Details (dict) -- The Stats event details.

          • BytesScanned (integer) -- Total number of object bytes scanned.

          • BytesProcessed (integer) -- Total number of uncompressed object bytes processed.

          • BytesReturned (integer) -- Total number of bytes of records payload data returned.

      • Progress (dict) -- The Progress Event.

        • Details (dict) -- The Progress event details.

          • BytesScanned (integer) -- Current number of object bytes scanned.

          • BytesProcessed (integer) -- Current number of uncompressed object bytes processed.

          • BytesReturned (integer) -- Current number of bytes of records payload data returned.

      • Cont (:class:`.EventStream`) -- The Continuation Event.

      • End (:class:`.EventStream`) -- The End Event.