Listen to this Post
Amazon S3 Tables has introduced a major update by embedding an Apache Iceberg REST endpoint, addressing earlier concerns about vendor lock-in and integration challenges. This enhancement simplifies interoperability with open-source query engines and strengthens AWS’s commitment to open data standards.
Key Improvements:
- Iceberg REST Catalog Integration: Enables seamless interaction with S3 Tables using standard Iceberg APIs.
- Better Ecosystem Support: Facilitates integration with tools like PyIceberg, Trino, and Spark.
- Reduced Vendor Lock-in: Aligns with open-source standards, making data portability easier.
🔗 Docs: Accessing Amazon S3 Tables from Open-Source Query Engines
You Should Know:
- Setting Up Iceberg REST Catalog with AWS S3
To connect to S3 Tables via Iceberg REST, use the following Python (PyIceberg) example:
from pyiceberg.catalog import RESTCatalog catalog = RESTCatalog( name="s3_tables", uri="https://s3-tables-iceberg-rest.amazonaws.com", credentials={"aws_region": "us-east-1"} ) tables = catalog.list_tables() print(tables)
2. Querying S3 Tables with Trino
Configure Trino to use the Iceberg REST catalog:
-- Configure catalog properties in `etc/catalog/iceberg.properties` connector.name=iceberg iceberg.catalog.type=rest iceberg.rest-catalog.uri=https://s3-tables-iceberg-rest.amazonaws.com iceberg.rest-catalog.credentials-provider=aws
Then query directly:
SELECT FROM iceberg.s3_tables.my_table;
3. Managing S3 Tables via AWS CLI
Check and modify S3 Tables metadata:
aws s3api list-buckets List available buckets aws s3api get-object --bucket my-bucket --key metadata/metadata.json Fetch Iceberg metadata
4. Spark Integration
Use Spark to read/write Iceberg tables in S3:
val df = spark.read .format("iceberg") .option("catalog", "rest") .option("uri", "https://s3-tables-iceberg-rest.amazonaws.com") .load("s3_tables.my_table")
What Undercode Say:
The addition of Iceberg REST support in S3 Tables is a game-changer for data engineers. It bridges the gap between AWS services and open-source tools, reducing friction in data workflows. However, challenges remain:
– Hidden File Management: S3 Tables still obscure underlying files, complicating debugging.
– Permission Handling: Fine-grained access control requires deeper AWS IAM integration.
For Linux/IT practitioners, mastering these commands ensures smooth operations:
Check AWS permissions aws iam get-user --user-name data_engineer List Iceberg snapshots (via REST API) curl -X GET "https://s3-tables-iceberg-rest.amazonaws.com/v1/namespaces/{namespace}/tables/{table}/snapshots" Monitor S3 access logs aws s3api get-bucket-logging --bucket my-data-lake
Expected Output:
A unified, open-standard approach to managing S3 Tables, fostering interoperability across data platforms. 🚀
References:
Reported By: Royhasson When – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅