Listen to this Post
Hosting data lakes in the cloud is a common practice, and optimizing storage costs while ensuring data accessibility is crucial. Storing data in a compressed format saves space, but decompressing files on-demand can improve usability. AWS Lambda, combined with S3 event triggers, provides a serverless solution for automatic decompression.
You Should Know:
1. Setting Up S3 Bucket and Lambda Function
First, create an S3 bucket and configure it to trigger a Lambda function upon file upload.
AWS CLI Commands:
Create an S3 bucket aws s3 mb s3://your-data-lake-bucket Create a Lambda deployment package (Python example) zip lambda_function.zip lambda_function.py Create the Lambda function aws lambda create-function \ --function-name UnzipFiles \ --runtime python3.8 \ --handler lambda_function.handler \ --role arn:aws:iam::123456789012:role/lambda-s3-role \ --zip-file fileb://lambda_function.zip Add S3 trigger to Lambda aws lambda add-permission \ --function-name UnzipFiles \ --statement-id s3-trigger \ --action "lambda:InvokeFunction" \ --principal s3.amazonaws.com \ --source-arn arn:aws:s3:::your-data-lake-bucket aws s3api put-bucket-notification-configuration \ --bucket your-data-lake-bucket \ --notification-configuration file://notification.json
2. Lambda Function Code (Python)
Hereβs a sample Python script to decompress ZIP files automatically:
import boto3
import zipfile
import io
s3 = boto3.client('s3')
def handler(event, context):
bucket = event['Records'][bash]['s3']['bucket']['name']
key = event['Records'][bash]['s3']['object']['key']
if key.endswith('.zip'):
zip_obj = s3.get_object(Bucket=bucket, Key=key)
buffer = io.BytesIO(zip_obj['Body'].read())
with zipfile.ZipFile(buffer) as zip_ref:
for file in zip_ref.namelist():
s3.upload_fileobj(
zip_ref.open(file),
bucket,
f"extracted/{file}"
)
3. Cost Considerations
- S3 Costs: ~$0.023 per GB (Standard Storage)
- Lambda Costs: $0.0000166667 per GB-second (Python runtime)
- Tradeoffs: For frequent small files, Lambda costs remain low. For large-scale operations, monitor execution time.
4. Automating with EventBridge (Advanced)
For better orchestration, use Amazon EventBridge to manage workflows:
aws events put-rule \
--name "S3-Zip-Processing" \
--event-pattern "{\"source\":[\"aws.s3\"],\"detail-type\":[\"Object Created\"]}"
aws events put-targets \
--rule S3-Zip-Processing \
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:123456789012:function:UnzipFiles"
What Undercode Say
Automating file extraction in S3 using Lambda is efficient but requires monitoring:
– Use AWS CloudWatch to track Lambda invocations:
aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name Invocations \ --dimensions Name=FunctionName,Value=UnzipFiles \ --start-time 2023-10-01T00:00:00Z \ --end-time 2023-10-02T00:00:00Z \ --period 3600 \ --statistics Sum
– Optimize Lambda Memory: Adjust memory settings for faster decompression:
aws lambda update-function-configuration \ --function-name UnzipFiles \ --memory-size 512
– Clean Up Extracted Files: Schedule S3 lifecycle policies:
aws s3api put-bucket-lifecycle-configuration \ --bucket your-data-lake-bucket \ --lifecycle-configuration file://lifecycle.json
For large-scale data lakes, consider AWS Glue or EMR for batch processing.
Expected Output:
- Decompressed files in `s3://your-data-lake-bucket/extracted/`
- CloudWatch logs for Lambda executions
- Cost-optimized storage with lifecycle policies
Reference: How to Extract ZIP Files in an Amazon S3 Data Lake with AWS Lambda
References:
Reported By: Darryl Ruggles – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass β



