Listing AWS S3 Buckets with Lambda and OpenGraph

Listen to this Post

Featured Image
Using managed services on AWS allows you to create powerful applications with minimal provisioning. This example demonstrates how to use AWS Lambda to build a sitemap of files in an S3 bucket using an event-driven approach. The process is automated with Terraform for easy deployment and cleanup.

Key Components:

  • AWS Lambda – Serverless compute to process S3 events.
  • Amazon S3 – Storage for files and generated sitemap.
  • OpenGraph Protocol – Defines metadata for web content.
  • Terraform – Infrastructure as Code (IaC) for deployment.

You Should Know:

1. Terraform Setup for AWS Lambda & S3

provider "aws" {
region = "us-east-1"
}

resource "aws_lambda_function" "sitemap_generator" {
filename = "lambda_function.zip"
function_name = "s3_sitemap_generator"
role = aws_iam_role.lambda_role.arn
handler = "lambda_function.lambda_handler"
runtime = "python3.8"
}

resource "aws_s3_bucket" "data_bucket" {
bucket = "my-sitemap-bucket"
acl = "private"
}

resource "aws_lambda_permission" "allow_s3" {
statement_id = "AllowExecutionFromS3"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.sitemap_generator.function_name
principal = "s3.amazonaws.com"
source_arn = aws_s3_bucket.data_bucket.arn
}

2. Python Lambda Function for Sitemap Generation

import boto3
import json
from opengraph import OpenGraph

s3 = boto3.client('s3')

def lambda_handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']

Fetch OpenGraph metadata
obj = s3.get_object(Bucket=bucket, Key=key)
html_content = obj['Body'].read().decode('utf-8')
og_data = OpenGraph(html=html_content)

Generate sitemap entry
sitemap_entry = f"<url><loc>{og_data.url}</loc><title>{og_data.title}</title></url>"

Append to sitemap.xml
s3.put_object(
Bucket=bucket,
Key="sitemap.xml",
Body=sitemap_entry,
ContentType="application/xml"
)

3. Triggering Lambda on S3 Upload

resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = aws_s3_bucket.data_bucket.id

lambda_function {
lambda_function_arn = aws_lambda_function.sitemap_generator.arn
events = ["s3:ObjectCreated:"]
}
}

4. Deploying with Terraform

terraform init 
terraform plan 
terraform apply -auto-approve 

5. Destroying Resources

terraform destroy -auto-approve 

What Undercode Say:

This approach leverages AWS serverless architecture for efficient, scalable file processing. Using Terraform ensures reproducibility, while OpenGraph enhances metadata handling. For further optimization:
– Add error handling in Lambda for malformed HTML.
– Use DynamoDB to track processed files.
– Enable CloudWatch Logs for debugging.

Expected Output:

A dynamically updated `sitemap.xml` in your S3 bucket, listing all processed files with OpenGraph metadata.

Prediction:

As serverless adoption grows, more enterprises will shift to event-driven architectures for real-time data processing, reducing operational overhead.

Reference: AWS S3 & Lambda Example

IT/Security Reporter URL:

Reported By: Darryl Ruggles – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

Join Our Cyber World:

💬 Whatsapp | 💬 Telegram