How to Query a CSV File Stored in AWS S Using Athena

Listen to this Post

AWS Athena is a powerful serverless query service that enables you to analyze data directly in Amazon S3 using standard SQL. It eliminates the need for infrastructure management and allows quick querying of structured and semi-structured data, including CSV files.

You Should Know:

1. Setting Up Athena for CSV Querying

Before querying a CSV file in S3, ensure you have:
– An AWS account with S3 and Athena access.
– The CSV file uploaded to an S3 bucket.
– Proper IAM permissions for Athena and S3.

2. Creating a Database and Table in Athena

Run the following SQL in the Athena Query Editor:

CREATE DATABASE my_athena_db;

Next, define a table schema for your CSV:

CREATE EXTERNAL TABLE my_athena_db.my_csv_table (
column1 STRING,
column2 INT,
column3 DOUBLE
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '"',
'escapeChar' = '\'
)
STORED AS TEXTFILE
LOCATION 's3://your-bucket-name/path-to-csv/';

3. Querying the CSV File

Once the table is created, run SQL queries:

SELECT  FROM my_athena_db.my_csv_table LIMIT 10;

For filtering:

SELECT column1, column2 FROM my_athena_db.my_csv_table WHERE column3 > 100;

4. Optimizing Athena Queries

  • Partitioning: Improves query performance by reducing scanned data.
    PARTITIONED BY (year STRING, month STRING)
    
  • Compression: Use GZIP or Snappy for large files.
  • File Formats: Convert CSV to Parquet for better performance.

5. Useful AWS CLI Commands

Upload a CSV to S3:

aws s3 cp local-file.csv s3://your-bucket-name/path/

List Athena tables:

aws athena list-table-metadata --catalog-name AwsDataCatalog --database-name my_athena_db

6. Automating Queries with Lambda

Trigger Athena queries via AWS Lambda (Python example):

import boto3

client = boto3.client('athena')

response = client.start_query_execution(
QueryString='SELECT  FROM my_athena_db.my_csv_table',
QueryExecutionContext={'Database': 'my_athena_db'},
ResultConfiguration={'OutputLocation': 's3://query-results-bucket/'}
)

What Undercode Say:

AWS Athena is a game-changer for quick, serverless SQL analytics on S3 data. By leveraging partitioning, efficient file formats, and automation, you can optimize costs and performance. For large-scale datasets, consider integrating AWS Glue for schema management.

Expected Output:

  • A queryable Athena table linked to your S3 CSV.
  • Faster analytics without managing servers.
  • Seamless integration with AWS data ecosystem.

Reference:

How to Query a CSV File Stored in AWS S3 Using Athena | Built In

References:

Reported By: Darryl Ruggles – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

Join Our Cyber World:

💬 Whatsapp | 💬 TelegramFeatured Image