Research Finds 12,000 ‘Live’ API Keys and Passwords in DeepSeek’s Training Data

Listen to this Post

https://lnkd.in/gRdMATQa

In a recent revelation by Truffle Security Co., over 12,000 live API keys and passwords were discovered in DeepSeek AI’s training data. This highlights the critical importance of securing sensitive data during AI model training and deployment.

Practice Verified Codes and Commands

1. Scanning for Secrets in Code Repositories

Use `trufflehog` to detect secrets in your codebase:

trufflehog git https://github.com/your-repo.git 

2. Securing API Keys with Environment Variables

Store API keys securely in environment variables:

export API_KEY="your_api_key_here" 
echo $API_KEY 

3. Using GPG to Encrypt Sensitive Data

Encrypt files containing sensitive information:

gpg -c sensitive_data.txt 

4. Checking for Exposed Secrets in Linux

Use `grep` to search for potential secrets in files:

grep -r "api_key" /path/to/codebase 

5. Auditing AWS S3 Buckets for Public Access

Ensure S3 buckets are not publicly accessible:

aws s3api get-bucket-acl --bucket your-bucket-name 

6. Using Git Secrets to Prevent Leaks

Install and use `git-secrets` to prevent committing sensitive data:

git secrets --install 
git secrets --register-aws 
git secrets --scan 

What Undercode Say

The discovery of 12,000 live API keys and passwords in DeepSeek’s training data underscores the need for robust security practices in AI development. Ensuring sensitive data is not inadvertently included in training datasets is crucial. Tools like `trufflehog` and `git-secrets` can help identify and prevent such leaks. Additionally, encrypting sensitive data using GPG and storing credentials in environment variables are essential steps. Regularly auditing cloud resources, such as AWS S3 buckets, for public access is also critical.

For Linux users, commands like `grep` can be invaluable for scanning codebases for exposed secrets. Windows users can leverage PowerShell to achieve similar results:

Select-String -Path "C:\path\to\files*" -Pattern "api_key" 

In conclusion, securing sensitive data requires a combination of tools, best practices, and vigilance. By integrating these practices into your workflow, you can mitigate the risk of data leaks and protect your systems from potential breaches.

For further reading on securing AI training data, visit:
Truffle Security Co.
OWASP API Security Top 10
Git Secrets Documentation

References:

initially reported by: https://www.linkedin.com/posts/mthomasson_over-a-decade-ago-i-was-introduced-to-the-activity-7301976510033182720-Echp – Hackers Feeds
Extra Hub:
Undercode AIFeatured Image