2025-02-11
Struggling with Git conflicts, Jenkins build failures, Docker crashes, or Kubernetes pod errors? Don’t worry—we’ve got you covered! Here’s your ultimate playbook for fixing the most common DevOps issues across tools like Git, Jenkins, Docker, Kubernetes, Terraform, Ansible, Prometheus, ELK, AWS, and more.
Git – Fix Merge Conflicts & Permission Issues
Merge conflicts are common in collaborative environments. To resolve them:
<h1>Check the status of your repository</h1> git status <h1>Pull the latest changes from the remote repository</h1> git pull origin main <h1>Resolve conflicts manually in the conflicted files</h1> <h1>After resolving conflicts, add the files and commit</h1> git add <file-name> git commit -m "Resolved merge conflict" <h1>Push the changes</h1> git push origin main
For permission issues:
<h1>Check current permissions</h1> ls -l .git <h1>Update permissions</h1> chmod -R 755 .git
Jenkins – Resolve Build Stuck & Plugin Failures
If a Jenkins build is stuck:
<h1>Check Jenkins logs</h1> tail -f /var/log/jenkins/jenkins.log <h1>Restart Jenkins</h1> sudo systemctl restart jenkins
For plugin failures:
<h1>Update Jenkins plugins</h1> <h1>Go to Jenkins Dashboard > Manage Jenkins > Manage Plugins > Update Center</h1>
Docker – Debug Daemon Connection & Port Conflicts
If Docker daemon fails to start:
<h1>Check Docker daemon status</h1> sudo systemctl status docker <h1>Restart Docker</h1> sudo systemctl restart docker <h1>Check logs for errors</h1> journalctl -u docker.service
For port conflicts:
<h1>List running containers and their ports</h1> docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Ports}}" <h1>Stop the conflicting container</h1> docker stop <container-id>
Kubernetes – Solve CrashLoopBackOff & Node Failures
For CrashLoopBackOff errors:
<h1>Describe the pod to get more details</h1> kubectl describe pod <pod-name> -n <namespace> <h1>Check logs of the failing container</h1> kubectl logs <pod-name> -n <namespace> -c <container-name>
For node failures:
<h1>Check node status</h1> kubectl get nodes <h1>Drain the node for maintenance</h1> kubectl drain <node-name> --ignore-daemonsets --delete-local-data <h1>After fixing the issue, uncordon the node</h1> kubectl uncordon <node-name>
Terraform – Debug Plan and Apply Errors
For plan errors:
<h1>Validate Terraform configuration</h1> terraform validate <h1>Format Terraform files</h1> terraform fmt
For apply errors:
<h1>Check Terraform state</h1> terraform show <h1>Taint a resource to recreate it</h1> terraform taint <resource-address>
Ansible – Fix Playbook Failures
For playbook errors:
<h1>Check syntax</h1> ansible-playbook --syntax-check playbook.yml <h1>Run playbook in verbose mode for debugging</h1> ansible-playbook playbook.yml -vvv
Prometheus – Resolve Scraping Issues
For scraping errors:
<h1>Check Prometheus logs</h1> journalctl -u prometheus <h1>Reload Prometheus configuration</h1> curl -X POST http://localhost:9090/-/reload
ELK – Debug Logstash Pipeline Failures
For Logstash errors:
<h1>Check Logstash logs</h1> tail -f /var/log/logstash/logstash-plain.log <h1>Test Logstash configuration</h1> /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/
AWS – Troubleshoot EC2 and S3 Issues
For EC2 instance issues:
<h1>Check instance status</h1> aws ec2 describe-instance-status --instance-id <instance-id> <h1>Reboot instance</h1> aws ec2 reboot-instances --instance-ids <instance-id>
For S3 bucket issues:
<h1>List S3 buckets</h1> aws s3 ls <h1>Sync local files to S3</h1> aws s3 sync . s3://<bucket-name>
What Undercode Say
Mastering DevOps troubleshooting requires a deep understanding of the tools and their common pitfalls. Here are some additional Linux and DevOps commands to enhance your troubleshooting skills:
1. Linux System Monitoring:
<h1>Check CPU usage</h1> top <h1>Check memory usage</h1> free -h <h1>Check disk usage</h1> df -h
2. Network Troubleshooting:
<h1>Check network interfaces</h1> ifconfig <h1>Test connectivity</h1> ping <hostname> <h1>Trace network route</h1> traceroute <hostname>
3. Log Analysis:
<h1>Search logs for errors</h1> grep -i "error" /var/log/syslog <h1>Monitor logs in real-time</h1> tail -f /var/log/syslog
4. Process Management:
<h1>List running processes</h1> ps aux <h1>Kill a process</h1> kill <pid>
5. File System Checks:
<h1>Check file system for errors</h1> sudo fsck /dev/sda1
6. Package Management:
<h1>Update package list</h1> sudo apt-get update <h1>Upgrade installed packages</h1> sudo apt-get upgrade
7. Firewall Management:
<h1>Check firewall status</h1> sudo ufw status <h1>Allow a port</h1> sudo ufw allow 8080
8. Service Management:
<h1>Check service status</h1> sudo systemctl status <service-name> <h1>Restart a service</h1> sudo systemctl restart <service-name>
9. User Management:
<h1>Add a new user</h1> sudo adduser <username> <h1>Change user password</h1> sudo passwd <username>
10. File Permissions:
<h1>Change file permissions</h1> chmod 755 <file-name> <h1>Change file ownership</h1> sudo chown <user>:<group> <file-name>
By mastering these commands and techniques, you can efficiently troubleshoot and resolve DevOps issues, ensuring smooth and reliable operations. For further reading, check out the official documentation of the tools mentioned:
- Git Documentation
- Jenkins Documentation
- Docker Documentation
- Kubernetes Documentation
- Terraform Documentation
- Ansible Documentation
- Prometheus Documentation
- ELK Stack Documentation
- AWS Documentation
Keep practicing and exploring new tools to stay ahead in the ever-evolving DevOps landscape.
References:
Hackers Feeds, Undercode AI