SSH Connection Failed: Complete Troubleshooting Guide for Linux Servers
Learn how to systematically diagnose and fix SSH connection issues on Linux servers and cloud instances. Discover the most common causes and effective resolution strategies.
SSH Connection Failed: Complete Troubleshooting Guide for Linux Servers
You're trying to connect to your Linux server via SSH, but instead of the familiar login prompt, you're getting connection errors. This is one of the most critical issues a system administrator can face—being locked out of your server means you can't perform maintenance, deploy updates, or troubleshoot other issues. Whether it's a cloud instance, a physical server, or a virtual machine, SSH connection failures can have multiple root causes. Let's systematically diagnose and resolve this issue.
Understanding SSH Connection Failures
Common SSH Error Messages
SSH connection failures typically manifest as one of these error messages:
Permission denied (publickey)
- Authentication failedConnection refused
- SSH service not running or port blockedOperation timed out
- Network connectivity issuesHost key verification failed
- SSH host key mismatchNo route to host
- Network routing problems
Root Causes of SSH Failures
- Network issues - Firewall rules, security groups, routing problems
- SSH service problems - Service down, configuration errors
- Authentication issues - Wrong keys, incorrect usernames, key permissions
- Instance problems - Server crashed, out of resources, stopped
- Security policies - SELinux, AppArmor, or other security restrictions
Systematic Troubleshooting Approach
Step 1: Confirm the Error Message
First, attempt the SSH connection and note the exact error:
# Basic SSH connection attempt
ssh -i my-key.pem ec2-user@<instance-public-ip>
# With verbose output for more details
ssh -vvv -i my-key.pem ec2-user@<instance-public-ip>
# Test with different usernames
ssh -i my-key.pem ubuntu@<instance-public-ip>
ssh -i my-key.pem centos@<instance-public-ip>
Common error patterns:
# Authentication failure
Permission denied (publickey)
# Service not running
Connection refused
# Network timeout
Operation timed out
# Host key mismatch
Host key verification failed
Step 2: Check Instance Health and Reachability
Verify that your server is running and reachable:
For Cloud Instances (AWS, GCP, Azure)
# AWS CLI - Check instance status
aws ec2 describe-instance-status --instance-id <instance-id>
# Check instance state
aws ec2 describe-instances --instance-ids <instance-id> --query 'Reservations[0].Instances[0].State.Name'
# Get public IP
aws ec2 describe-instances --instance-ids <instance-id> --query 'Reservations[0].Instances[0].PublicIpAddress'
Basic Connectivity Tests
# Ping test
ping <public-ip>
# Port connectivity test
telnet <public-ip> 22
# Network trace
traceroute <public-ip>
# Check if port is open
nmap -p 22 <public-ip>
Step 3: Verify Security Group and Firewall Rules
Check that port 22 (SSH) is open and accessible:
AWS Security Groups
# Check security group rules
aws ec2 describe-security-groups --group-ids <security-group-id>
# Check instance security groups
aws ec2 describe-instances --instance-ids <instance-id> --query 'Reservations[0].Instances[0].SecurityGroups'
Required security group rule:
{
"Type": "SSH",
"Protocol": "tcp",
"Port": 22,
"Source": "your-ip/32"
}
Local Firewall Rules
# Check UFW status
sudo ufw status
# Check iptables rules
sudo iptables -L
# Check for SSH port rules
sudo iptables -L | grep :22
Step 4: Check Network ACLs and Route Tables
Verify network-level access controls:
AWS Network ACLs
# Check NACL rules
aws ec2 describe-network-acls --network-acl-ids <nacl-id>
# Check route tables
aws ec2 describe-route-tables --route-table-ids <route-table-id>
Common issues:
- NACLs blocking port 22
- Missing route to internet gateway
- Incorrect subnet configuration
Step 5: Verify IP Address and DNS
Ensure you're connecting to the correct address:
# Check if using Elastic IP vs Public IP
aws ec2 describe-addresses --instance-ids <instance-id>
# Test DNS resolution
nslookup your-domain.com
dig your-domain.com
# Check if IP changed after restart
aws ec2 describe-instances --instance-ids <instance-id> --query 'Reservations[0].Instances[0].PublicIpAddress'
Step 6: Validate SSH Key and User Credentials
Check your authentication credentials:
Key File Permissions
# Check key file permissions
ls -la my-key.pem
# Set correct permissions
chmod 400 my-key.pem
# Verify key format
file my-key.pem
Correct Username
Different Linux distributions use different default usernames:
# Amazon Linux
ssh -i my-key.pem ec2-user@<ip>
# Ubuntu
ssh -i my-key.pem ubuntu@<ip>
# CentOS/RHEL
ssh -i my-key.pem centos@<ip>
# Debian
ssh -i my-key.pem admin@<ip>
Key Pair Verification
# Check if key pair matches instance
aws ec2 describe-instances --instance-ids <instance-id> --query 'Reservations[0].Instances[0].KeyName'
# List available key pairs
aws ec2 describe-key-pairs
Step 7: Use Alternative Access Methods
If SSH is completely broken, use alternative access methods:
AWS EC2 Instance Connect
# Use EC2 Instance Connect via AWS Console
# Navigate to EC2 Console → Instances → Connect → EC2 Instance Connect
# Or use AWS CLI
aws ec2-instance-connect send-ssh-public-key \
--instance-id <instance-id> \
--availability-zone <az> \
--instance-os-user ec2-user \
--ssh-public-key file://~/.ssh/id_rsa.pub
Console Access
# Use cloud provider console access
# AWS: EC2 Console → Instances → Connect → EC2 Serial Console
# GCP: Compute Engine → VM Instances → SSH
# Azure: Virtual Machines → Connect → Serial Console
Step 8: Advanced Recovery Methods
If all else fails, use advanced recovery techniques:
EBS Volume Rescue (AWS)
# Stop the instance
aws ec2 stop-instances --instance-ids <instance-id>
# Detach the root volume
aws ec2 detach-volume --volume-id <volume-id>
# Attach to another instance
aws ec2 attach-volume --volume-id <volume-id> --instance-id <rescue-instance-id> --device /dev/sdf
# Mount and repair on rescue instance
sudo mkdir /mnt/rescue
sudo mount /dev/xvdf1 /mnt/rescue
sudo chroot /mnt/rescue
# Fix SSH configuration
nano /etc/ssh/sshd_config
# Ensure: PermitRootLogin yes (temporarily)
# Ensure: PasswordAuthentication yes (temporarily)
# Reset root password
passwd root
# Exit chroot and unmount
exit
sudo umount /mnt/rescue
# Detach and reattach to original instance
aws ec2 detach-volume --volume-id <volume-id>
aws ec2 attach-volume --volume-id <volume-id> --instance-id <original-instance-id> --device /dev/xvda
aws ec2 start-instances --instance-ids <original-instance-id>
Common Root Causes and Solutions
Cause 1: Wrong Security Group Rules
Symptoms: Connection timeout, port unreachable Solution:
# Update security group to allow SSH from your IP
aws ec2 authorize-security-group-ingress \
--group-id <security-group-id> \
--protocol tcp \
--port 22 \
--cidr <your-ip>/32
Cause 2: SSH Service Not Running
Symptoms: Connection refused Solution:
# Use console access to start SSH service
sudo systemctl start sshd
sudo systemctl enable sshd
sudo systemctl status sshd
Cause 3: Wrong Username
Symptoms: Permission denied (publickey) Solution:
# Try different usernames based on OS
ssh -i my-key.pem ec2-user@<ip> # Amazon Linux
ssh -i my-key.pem ubuntu@<ip> # Ubuntu
ssh -i my-key.pem centos@<ip> # CentOS
Cause 4: Key File Permissions
Symptoms: Permission denied (publickey) Solution:
# Fix key file permissions
chmod 400 my-key.pem
chown $USER:$USER my-key.pem
Cause 5: SSH Configuration Issues
Symptoms: Various SSH errors Solution:
# Check SSH configuration
sudo nano /etc/ssh/sshd_config
# Ensure these settings:
# Port 22
# PermitRootLogin no
# PasswordAuthentication no
# PubkeyAuthentication yes
# Restart SSH service
sudo systemctl restart sshd
Cause 6: Instance Out of Resources
Symptoms: Connection timeout, unresponsive Solution:
# Check instance status and metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=<instance-id> \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T23:59:59Z \
--period 3600 \
--statistics Average
Prevention and Best Practices
1. Multiple Access Methods
# Always have backup access methods
# - Console access enabled
# - Multiple SSH keys
# - Bastion host setup
# - VPN access
2. Security Group Management
# Use specific IP ranges instead of 0.0.0.0/0
# Regularly review and update security group rules
# Use separate security groups for different environments
3. Key Management
# Store SSH keys securely
# Use different keys for different environments
# Regularly rotate SSH keys
# Use SSH agent for key management
4. Monitoring and Alerting
# Set up CloudWatch alarms for instance health
# Monitor SSH connection attempts
# Set up alerts for failed login attempts
Real-World Scenarios
Scenario 1: Security Group Misconfiguration
Problem: SSH worked yesterday, fails today Solution:
# Check recent security group changes
aws ec2 describe-security-groups --group-ids <sg-id>
# Restore correct rule
aws ec2 authorize-security-group-ingress \
--group-id <sg-id> \
--protocol tcp \
--port 22 \
--cidr <your-ip>/32
Scenario 2: Instance Restart with New IP
Problem: SSH fails after instance restart Solution:
# Get new public IP
aws ec2 describe-instances --instance-ids <instance-id> --query 'Reservations[0].Instances[0].PublicIpAddress'
# Update DNS records
# Or use Elastic IP for persistent addressing
Scenario 3: SSH Service Crash
Problem: Connection refused, instance running Solution:
# Use console access to restart SSH
sudo systemctl start sshd
sudo systemctl enable sshd
# Check SSH logs
sudo journalctl -u sshd -f
Conclusion
SSH connection failures can be frustrating and potentially dangerous if they leave you locked out of critical systems. The key to resolving these issues is systematic diagnosis:
- Identify the error - Note the exact error message
- Check instance health - Verify the server is running and reachable
- Verify network rules - Check security groups, firewalls, and routing
- Validate credentials - Ensure correct keys, usernames, and permissions
- Use alternative access - Console access, EC2 Instance Connect, or rescue mode
- Implement recovery - Fix the root cause and restore access
Remember:
- Always have backup access methods - Don't rely solely on SSH
- Test connectivity systematically - Don't skip network-level checks
- Document your findings - For future reference and team knowledge
- Implement monitoring - To catch issues before they become critical