Networking β Absolute Basics for DevOps
OSI, TCP/IP, DNS, HTTP, HTTPS, load balancing, and the debugging tools you'll use daily
Why Networking Matters
Every bug in a distributed system is either a code bug or a networking bug. βIt works on my machineβ almost always means a networking problem. You donβt need to be a network engineer β but you need to know how data travels and where it can fail.
OSI vs TCP/IP (Practical Mapping)
Forget memorizing 7 layers β know what each layer means operationally.
| OSI Layer | TCP/IP Layer | What it is | Examples |
|---|---|---|---|
| 7 Application | Application | Your appβs protocol | HTTP, DNS, SSH, SMTP |
| 6 Presentation | Application | Encoding, encryption | TLS, JSON serialization |
| 5 Session | Application | Connection management | TLS sessions |
| 4 Transport | Transport | End-to-end delivery | TCP, UDP |
| 3 Network | Internet | Routing between networks | IP, ICMP |
| 2 Data Link | Network Access | Delivery on one network | Ethernet, WiFi, ARP |
| 1 Physical | Network Access | Cables, signals | Fiber, copper, radio |
DevOps practical view:
- L4 = ports and connections (TCP/UDP) β
ss,netstat, firewalls - L7 = application protocols β
curl,dig, app logs
IP Addressing & CIDR
IP Address: 192.168.1.100Subnet Mask: 255.255.255.0 = /24Network: 192.168.1.0Broadcast: 192.168.1.255Hosts: 192.168.1.1 - 192.168.1.254 (254 usable)CIDR Notation
| CIDR | Hosts | Common Use |
|---|---|---|
| /32 | 1 | Single host (security group rules) |
| /30 | 2 | Point-to-point links |
| /28 | 14 | Small subnet |
| /24 | 254 | Standard LAN, small VPC subnet |
| /16 | 65,534 | Large VPC CIDR block |
| /8 | 16M | RFC1918 private range (10.0.0.0/8) |
Private (RFC 1918) Ranges
10.0.0.0/8 (10.x.x.x)172.16.0.0/12 (172.16.x.x - 172.31.x.x)192.168.0.0/16 (192.168.x.x)# Quick CIDR calculationsipcalc 192.168.1.0/24 # detailed breakdownpython3 -c "import ipaddress; print(list(ipaddress.IPv4Network('10.0.1.0/28')))"Ports & Sockets
A socket is an IP address + port + protocol. It uniquely identifies a network endpoint.
Well-known ports:
| Port | Protocol | Service |
|---|---|---|
| 22 | TCP | SSH |
| 25 | TCP | SMTP |
| 53 | TCP/UDP | DNS |
| 80 | TCP | HTTP |
| 443 | TCP | HTTPS |
| 3306 | TCP | MySQL |
| 5432 | TCP | PostgreSQL |
| 6379 | TCP | Redis |
| 8080 | TCP | Alt HTTP / app servers |
| 27017 | TCP | MongoDB |
# What's listening on which ports?ss -tlnp # TCP, listening, numeric, with processesss -ulnp # UDP equivalent
# Who is using port 8080?ss -tlnp sport = :8080lsof -i :8080
# Check if remote port is reachablenc -zv 10.0.0.5 5432 # TCPnc -zuv 10.0.0.5 53 # UDPtimeout 3 bash -c "echo > /dev/tcp/10.0.0.5/80" && echo "open"TCP Handshake & Common Failures
TCP requires a 3-way handshake before any data flows:
Client Server |ββ SYN βββββ | "I want to connect" |β SYN-ACK ββ | "OK, I'm ready" |ββ ACK ββββββ | "Acknowledged, let's go" | | |β DATA ββββββ | (now data can flow) | | |ββ FIN ββββββ | "I'm done" |β FIN-ACK ββ | "OK, closing"TCP State Machine (the states youβll see)
| State | Meaning |
|---|---|
LISTEN | Server waiting for connections |
ESTABLISHED | Active connection |
TIME_WAIT | Waiting after close (2x MSL, ~60s) |
CLOSE_WAIT | Remote closed, local hasnβt yet |
SYN_SENT | Client sent SYN, waiting for SYN-ACK |
SYN_RECV | Server received SYN, sent SYN-ACK |
# See connection statesss -tn state establishedss -tn state time-wait | wc -l # too many = high traffic or slow closess -tn state close-wait # app not closing connections = memory leakCommon TCP Failures
| Symptom | Likely Cause |
|---|---|
| Connection refused | Nothing listening on that port |
| Connection timed out | Firewall dropping packets, or host unreachable |
| SYN_SENT stuck | Firewall blocking SYN packets outbound |
| Many TIME_WAIT | Normal at high traffic, or keepalive not configured |
| Many CLOSE_WAIT | Application not closing sockets |
DNS
Resolution Flow
Browser/App β ββ 1. Check local cache (/etc/hosts, OS cache) β ββ 2. Ask recursive resolver (usually your router or ISP) β ββ Resolver checks its cache β ββ 3. Resolver asks Root servers β "who handles .com?" β ββ 4. Resolver asks TLD servers β "who handles example.com?" β ββ 5. Resolver asks Authoritative NS β "what's www.example.com?" β ββ 6. Returns IP, caches result with TTL# Check /etc/hosts first (always checked before DNS)cat /etc/hosts
# DNS resolution order configured in:cat /etc/nsswitch.conf | grep hosts
# Which DNS server am I using?cat /etc/resolv.confresolvectl status # systemd-resolved systems
# Force DNS resolution (bypass OS cache)dig www.example.comnslookup www.example.comDNS Record Types
| Type | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com β 93.184.216.34 |
| AAAA | IPv6 address | example.com β 2606:2800:... |
| CNAME | Alias to another name | www β example.com |
| MX | Mail server | example.com β mail.example.com |
| TXT | Arbitrary text | SPF, DKIM, verification records |
| NS | Nameserver | Who handles this domain |
| PTR | Reverse DNS (IP β name) | 93.184.216.34 β example.com |
| SOA | Zone authority | Primary NS, contact, refresh |
Caching & TTL
# Check TTL on a recorddig www.example.com | grep -A1 "ANSWER"# ;; ANSWER SECTION:# www.example.com. 300 IN A 93.184.216.34# ^^^ TTL in seconds
# Query specific DNS serverdig @8.8.8.8 www.example.com
# Query all record typesdig example.com ANY
# Reverse lookup (IP β hostname)dig -x 93.184.216.34
# Trace full resolution pathdig +trace www.example.comTTL debugging: If you changed a DNS record and itβs not propagating, check the old TTL β thatβs how long resolvers were told to cache it.
HTTP
Request Lifecycle
1. DNS resolution β get server IP2. TCP handshake β establish connection3. (HTTPS) TLS handshake β negotiate encryption4. HTTP request β client sends request headers + body5. Server processes request6. HTTP response β server sends status + headers + body7. Connection close or keep-alive for next requestRequest Structure
GET /api/users?page=1 HTTP/1.1Host: api.example.comAccept: application/jsonAuthorization: Bearer eyJ...User-Agent: curl/7.88.0Response Status Codes
| Range | Meaning | Common ones |
|---|---|---|
| 2xx | Success | 200 OK, 201 Created, 204 No Content |
| 3xx | Redirect | 301 Permanent, 302 Temporary, 304 Not Modified |
| 4xx | Client error | 400 Bad Request, 401 Unauth, 403 Forbidden, 404 Not Found |
| 5xx | Server error | 500 Internal Error, 502 Bad Gateway, 503 Unavailable, 504 Timeout |
Key Headers
# Content negotiationAccept: application/jsonContent-Type: application/json
# AuthenticationAuthorization: Bearer <token>Authorization: Basic <base64(user:pass)>
# CachingCache-Control: max-age=3600ETag: "33a64df551425fcc55e4d42a148795d9f25f89d"
# SecurityStrict-Transport-Security: max-age=31536000X-Content-Type-Options: nosniffX-Frame-Options: DENYHTTPS: Certificates & TLS
TLS Termination
Client β [HTTPS] β Load Balancer β [HTTP] β App Server (TLS terminated here)TLS termination decrypts traffic at the load balancer/reverse proxy so your app servers donβt need to handle encryption.
Certificates
A TLS certificate:
- Proves server identity (signed by a trusted CA)
- Contains the serverβs public key
- Has an expiry date
# Check a site's certificatecurl -v https://example.com 2>&1 | grep -A5 "Server certificate"
# Detailed cert infoopenssl s_client -connect example.com:443 -servername example.com < /dev/null
# Check cert expiryecho | openssl s_client -connect example.com:443 2>/dev/null | \ openssl x509 -noout -dates
# Check local cert fileopenssl x509 -in cert.pem -noout -textopenssl x509 -in cert.pem -noout -datesCommon certificate errors:
| Error | Cause |
|---|---|
| Certificate expired | Past notAfter date |
| Certificate not trusted | Self-signed or unknown CA |
| Hostname mismatch | CN/SAN doesnβt match the domain |
| Incomplete chain | Missing intermediate certificates |
Load Balancing
L4 vs L7
| Layer 4 (Transport) | Layer 7 (Application) | |
|---|---|---|
| Sees | IP + TCP/UDP | HTTP headers, URL, cookies |
| Routes by | IP:port | URL path, host header, content |
| TLS | Pass-through or terminate | Terminate (usually) |
| Speed | Faster | Slightly slower |
| Examples | AWS NLB, HAProxy TCP | AWS ALB, nginx, HAProxy HTTP |
| Use case | Raw TCP, non-HTTP | HTTP services, A/B testing |
Health Checks
# nginx upstream with health checksupstream backend { server 10.0.0.1:8080; server 10.0.0.2:8080; server 10.0.0.3:8080 backup; # only used if others fail
keepalive 32;}Health check types:
| Type | What it checks |
|---|---|
| TCP | Port is accepting connections |
| HTTP | Returns 2xx status code |
| HTTPS | TLS + 2xx status code |
| Custom | Specific response body, latency threshold |
Debugging Tools
curl (Advanced Usage)
# Full timing breakdowncurl -w "\ndns:%{time_namelookup}s connect:%{time_connect}s tls:%{time_appconnect}s transfer:%{time_starttransfer}s total:%{time_total}s\n" \ -s -o /dev/null https://example.com
# Follow redirectscurl -L https://example.com
# Custom headerscurl -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://api.example.com/endpoint
# POST with JSON bodycurl -X POST \ -H "Content-Type: application/json" \ -d '{"key":"value"}' \ https://api.example.com/data
# Save response with headerscurl -D headers.txt -o response.body https://example.com
# Test with specific DNS (bypass system DNS)curl --resolve example.com:443:93.184.216.34 https://example.com
# Verbose (see headers, TLS handshake)curl -v https://example.com
# Ignore TLS cert errors (for testing ONLY)curl -k https://self-signed.example.comdig / nslookup
# Basic lookupdig example.com
# Short output (just IP)dig +short example.com
# Specific record typedig example.com MXdig example.com TXTdig example.com NS
# Query specific DNS serverdig @1.1.1.1 example.com
# Reverse lookupdig -x 93.184.216.34
# Trace full resolutiondig +trace example.com
# Check all authoritative servers for a domaindig +nssearch example.comss / netstat
# ss β modern replacement for netstatss -tlnp # TCP, listening, numeric ports, with processesss -tuln # TCP + UDP, listeningss -tn dst :443 # connections to port 443ss -s # summary statistics
# Filter by statess -tn state establishedss -tn state time-wait | wc -l
# netstat (older, still on many systems)netstat -tlnpnetstat -an | grep ESTABLISHED | wc -ltraceroute
# Trace the network path to a hosttraceroute example.com
# Use TCP (more firewall-friendly)traceroute -T -p 443 example.com
# mtr β continuous traceroute (better for debugging)mtr example.commtr --report example.com # run for 10 cycles then print reporttcpdump (Basic)
# Capture on interfacetcpdump -i eth0
# Filter by hosttcpdump -i eth0 host 10.0.0.5
# Filter by porttcpdump -i eth0 port 80
# Capture to file for Wireshark analysistcpdump -i eth0 -w capture.pcap
# Read capture filetcpdump -r capture.pcap
# Useful filterstcpdump -i eth0 'tcp port 443 and host api.example.com'tcpdump -i eth0 'tcp[tcpflags] & tcp-rst != 0' # capture RSTs