Linux β Thinking Like a System Engineer
File systems, processes, permissions, systemd, SSH, and the failures you'll actually face
Why Linux First?
Almost every server, container, CI runner, and cloud VM runs Linux. If you canβt navigate a broken Linux system at 2am, youβre not ready for on-call. This isnβt about memorizing commands β itβs about building a mental model of how the OS works.
File System Hierarchy
Linux organizes everything under a single root /. Understanding what lives where prevents a lot of confusion.
| Path | Purpose |
|---|---|
/ | Root of the entire file system |
/etc | System-wide configuration files |
/var | Variable data β logs, spool files, databases |
/tmp | Temporary files, cleared on reboot |
/home | User home directories |
/usr | User programs and libraries |
/proc | Virtual filesystem β live kernel/process data |
/sys | Virtual filesystem β hardware and driver info |
/dev | Device files (disks, terminals, null) |
# Browse the filesystem hierarchyls /ls /etc | head -20ls /var/log
# Check disk usage per directorydu -sh /* 2>/dev/null | sort -hFile Types, Inodes, Links
Every file in Linux has an inode β a data structure storing metadata (permissions, timestamps, owner, size, pointers to data blocks). The filename is just a pointer to an inode.
# See inode numberls -i file.txt
# Check inode usage on a filesystemdf -iHard Links vs Symbolic Links
| Hard Link | Symbolic Link (symlink) | |
|---|---|---|
| Points to | Inode directly | Another path |
| Works across filesystems? | No | Yes |
| Survives original deletion? | Yes | No (dangling link) |
| Works on directories? | No (usually) | Yes |
# Create a hard linkln original.txt hardlink.txt
# Create a symbolic linkln -s /etc/nginx/nginx.conf nginx.conf
# Find all broken symlinksfind /etc -type l ! -exec test -e {} \; -printUsers, Groups, Permissions, umask
Permission Model
-rwxr-xr-- 1 alice devs 4096 Jan 1 file.txt ||||||||||| |user|group|otherEach section has read (r=4), write (w=2), execute (x=1).
# Change permissionschmod 755 script.sh # rwxr-xr-xchmod u+x,g-w script.sh # symbolic mode
# Change ownershipchown alice:devs file.txtchown -R alice:devs /var/www/
# View effective permissionsstat file.txtnamei -l /path/to/file # trace permissions along pathumask
umask defines the default permissions subtracted from new files/directories.
# Check current umaskumask # e.g. 0022
# New file permissions = 0666 - 0022 = 0644 (rw-r--r--)# New dir permissions = 0777 - 0022 = 0755 (rwxr-xr-x)
# Set umask in /etc/profile or ~/.bashrcumask 027 # more restrictive β group can read, others nothingsudo Internals & Security Implications
sudo doesnβt just run as root β it uses PAM for auth, checks /etc/sudoers for authorization, and logs everything to syslog.
# Edit sudoers safely (validates syntax before saving)visudo
# Allow user to run specific commands without passwordalice ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart nginx
# Check what you're allowed to runsudo -l
# Run as a different user (not root)sudo -u www-data idSecurity gotchas:
NOPASSWD: ALLis almost always wrong β scope it to specific commandssudo bashorsudo -igives a full root shell β avoid in scripts- Check
/var/log/auth.logorjournalctl _COMM=sudofor sudo abuse
Process Lifecycle: fork, exec, signals
Every process was born from another process via fork() + exec().
init/systemd (PID 1) ββ bash (fork) ββ ls (exec replaces bash image)# View process treepstree -pps auxf
# Find process by namepgrep -a nginxpidof nginx
# Send signalskill -15 1234 # SIGTERM β graceful shutdownkill -9 1234 # SIGKILL β force kill (no cleanup)kill -1 1234 # SIGHUP β reload config (for many daemons)kill -0 1234 # check if process exists (no signal sent)Common signals:
| Signal | Number | Meaning |
|---|---|---|
| SIGTERM | 15 | Graceful termination |
| SIGKILL | 9 | Force kill (canβt be caught) |
| SIGHUP | 1 | Reload config |
| SIGINT | 2 | Interrupt (Ctrl+C) |
| SIGSTOP | 19 | Pause (canβt be caught) |
| SIGCONT | 18 | Resume |
systemd
systemd is PID 1 on most modern Linux distros. It manages services, mounts, timers, sockets, and boot targets.
Units & Targets
# List all running servicessystemctl list-units --type=service --state=running
# Start / stop / restart / reloadsystemctl start nginxsystemctl stop nginxsystemctl restart nginxsystemctl reload nginx # graceful config reload
# Enable at boot / disablesystemctl enable nginxsystemctl disable nginx
# Check status with logssystemctl status nginx -l
# Show dependenciessystemctl list-dependencies nginxService Unit File Anatomy
[Unit]# Human-readable name in status listingsDescription=Python app (example)# Start after the network is reachable (optional; drop Wants= if you do not need it)After=network-online.targetWants=network-online.target# Uncomment if this app must wait for Postgres# Requires=postgresql.service# After=postgresql.service
[Service]Type=simpleUser=myappGroup=myappWorkingDirectory=/opt/myapp# Ensure logs (or runtime files) exist before the process starts; must exit 0 or the unit failsExecStartPre=/bin/mkdir -p /opt/myapp/logs# Main process: use the venvβs Python so deps match production (adjust script/module as needed)ExecStart=/opt/myapp/venv/bin/python /opt/myapp/app.py# Runs once the main command has been invoked (not after the app exits); keep this quickExecStartPost=/bin/sh -c 'echo "$(date -Is) myapp started" >> /opt/myapp/logs/boot.log'Restart=on-failureRestartSec=5sEnvironment=PYTHONUNBUFFERED=1Environment=PORT=8080
[Install]WantedBy=multi-user.target# After editing unit filessystemctl daemon-reloadsystemctl restart myappService Failures & Restart Loops
# Check why a service failedsystemctl status myappjournalctl -u myapp -n 50 --no-pager
# Check restart countsystemctl show myapp --property=NRestarts
# Reset failure countersystemctl reset-failed myappDebugging a restart loop:
systemctl status myappβ look at exit codejournalctl -u myapp -fβ tail live logs- Run the ExecStart command manually as the service user
- Check file permissions, missing env vars, port conflicts
Logs
journalctl Usage
# All logs, newest firstjournalctl -r
# Follow live (like tail -f)journalctl -f
# Since last bootjournalctl -b
# Specific servicejournalctl -u nginx -n 100
# Time rangejournalctl --since "2024-01-01 10:00" --until "2024-01-01 11:00"
# Priority levels (err and above)journalctl -p err -b
# Show kernel messages onlyjournalctl -k
# Disk usage of journaljournalctl --disk-usage
# Vacuum old logsjournalctl --vacuum-size=500MApp Logs vs System Logs
| Type | Location | Tool |
|---|---|---|
| System/kernel | journald | journalctl |
| Nginx access | /var/log/nginx/access.log | tail, grep, awk |
| App custom | /var/log/myapp/ or stdout | depends on app |
| Auth events | /var/log/auth.log | grep, journalctl |
| Cron | /var/log/cron or journal | journalctl -u cron |
Disk
Mount Points & fstab
# Show all mounted filesystemsdf -hlsblkmount | column -t
# Check /etc/fstab (filesystems to mount at boot)cat /etc/fstab
# Mount manuallymount /dev/sdb1 /mnt/data
# Mount with optionsmount -o ro,noexec /dev/sdc1 /mnt/backupfstab entry structure:
DEVICE MOUNTPOINT TYPE OPTIONS DUMP PASSUUID=abc123 / ext4 defaults 0 1/dev/sdb1 /data xfs defaults,nofail 0 2Disk Pressure & Inode Exhaustion
# Check disk spacedf -h
# Check inode usage β can be full even when space is free!df -i
# Find what's eating spacedu -sh /var/* 2>/dev/null | sort -rh | head -20ncdu /var # interactive (install with apt/yum)
# Find files larger than 100MBfind / -type f -size +100M 2>/dev/null
# Find directories with most files (inode issue)find /tmp -type d -exec sh -c 'echo "$(ls -A "$1" | wc -l) $1"' _ {} \; | sort -rn | headMemory & CPU
# Memory overviewfree -hvmstat 1 5 # 5 samples, 1 second apart
# Detailed memory statscat /proc/meminfo
# Top memory consumersps aux --sort=-%mem | head -10
# CPU load vs CPU usageuptime # load average: 1, 5, 15 mintop # interactivehtop # better interactive
# Per-CPU statsmpstat -P ALL 1 # (from sysstat package)CPU Load vs CPU Usage:
- CPU usage = percentage of time CPU is busy (0-100% per core)
- CPU load average = number of processes waiting for CPU + running
- On a 4-core system, load of 4.0 = 100% utilized; load of 8.0 = 200% (overloaded)
SSH
# Generate key pairssh-keygen -t ed25519 -C "your-email@example.com"
# Copy public key to serverssh-copy-id -i ~/.ssh/id_ed25519.pub user@server
# Manual copy (if ssh-copy-id unavailable)cat ~/.ssh/id_ed25519.pub | ssh user@server "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
# Connect with specific keyssh -i ~/.ssh/id_ed25519 user@server
# Connect with verbose output (debugging)ssh -v user@serverPort Forwarding
# Local forwarding β access remote service locally# Access server's port 5432 (postgres) as localhost:5432ssh -L 5432:localhost:5432 user@server
# Remote forwarding β expose local port on remote server# Expose local :8080 as server's :9090ssh -R 9090:localhost:8080 user@server
# Dynamic forwarding (SOCKS proxy)ssh -D 1080 user@server
# Jump host / bastionssh -J bastion-user@bastion.example.com target-user@private-server~/.ssh/config for convenience:
Host myserver HostName 1.2.3.4 User ubuntu IdentityFile ~/.ssh/id_ed25519 Port 22
Host private-db HostName 10.0.0.5 User admin ProxyJump myserverCommon Failures Youβll Actually Face
Service Not Starting
systemctl status myappjournalctl -u myapp --since "5 minutes ago"# Run ExecStart command manually# Check: ports in use, missing files, wrong userss -tlnp | grep :8080Permission Denied
# Who owns the file?ls -la /path/to/file
# What's the process running as?ps aux | grep myapp
# Trace permission along pathnamei -l /path/to/file
# Check SELinux/AppArmor if permissions look finegetenforce # SELinuxaa-status # AppArmorausearch -m avc -ts recent # SELinux audit logDisk Full
df -h # which filesystem is full?df -i # check inodes too
# Find large filesdu -sh /var/* | sort -rh | headfind /var/log -name "*.log" -size +100M
# Quick fixesjournalctl --vacuum-size=100Mtruncate -s 0 /var/log/huge.log # zero out (don't delete if open)High Load
uptime # load averagetop # interactive β press 1 for per-CPUps aux --sort=-%cpu | head -10
# Is it CPU-bound or I/O-bound?vmstat 1 5# procs r column = processes waiting for CPU# io wa column = I/O wait percentage# High wa + low cpu = I/O bottleneck# High cpu = CPU bottleneck
iostat -x 1 5 # per-disk I/O stats