devops

Cloud Fundamentals (AWS / Azure / GCP)

Shared responsibility model, IAM, EC2, VPC, S3, RDS, ECR, and cost awareness


Shared Responsibility Model

Cloud doesn’t mean β€œno security concerns.” It means security responsibilities are split.

AWS ResponsibleYou Responsible
PhysicalData centers, hardwareβ€”
NetworkBackbone networkVPC config, security groups
ComputeHypervisorOS patching (EC2), code
StoragePhysical disks, S3 durabilityBucket policies, encryption
DatabaseRDS infrastructureDB user permissions, data
IAMIAM serviceWho has what permissions

The mistake most teams make: Assuming β€œit’s in the cloud so it’s secure.” S3 bucket misconfigurations and overly permissive IAM are the most common causes of cloud breaches.


IAM β€” Identity and Access Management

Users vs Roles

IAM UserIAM Role
Who uses itHumans with long-term credentialsServices, EC2 instances, Lambda
CredentialsAccess key + secret (long-lived)Temporary tokens (STS)
Use caseDevelopers, CI/CD service accountsEC2 to access S3, Lambda execution
Best practiceAvoid long-lived keys, use SSOPrefer over users for services
Terminal window
# Check your current identity
aws sts get-caller-identity
# List IAM users
aws iam list-users
# List IAM roles
aws iam list-roles
# Who has access to what (for a user)
aws iam list-attached-user-policies --user-name alice
aws iam list-user-policies --user-name alice

Policies

IAM policies are JSON documents defining Allow/Deny for actions on resources.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Effect": "Deny",
"Action": "s3:DeleteObject",
"Resource": "*"
}
]
}

Policy types:

TypeScopeExample
Managed (AWS)AWS-maintainedAmazonS3ReadOnlyAccess
Managed (Customer)Your accountYour custom policies
InlineAttached directly to user/roleSpecific one-off permissions
Resource-basedOn the resource itselfS3 bucket policy
Permission boundaryMax permissions a role can haveLimiting what devs can grant

Access Boundaries & Least Privilege

Terminal window
# Check effective permissions (IAM Policy Simulator via CLI)
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:user/alice \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::my-bucket/file.txt
# Generate least-privilege policy from CloudTrail
# Use IAM Access Analyzer to see unused permissions

Least privilege checklist:

  • No * actions unless absolutely necessary
  • No * resources unless justified
  • Separate roles per service (EC2 role β‰  Lambda role)
  • Rotate access keys regularly or switch to roles/SSO

EC2 β€” Elastic Compute Cloud

Instance Lifecycle

stopped β†’ pending β†’ running β†’ stopping β†’ stopped
β†’ shutting-down β†’ terminated
Terminal window
# Launch instance (aws CLI)
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type t3.medium \
--key-name my-key-pair \
--security-group-ids sg-12345 \
--subnet-id subnet-12345 \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=my-server}]'
# List instances
aws ec2 describe-instances \
--query 'Reservations[].Instances[].[Tags[?Key==`Name`].Value|[0],InstanceId,State.Name,PublicIpAddress]' \
--output table
# Stop / start / terminate
aws ec2 stop-instances --instance-ids i-1234567890abcdef0
aws ec2 start-instances --instance-ids i-1234567890abcdef0
aws ec2 terminate-instances --instance-ids i-1234567890abcdef0

Security Groups

Security groups are stateful firewalls at the instance/ENI level.

Terminal window
# Create security group
aws ec2 create-security-group \
--group-name web-sg \
--description "Web server security group" \
--vpc-id vpc-12345
# Allow inbound HTTPS from anywhere
aws ec2 authorize-security-group-ingress \
--group-id sg-12345 \
--protocol tcp \
--port 443 \
--cidr 0.0.0.0/0
# Allow SSH from specific IP only
aws ec2 authorize-security-group-ingress \
--group-id sg-12345 \
--protocol tcp \
--port 22 \
--cidr 203.0.113.10/32
# Allow traffic from another security group (e.g., from load balancer SG)
aws ec2 authorize-security-group-ingress \
--group-id sg-app \
--protocol tcp \
--port 8080 \
--source-group sg-alb

Security group rules:

  • Inbound: controls what traffic can reach your instance
  • Outbound: controls what traffic your instance can send (default: all allowed)
  • Stateful: if inbound is allowed, the response is automatically allowed
  • No explicit deny β€” only allow rules exist; everything else is denied

VPC β€” Virtual Private Cloud

CIDR Planning

VPC CIDR: 10.0.0.0/16 (65,534 IPs)
β”œβ”€β”€ AZ-1a: 10.0.1.0/24 (254 IPs)
β”‚ β”œβ”€β”€ Public subnet: 10.0.1.0/25 (web servers, load balancers)
β”‚ └── Private subnet: 10.0.1.128/25 (app servers)
β”œβ”€β”€ AZ-1b: 10.0.2.0/24
β”‚ β”œβ”€β”€ Public subnet: 10.0.2.0/25
β”‚ └── Private subnet: 10.0.2.128/25
└── AZ-1c: 10.0.3.0/24
β”œβ”€β”€ Public subnet: 10.0.3.0/25
└── Private subnet: 10.0.3.128/25

Planning rules:

  • Reserve enough space for growth (use /16 for VPC)
  • Don’t overlap with on-prem networks (needed for VPN/Direct Connect)
  • Separate AZs for high availability
  • Separate subnets for different tiers (public/private)

Public vs Private Subnets

Public SubnetPrivate Subnet
HasInternet Gateway routeNAT Gateway route (optional)
Instances getPublic IP possiblePrivate IP only
Accessible from internetYes (with security group)No
ExamplesLoad balancers, bastion hostsApp servers, databases
Terminal window
# Key components:
# Internet Gateway (IGW) β€” allows public subnet traffic to/from internet
# NAT Gateway β€” allows private subnet instances to reach internet (outbound only)
# Route Table β€” rules for where to send traffic
# Check route tables
aws ec2 describe-route-tables --filters "Name=vpc-id,Values=vpc-12345"

S3 β€” Simple Storage Service

Terminal window
# Create bucket
aws s3 mb s3://my-unique-bucket-name --region us-east-1
# Upload file
aws s3 cp localfile.txt s3://my-bucket/prefix/file.txt
# Upload directory
aws s3 sync ./local-dir s3://my-bucket/prefix/
# Download
aws s3 cp s3://my-bucket/file.txt ./
# List contents
aws s3 ls s3://my-bucket/
aws s3 ls s3://my-bucket/ --recursive
# Delete
aws s3 rm s3://my-bucket/file.txt
aws s3 rm s3://my-bucket/prefix/ --recursive

S3 Bucket Policies

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowReadFromCDN",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Sid": "DenyUnencryptedUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
]
}

Security checklist:

  • Block public access unless intentionally hosting public content
  • Enable versioning for important buckets
  • Enable access logging
  • Use bucket policies to limit who can read/write
  • Never enable ListBucket permission for public access

RDS β€” Relational Database Service

Terminal window
# List RDS instances
aws rds describe-db-instances \
--query 'DBInstances[].[DBInstanceIdentifier,DBInstanceStatus,Engine,DBInstanceClass]' \
--output table
# Create parameter group snapshot (before changing DB params)
aws rds create-db-snapshot \
--db-instance-identifier my-db \
--db-snapshot-identifier pre-migration-snapshot-$(date +%Y%m%d)

RDS awareness checklist:

  • Multi-AZ for production (automatic failover ~1-2 min)
  • Read replicas for read scaling
  • Automated backups with appropriate retention period
  • Parameter groups for DB tuning (require reboot to apply some)
  • Security groups: only allow from app server security group, not 0.0.0.0/0
  • Storage auto-scaling to prevent disk full
  • Monitoring: FreeStorageSpace, DatabaseConnections, CPUUtilization

ECR β€” Elastic Container Registry

Terminal window
# Authenticate Docker to ECR
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
123456789012.dkr.ecr.us-east-1.amazonaws.com
# Create repository
aws ecr create-repository \
--repository-name myapp \
--image-scanning-configuration scanOnPush=true \
--encryption-configuration encryptionType=AES256
# Tag and push image
docker build -t myapp:latest .
docker tag myapp:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0
# List images
aws ecr list-images --repository-name myapp
# Lifecycle policy (auto-delete old images)
aws ecr put-lifecycle-policy \
--repository-name myapp \
--lifecycle-policy-text '{
"rules": [
{
"rulePriority": 1,
"description": "Keep last 10 images",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 10
},
"action": { "type": "expire" }
}
]
}'

Cost Awareness β€” Common Mistakes

The Expensive Surprises

MistakeCost ImpactFix
Running EC2 without stopping (even idle)Full hourly cost 24/7Stop dev instances at night, use auto-scaling
NAT Gateway data processing charges$0.045/GB processedMinimize traffic through NAT, use VPC endpoints for S3
Forgotten snapshots/AMIsAccumulates over timeTag + lifecycle policies
S3 data transfer out$0.09/GB out to internetUse CloudFront, minimize cross-region transfers
RDS Multi-AZ in dev2x base costSingle-AZ for dev, Multi-AZ for prod only
Oversized instancesFull cost of unused capacityRight-size with CloudWatch metrics
Unattached EBS volumesContinuous costCleanup after instance termination
Terminal window
# Find unattached EBS volumes
aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query 'Volumes[].[VolumeId,Size,CreateTime]' \
--output table
# Find old snapshots
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[].[SnapshotId,VolumeSize,StartTime,Description]' \
--output table
# Check for idle EC2 instances (CPU < 5% over 2 weeks)
# Use AWS Cost Explorer and Compute Optimizer
aws compute-optimizer get-ec2-instance-recommendations

Cost hygiene habits:

  • Enable billing alerts at $10, $50, $100
  • Tag every resource with Project, Environment, Owner
  • Use AWS Cost Explorer weekly
  • Enable S3 Intelligent-Tiering for infrequently accessed data
  • Use Reserved Instances or Savings Plans for stable workloads