Yudi Nugraha

Running microservices on your local machine is straightforward — each service starts on a different port, and they talk to each other over localhost. The moment you move to AWS, everything changes.

Services are spread across containers, availability zones, and private subnets. IP addresses are ephemeral. A container that crashes and restarts gets a new IP. And every network path that isn't explicitly allowed is blocked by default.

This post works through a concrete scenario: 5 microservices deployed on AWS ECS Fargate, one of which acts as the API Gateway. We'll cover how service discovery solves the ephemeral IP problem, and how security groups enforce the communication rules that keep your infrastructure from becoming an open network.

---

The Problem

Suppose you've built five services:

Service	Role
`api-gateway`	Public entry point. Routes requests to internal services.
`user-service`	Handles authentication and user profiles.
`product-service`	Manages product catalogue and stock.
`order-service`	Creates and tracks orders.
`notification-service`	Sends emails and push notifications.

On localhost, order-service calls user-service at http://localhost:3001. In AWS, user-service runs inside a private subnet on a container whose IP address changes every time it restarts. There is no localhost.

Three questions immediately surface:

How does order-service know where user-service is? (Service Discovery)

Who is allowed to call whom? (Security Groups)

How does external traffic reach the system without exposing every service directly? (API Gateway)

These aren't separate concerns — they are deeply connected. Getting one wrong breaks the others.

---

Fundamental Concepts

The API Gateway Pattern

An API Gateway is the single entry point for all external traffic. Clients never call user-service or order-service directly. They call the gateway, and the gateway routes the request to the right internal service.

This pattern gives you one place to handle:

TLS termination — HTTPS handled at the edge, HTTP inside the private network

Authentication — validate JWT tokens before forwarding requests

Rate limiting — protect backends from abuse

Request routing — /api/users/** goes to user-service, /api/orders/** goes to order-service

In our setup, api-gateway is itself a microservice — a Node.js (or Nginx, or Kong) container that proxies traffic. It sits behind an Application Load Balancer (ALB), which is what the internet actually talks to.

Internet → ALB (public) → api-gateway (private) → internal services

Internal services are never reachable from the internet.

Service Discovery

Service discovery is the mechanism that answers: "Where is user-service right now?"

Without it, you'd hard-code IP addresses. That breaks the moment a container restarts.

There are two approaches:

Client-side discovery — the calling service queries a registry directly, gets an IP, and makes the call itself. The service registry (e.g., Consul, Eureka) must be maintained separately.

Server-side discovery — the calling service calls a known DNS name (user-service.internal). A load balancer or DNS resolver handles routing to a healthy instance. The caller doesn't need to know the registry exists.

On AWS, server-side discovery is the practical choice. AWS Cloud Map provides a private DNS namespace. ECS automatically registers containers into Cloud Map when they start and deregisters them when they stop. A service calls http://user-service.internal:3001 and DNS resolves it to a healthy container's IP — always up to date.

Security Groups

A Security Group is a stateful firewall attached to a network interface (in ECS Fargate, each task gets its own ENI). Rules are evaluated on every connection.

Key properties:

Stateful — if outbound traffic is allowed, the response is automatically allowed back in, regardless of inbound rules.

Default deny — anything not explicitly allowed is blocked.

Reference other security groups — instead of specifying a CIDR range, you can say "allow traffic from the security group attached to api-gateway tasks." This is the correct approach for internal service-to-service communication, because it remains accurate even as IP addresses change.

Best practice: one security group per service. Each service gets its own security group. Inbound rules reference the security group of the caller, not a CIDR block. This creates a machine-readable map of your intended communication graph.

---

Architecture Overview

                        ┌─────────────────────────────────────────┐
                        │              AWS VPC                     │
                        │                                          │
  Internet              │  Public Subnet                           │
  ─────────►  ALB ─────────► api-gateway (SG: sg-gateway)         │
                        │         │                                │
                        │         │ Private Subnet                 │
                        │         ├──► user-service (SG: sg-users) │
                        │         ├──► product-service (SG: sg-products)│
                        │         ├──► order-service (SG: sg-orders)│
                        │         └──► notification-service (SG: sg-notifications)│
                        │                                          │
                        └─────────────────────────────────────────┘

The ALB lives in a public subnet. All ECS tasks run in private subnets with no direct internet access. Communication between services happens over private DNS names provided by AWS Cloud Map.

---

Practical Implementation

Step 1: Create the VPC and Subnets

Use an existing VPC or create a dedicated one. At minimum, you need:

2 public subnets (for the ALB — required for high availability)

2 private subnets (for ECS tasks)

A NAT Gateway in a public subnet (so private tasks can pull images and make outbound calls)

# Using AWS CLI — create a VPC
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --tag-specifications \
  'ResourceType=vpc,Tags=[{Key=Name,Value=microservices-vpc}]'

# Public subnets (replace with your AZs and VPC ID)
aws ec2 create-subnet --vpc-id vpc-XXXXX \
  --cidr-block 10.0.1.0/24 --availability-zone us-east-1a \
  --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-1a}]'

aws ec2 create-subnet --vpc-id vpc-XXXXX \
  --cidr-block 10.0.2.0/24 --availability-zone us-east-1b \
  --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-1b}]'

# Private subnets
aws ec2 create-subnet --vpc-id vpc-XXXXX \
  --cidr-block 10.0.11.0/24 --availability-zone us-east-1a \
  --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-1a}]'

aws ec2 create-subnet --vpc-id vpc-XXXXX \
  --cidr-block 10.0.12.0/24 --availability-zone us-east-1b \
  --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-1b}]'

Step 2: Create the ECS Cluster

aws ecs create-cluster \
  --cluster-name microservices-cluster \
  --capacity-providers FARGATE \
  --default-capacity-provider-strategy capacityProvider=FARGATE,weight=1

Step 3: Set Up AWS Cloud Map for Service Discovery

Create a private DNS namespace. Every service will register itself here automatically.

aws servicediscovery create-private-dns-namespace \
  --name internal \
  --vpc vpc-XXXXX \
  --description "Private namespace for microservices"

This creates a namespace internal. Services will be reachable at <service-name>.internal.

Create a service discovery entry for each microservice:

# user-service
aws servicediscovery create-service \
  --name user-service \
  --dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
  --health-check-custom-config FailureThreshold=1

# product-service
aws servicediscovery create-service \
  --name product-service \
  --dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
  --health-check-custom-config FailureThreshold=1

# order-service
aws servicediscovery create-service \
  --name order-service \
  --dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
  --health-check-custom-config FailureThreshold=1

# notification-service
aws servicediscovery create-service \
  --name notification-service \
  --dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
  --health-check-custom-config FailureThreshold=1

With this in place, order-service calls http://user-service.internal:3001 and DNS resolves to a healthy task. No hard-coded IPs. No custom discovery client in your application code.

Step 4: Create Security Groups

This step defines the communication rules for the entire system. Create one security group per service.

VPC_ID=vpc-XXXXX

# ALB security group — accepts HTTPS from the internet
aws ec2 create-security-group \
  --group-name sg-alb \
  --description "ALB: accept HTTPS from internet" \
  --vpc-id $VPC_ID

# api-gateway security group
aws ec2 create-security-group \
  --group-name sg-gateway \
  --description "api-gateway service" \
  --vpc-id $VPC_ID

# user-service security group
aws ec2 create-security-group \
  --group-name sg-users \
  --description "user-service" \
  --vpc-id $VPC_ID

# product-service security group
aws ec2 create-security-group \
  --group-name sg-products \
  --description "product-service" \
  --vpc-id $VPC_ID

# order-service security group
aws ec2 create-security-group \
  --group-name sg-orders \
  --description "order-service" \
  --vpc-id $VPC_ID

# notification-service security group
aws ec2 create-security-group \
  --group-name sg-notifications \
  --description "notification-service" \
  --vpc-id $VPC_ID

Now define the inbound rules. The key principle: each service only allows inbound traffic from the specific security group of its caller.

# ALB: accept port 443 from the internet
aws ec2 authorize-security-group-ingress \
  --group-id sg-alb-ID \
  --protocol tcp --port 443 --cidr 0.0.0.0/0

# api-gateway: accept traffic only from the ALB
aws ec2 authorize-security-group-ingress \
  --group-id sg-gateway-ID \
  --protocol tcp --port 3000 \
  --source-group sg-alb-ID

# user-service: accept traffic only from api-gateway
aws ec2 authorize-security-group-ingress \
  --group-id sg-users-ID \
  --protocol tcp --port 3001 \
  --source-group sg-gateway-ID

# product-service: accept traffic from api-gateway and order-service
# (order-service checks stock before placing an order)
aws ec2 authorize-security-group-ingress \
  --group-id sg-products-ID \
  --protocol tcp --port 3002 \
  --source-group sg-gateway-ID

aws ec2 authorize-security-group-ingress \
  --group-id sg-products-ID \
  --protocol tcp --port 3002 \
  --source-group sg-orders-ID

# order-service: accept traffic only from api-gateway
aws ec2 authorize-security-group-ingress \
  --group-id sg-orders-ID \
  --protocol tcp --port 3003 \
  --source-group sg-gateway-ID

# notification-service: accept traffic only from order-service
# (only orders trigger notifications in this setup)
aws ec2 authorize-security-group-ingress \
  --group-id sg-notifications-ID \
  --protocol tcp --port 3004 \
  --source-group sg-orders-ID

The resulting communication map looks like this:

ALB → api-gateway → user-service
                 → product-service ← order-service
                 → order-service   → notification-service

notification-service cannot be reached from api-gateway directly — not because of application logic, but because the security group rule does not exist. The network itself enforces the architecture.

Step 5: Create ECS Task Definitions

Each service needs a task definition. Here's the one for api-gateway:

{
  "family": "api-gateway",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "api-gateway",
      "image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/api-gateway:latest",
      "portMappings": [
        {
          "containerPort": 3000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        { "name": "USER_SERVICE_URL", "value": "http://user-service.internal:3001" },
        { "name": "PRODUCT_SERVICE_URL", "value": "http://product-service.internal:3002" },
        { "name": "ORDER_SERVICE_URL", "value": "http://order-service.internal:3003" }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/api-gateway",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

The service URLs use the Cloud Map DNS names. These are static configuration — the actual IP resolution happens at request time, always pointing to a healthy container.

Create the same structure for each service, adjusting the image, port, and environment variables accordingly.

aws ecs register-task-definition --cli-input-json file://task-def-api-gateway.json
aws ecs register-task-definition --cli-input-json file://task-def-user-service.json
aws ecs register-task-definition --cli-input-json file://task-def-product-service.json
aws ecs register-task-definition --cli-input-json file://task-def-order-service.json
aws ecs register-task-definition --cli-input-json file://task-def-notification-service.json

Step 6: Create ECS Services

Each ECS service links a task definition to a security group, subnets, and (for api-gateway) the Cloud Map service discovery registry.

# Internal services — no ALB, just Cloud Map registration
aws ecs create-service \
  --cluster microservices-cluster \
  --service-name user-service \
  --task-definition user-service:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-private-1a,subnet-private-1b],
    securityGroups=[sg-users-ID],
    assignPublicIp=DISABLED
  }" \
  --service-registries "registryArn=arn:aws:servicediscovery:us-east-1:ACCOUNT:service/srv-users-ID"

Repeat for product-service, order-service, and notification-service, substituting their respective security group IDs and Cloud Map service ARNs.

For api-gateway, attach it to the ALB instead:

# Create target group for the ALB
aws elbv2 create-target-group \
  --name tg-api-gateway \
  --protocol HTTP \
  --port 3000 \
  --vpc-id $VPC_ID \
  --target-type ip \
  --health-check-path /health

# Create the ALB
aws elbv2 create-load-balancer \
  --name alb-microservices \
  --subnets subnet-public-1a subnet-public-1b \
  --security-groups sg-alb-ID \
  --scheme internet-facing \
  --type application

# Create HTTPS listener (assumes ACM certificate already exists)
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:...:loadbalancer/app/alb-microservices/... \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT-ID \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:...:targetgroup/tg-api-gateway/...

# Create the api-gateway ECS service
aws ecs create-service \
  --cluster microservices-cluster \
  --service-name api-gateway \
  --task-definition api-gateway:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-private-1a,subnet-private-1b],
    securityGroups=[sg-gateway-ID],
    assignPublicIp=DISABLED
  }" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...:targetgroup/tg-api-gateway/...,containerName=api-gateway,containerPort=3000"

Note that api-gateway still runs in a private subnet — the ALB is the only resource in the public subnet. Traffic from the internet hits the ALB, which forwards it to api-gateway over the private network. The gateway tasks never have a public IP.

---

Verifying the Setup

Check service discovery registration

After each ECS service starts, its tasks register with Cloud Map automatically. Verify:

aws servicediscovery list-instances \
  --service-id srv-users-ID

You should see one entry per running task, each with a private IP address.

Test internal DNS resolution

Connect to a container in the cluster (using ECS Exec) and resolve the internal DNS name:

aws ecs execute-command \
  --cluster microservices-cluster \
  --task TASK-ID \
  --container api-gateway \
  --interactive \
  --command "nslookup user-service.internal"

The response should list the private IPs of your user-service tasks.

Confirm security group enforcement

Try calling notification-service directly from the ALB — it should time out because no rule exists on sg-notifications allowing traffic from sg-alb. The block happens at the network layer, before any application code runs.

---

Common Mistakes

Allowing 0.0.0.0/0 on internal services. It's tempting when debugging — but once you open a security group to the world, it's easy to forget to close it. Internal services should never accept traffic from the internet.

Using CIDR ranges instead of security group references for internal rules. CIDR ranges break as soon as IP addresses change. Reference security groups by ID — they stay accurate regardless of which IPs your containers happen to have.

One security group for all internal services. This is convenient to set up but destroys the communication map. notification-service would accept calls from api-gateway, even though it shouldn't. Separate security groups make your intended architecture visible and enforceable.

Forgetting health check endpoints. Cloud Map uses health checks to deregister unhealthy tasks. If your service doesn't expose a /health endpoint that returns 200, Cloud Map will deregister healthy tasks or leave unhealthy ones in the registry.

---

Summary

Layer	What it solves
ALB	Accepts public HTTPS traffic, terminates TLS, routes to `api-gateway`
`api-gateway` (ECS service)	Single entry point, routes internal requests by path
AWS Cloud Map	Resolves `service-name.internal` to live container IPs — no hard-coded addresses
Security Groups (per service)	Enforces who can call whom at the network level, independent of application logic
Private subnets	Internal services are unreachable from the internet by default

The setup takes more effort than a single server, but the result is a system where the architecture is enforced by infrastructure — not convention. Adding a new service means creating a task definition, a Cloud Map entry, and a security group with explicit rules about who may call it. Everything else stays the same.

Deploying Microservices to AWS: API Gateway, Service Discovery, and Security Groups in Practice