Running microservices on your local machine is straightforward — each service starts on a different port, and they talk to each other over localhost. The moment you move to AWS, everything changes.
Services are spread across containers, availability zones, and private subnets. IP addresses are ephemeral. A container that crashes and restarts gets a new IP. And every network path that isn't explicitly allowed is blocked by default.
This post works through a concrete scenario: 5 microservices deployed on AWS ECS Fargate, one of which acts as the API Gateway. We'll cover how service discovery solves the ephemeral IP problem, and how security groups enforce the communication rules that keep your infrastructure from becoming an open network.
---
The Problem
Suppose you've built five services:
| Service | Role |
|---|---|
api-gateway | Public entry point. Routes requests to internal services. |
user-service | Handles authentication and user profiles. |
product-service | Manages product catalogue and stock. |
order-service | Creates and tracks orders. |
notification-service | Sends emails and push notifications. |
order-service calls user-service at http://localhost:3001. In AWS, user-service runs inside a private subnet on a container whose IP address changes every time it restarts. There is no localhost.
Three questions immediately surface:
order-service know where user-service is? (Service Discovery)These aren't separate concerns — they are deeply connected. Getting one wrong breaks the others.
---
Fundamental Concepts
The API Gateway Pattern
An API Gateway is the single entry point for all external traffic. Clients never call user-service or order-service directly. They call the gateway, and the gateway routes the request to the right internal service.
This pattern gives you one place to handle:
/api/users/** goes to user-service, /api/orders/** goes to order-serviceIn our setup, api-gateway is itself a microservice — a Node.js (or Nginx, or Kong) container that proxies traffic. It sits behind an Application Load Balancer (ALB), which is what the internet actually talks to.
Internet → ALB (public) → api-gateway (private) → internal services
Internal services are never reachable from the internet.
Service Discovery
Service discovery is the mechanism that answers: "Where is user-service right now?"
Without it, you'd hard-code IP addresses. That breaks the moment a container restarts.
There are two approaches:
Client-side discovery — the calling service queries a registry directly, gets an IP, and makes the call itself. The service registry (e.g., Consul, Eureka) must be maintained separately.
Server-side discovery — the calling service calls a known DNS name (user-service.internal). A load balancer or DNS resolver handles routing to a healthy instance. The caller doesn't need to know the registry exists.
On AWS, server-side discovery is the practical choice. AWS Cloud Map provides a private DNS namespace. ECS automatically registers containers into Cloud Map when they start and deregisters them when they stop. A service calls http://user-service.internal:3001 and DNS resolves it to a healthy container's IP — always up to date.
Security Groups
A Security Group is a stateful firewall attached to a network interface (in ECS Fargate, each task gets its own ENI). Rules are evaluated on every connection.
Key properties:
api-gateway tasks." This is the correct approach for internal service-to-service communication, because it remains accurate even as IP addresses change.Best practice: one security group per service. Each service gets its own security group. Inbound rules reference the security group of the caller, not a CIDR block. This creates a machine-readable map of your intended communication graph.
---
Architecture Overview
┌─────────────────────────────────────────┐
│ AWS VPC │
│ │
Internet │ Public Subnet │
─────────► ALB ─────────► api-gateway (SG: sg-gateway) │
│ │ │
│ │ Private Subnet │
│ ├──► user-service (SG: sg-users) │
│ ├──► product-service (SG: sg-products)│
│ ├──► order-service (SG: sg-orders)│
│ └──► notification-service (SG: sg-notifications)│
│ │
└─────────────────────────────────────────┘
The ALB lives in a public subnet. All ECS tasks run in private subnets with no direct internet access. Communication between services happens over private DNS names provided by AWS Cloud Map.
---
Practical Implementation
Step 1: Create the VPC and Subnets
Use an existing VPC or create a dedicated one. At minimum, you need:
# Using AWS CLI — create a VPC
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --tag-specifications \
'ResourceType=vpc,Tags=[{Key=Name,Value=microservices-vpc}]'
# Public subnets (replace with your AZs and VPC ID)
aws ec2 create-subnet --vpc-id vpc-XXXXX \
--cidr-block 10.0.1.0/24 --availability-zone us-east-1a \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-1a}]'
aws ec2 create-subnet --vpc-id vpc-XXXXX \
--cidr-block 10.0.2.0/24 --availability-zone us-east-1b \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-1b}]'
# Private subnets
aws ec2 create-subnet --vpc-id vpc-XXXXX \
--cidr-block 10.0.11.0/24 --availability-zone us-east-1a \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-1a}]'
aws ec2 create-subnet --vpc-id vpc-XXXXX \
--cidr-block 10.0.12.0/24 --availability-zone us-east-1b \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-1b}]'
Step 2: Create the ECS Cluster
aws ecs create-cluster \
--cluster-name microservices-cluster \
--capacity-providers FARGATE \
--default-capacity-provider-strategy capacityProvider=FARGATE,weight=1
Step 3: Set Up AWS Cloud Map for Service Discovery
Create a private DNS namespace. Every service will register itself here automatically.
aws servicediscovery create-private-dns-namespace \
--name internal \
--vpc vpc-XXXXX \
--description "Private namespace for microservices"
This creates a namespace internal. Services will be reachable at <service-name>.internal.
Create a service discovery entry for each microservice:
# user-service
aws servicediscovery create-service \
--name user-service \
--dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
--health-check-custom-config FailureThreshold=1
# product-service
aws servicediscovery create-service \
--name product-service \
--dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
--health-check-custom-config FailureThreshold=1
# order-service
aws servicediscovery create-service \
--name order-service \
--dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
--health-check-custom-config FailureThreshold=1
# notification-service
aws servicediscovery create-service \
--name notification-service \
--dns-config "NamespaceId=ns-XXXXX,DnsRecords=[{Type=A,TTL=10}]" \
--health-check-custom-config FailureThreshold=1
With this in place, order-service calls http://user-service.internal:3001 and DNS resolves to a healthy task. No hard-coded IPs. No custom discovery client in your application code.
Step 4: Create Security Groups
This step defines the communication rules for the entire system. Create one security group per service.
VPC_ID=vpc-XXXXX
# ALB security group — accepts HTTPS from the internet
aws ec2 create-security-group \
--group-name sg-alb \
--description "ALB: accept HTTPS from internet" \
--vpc-id $VPC_ID
# api-gateway security group
aws ec2 create-security-group \
--group-name sg-gateway \
--description "api-gateway service" \
--vpc-id $VPC_ID
# user-service security group
aws ec2 create-security-group \
--group-name sg-users \
--description "user-service" \
--vpc-id $VPC_ID
# product-service security group
aws ec2 create-security-group \
--group-name sg-products \
--description "product-service" \
--vpc-id $VPC_ID
# order-service security group
aws ec2 create-security-group \
--group-name sg-orders \
--description "order-service" \
--vpc-id $VPC_ID
# notification-service security group
aws ec2 create-security-group \
--group-name sg-notifications \
--description "notification-service" \
--vpc-id $VPC_ID
Now define the inbound rules. The key principle: each service only allows inbound traffic from the specific security group of its caller.
# ALB: accept port 443 from the internet
aws ec2 authorize-security-group-ingress \
--group-id sg-alb-ID \
--protocol tcp --port 443 --cidr 0.0.0.0/0
# api-gateway: accept traffic only from the ALB
aws ec2 authorize-security-group-ingress \
--group-id sg-gateway-ID \
--protocol tcp --port 3000 \
--source-group sg-alb-ID
# user-service: accept traffic only from api-gateway
aws ec2 authorize-security-group-ingress \
--group-id sg-users-ID \
--protocol tcp --port 3001 \
--source-group sg-gateway-ID
# product-service: accept traffic from api-gateway and order-service
# (order-service checks stock before placing an order)
aws ec2 authorize-security-group-ingress \
--group-id sg-products-ID \
--protocol tcp --port 3002 \
--source-group sg-gateway-ID
aws ec2 authorize-security-group-ingress \
--group-id sg-products-ID \
--protocol tcp --port 3002 \
--source-group sg-orders-ID
# order-service: accept traffic only from api-gateway
aws ec2 authorize-security-group-ingress \
--group-id sg-orders-ID \
--protocol tcp --port 3003 \
--source-group sg-gateway-ID
# notification-service: accept traffic only from order-service
# (only orders trigger notifications in this setup)
aws ec2 authorize-security-group-ingress \
--group-id sg-notifications-ID \
--protocol tcp --port 3004 \
--source-group sg-orders-ID
The resulting communication map looks like this:
ALB → api-gateway → user-service
→ product-service ← order-service
→ order-service → notification-service
notification-service cannot be reached from api-gateway directly — not because of application logic, but because the security group rule does not exist. The network itself enforces the architecture.
Step 5: Create ECS Task Definitions
Each service needs a task definition. Here's the one for api-gateway:
{
"family": "api-gateway",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "api-gateway",
"image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/api-gateway:latest",
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"environment": [
{ "name": "USER_SERVICE_URL", "value": "http://user-service.internal:3001" },
{ "name": "PRODUCT_SERVICE_URL", "value": "http://product-service.internal:3002" },
{ "name": "ORDER_SERVICE_URL", "value": "http://order-service.internal:3003" }
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/api-gateway",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
The service URLs use the Cloud Map DNS names. These are static configuration — the actual IP resolution happens at request time, always pointing to a healthy container.
Create the same structure for each service, adjusting the image, port, and environment variables accordingly.
aws ecs register-task-definition --cli-input-json file://task-def-api-gateway.json
aws ecs register-task-definition --cli-input-json file://task-def-user-service.json
aws ecs register-task-definition --cli-input-json file://task-def-product-service.json
aws ecs register-task-definition --cli-input-json file://task-def-order-service.json
aws ecs register-task-definition --cli-input-json file://task-def-notification-service.json
Step 6: Create ECS Services
Each ECS service links a task definition to a security group, subnets, and (for api-gateway) the Cloud Map service discovery registry.
# Internal services — no ALB, just Cloud Map registration
aws ecs create-service \
--cluster microservices-cluster \
--service-name user-service \
--task-definition user-service:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-private-1a,subnet-private-1b],
securityGroups=[sg-users-ID],
assignPublicIp=DISABLED
}" \
--service-registries "registryArn=arn:aws:servicediscovery:us-east-1:ACCOUNT:service/srv-users-ID"
Repeat for product-service, order-service, and notification-service, substituting their respective security group IDs and Cloud Map service ARNs.
For api-gateway, attach it to the ALB instead:
# Create target group for the ALB
aws elbv2 create-target-group \
--name tg-api-gateway \
--protocol HTTP \
--port 3000 \
--vpc-id $VPC_ID \
--target-type ip \
--health-check-path /health
# Create the ALB
aws elbv2 create-load-balancer \
--name alb-microservices \
--subnets subnet-public-1a subnet-public-1b \
--security-groups sg-alb-ID \
--scheme internet-facing \
--type application
# Create HTTPS listener (assumes ACM certificate already exists)
aws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:...:loadbalancer/app/alb-microservices/... \
--protocol HTTPS \
--port 443 \
--certificates CertificateArn=arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT-ID \
--default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:...:targetgroup/tg-api-gateway/...
# Create the api-gateway ECS service
aws ecs create-service \
--cluster microservices-cluster \
--service-name api-gateway \
--task-definition api-gateway:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-private-1a,subnet-private-1b],
securityGroups=[sg-gateway-ID],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...:targetgroup/tg-api-gateway/...,containerName=api-gateway,containerPort=3000"
Note that api-gateway still runs in a private subnet — the ALB is the only resource in the public subnet. Traffic from the internet hits the ALB, which forwards it to api-gateway over the private network. The gateway tasks never have a public IP.
---
Verifying the Setup
Check service discovery registration
After each ECS service starts, its tasks register with Cloud Map automatically. Verify:
aws servicediscovery list-instances \
--service-id srv-users-ID
You should see one entry per running task, each with a private IP address.
Test internal DNS resolution
Connect to a container in the cluster (using ECS Exec) and resolve the internal DNS name:
aws ecs execute-command \
--cluster microservices-cluster \
--task TASK-ID \
--container api-gateway \
--interactive \
--command "nslookup user-service.internal"
The response should list the private IPs of your user-service tasks.
Confirm security group enforcement
Try calling notification-service directly from the ALB — it should time out because no rule exists on sg-notifications allowing traffic from sg-alb. The block happens at the network layer, before any application code runs.
---
Common Mistakes
Allowing 0.0.0.0/0 on internal services. It's tempting when debugging — but once you open a security group to the world, it's easy to forget to close it. Internal services should never accept traffic from the internet.
Using CIDR ranges instead of security group references for internal rules. CIDR ranges break as soon as IP addresses change. Reference security groups by ID — they stay accurate regardless of which IPs your containers happen to have.
One security group for all internal services. This is convenient to set up but destroys the communication map. notification-service would accept calls from api-gateway, even though it shouldn't. Separate security groups make your intended architecture visible and enforceable.
Forgetting health check endpoints. Cloud Map uses health checks to deregister unhealthy tasks. If your service doesn't expose a /health endpoint that returns 200, Cloud Map will deregister healthy tasks or leave unhealthy ones in the registry.
---
Summary
| Layer | What it solves |
|---|---|
| ALB | Accepts public HTTPS traffic, terminates TLS, routes to api-gateway |
api-gateway (ECS service) | Single entry point, routes internal requests by path |
| AWS Cloud Map | Resolves service-name.internal to live container IPs — no hard-coded addresses |
| Security Groups (per service) | Enforces who can call whom at the network level, independent of application logic |
| Private subnets | Internal services are unreachable from the internet by default |