HeadlinesBriefing favicon HeadlinesBriefing.com

Cutting AWS Onboarding from 7 Days to 1 Hour

DEV Community •
×

A two-person platform team re-architected their AWS infrastructure, moving from manual provisioning to Terragrunt. The payoff was dramatic: new environment setup time dropped from ~7 days to ~1 hour. This shift addressed scaling pain points where every new client environment required copying hundreds of configuration lines, a process that couldn't sustain their multi-account, multi-region disaster recovery needs.

The team faced specific challenges the official documentation often misses. A major hurdle was a circular dependency created by S3 Cross-Region Replication, where buckets, IAM policies, and replication rules all referenced each other, choking Terraform's dependency graph. They also dealt with Terragrunt's cache bloating to 75-80 GB, crashing CI/CD agents. Their solution involved splitting resources into "physical" and "logical" layers to untangle these cycles.

Key to their success was a pilot-light DR strategy that balances cost with a one-hour recovery time target. Critical data like Aurora Global DB and S3 CRR stays warm, while compute resources like EKS are provisioned on demand. For their small team, they controversially skipped complex DynamoDB locking in favor of human coordination. They emphasize designing for your team's size and regularly testing disaster recovery plans, not just assuming they work.