🏠 Home 👤 About 💎 Why Choose Dipak 🤝 For Partners 🧠 Expertise 📊 Case Studies 📚 Knowledge Hub 🏆 Credentials
✉️ Get in touch
Knowledge Hub

Practical field notes for cloud database delivery.

A focused technical library for AWS database, PostgreSQL, Oracle modernisation, migration, HA/DR, FinOps and production reliability. Use it as a fast signal of how I assess risk, plan cutovers and communicate production decisions — without overloading the homepage.

How to use this page

Start with the field guide, then browse by topic.

The articles are written as short delivery briefs for engineering managers, recruiters and partner teams: what matters, what can go wrong, and what a practical next step looks like in production.

1. The replication instance is your bottleneck, not the source. DMS task throughput is bounded by the replication instance class. For multi-TB loads, start with dms.r5.4xlarge or larger. Undersizing here wastes days.

2. Supplemental logging is the #1 Oracle gotcha. For Oracle sources with CDC, enable supplemental logging at the database level AND on each table's primary key. Miss the table-level and CDC silently drops rows on UPDATE.

3. LOB mode matters enormously. Full LOB mode is correct but slow (one round trip per LOB). Limited LOB truncates above your configured size. Inline LOB (DMS 3.4.7+) gives best performance for LOBs under 8 KB. Choose deliberately — don't accept the default.

4. Table mapping is more powerful than people think. You can filter by schema, exclude audit tables, transform case (UPPER → lower), and add calculated columns — all in the JSON mapping rules. Read the documentation before writing application-side ETL.

5. Parallel load by column ranges beats parallel tables. For tables over 100 GB, configure ParallelLoadThreads with column range boundaries. This is faster than running multiple tables in parallel because you're splitting the I/O across the single largest table.

6. Validation is optional but invaluable. EnableValidation runs row-level comparisons after migration. It adds 30–50% to migration time but catches silent data drift. Run it on a representative sample for huge tables.

7. CDC lag has three places to look. Source latency (reading redo/WAL), target latency (writing to destination), and network. Check CDCLatencySource and CDCLatencyTarget CloudWatch metrics separately — they need different fixes.

8. Large transactions stall everything. DMS holds a complete transaction in memory until COMMIT. One 50 GB transaction stalls the entire task. Batch your large operations, or use the BatchApplyEnabled setting.

9. Schema conversion comes first, always. DMS moves data. AWS SCT converts schema and code. Run SCT first, fix conversion issues, create the target schema, then start DMS. The other order creates chaos.

10. Keep a reverse-CDC option for 48 hours post-cutover. A DMS task running in reverse (new target → old source) gives you rollback capability during the first critical hours. It's cheap insurance.

Bottom line Size the replication instance for the load, enable supplemental logging before CDC, choose LOB mode deliberately, and run SCT first. Most "DMS is slow/dropping data" incidents trace back to one of these four — not to DMS itself.

Logical replication slots in PostgreSQL are the most common cause of silent disk exhaustion I've seen in production RDS environments. The mechanism is simple: a replication slot tells PostgreSQL to retain WAL (Write-Ahead Log) files until the consumer has confirmed receipt. If the consumer stops, disconnects, or falls behind, WAL files accumulate indefinitely — until pg_wal fills the disk and PostgreSQL halts with a fatal error.

How to detect it early

Query pg_replication_slots and compare confirmed_flush_lsn against pg_current_wal_lsn(). Any slot where active = false for more than an hour is a ticking bomb. On RDS, monitor the TransactionLogsDiskUsage CloudWatch metric.

The three culprits behind replication lag

  1. Slow consumer — the subscriber can't apply changes fast enough, often due to missing indexes on the target or FK constraint checks.
  2. Large transactions — PostgreSQL ships logical changes at COMMIT, so a single 10 GB transaction arrives as one burst.
  3. TOAST expansion — toasted columns (large text/JSON) expand the decoded payload dramatically, sometimes 10× the on-disk size.

Fixing it

For slow consumers, add indexes on the subscriber's replicated tables and consider disabling FK checks during bulk catch-up. For large transactions, batch your writes. For TOAST overhead, filter unnecessary large columns from the publication using ALTER PUBLICATION ... SET TABLE ... (column_list) (PostgreSQL 15+). For emergency WAL pressure, the nuclear option is pg_drop_replication_slot() — but this means you lose your replication position and need to re-initialise.

Prevention

Set max_slot_wal_keep_size (PostgreSQL 13+) to cap WAL retention per slot. Monitor slot lag with alerting at 1 GB and critical at 5 GB.

Bottom line Treat inactive replication slots as incidents, not warnings — set max_slot_wal_keep_size and alert on slot lag before pg_wal ever fills. The fix is almost always cheap; the outage from ignoring it is not.

The headline difference is storage architecture. Aurora PostgreSQL uses a distributed, shared-storage layer — 6 copies across 3 Availability Zones, with replicas reading from the same storage as the writer. RDS PostgreSQL is vanilla PostgreSQL on EBS volumes, with read replicas streaming WAL from the primary.

Choose Aurora when…

  • You need fast failover — typically under 30 seconds, vs RDS's 60–120 seconds.
  • You need multiple read replicas with minimal lag — Aurora replicas see sub-20 ms lag since they share storage; RDS replicas lag by seconds.
  • You want auto-scaling storage without pre-provisioning.
  • Seconds of downtime matter financially.

Choose RDS PostgreSQL when…

  • You need specific extensions not supported on Aurora (the compatibility matrix is shorter than you'd expect).
  • You want exact version control — RDS tracks community PostgreSQL releases more closely.
  • Your workload is smaller and cost-sensitive — RDS's base price is lower on smaller instance classes.
  • You need pg_cron or other extensions with restrictions on Aurora.

The cost trap

Aurora's I/O-Optimized configuration includes I/O but charges a higher base storage rate. For write-heavy workloads (high WAL generation), Aurora I/O-Optimized often wins. For read-heavy workloads with modest storage, RDS gp3 with provisioned IOPS can be cheaper. Model your specific workload — don't assume Aurora is always more expensive.

The migration reality

Moving from RDS PostgreSQL to Aurora is straightforward (snapshot restore). Moving back is harder — once on Aurora, you're committed to its storage model.

Bottom line Pick Aurora for HA and low-lag reads, RDS for extension flexibility and cost-sensitive smaller workloads — and make the call before you build your HA architecture around Aurora's endpoint model, because the move back is the hard direction.

Enterprises moving from Oracle Data Guard to AWS face a conceptual shift. Data Guard provides a physical standby (block-for-block copy via redo apply) or a logical standby (SQL-level apply). Both are flexible — you control switchover timing, protection modes, and can open standbys for read-only queries (Active Data Guard).

On AWS, the equivalent depends on the target engine. RDS Multi-AZ gives you a synchronous secondary in a different AZ — not readable until the newer Multi-AZ DB Cluster option. Aurora gives you up to 15 read replicas sharing the same storage layer, with automatic failover.

What Oracle shops miss most

  1. Control over switchover timing — AWS failover is automatic and opinionated, not a manual decision.
  2. Observer / FSFO equivalent — Data Guard's Fast-Start Failover with Observer has no direct AWS analogue; you get CloudWatch alarms and automatic failover instead.
  3. Cross-region DR — Data Guard replicates to any connected site; AWS needs Aurora Global Database or RDS cross-region replicas, each with different lag characteristics.

The mapping table

  • Oracle Physical Standby → RDS Multi-AZ (automatic failover)
  • Oracle Active Data Guard → Aurora Read Replica (readable, sub-20 ms lag)
  • Oracle Data Guard Far Sync → Aurora Global Database (cross-region, typically under 1 s lag)
  • Oracle GoldenGate active-active → not directly available; consider DMS or application-level conflict resolution

The honest gap

Oracle Data Guard gives DBAs more control. AWS managed services give less control but more reliability for the common cases. The real question: do you trust Amazon's automation, or do you need to own the failover decision?

Bottom line Map each Data Guard role to its AWS equivalent before you migrate, and accept the trade: you give up manual switchover control in exchange for automation that's more reliable for the common cases. For most enterprises, trusting the automation frees real engineering time.

1. gp2 → gp3 storage migration. gp3 gives independent IOPS and throughput scaling at lower base cost. For any RDS workload under 16 TiB, this is a one-click modify with no downtime. Most teams save 15–20% on storage costs immediately.

2. Reserved Instance size flexibility. A db.r6g.large RI can be split across two db.r6g.medium instances within the same family. Plan RIs by family, not by exact size — you get flexibility most finance teams don't know exists.

3. Stop paying for idle dev/test databases. RDS instances can be stopped for up to 7 days (auto-restart after 7 days). For dev/test environments used only during business hours, schedule stop/start with Lambda or AWS Instance Scheduler. Savings: ~65% on instance hours.

4. Right-size before reserving. Performance Insights (free for 7-day retention) shows actual CPU and memory utilisation. If your db.r6g.2xlarge averages 15% CPU, downsize to db.r6g.large before committing to a 1-year RI on the wrong size.

5. Aurora I/O-Optimized for write-heavy workloads. If your Aurora cluster's I/O costs exceed 25% of total database spend, I/O-Optimized pricing (higher storage rate, zero I/O charges) usually saves money. Run the calculation — it takes 5 minutes.

6. Cross-region read replicas vs Global Database. If you only need cross-region reads (not failover), a cross-region read replica is cheaper than Aurora Global Database. Match the architecture to the actual DR requirement, not the aspirational one.

7. Snapshot and backup lifecycle. Automated backups are free up to the provisioned storage size. Manual snapshots are not. Audit your manual snapshots quarterly — stale snapshots from old projects accumulate silently and cost real money.

Bottom line The fastest wins need no architecture change: move gp2→gp3, stop idle dev/test instances, and right-size before you reserve. Those three alone typically cut a database bill by a quarter — the rest is matching architecture to the real requirement, not the aspirational one.

Production help / hiring signal

Turn field notes into a delivery plan.

These notes show the practical thinking behind my database work. If you are hiring a senior cloud database engineer, need AWS partner/subcontract support, or want an independent review before migration, cutover, performance rescue or HA/DR change, this page gives you evidence of how I structure production decisions.

AssessReview the current database estate, risks, dependencies and production constraints.
PlanConvert the right note into a migration, tuning, FinOps or HA/DR action plan.
DeliverSupport execution with runbooks, validation steps, stakeholder updates and clean handover material.
Developer Toolkit
Ctrl + T to open anytime
🐘
SQL Formatter
Beautify & indent SQL queries
📋
JSON Formatter
Pretty-print & validate JSON
🔐
Base64 Encode/Decode
Encode or decode Base64
⏱️
Timestamp Converter
Unix ↔ Human-readable dates
Cron Translator
Explain cron expressions
🔍
Regex Tester
Test patterns live
🆔
UUID Generator
Generate v4 UUIDs instantly
🔒
Hash Generator
SHA-256, MD5, SHA-1
📝
Text Diff
Compare two text blocks
🔗
URL Encode/Decode
Encode or decode URLs
🎨
Color Converter
HEX ↔ RGB ↔ HSL
🛡️
Password Generator
Secure random passwords
Services & fast links
Contract / SubcontractFast partner-ready delivery ☁️AWS DatabaseRDS, Aurora, DMS, production ops RDS & AuroraManaged PostgreSQL architecture 🔄Oracle → PostgreSQLSCT, DMS, cutover, rollback 📦Database MigrationAssessment, migration, validation 🐘PostgreSQL TuningSlow SQL, vacuum, bloat, locks 🔶Oracle DBARAC, Data Guard, Exadata, GoldenGate🟦SQL ServerAlways On, tuning, HA/DR, Azure SQL / AWS RDS 🏗️Cloud ArchitectureHA/DR, platform design, governance 🤖AI-SDLCAI-assisted engineering workflows 📊Case StudiesProof, outcomes and delivery examples 📚Knowledge HubGuides, notes and resources 🤝Partner SupportSubcontracting and bid support
Request help / availability →