← All Posts
Infrastructure 7 min read

Data Center Migration: A Step-by-Step Approach

DC migrations are underestimated in scope and overestimated in how well they go. After leading a three-month migration across multiple enterprise facilities, here is the honest version — what worked, what didn't, and what the project plan didn't warn us about.

Every DC migration I've been involved in starts with the same optimism: this one will be different, we have the inventory, we have the timeline, we've done the dependency mapping. It never goes entirely to plan. The question is whether the surprises are contained or catastrophic.

The migration I'm documenting here involved moving production infrastructure for a large enterprise group — hundreds of physical and virtual servers across multiple business units — from an aging facility to a new, purpose-built data center. Three months from start to complete cutover, with a hard deadline driven by the lease expiry on the old facility.

Phase 1: Discovery and Inventory (Weeks 1–3)

The first thing we discovered was that the asset inventory was wrong. Not dramatically wrong — but wrong enough to matter. Systems that appeared in the CMDB as production had been decommissioned. Systems that were production didn't appear in the CMDB at all. We found physical servers that no one in the current IT team had knowledge of. One of them turned out to be running a critical business process for a subsidiary — nobody knew it existed until we started physically inventorying the rack.

The discovery phase is not glamorous. It involves going through every rack, every server, every network connection, and building an accurate picture of what you have. Tools help — network discovery scanning, vSphere inventory exports, AD computer account enumeration — but they don't replace physical inspection. Virtual machines have a particular tendency to be forgotten.

Dependency mapping

Once the inventory is accurate, dependency mapping is the hardest part of the preparation phase. For each system, you need to know: what depends on it, and what does it depend on? Move a database server without moving its application server first, and you have an outage. Move the application server first, and you have an outage because the database is still at the old site.

We used network traffic analysis (NetFlow data from the core switches) to validate application dependencies — what was actually talking to what — against what the documentation said. The discrepancies were significant. Applications talking to services that were supposed to have been decommissioned. Undocumented integrations between business unit systems. Direct dependencies on external vendor systems that the business didn't know about.

Phase 2: Migration Waves (Weeks 4–10)

Based on the dependency mapping, we organized systems into migration waves with defined sequence constraints. Systems with no inbound dependencies migrated in early waves. Systems that other systems depended on migrated last. Within each wave, we defined the migration method for each system:

"The maintenance windows always take longer than planned. Build in three times the estimated cutover time, not two. The third time is when you need it."

The rollback requirement

Every migration cutover had a defined rollback procedure and a decision point: if we're not confident at T+30 minutes, we roll back. This sounds obvious, but in practice there is always pressure to push through — "we've come this far," "the business is waiting." Having a pre-agreed rollback threshold and the technical capability to execute it prevented several marginal situations from becoming extended outages.

Phase 3: Network Cutover (Week 11)

The network cutover — changing the IP routing so that traffic flows to the new facility — was the highest-risk phase. Every system is now at the new site. The question is whether the new network infrastructure handles the traffic exactly as the old one did.

We ran the new and old environments in parallel for two weeks before network cutover, with systems replicated to both sites and traffic analysis confirming that all application paths were working at the new site. The cutover itself happened over a weekend, with all business unit representatives available to validate their systems immediately after.

What We Got Wrong

Underestimating the application testing requirement was the most significant mistake. We allocated two days per migration wave for application validation. For simple systems, two days was adequate. For complex, multi-tier applications with downstream reporting systems and scheduled jobs, two days was insufficient. Some issues only surfaced a week after cutover when a monthly batch process ran for the first time in the new environment.

The lesson: for each system, explicitly map every scheduled process, batch job, and time-triggered integration before migration. Test those specifically, not just the interactive paths.

Key Takeaways

Related →
Ransomware Resilience: Building a Recovery-First Strategy
Related →
Huawei Cloud vs AWS: A Practical Infrastructure Perspective