Realm

Network Engineer - Datacenter Operations

Realm United States

Save

Pay found in job post

Retrieved from the description.

Base pay range

$150,000.00/yr - $260,000.00/yr

Direct message the job poster from Realm

🛜 Network Engineer - Datacenter Operations

🤖 High-Growth AI Infrastructure

🇺🇸 United States - 30% Travel

💵 $150,000 - $250,000 + Equity + Benefits


Description:

A rapidly scaling AI-infrastructure company, backing many of the world’s leading research labs and next-generation AI builders, is seeking a Network Engineer focused on Operations and Repair.


They’re building colossal GPU clusters in the US - think 100k+ GPUs, liquid cooling, multi-GW power draw. This is the infrastructure that literally determines how fast the future gets built.

This role is for an experienced network operations engineer who wants true ownership. You’ll be the primary operator for a datacenter region, responsible for keeping large-scale network fabrics healthy, responding to complex incidents, and coordinating repair and recovery when things go wrong.


This is not a NOC role and not a design-only position. You’ll work closely with centralized monitoring teams, deployment engineers, and onsite operations to ensure production networks stay available and performant.


What you’ll do

  • Own network operations for an assigned datacenter region, supporting datacenter deployments, turn-ups, and expansions
  • Act as Tier 2/3 escalation point for network incidents
  • Troubleshoot complex L1–L3 and fabric-level issues
  • Coordinate network break-fix with onsite teams and vendors
  • Manage RMAs and vendor escalations
  • Build and maintain regional/network observability dashboards
  • Validate production readiness and operational handover


Requirements:

  • 4+ years of network engineering with heavy production ops exposure
  • Proven experience running and troubleshooting live datacenter networks
  • Strong incident response and outage leadership experience
  • Hands-on with EVPN/VXLAN, BGP, CLOS, high-radix switching
  • Confident in troubleshooting L2/L3, routing, fabric, and physical faults
  • Experience with SQL-backed dashboards (Grafana, Tableau, similar)
  • Working knowledge of Python for ops, analysis, or scripting
  • Pragmatic operator: prioritizes impact, documents as they go
  • Comfortable with ~30–40% travel


Nice to have

  • AI/ML or HPC network operations (RDMA, RoCEv2, lossless Ethernet)
  • Previous site, campus, or regional ops ownership
  • Hands-on hardware break-fix and RMA coordination
  • Experience with network monitoring, alerting, and telemetry
  • Follow-the-sun or globally distributed ops experience


Compensation:

  • $150k–$260k + meaningful equity
  • Generous PTO policy
  • Remote flexibility available, though in-office presence is encouraged.

  • Seniority level

    Mid-Senior level
  • Employment type

    Full-time
  • Job function

    Information Technology
  • Industries

    Software Development

Referrals increase your chances of interviewing at Realm by 2x

See who you know
Get notified when a new job is posted.

Similar jobs

People also viewed

Similar Searches

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More