Build the operating system for clusters

We build infrastructure for a world where clusters are personal, sovereign, and developer-controlled.

Open Roles

We're looking for a frontend engineer to own the interface through which the world will interact with distributed systems. This isn't just another dashboard — it's an entire OS. Who invented the mouse and cursor? Who thought of the "x" button to close a window? The personal cluster revolution will require frontend innovation on the same level! These sparks of inspiration reframe our relationship with the hardware from something scary and complex into something accessible and useful to everyone, not just a select few highly technical users.

What You'll Do

Design and build the ClusterdOS interface, including cluster provisioning flows, natural language interfaces, and real-time proactive system insights.
Create visualization systems for distributed state that make complex infrastructure legible at a glance.
Work directly with Kubernetes APIs, GitOps controllers, gRPC/cRPC backends, and distributed system primitives to surface the right information at the right time. (No prior experience required—just curiosity and a willingness to learn.)
Help design elegant protocols in collaboration with backend and platform teams for use in the frontend codebase.
Build real-time reactive UIs using Convex that work equally well for individual tinkerers and large teams shipping to production.
Prototype new interaction patterns for infrastructure management, testing assumptions quickly and iterating based on feedback.
Establish frontend architecture patterns and best practices as an early engineering team member.

What We're Looking For

You write maintainable, well-structured frontend code.
You have basic experience or knowledge of Kubernetes and understand pods, deployments, services, and how distributed systems behave.
You have a deep understanding of performance, accessibility, and frontend best practices.
You are comfortable working with complex state management in systems that reflect real-world distributed infrastructure.
You can translate technical complexity into clear user interfaces without oversimplifying.
You have experience working with APIs and backend services, with bonus points if you’re comfortable contributing to backend logic when needed.
You have strong communication skills and the ability to collaborate across product, design, and backend teams.
You are self-directed with strong prioritization instincts and can identify what matters and execute accordingly.
You are able to build an end-to-end Next.js application using React, Tailwind, and shadcn/ui.

Nice to Have

Background in systems programming or distributed systems.
Experience working with Convex.
Experience with PostHog or similar analytics tooling.
Experience with observability tools such as Grafana or other monitoring and visualization platforms.
Experience designing and implementing elegant, compute-efficient UI animations.
A track record of shipping user-facing infrastructure or developer tools.
Experience with GitOps workflows or infrastructure-as-code tooling.

APPLY NOW

We're looking for a Platform Engineer who will be instrumental in building and evolving ClusterdOS. You'll work directly with our founding team to design systems that abstract away Kubernetes complexity while preserving its power. You'll design GitOps workflows, build Kubernetes operators and controllers, and create the automation that makes cluster management invisible to end users.

What You'll Do

Build and extend ClusterdOS core features using Go, including custom Kubernetes operators and controllers
Design and implement GitOps workflows with ArgoCD that make continuous deployment feel automatic
Develop infrastructure-as-code patterns using Terraform and Helm that provision and manage clusters seamlessly
Work on distributed storage solutions using Ceph and WEKA for high-performance, scalable cluster storage
Create observability and monitoring systems using Prometheus and Grafana to surface cluster health and performance
Build and optimize container networking with Cilium for network security and observability
Design and implement federated Kubernetes architectures for multi-cluster management
Build automation tooling that reduces operational overhead for developers running production workloads
Contribute to open source components
Establish platform architecture patterns and practices as an early engineering team member

What We're Looking For

Go — Strong proficiency building system-level tooling, controllers, and distributed systems
ArgoCD — Deep experience with GitOps workflows, declarative infrastructure, and continuous deployment patterns
Kubernetes — Production experience with Kubernetes internals, custom resources (CRDs), operators, cluster architecture, etcd clusters, including backup/restore procedures and troubleshooting cluster health issues
Infrastructure as Code — Experience with Terraform, Helm, or similar tools for automating infrastructure provisioning. Experience with Ansible, Kubespray, or similar orchestration frameworks
Observability — Familiarity with monitoring and logging systems like Prometheus, Grafana, or similar platforms
Strong understanding of containerization, networking, and cloud-native architectures
Experience with CI/CD pipelines and automation workflows
Ability to design systems that are both powerful and simple to use
Self-directed with good prioritization instincts — you can identify what matters and execute accordingly

Nice to Have

Experience building developer tools or infrastructure products
Background with service mesh technologies (Istio, Linkerd) or CNI plugins
Contributions to projects or Kubernetes ecosystem tools
Familiarity with cloud platforms (AWS, GCP, Azure) and their managed Kubernetes offerings
Experience with policy-as-code and security tooling for Kubernetes
Track record shipping infrastructure products that developers love
Understanding of FinOps and infrastructure cost optimization

What Makes You Successful

You stay calm under pressure and know when to escalate versus dig deeper yourself
You communicate clearly with both engineers and non-technical stakeholders, especially when things break
You're curious about root causes and genuinely interested in preventing problems before they happen
You work with low ego

APPLY NOW

We're looking for a Senior Infrastructure/SRE Engineer to join our on-call team for enterprise clients. You'll be the technical expert our clients rely on when things go wrong with extremely valuable clusters—diagnosing complex infrastructure issues, resolving production incidents, and ensuring zero downtime for critical AI workloads. This role requires deep technical knowledge, excellent troubleshooting skills, and the ability to stay calm under pressure.

What You'll Do

Respond to and resolve production incidents across our clients' infrastructure, including Kubernetes clusters, Ceph storage systems, and bare metal servers
Diagnose complex issues ranging from pod scheduling problems and CNI networking failures to distributed storage performance degradation and hardware issues
Handle escalations requiring deep expertise in distributed systems, including etcd cluster problems, Ceph RGW authentication issues, and custom networking setups with Cilium and bare metal load balancers
Work directly with enterprise clients during incidents, providing clear communication about status, timeline, and resolution steps
Participate in a follow-the-sun on-call rotation with engineers across time zones
Document incidents thoroughly and improve runbooks based on recurring patterns
Collaborate with our infrastructure team on long-term reliability improvements and automation to reduce incident frequency

Preferred Experience

5+ years of production experience with Kubernetes in enterprise environments, including deep knowledge of cluster operations, troubleshooting bare metal k8s issues, working with admission controllers, and understanding the control plane architecture
Strong experience with distributed storage systems—Ceph experience is highly preferred, but deep experience with other systems like Weka, VAST, or similar is highly valuable as well
Very solid modern Linux systems administration skills
Experience with bare metal infrastructure management and construction, not just cloud environments. You should be comfortable with IPMI, hardware troubleshooting, networking, etc.
Experience with at least one CNI plugin in production, preferably Cilium or Calico
Strong troubleshooting methodology—you know how to systematically narrow down issues across complex distributed systems and can work effectively under pressure during outages

Highly Valued Experience

Production experience with etcd clusters, including backup/restore procedures and troubleshooting cluster health issues
Experience with GPU infrastructure for AI/ML workloads, including the NVIDIA Kubernetes operator and GPU stack
Knowledge of infrastructure as code tools like Ansible, Kubespray, or similar orchestration frameworks
Experience with high-availability load balancers, service mesh technologies, or IPAM systems
Background working with AI inference or training companies and understanding their unique infrastructure requirements

What Makes You Successful

You're calm and methodical during incidents, with good judgment about when to escalate versus when to dig deeper yourself
You communicate clearly with both technical and non-technical audiences, especially during high-stakes situations
You have intellectual curiosity about root causes and want to prevent problems in advance

APPLY NOW

About Aranya

We're building clusterdOS to make production-grade Kubernetes accessible to anyone. Our GitOps-native distributed OS removes the operational complexity so developers and teams can run serious infrastructure without a dedicated platform engineering team.

At Aranya different perspectives aren'tjust welcome, they're essential. The best infrastructure comes from people with diverse experiences and ideas.

01.

Work on open-source infrastructure used in production

02.

Shape the future of Kubernetes and distributed systems

03.

In-person, async-friendly team

04.

Research-driven engineering culture

05.

Focus on autonomy, security, and simplicity

Build the operating system for clusters

Open Roles

Frontend Engineer — ClusterdOS

Platform Engineer

Senior Infrastructure/Site Reliability Engineer On-Call

About Aranya

Don't see your role but think you'd be a great fit?