DevRev

At DevRev, we’re building the future of work with Computer – your AI teammate.

Computer is not just another tool. It’s built on the belief that the future of work should be about genuine human connection and collaboration – not piling on more apps.
Computer is the best kind of teammate: it amplifies your strengths, takes repetition and frustration out of your day, and gives you more time and energy to do your best work.

How?

Easy: it’s the only platform capable of…

Complete data unification
Most AI products focus on either structured data (like CRM records and support tickets), or unstructured data (like documents and emails). Computer AirSync connects everything, unifying all your data sources (like Google Workspace, Jira, Notion) into one AI-ready source of truth: Computer Memory.

Powerful search, reasoning, and action
Once connected to all your tools and apps, Computer is embedded in your full business context. It can find and summarize, sure. Even more impressive: it offers employees insights, strategic and proactive suggestions, plus powerful agentic actions.

Extensions for your teams and customers
Computer doesn’t make you choose between new software and old. Its AI-native platform lets you extend existing tools with sophisticated apps and agents. So your teams – and your customers – can take action, seamlessly. These agents work alongside you: updating workflows, coordinating across teams, and syncing back to your systems.

This isn’t just software. Computer brings people back together, breaking down silos and ushering in the future of teamwork, through human-AI collaboration. Stop managing software. Stop wasting time. Start solving bigger problems, building better products, and making your customers happier.

We call this Team Intelligence. It’s why DevRev exists.

Trusted by global companies across multiple industries, DevRev is backed by Khosla Ventures and Mayfield, with $150M+ raised. We are 650+ people, across eight global offices.

What You’ll Do

  • Design and implement agent evaluation pipelines that benchmark AI capabilities across real-world enterprise use cases.
  • Build domain-specific benchmarks for product support, engineering ops, GTM insights, and other verticals relevant to modern SaaS
  • Develop performance benchmarks that measure and optimize for latency, safety, cost-efficiency, and user-perceived quality.
  • Create search- and retrieval-oriented benchmarks, including multilingual query handling, annotation-aware scoring, and context relevance.
  • Partner with AI and infra teams to instrument models and agents with detailed elemetry for outcome-based evaluation.
  • Drive human-in-the-loop and programmatic testing methodologies for fuzzy metrics like helpfulness, intent alignment, and resolution effectiveness.
  • Contribute to DevRev’s open evaluation tooling and benchmarking frameworks, shaping how the broader ecosystem thinks about SaaS AI performance.

What We’re Looking For

  • 3–7 years of experience in systems, infra, or performance engineering roles with strong ownership of metrics and benchmarking.
  • Fluency in Python and comfort working across full-stack and backend services.
  • Experience building or using LLMs, vector-based search, or agentic frameworks in production environments.
  • Familiarity with LLM model serving infrastructure (e.g., vLLM, Triton, Ray, or custom Kubernetes-based deployments), including observability, autoscaling, and token streaming
  • Experience working with model tuning workflows, including prompt engineering, fine-tuning (e.g., LoRA, DPO), or evaluation loops for post-training optimization
  • Deep appreciation for measuring what matters — whether it’s latency under load, degradation in retrieval precision, or regression in AI output quality
  • Familiarity with evaluation techniques in NLP, information retrieval, or human-centered AI (e.g. RAGAS, Recall@K, BLEU, etc.)
  • Strong product and user intuition — you care about what the benchmark represents, not just what it measures

Bonus: experience contributing to academic or open-source benchmarking projects

Why This Role Matters

  • Agents are not APIs — they reason, adapt, and learn. But with that power comes ambiguity in how we measure success. At DevRev, we believe the benchmarks of the past aren’t enough for the software of the future.
  • This role is your opportunity to design the KPIs of the AI-native enterprise — to bring rigor to systems that reason, and structure to software that thinks.
  • Join us to shape how intelligence is measured in SaaS 2.0

Culture

The foundation of DevRev is its culture -- our commitment to those who are hungry, humble, honest, and who act with heart. Our vision is to help build the earth’s most customer-centric companies. Our mission is to leverage design, data engineering, and machine intelligence to empower engineers to embrace their customers. 

That is DevRev! 

Location

Remote - Philippines

Remote Job

Job Overview
Job Posted:
3 weeks ago
Job Expires:
Job Type
Full Time

Share This Job: