Staff Software Engineer

Refinitiv

Canada · On-site Full-time Lead 1mo ago

About the role

Overview of the Role

Advanced Content Engineering (ACE) is seeking a Staff Software Engineer to lead the design and delivery of the search platform’s control-plane API and cloud infrastructure. The platform’s core promise is self-service: internal client teams must be able to create a search system, configure an ingestion topology, promote a new index to production, and monitor system health — entirely through APIs — without requiring direct involvement from the platform team. Building, operating, and continuously improving that self-service experience is the heart of this role. This is a high-ownership, high-leverage position at the intersection of platform engineering, API design, and cloud infrastructure. Staff Engineers on this team define, build, test, deploy, scale, and operate what they ship — full-stack ownership is the baseline, not a bonus. Delivery friction is treated as an urgent engineering problem: the team ships to production constantly, AI-assisted development is the norm, and removing obstacles to fast, safe delivery is everyone’s responsibility. The successful candidate brings enterprise-grade security instincts, deep AWS expertise, and a product-minded approach to developer experience — treating the platform’s API as a product in its own right.

About the Role

In this position, you will focus on:

Platform Control-Plane API

Plan, design, develop, and own the platform’s management API — the self-service interface through which client teams create and configure search systems, manage ingestion topologies, register reusable components, promote index versions, and monitor system health — resolving problems of diverse scope with innovative thinking and little or no precedent to guide solutions
Architect the platform’s multi-tenant access model: implement strict data isolation between client tenants, integrate with enterprise identity providers, establish role-based access control across all API endpoints, and define the governance framework that ensures the platform can make credible security commitments to enterprise customers
Establish API strategy and cross-system integration patterns — designing versioned, backward-compatible interfaces with clear contracts, comprehensive documentation, and developer-experience patterns drawn from best-in-class search platform providers — and set governance standards that the team follows for all future API surface
Design and expose the API surface required to support the platform’s evaluation and experimentation workflows — including endpoints that enable the search grading tool to consume experiment run outputs, query/result pairs, and relevance judgments, and that allow client teams to configure and trigger A/B search experiments through self-service interfaces
Design the configuration data model and persistence layer (DynamoDB and related services) that stores search system definitions, component registry entries, index lifecycle state, and audit logs — applying architectural patterns that scale to the platform’s multi-tenant and multi-region ambitions
Break down complex business requirements into functional and technical requirements with consideration for security, ethical AI implementation, and operational efficiency; contribute to recommendations where technology transformation can spark business growth

Cloud Infrastructure & DevOps

Own the platform’s AWS infrastructure as code — defining, provisioning, and maintaining ECS services, MSK clusters, OpenSearch/Vespa deployments, DynamoDB tables, networking (VPC, security groups, NAT), and IAM roles using Terraform or AWS CDK — establishing infrastructure governance standards and a cloud strategy for multi-environment and eventual multi-region operation
Design and own the CI/CD pipeline for platform services — establishing DevOps culture and toolchain strategy for the team, with a clear mandate to eliminate delivery friction: the team ships to production constantly, and any obstacle to doing so safely is an engineering problem to be solved, not a process to be accepted
Drive adoption of AI-assisted development practices across the team’s infrastructure and API work — establishing the tooling, patterns, and norms that enable engineers to leverage AI to move faster while maintaining the quality and reliability bar the platform demands
Own infrastructure cost management: monitor AWS spend across platform components, evaluate architectural trade-offs at the system level, and implement an enterprise performance and optimization framework that keeps the platform’s economics sustainable as it scales — including compute cost governance for inference workloads as custom model serving is introduced
Implement and operate customer-controlled encryption key (CMK) support — applying security strategy, risk assessment frameworks, and security governance to give enterprise clients control over their encryption keys while preserving multi-tenant reliability

Reliability Engineering

Define and own platform-level SLOs covering API availability, query latency, ingestion throughput, and end-to-end document freshness — and build the monitoring infrastructure (CloudWatch, distributed tracing, alerting) that makes SLO compliance continuously visible to the team and to client teams
Design the observability infrastructure for agentic retrieval paths — where standard request/response logging is insufficient: implement trace-level instrumentation that captures tool invocation sequences, per-hop latency, and retrieval inputs, enabling reliable diagnosis of failures and quality regressions in non-deterministic agent workflows
Take full operational responsibility for platform API and infrastructure — you built it, you own it, you run it: triage and resolve incidents, write thorough post-mortems, and drive systematic improvements that prevent recurrence
Design enterprise performance strategy for the platform’s API layer: load testing, capacity planning, performance profiling, and system-level optimization — ensuring the platform can handle planned growth in tenants, content volumes, and query traffic
Embed security architecture throughout the platform’s infrastructure: least-privilege IAM, secrets management, encryption at rest and in transit, audit logging, and compliance implementation aligned with TR’s enterprise security requirements

Technical Leadership

Establish architectural principles and cross-system design patterns for the platform’s control plane and infrastructure — functioning as the technical authority that other engineers and teams turn to for API and infrastructure guidance
Lead significant projects and business initiatives that span multiple engineers and interact with partner teams; determine work priorities and make adjustments to short-term priorities while maintaining strategic focus; provide specialist advice to senior management on complex infrastructure and security issues
Mentor and develop Senior and mid-level engineers — providing coaching, technical direction, and educational opportunities in cloud infrastructure, platform API design, reliability engineering, and AI-assisted development practices
Engage with client teams as a technical partner — understanding their integration experience and pain points, feeding structured requirements back into the platform API roadmap, and proactively reducing time-to-value for new platform adopters
Deliver effective presentations on complex infrastructure and security concepts to technical and non-technical stakeholders; champion ethical AI practices and responsible technology deployment across the team’s work

About You

You’re an ideal fit if you have:

Required Experience

Bachelor’s or Master’s degree

Skills

AWS CDKDynamoDBECSIAMMSKOpenSearchTerraformVespa

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free