AI Compiler Engineer
Blumind
About the role
About Blumind
Blumind is a deep-tech startup at the forefront of the AI revolution. We are building a new class of semiconductor: ultra-low-power analog AI processors designed for edge inferencing. Our technology enables complex AI, from "always-on" keyword spotting to advanced vision and language models, to run on a fraction of the power of traditional digital chips. We are on a mission to make high-performance, on-device AI ubiquitous, from wearables and mobile, to smart home devices, automotive and industrial IoT.
Role Overview
We are looking for an AI Compiler Engineer to help bridge modern machine learning frameworks with our novel analog hardware platform.
As an AI Compiler Engineer at Blumind, you will design and build the compiler stack that maps neural network models onto our analog compute architecture. You will work at the intersection of machine learning, compilers, and hardware, translating high-level models into efficient, hardware-aware execution.
Key Responsibilities
- Design and develop compiler infrastructure for Blumind’s analog AI chips
- Build lowering pipelines from ML frameworks (e.g., PyTorch, TensorFlow) to hardware-specific representations
- Leverage and extend MLIR to represent, optimize, and schedule computations
- Develop custom dialects, passes, and transformations for analog compute primitives
- Implement quantization, approximation, and noise-aware optimizations tailored to analog hardware
- Collaborate closely with hardware, firmware, and ML teams to co-design efficient execution strategies
- Optimize performance, power efficiency, and accuracy trade-offs across the stack
- Contribute to runtime integration and execution tooling
What We’re Looking For
Required
- 8+ years software development industry experience with a strong background in compiler design and implementation
- Experience with LLVM and/or MLIR (MLIR strongly preferred)
- Proficiency in C++ (and/or Rust)
- Solid understanding of machine learning fundamentals and neural network architectures
- Experience building or working with deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
- Familiarity with graph-level and tensor-level optimizations
Preferred
- Experience developing MLIR dialects and passes
- Background in hardware-aware compilation (e.g., GPUs, TPUs, NPUs, or custom accelerators)
- Understanding of quantization, mixed precision, and model compression techniques
- Exposure to analog or in-memory compute paradigms
- Experience with scheduling, tiling, and memory optimization strategies
- Familiarity with Python for ML tooling and prototyping
- Knowledge of compute-in-memory (CIM), analog/mixed-signal architectures, or neuromorphic compiler stacks.
Education
- Degree in Computer Science, Electrical Engineering, or a related technical or scientific field with a focus on Machine Learning preferred.
What Makes This Role Unique
- Work on first-of-its-kind analog AI compute systems
- Define the compiler layer for a completely new hardware paradigm
- High impact role shaping performance, efficiency, and developer experience
- Tight collaboration across silicon, systems, and AI teams
- Opportunity to contribute to open-source ecosystems (e.g., MLIR/LLVM)
Location
- Preferred locations are Toronto/Ottawa or Silicon Valley. Candidates willing to relocate to Canada will also be considered
We thank all applicants for their interest. Only candidates being considered for the role will be contacted.
Requirements
- 8+ years software development industry experience with a strong background in compiler design and implementation
- Experience with LLVM and/or MLIR (MLIR strongly preferred)
- Proficiency in C++ (and/or Rust)
- Solid understanding of machine learning fundamentals and neural network architectures
- Experience building or working with deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
- Familiarity with graph-level and tensor-level optimizations
Responsibilities
- Design and develop compiler infrastructure for Blumind’s analog AI chips
- Build lowering pipelines from ML frameworks (e.g., PyTorch, TensorFlow) to hardware-specific representations
- Leverage and extend MLIR to represent, optimize, and schedule computations
- Develop custom dialects, passes, and transformations for analog compute primitives
- Implement quantization, approximation, and noise-aware optimizations tailored to analog hardware
- Collaborate closely with hardware, firmware, and ML teams to co-design efficient execution strategies
- Optimize performance, power efficiency, and accuracy trade-offs across the stack
- Contribute to runtime integration and execution tooling
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free