Palo Alto, CA
FullTime
On-site
Role Overview:
As a Machine Learning Engineer, you will play a central role in translating cutting-edge machine learning research into scalable, production-ready solutions. You will collaborate closely with cross-functional teams to identify opportunities where ML can drive product value, architect robust model-centric systems, and ensure their seamless integration into real-world applications. The role requires a strong balance between theoretical understanding and engineering execution, with a focus on building reliable, maintainable, and high-impact AI-driven features that align with Nace.AI’s strategic objectives.
Key Responsibilities:
Design, build, and maintain end-to-end ML systems, including synthetic data pipelines, model training, debugging, and performance evaluation.
Fine-tune large language models (LLMs) and implement meta-learning methods to enhance model generalization and efficiency.
Improve existing Nace.AI models by incorporating advancements from recent ML research.
Qualifications:
Hands-on experience training and fine-tuning large language models (LLMs) and vision-language models (VLMs), including practical work with pre-training, instruction tuning, and alignment techniques (GRPO,RLHF/DPO/PPO).
Hands-on Experience with Deep Learning Models, especially Transformers.
Ability to translate cutting-edge research from papers into clean, production-ready code (Paper to Code).
Proven experience scaling inference infrastructure for LLMs/VLMs, including expertise in model serving frameworks like vLLM, TGI.
Proficient in Python with a strong track record of building substantial projects.
Solid foundation in computer science fundamentals (data structures, algorithms, design patterns).
BS degree in CS or related technical field.
Solid Experience with ML frameworks and libraries (PyTorch, TensorFlow).
Self-starter comfortable working in a fast-paced, dynamic environment.
Preferred Qualifications:
MS/PhD in CS or related technical field.
Familiarity with data processing stacks such as Spark and Airflow.
Experience with multi-node GPU training.
Contributor to open-source ML projects.
Deep knowledge in Linear Programming.
Experience with advanced NLP and Multimodal post-training experience (e.g., model distillation, quantization, deployment optimization).
Experienced in inference time optimization, deep understanding of LLM serving optimizations for LLMs/VLMs.
Hands on experience with quantization techniques (AWQ, GPTQ, FP8/GGUF).