Our client, an international AI development company based in New York, is currently seeking a LLM Scientist (Multimodal Models & Reinforcement Learning ) to lead the development of multimodal foundation models and reinforcement learning pipelines.
This role will focus on expanding AI capabilities through vision-language integration, reward modeling, and optimization techniques such as RLHF, RLAIF, and Direct Preference Optimization (DPO). Experience in model distillation is also key to optimizing performance and scalability.
Key Responsibilities
Multimodal Model Development:
Research and build multimodal foundation models (e.G., vision-language, audio-text)
Implement and evaluate architectures using tools like CLIP, Flamingo, or GPT-4V
Reinforcement Learning & Preference Optimization:
Design and implement RLHF (Reinforcement Learning with Human Feedback), RLAIF (AI Feedback), and DPO pipelines
Fine-tune reward models and train policies that align with human or enterprise preferences
Model Compression & Evaluation:
Apply model distillation techniques to reduce model size while preserving performance
Develop benchmarks and metrics for both multimodal and RL-based systems
Governance & Alignment:
Contribute to alignment research, interpretability techniques, and bias mitigation strategies
Ensure ethical and responsible development of deployed AI systems
Qualifications & Skills
Strong experience with multimodal models (CLIP, BLIP, Flamingo, etc.)
Hands-on expertise in RLHF, RLAIF, and Direct Preference Optimization (DPO)
Proven track record in model distillation and efficiency optimization
Proficiency in Python, PyTorch, Hugging Face, and distributed training environments
Passionate about advancing the frontier of AI safety and alignment
Strong communication skills, especially for cross-functional collaboration
Self-starter with a research-driven, detail-oriented mindset
Very strong English communication skills, both written and verbal (essential for global collaboration)