Student Researcher [Seed Vision – Multimodal Joint Modeling] – 2026 Start (PhD)
3 Days Old
Overview Student Researcher [Seed Vision – Multimodal Joint Modeling] – 2026 Start (PhD) at ByteDance.
Responsibilities Conduct research on joint training of vision, language, and video models under a unified architecture.
Develop scalable and efficient methods for autoregressive-style multimodal pretraining, supporting both understanding and generation.
Explore cross-modal tokenization, alignment, and shared representation strategies.
Investigate instruction tuning, captioning, and open-ended generation capabilities across modalities.
Contribute to system-level improvements in data curation, model optimization, and evaluation pipelines.
Qualifications Minimum Qualifications:
Currently pursuing a PhD in Computer Vision, Machine Learning, NLP, or a related field.
Research experience in multimodal learning, large-scale pretraining, or vision-language modeling.
Proficiency in deep learning frameworks such as PyTorch or JAX.
Demonstrated ability to conduct independent research, with publications in top-tier conferences such as CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR.
Preferred Qualifications:
Experience with autoregressive LLM training, especially in multimodal or unified modeling settings.
Familiarity with instruction tuning, vision-language generation, or unified token space design.
Background in model scaling, efficient training, or data mixture strategies.
Ability to work closely with infrastructure teams to deploy large-scale training workflows.
About Doubao (Seed) Founded in 2023, the ByteDance Doubao (Seed) Team is dedicated to pioneering advanced AI foundation models. Our goal is to lead in cutting-edge research and drive technological and societal advancements. Our research areas span deep learning, reinforcement learning, Language, Vision, Audio, AI Infra and AI Safety. Our team has labs and research positions across China, Singapore, and the US.
Why Join ByteDance Inspiring creativity is at the core of ByteDance's mission. Our products help people express themselves, discover and connect. Our diverse teams make that possible. We create value for our communities and strive for meaningful breakthroughs for ourselves, our Company, and our users.
Diversity & Inclusion ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and perspectives. We celebrate diverse voices and aim to reflect the communities we reach.
Reasonable Accommodation ByteDance is committed to providing reasonable accommodations in our recruitment processes. If you need assistance, please reach out to us at https://tinyurl.com/RA-request
Job Information Compensation Description (Hourly) - Campus Intern. The hourly rate range for this position in the selected city is $65- $65. Benefits may vary by location. Interns have day one access to health insurance, life insurance, wellbeing benefits and more, with paid holidays and sick time. Housing allowance may be available for non-remote roles. The Company reserves the right to modify benefits programs at any time, with or without notice.
For Los Angeles County (unincorporated) Candidates: Qualified applicants with arrest or conviction records will be considered in accordance with applicable laws, including the Los Angeles County Fair Chance Ordinance and California Fair Chance Act. Potential duties may include client interactions, handling confidential information, and exercising sound judgment.
Seniority level Internship
Employment type Internship
Job function Research, Analyst, and Information Technology
Software Development
#J-18808-Ljbffr
- Location:
- San Jose, CA, United States
- Job Type:
- FullTime
- Category:
- Other