Jun Wang

Jun is a Senior Machine Learning Engineer at Salesforce Research, working on multimodal LLM. He earned his Ph.D. degree from University of Maryland, College Park, where he was honored to be co-advised by Prof. Larry S. Davis and Prof. Joseph F. JaJa. His research primarily focuses on multi-modal learning, object detection, and 3D scene understanding. Currently, he is on the job market.

Recently, Jun has been fortunate to work with Dr. Kishore Prahallad (Apple), Dr. Mingfei Gao and Dr. Ran Xu (Salesforce Research), and Dr. Siheng Chen (Mitsubishi Electric Research Labs). Prior to that, he obtained his M.S. degree in Electrical and Computer Engineering from University of Michigan, Ann Arbor in 2017 and B.S. degree from Beijing Institute of Technology, China in 2015.

Email: junwong [AT] terpmail [DOT] umd [DOT] edu

Google Scholar / Semantic Scholar / DBLP / LinkedIn / GitHub

News

[Jun. 2025][New] BLIP3-KALE won Best Paper at Synthetic Data for Computer Vision Workshop, CVPR 2025.
[Dec. 2024] ProVision is now released. Excited to contribute to our open-source instruction data generation pipeline to train multimodal LLM.
[Aug. 2024] xGen-MM (BLIP-3) model is officialy released. Thrilled to contribute to advancing open-source multimodal LLM. Don't miss our BLIP3-tailored datasets: BLIP3-OCR-200M and BLIP3-Grounding-50M—check them out!
[Aug. 2024] xGen-VideoSyn-1 is now available! Excited to contribute to our open-source video generation model. Stay tuned for the upcoming release!
[Feb. 2024] Start working as a senior machine learning engineer on GenAI at Salesforce Research, Palo Alto, CA.
[Dec 2023] Two patent applications for scene flow estimation in autonomous driving are filed.
[July 2023] Start working as a senior machine learning engineer on automated driving at Qualcomm, Novi, MI.
[Mar. 2023] Their work A2Summ, on multi-modal summarization is accepted by CVPR 2023. Hello Vancouver!
[Sept. 2022] Their work TAG, a generic text-aware question-answer generation approach for Text-related VQA is accepted by BMVC 2022.
[Aug. 2022] Their work NAPL, a novel prototype learning paradigm for 3D LiDAR point cloud semantic segmentation will be presented at Computer Vision for Metaverse Workshop, ECCV 2022.
[June 2022] The work ESSumm, an unsupervised speech summarization framework employing Wav2Vec, is accepted by INTERSPEECH 2022. Annyeong haseyo, Incheon.
[Apr. 2022] A patent application for motion prediction in autonomous driving is filed.
[Apr. 2022] Their paper "PointMotionNet: Point-Wise Motion Learning for Large-Scale LiDAR Point Clouds Sequences" will be presented at Workshop on Autonomous Driving (WAD), CVPR 2022. Where y'at, New Orleans.
[Jan. 2022] The code for M3DETR is released. He gave a presentation of M3DETR at WACV 2022, Waikoloa, Hawaii.
[Nov. 2021] Their work PointMotionNet, a framework of 3D motion learning on LiDAR point clouds, achieves the 4th place out of 85 teams in the leaderboard of SemanticKITTI Multiscan Semantic Segmentation.
[Aug. 2021] Pass his Ph.D. research proposal examination and advance to candidacy.
[July 2021] Their Paper "M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers" is accepted by WACV 2022 in the First Round. Aloha, Hawaii.
[May 2021] Start his machine learning internship with Dr. Kishore Prahallad at Apple, Cupertino, CA.
[Feb. 2021] Start his research internship with Dr. Mingfei Gao and Dr. Ran Xu at Salesforce Research, Palo Alto, CA.
[Nov. 2020] A manuscript on 3D motion learning in LiDAR point clouds is under review. Fingers Crossed.
[Sept. 2020] Start his research internship with Prof. Siheng Chen (now Shanghai Jiao Tong University) at Mitsubishi Electric Research Labs, Cambridge, MA.
[July 2020] Their Paper "InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling" is accepted by ECCV 2020.

Selected Publications

	xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles, Caiming Xiong, Ran Xu arXiv, 2024 arXiv / code / dataset / VentureBeat coverage Open-sourced multimodal Large Language Models (MLLM).
	ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models Jieyu Zhang, Le Xue, Linxin Song, Jun Wang, Weikai Huang, Manli Shu, An Yan, Zixian Ma, Juan Carlos Niebles, Silvio Savarese, Caiming Xiong, Zeyuan Chen, Ranjay Krishna, Ran Xu arXiv, 2024 arXiv / code / dataset / VentureBeat coverage A scalable system generating 10M+ vision-centric instructions, improving multimodal benchmark by 8%.
	xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, Ran Xu, Caiming Xiong arXiv, 2024 arXiv / code A T2V model leveraging VideoVAE compression and Diffusion Transformer.
	Align and Attend: Multimodal Summarization with Dual Contrastive Losses Bo He, Jun Wang, Jielin Qiu, Trung Bui, Abhinav Shrivastava, Zhaowen Wang CVPR, 2023 arXiv / code / project / bibtex Multimodal summarization that summarizes video frames and text sentences with time correspondence.
	TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, Ran Xu, Joseph F. JaJa, Larry S. Davis BMVC, 2022 arXiv / code / poster / bibtex The first generic text-aware question-answer generation approach for Text-related VQA.
	ESSumm: Extractive Speech Summarization from Untranscribed Meeting Jun Wang INTERSPEECH, 2022 arXiv / code / slides / bibtex The first automatic speech summarization system with Wav2vec 2.0.
	PointMotionNet: Point-Wise Motion Learning for Large-Scale LiDAR Point Clouds Sequences Jun Wang, Xiaolong Li, Alan Sullivan, Lynn Abbott, Siheng Chen * denotes equal contribution. WAD, CVPR, 2022 arXiv / bibtex 3D motion learning with a novel point-based spatiotemporal convolution operation module.
	M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers Jun Wang, Tianrui Guan, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry S. Davis, Dinesh Manocha * denotes equal contribution. WACV, 2022 arXiv / code / slides / bibtex The multi-representation, multi-scale, mutual-relation 3D object detector with transformers.
	InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis * denotes equal contribution. ECCV, 2020 arXiv / slides / bibtex 3D Object Detection with the effective dynamic attention module.

Misc Projects

RTL Design of R10K Out-of-Order 2-way Superscalar Processor with Simultaneous Multithreading
Ruobai Feng, Wang Cao, Jun Wang, Yujun Yan, Jiapeng Zhao
EECS 470 Computer Architecture, 2016

The two-way superscalar SMT processor design based on MIPS R10K out- of-order execution architecture.

Design and Layout of a 16-bit RISC Pipelined Processor
Farzad Asgarian, Harsha Chawla, Isaac Jarman, Cody Piekarz, Jun Wang
EECS 427 VLSI Design I, 2015

The baseline processor design with a customized kogge-stone adder based on a 16-bit RISC architecture using IBM’s 130nm CMOS process.

Academic Services

Program Committee: AAAI'25
Reviewer: CVPR, ICCV, ECCV, ICML, NeurIPS, ICLR, AAAI, BMVC, WACV, ACM MM
Student Volunteer: INTERSPEECH'22

Awards

[2023] Outstanding Overseas Student Scholarship - Government of China
[2022] International Conference Student Support Award - University of Maryland
[2022] CVPR Travel Grant
[2022] Jacob K. Goldhaber Travel Grant - University of Maryland
[2019] Teaching Assistant Training and Development (TATD) Fellow - University of Maryland
[2015] Outstanding Graduates - Beijing Institute of Technology
[2014] Honorable Mention - Mathematical Contest in Modeling

Experience