Experience

HUAWEI Shanghai, China Oct. 2023 – Aug. 2024 Research Intern Supervisors: Duyu Tang and Prof. Xiaocheng Feng

Developed data construction and alignment training methods for visual language models, incorporating reinforcement learning techniques to enhance OCR and complex reasoning capabilities on text-rich images.
ConstructedXT-VQA,across-lingualtext-richvisualQAbenchmark,revealingperformancegapsinLVLMs due to insufficient visual information activation across languages.
DevelopedMVCL-MI,apolicy-inspiredapproachmaximizingvision-languagemutualinformation,reduc- ing cross-lingual disparities through knowledge distillation from monolingual to multilingual contexts.

Mila-Quebec AI institute Quebec, Canada June. 2023 – Oct. 2023 Research Intern Supervisor: PhD.Meng Qu

Developed GraphAgent, a novel approach reframing text-attributed graph learning as an agent planning problem, leveraging Large Language Models to explore both structural and textual features in graphs.
Implemented a policy-driven framework where the LLM-parameterized agent takes actions tailored for text-attributed graphs, achieving improved performance and interpretability through a process analogous to state-action planning in reinforcement learning.

University of Southern California GLAMOR lab Los Angeles, U.S. May. 2023 – Present Research Intern Supervisors: Prof. Jesse Thomason and PhD. Ishika Singh

SceneGraphConstruction:Developedanautomatedscenegraphpipelinethatenabledrobotstoautonomously construct and refine 3D maps from interactions, enhancing scene accuracy and robot autonomy.
Task Planning Innovation: Integrated a Chain-of-Thought methodology in task planning, significantly im- proving planning precision and efficiency.

HIT SCIR Lab Harbin, China Jun. 2022 – Jan. 2023 Master candidate Supervisors: Prof. Xiaocheng Feng and PhD. Zhangyin Feng.

Developed an improved visual story generation model with adaptive context modeling, addressing the limitation of treating historical images equally by implementing a more nuanced approach to historical context.
Implemented a novel guidance mechanism in the sampling stage, enhancing global consistency and achieving SOTA FID scores on story visualization and continuation tasks on PororoSV and FlintstonesSV datasets.

Xinmiao (Mia) Yu

Experience