About me
Glad to see you here. My name is Xinmiao (Mia) Yu. Currently, I’m a double-degree Master student at Politecnico di Milano and Harbin Institute of Technology, supervised by Prof.Bing Qin and Prof.Xiaocheng Feng. I have collaborated with Huawei on vision-language models under the supervision of Duyu Tang. I worked as a research intern at MILA, where I was supervised by (Meng Qu)[https://mengqu.github.io/] on the GraphAgent project. Additionally, I work closely with Ishika Singh and Wang Zhu at the USC GLAMOR Lab. My current research interest focuses on developing advanced multimodal models and LLM agents capable of enhancing their intelligence through interaction with the real world. More specificly,
- Vision-language comprehension
- Enhancing vision-language models through advanced methods like reinforcement learning, cross-lingual data alignment, and mutual information maximization.
- Exploring challenges in text-rich visual comprehension, such as improving OCR capabilities, addressing cross-lingual disparities, and activating visual information effectively across languages.
- Generative agents
- Designed agent-based approach for learning on text-attributed graphs using large language models (LLMs), which explores graph structures and text features, improving performance and interpretability through state-action planning.
- Investigating how LLM-powered agents can improve task planning by enabling robots to create and refine 3D scene maps. Using techniques like Chain-of-Thought reasoning, these agents aim to enhance task precision, efficiency, and autonomy in real-world scenarios.
Education
- M.S. in Computer Science and Engineering, Politecnico di Milano, 2024.09-2025.09
- M.S. in Computer Science and Technology, Harbin Institute of Technology, 2023.09-2025.09
- Visiting student, University of California Los Angeles, 2023.03-2023.06
- B.S. in School of Computer Science(CS), Harbin Institute of Technology, 2019.09-2023.06
Publications
- Xinmiao Yu, Xiaocheng Feng, Yun Li, Minghui Liao, Ya-Qi Yu, Xiachong Feng, Weihong Zhong, Rui- han Chen, Mengkang Hu, Jihao Wu, Duyu Tang, Dandan Tu, Bing Qin. Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective. submitted to AAAI 2024
- Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo. Tree-Planner: Efficient Close-loop Task Planning with Large Language Models arxiv ICLR 2024
- Xinmiao Yu, Meng Qu, Xiaocheng Feng, Bing Qin. GraphAgent: Exploiting Large Language Models for Interpretable Learning on Text-attributed Graphs. arxiv
- Zhangyin Feng, Yuchen Ren, Xinmiao Yu, Xiaocheng Feng, Duyu Tang, Shuming Shi, Bing Qin. Improved Visual Story Generation with Adaptive Context Modeling. ACL 2023
- Xuehui Yu, Jingchi Jiang, Xinmiao Yu, Yi Guan*,Xue Li. Causal Coupled Mechanisms: A Control Method with Cooperation and Competition for Complex System The BIBM 2022
Honors and awards
- First-Class Graduate Scholarship(Top 20%) 2024
- Deloitte Digital Elite Challenge Runner-up(Top 0.2%) 2024
- Graduate Entrance Outstanding Scholarship(Top 20%) 2023
- Outstanding graduate award (Top 1%) 2023
- 1st Prize, HUAWEI Cup(The only), on National Undergraduate IoT Design Contest 2022
- National Scholarship (Top 1%) 2022
- People’s Scholarship 2020,2021,2022
- Excellent Student Cadre (Top 5%) 2020,2021,2022
More about me
I’m fasinated with reading, traveling, and skiing. I also love spending my free time cooking (it kind of feels like doing fun experiments in the kitchen). Here are some snapshots from my life—hope you enjoy!
Coding Skills
- Programming Languages: C Python Java
- Platform: Pytorch, Tensorflow, SpringBoot, JavaScript