R&D Researcher at Sansan, Inc.

Research Interest

Vision Language Model, Information Extraction, Low-Resource Languages, Linguistics

Work Experience

Apr 2024 - Present Researcher at Sansan, Inc.
Aug 2023 - Mar 2024 Member of TokyoTech-LLM
Dec 2021 - Mar 2024 Research Assistant at Tokyo Institute of Technology
Jan 2023 - Dec 2023 Chair of IT Committee at Cambodian Students’ Association in Japan
Nov 2020 - Aug 2023 Part-time Technical Staff at Novitas, Inc.
Oct 2022 - Dec 2022 Teaching Assistant at Tokyo Institute of Technology

Education

Apr 2022 - Mar 2024 Master of Engineering in Artificial Intelligence, Tokyo Institute of Technology (Okazaki Laboratory)
Apr 2020 - Mar 2022 Bachelor of Engineering in Computer Science, Tokyo Institute of Technology
Apr 2017 - Mar 2020 Associate of Engineering in Information Engineering, National Institute of Technology, Kagoshima College

Publications

Journal

Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, and Naoaki Okazaki. Likelihood-based Mitigation of Evaluation Bias in Large Language Models. Journal of Natural Language Processing, 2025, 2025 Volume 32 Issue 2 Pages 480-496. [paper]
Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki, ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization. Journal of Natural Language Processing, 2023, Volume 30, Issue 2, Pages 489-506. [paper]

(Refereed) International Conference and Workshop Papers

Naoaki Okazaki, Kakeru Hattori, Hirai Shota, Hiroki Iida, Masanari Ohi, Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Rio Yokota, and Sakae Mizuki. Building a Large Japanese Web Corpus for Large Language Models. In Proceedings of the First Conference on Language Modeling (COLM), University of Pennsylvania, USA, October 2024.
Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, and Naoaki Okazaki. Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities. In Proceedings of the First Conference on Language Modeling (COLM), University of Pennsylvania, USA, October 2024.
Mengsay Loem, Masahiro Kaneko, and Naoaki Okazaki. 2024. SAIE Framework: Support Alone Isn’t Enough - Advancing LLM Training with Adversarial Remarks. In Proceedings of the 27th European Conference on Artificial Intelligence (ECAI-2024), pp (to appear), Santiago de Compostela, Spain. [arXiv]
Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki. Likelihood-based Mitigation of Evaluation Bias in Large Language Models. The 62nd Annual Meeting of the Association for Computational Linguistics (Findings: ACL). [arXiv]
Mengsay Loem, Masahiro Kaneko, Sho Takase, and Naoaki Okazaki. 2023. Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 205–219, Toronto, Canada. Association for Computational Linguistics. [paper]
Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki. ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (NAACL:SRW), Seattle, July 2022. [paper]
Mengsay Loem, David Taingngin, Chan Oeurn Chey, Meak Kamerane. A Design of Low Cost and High-Performance Speed Detector Combining Digital Camera and Photo-resistive Sensor to Contribute to Traffic Accidents Reduction in Cambodia. Youth Innovation for Sustainability, SEAMEO Recsam, 2016.

Pre-print

Ahmed Sabir, Azinovič Gasper, Mengsay Loem, Rajesh Sharma. Contrasting Cognitive Styles in Vision-Language Models: Holistic Attention in Japanese Versus Analytical Focus in English. arXiv, 2025. [paper]
Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki. Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention. arXiv, 2022. [paper]

Domestic Conference and Symposium (in Japanese)

佐藤勇元, Mengsay Loem. 因果追跡に基づくVLMのモジュール重要度推定とLoRA適用先選択の検証. 言語処理学会第32回年次大会 (NLP2026), pp. 1531-1536, 2026年3月. [paper]
Mengsay Loem, 橋本航. いつ考え，いつ即答するか。文書理解視覚言語モデルにおける推論ルーティングの評価. 言語処理学会第32回年次大会 (NLP2026), pp. 1226-1231, 2026年3月. [paper]
Mengsay Loem, 保坂大樹. 視覚的質問応答による文書情報抽出における同時多項目推論. 言語処理学会第31回年次大会 (NLP2025), pp. 4227-4231, 2025年3月. [paper]
Mengsay Loem, 金子正弘, 岡崎直観. 敵対的発言を取り入れた議論による言語モデルの学習強化と推論力の向上. 言語処理学会第30回年次大会 (NLP2024), pp. 2750–2755, 2024年3月. [paper]
大井聖也, 金子正弘, 小池隆斗, Mengsay Loem, 岡崎直観. 大規模言語モデルにおける評価バイアスの尤度に基づく緩和. 言語処理学会第30回年次大会 (NLP2024), pp. 3021–3026, 2024年3月. [paper]
岡崎直観, 服部翔, 平井翔太, 飯田大貴, 大井聖也, 藤井一喜, 中村泰士, Mengsay Loem, 横田理央, 水木栄. Swallowコーパス: 日本語大規模ウェブコーパス. 言語処理学会第30回年次大会 (NLP2024), pp. 1498–1503, 2024年3月. [paper]
水木栄, 飯田大貴, 藤井一喜, 中村泰士, Mengsay Loem, 大井聖也, 服部翔, 平井翔太, 横田理央, 岡崎直観. 大規模言語モデルの日本語能力の効率的な強化: 継続事前学習における語彙拡張と対訳コーパスの活用. 言語処理学会第30回年次大会 (NLP2024), pp. 1514–1519, 2024年3月. [paper]
藤井一喜, 中村泰士, Mengsay Loem, 飯田大貴, 大井聖也, 服部翔, 平井翔太, 水木栄, 横田理央, 岡崎直観. 継続事前学習による日本語に強い大規模言語モデルの構築. 言語処理学会第30回年次大会 (NLP2024), pp. 2102–2107, 2024年3月. [paper]
Mengsay Loem, 高瀬翔, 金子正弘, 岡崎直観. マルチヘッドニューラルN-gramによる自己注意機構の代替. 言語処理学会第29回年次大会 (NLP2023), A9-1, pp. 2094–2099, 2023年3月. [paper]
Mengsay Loem, 高瀬翔, 岡崎直観. Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention. 第17回NLP若手の会シンポジウム, P5-07, 2022年8月.
Mengsay Loem, 高瀬翔, 金子正弘, 岡崎直観. 抽出型要約と言い換えによる生成型要約の訓練データ拡張. 言語処理学会第28回年次大会 (NLP2022), pp. 1996–2001, 2022年3月. [paper]

Talks

Enhancing LLMs with Interactive Feedback: Advancing Learning and Reasoning. Tokyo AI - Advanced AI #3, August 2024. [slide]
A Journey of Generative Model. Invited Talk at AI Webinar by Forum for Pushing the Boundary, July 2024. [note]
Showcasing Experiences in a Master’s Education in Computer Science. Panel Discussion at TechTalk Throwdown by Cambodian Student Association in the United States of America. January 2024. [note]
Beyond the Symbols: A 30-minute Overview of NLP. Invited Talk at Technical Seminar by Cambodian MEXT Scholar IT Team. June 2023. [slide]
Deep Learning based Natural Language Processing. Invited Talk at Natural Language Processing Webinar by Forum for Pushing the Boundary, December 2021. [slide]

Awards and Participations

Best Journal Paper Award ([Paper]) in The 32nd Annual Meeting of the Association for Natural Language Processing (Japan, 2026)
Language Resources Award at the 31st Annual Meeting of the Association of Natural Language Processing (NLP2025), Swallow LLM (Japan, 2025)
Young Researcher’s Encouragement Award ([Paper]) in The 30th Annual Meeting of the Association for Natural Language Processing (Japan, 2024)
Best Paper Awards ([Paper-1], [Paper-2]) in The 30th Annual Meeting of the Association for Natural Language Processing (Japan, 2024)
Sato Yo International Scholarship (Apr 2022 - Mar 2024)
Japanese Government (MEXT) Scholarship (Apr 2020 - Mar 2022)
IEICE Kyushu Section Award for Academic Excellence (Japan, 2020)
President’s Award for Best Graduation Project, Kagoshima College of Technology (Japan, 2020)
Japanese Government (MEXT) Scholarship: College of Technology (Apr 2016 - Mar 2020)
Outstanding Award for the presentation of Exhibit in The 10th Regional Congress Search for SEAMEO Young Scientists (Malaysia, 2016)
56th International Mathematical Olympiad Participation (Thailand, 2015)
Gold Medal in Cambodian National Mathematical Olympiad (Cambodia, 2015)