R&D Researcher at Sansan, Inc.
Research Interest
Large Language Model, Low-resource Languages, Educational Application
Education
- Apr 2022 - Mar 2024 Master of Engineering in Artificial Intelligence, Tokyo Institute of Technology (Okazaki Laboratory)
- Apr 2020 - Mar 2022 Bachelor of Engineering in Computer Science, Tokyo Institute of Technology
- Apr 2017 - Mar 2020 Associate of Engineering in Information Engineering, National Institute of Technology, Kagoshima College
Work Experience
- Apr 2024 - Present Researcher at Sansan, Inc.
- Aug 2023 - Mar 2024 Member of TokyoTech-LLM
- Dec 2021 - Mar 2024 Research Assistant at Tokyo Institute of Technology
- Jan 2023 - Dec 2023 Chair of IT Committee at Cambodian Students’ Association in Japan
- Nov 2020 - Aug 2023 Part-time Technical Staff at Novitas, Inc.
- Oct 2022 - Dec 2022 Teaching Assistant at Tokyo Institute of Technology
Publications
Journal
- Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki, ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization, Journal of Natural Language Processing, 2023, Volume 30, Issue 2, Pages 489-506. [paper]
(Refereed) International Conference and Workshop Papers
- Naoaki Okazaki, Kakeru Hattori, Hirai Shota, Hiroki Iida, Masanari Ohi, Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Rio Yokota, and Sakae Mizuki. Building a Large Japanese Web Corpus for Large Language Models. In Proceedings of the First Conference on Language Modeling (COLM), pages (to appear), University of Pennsylvania, USA, October 2024.
- Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, and Naoaki Okazaki. Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities. In Proceedings of the First Conference on Language Modeling (COLM), pages (to appear), University of Pennsylvania, USA, October 2024.
- Mengsay Loem, Masahiro Kaneko, and Naoaki Okazaki. 2024. SAIE Framework: Support Alone Isn’t Enough - Advancing LLM Training with Adversarial Remarks. In Proceedings of the 27th European Conference on Artificial Intelligence (ECAI-2024), pp (to appear), Santiago de Compostela, Spain. [arXiv]
- Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki. Likelihood-based Mitigation of Evaluation Bias in Large Language Models. The 62nd Annual Meeting of the Association for Computational Linguistics (Findings: ACL). [arXiv]
- Mengsay Loem, Masahiro Kaneko, Sho Takase, and Naoaki Okazaki. 2023. Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 205–219, Toronto, Canada. Association for Computational Linguistics. [paper]
- Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki. ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (NAACL:SRW), Seattle, July 2022. [paper]
- Mengsay Loem, David Taingngin, Chan Oeurn Chey, Meak Kamerane. A Design of Low Cost and High-Performance Speed Detector Combining Digital Camera and Photo-resistive Sensor to Contribute to Traffic Accidents Reduction in Cambodia. Youth Innovation for Sustainability, SEAMEO Recsam, 2016.
Pre-print
- Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki. Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention. arXiv, 2022. [paper]
Domestic Conference and Symposium (in Japanese)
- Mengsay Loem, 金子 正弘, 岡崎 直観. 敵対的発言を取り入れた議論による言語モデルの学習強化と推論力の向上. 言語処理学会第30回年次大会 (NLP2024), B10-6, pp. 2750–2755, 2024年3月. [paper]
- 大井 聖也, 金子 正弘, 小池 隆斗, Mengsay Loem, 岡崎 直観. 大規模言語モデルにおける評価バイアスの尤度に基づく緩和. 言語処理学会第30回年次大会 (NLP2024), A11-4, pp. 3021–3026, 2024年3月. [paper]
- 岡崎 直観, 服部 翔, 平井 翔太, 飯田 大貴, 大井 聖也, 藤井 一喜, 中村 泰士, Mengsay Loem, 横田 理央, 水木 栄. Swallowコーパス: 日本語大規模ウェブコーパス. 言語処理学会第30回年次大会 (NLP2024), A6-1, pp. 1498–1503, 2024年3月. [paper]
- 水木 栄, 飯田 大貴, 藤井 一喜, 中村 泰士, Mengsay Loem, 大井 聖也, 服部 翔, 平井 翔太, 横田 理央, 岡崎 直観. 大規模言語モデルの日本語能力の効率的な強化: 継続事前学習における語彙拡張と対訳コーパスの活用. 言語処理学会第30回年次大会 (NLP2024), A6-4, pp. 1514–1519, 2024年3月. [paper]
- 藤井 一喜, 中村 泰士, Mengsay Loem, 飯田 大貴, 大井 聖也, 服部 翔, 平井 翔太, 水木 栄, 横田 理央, 岡崎 直観. 継続事前学習による日本語に強い大規模言語モデルの構築. 言語処理学会第30回年次大会 (NLP2024), A8-5, pp. 2102–2107, 2024年3月. [paper]
- Mengsay Loem, 高瀬 翔, 金子 正弘, 岡崎 直観. マルチヘッドニューラルN-gramによる自己注意機構の代替. 言語処理学会第29回年次大会 (NLP2023), A9-1, pp. 2094–2099, 2023年3月. [paper]
- Mengsay Loem, 高瀬 翔, 岡崎 直観. Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention. 第17回NLP若手の会シンポジウム, P5-07, 2022年8月.
- Mengsay Loem, 高瀬 翔, 金子 正弘, 岡崎 直観. 抽出型要約と言い換えによる生成型要約の訓練データ拡張. 言語処理学会第28回年次大会 (NLP2022), pp. 1996–2001, 2022年3月. [paper]
Talks
- Enhancing LLMs with Interactive Feedback: Advancing Learning and Reasoning. Tokyo AI - Advanced AI #3, August 2024. [slide]
- A Journey of Generative Model. Invited Talk at AI Webinar by Forum for Pushing the Boundary, July 2024. [note]
- Showcasing Experiences in a Master’s Education in Computer Science. Panel Discussion at TechTalk Throwdown by Cambodian Student Association in the United States of America. January 2024. [note]
- Beyond the Symbols: A 30-minute Overview of NLP. Invited Talk at Technical Seminar by Cambodian MEXT Scholar IT Team. June 2023. [slide]
- Deep Learning based Natural Language Processing. Invited Talk at Natural Language Processing Webinar by Forum for Pushing the Boundary, December 2021. [slide]
Awards and Participations
- Young Researcher’s Encouragement Award (Paper: [1]) in The 30th Annual Meeting of the Association for Natural Language Processing (Japan, 2024)
- Best Paper Award (Papers: [1], [2]) in The 30th Annual Meeting of the Association for Natural Language Processing (Japan, 2024)
- Sato Yo International Scholarship (Apr 2022 - Mar 2024)
- Japanese Government (MEXT) Scholarship (Apr 2020 - Mar 2022)
- IEICE Kyushu Section Award for Academic Excellence (Japan, 2020)
- President’s Award for Best Graduation Project, Kagoshima College of Technology (Japan, 2020)
- Japanese Government (MEXT) Scholarship: College of Technology (Apr 2016 - Mar 2020)
- Outstanding Award for the presentation of Exhibit in The 10th Regional Congress Search for SEAMEO Young Scientists (Malaysia, 2016)
- 56th International Mathematical Olympiad Participation (Thailand, 2015)
- Gold Medal in Cambodian National Mathematical Olympiad (Cambodia, 2015)