Hao Wang / 王 昊
Hi, my name is Hao Wang and I am a Ph.D. candidate at Department of Computer Science and Communications Engineering, Waseda University. I am currently researching natural language processing at Kawahara Lab.
Research interests: natural language processing, multimodal learning, machine translation.
Email: conan1024hao[at->@]akane.waseda.jp
Education
Doctor of Engineering, Department of Computer Science and Communications Engineering, Waseda University. (Sep. 2024 - Sep. 2027, Sup: Daisuke Kawahara)
Master of Engineering, Department of Computer Science and Communications Engineering, Waseda University. (Apr. 2023 - Sep. 2024, Sup: Daisuke Kawahara)
Bachelor of Engineering, Department of Computer Science and Engineering, Waseda University. (Apr. 2019 - Mar. 2023, Sup: Daisuke Kawahara)
Experience
Research Intern / Visiting Researcher, New York University Courant Institute of Mathematical Sciences. (Oct. 2024 - Mar. 2025, NYC, Sup: Saining Xie)
Research Intern, OMRON SINIC X Corp. (Aug. 2023 - Jan. 2024, Tokyo, Sup: Yoshitaka Ushiku, Shohei Tanaka)
Trainee, RIKEN AIP. (Apr. 2023 - Mar. 2024, Tokyo, Sup: Shuhei Kurita)
Research Intern, CyberAgent AI Lab. (Mar. 2023 - Oct. 2023, Tokyo, Sup: Tetsuro Morimura, Ukyo Honda)
Research Assistant, Waseda University. (Nov. 2022 - Mar. 2023, Tokyo)
Software Engineer Intern, Citadel AI, Inc. (Jun. 2024 - Sep. 2024, Tokyo)
Software Engineer Intern, LegalOn Technologies, Inc. (Feb. 2024 - Mar. 2024, Tokyo)
Software Engineer Intern, CyberAgent, Inc. (Feb. 2023 - Feb. 2023, Tokyo)
Software Engineer Intern, LINE Corp. (Aug. 2022 - Oct. 2022, Tokyo) [blog]
Software Engineer Intern, Fixstars Corp. (May. 2021 - Jul. 2021, Tokyo)
Software Engineer Intern, Morpho, Inc. (Oct. 2020 - Dec.2020, Tokyo)
Data Science Intern, MC Digital, Inc. (Jul. 2021 - Jan. 2022, Tokyo)
Data Science Hackathon First Place, P&G Japan. (Sep. 2021, Tokyo)
Publication
International Conference (Refereed)
Ziqi Yin, Hao Wang, Kaito Horio, Daisuke Kawahara, Satoshi Sekine. 2024. Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance. The Second Workshop on Social Influence in Conversations (SICon 2024) @ EMNLP 2024, Miami, United States. [paper] [github]
Shohei Tanaka, Hao Wang, Yoshitaka Ushiku. 2024. SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters. The 35th British Machine Vision Conference (BMVC 2024), Glasgow, United Kingdom. [paper] [dataset] [github]
Hao Wang, Shuhei Kurita, Shuichiro Shimizu, Daisuke Kawahara. 2024. SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition. The Third Workshop on Advances in Language and Vision Research (ALVR) @ ACL 2024, Bangkok, Thailand. [paper] [github] [poster]
Hao Wang, Tetsuro Morimura, Ukyo Honda, Daisuke Kawahara. 2024. Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation. The 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (NAACL SRW 2024), Mexico City, Mexico. [paper] [poster]
Takuya Uematsu, Hao Wang, Daisuke Kawahara, Tomohide Shibata. 2024. A Benchmark Suite of Japanese Natural Questions. The 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024) @ NAACL 2024, Mexico City, Mexico. [paper]
Hao Wang, Hirofumi Shimizu, and Daisuke Kawahara. 2023. Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models. Findings of the Association for Computational Linguistics: ACL 2023 (Findings of ACL 2023), Toronto, Canada. [paper] [github] [demo] [poster]
Domestic Journal (Refereed)
王昊, 清水博文, 河原大輔. 言語モデルを用いた漢詩文の返り点付与と書き下し文生成. 自然言語処理, 2024, 31巻, 1号, p. 135-154. [paper]
Domestic Conference (Non-Refereed)
田中翔平, 王昊, 牛久祥孝. SciPostLayout: 科学論文ポスターのレイアウト解析およびレイアウト生成のためのデータセット. 第27回 画像の認識・理解シンポジウム(MIRU2024).
王昊, 藤田正悟, 神田峻介. 契約書条文に特化した文埋め込みモデルの構築. 第260回自然言語処理研究発表会(NL260). [paper] [slide]
王昊, 栗田修平, 清水周一郎, 河原大輔. SlideAVSR: 視聴覚音声認識のための論文解説動画データセット. 言語処理学会第30回年次大会(NLP2024). [paper]
尹子旗, 王昊, 堀尾海斗, 河原大輔, 関根聡. プロンプトの丁寧さと大規模言語モデルの性能の関係検証. 言語処理学会第30回年次大会(NLP2024). スポンサー賞 (メルカリ賞) 受賞. [paper]
植松拓也, 王昊, 河原大輔, 柴田知秀. 日本語Natural QuestionsとBoolQの構築. 言語処理学会第30回年次大会(NLP2024). 若手奨励賞 (筆頭著者のみ対象) 受賞. [paper]
Arseny Tolmachev, Masayoshi Hayashi, Takuro Niitsuma, Rintaro Enomoto, Hao Wang, Shuhei Kurita, Daisuke Kawahara, Kazuma Takaoka, Yoshitaka Uchida. Uzushio: A Distributed Huge Corpus Processor for the LLM Era. 言語処理学会第30回年次大会(NLP2024). [paper]
堀尾海斗, 村田栄樹, 王昊, 井手竜也, 河原大輔, 山崎天, 新里顕大, 中町礼文, 李聖哲, 佐藤敏紀. 日本語におけるChain-of-Thoughtプロンプトの検証. 2023年度人工知能学会全国大会(第36回, JSAI2023). [paper] [github] [slide]
王昊, 清水博文, 河原大輔. 言語モデルを用いた漢文の返り点付与と書き下し文生成. 言語処理学会第29回年次大会(NLP2023). [paper] [poster]
王昊, 中町礼文, 佐藤敏紀. 日本語の大規模な基盤モデルに対するLoRAチューニング. 言語処理学会第29回年次大会(NLP2023). [paper] [slide]
Other Presentations
王昊, 河原大輔. 言語学習支援に向けた動画生成モデルの構築. NLP若手の会第19回シンポジウム(YANS2024).
王昊, 森村哲郎, 本多右京, 河原大輔. 非自己回帰言語モデルへの強化学習の適用. NLP若手の会第18回シンポジウム(YANS2023). [poster]
王昊. 漢文と言語モデル. 東洋学へのコンピュータ利用第36回研究セミナー. [slide]
近藤瑞希, 王昊, 井手竜也, 伊藤俊太朗, Ritvik Choudhary, 栗原健太郎, 河原大輔. 日本語BigBirdの構築. 日本語言語資源の構築と利用性の向上(言語処理学会第29回年次大会 併設ワークショップ, JLR2023). [slide]
Others
Scholarships & Fundings
Google Gemma Academic Program for JP/KR 2024. 5,000 USD (GCP credits). 2024.
SPRING, Japan Science and Technology Agency. 200,000 JPY (monthly salary) + 500,000 JPY (research funds per year). 2024 - 2027.
Overseas Fellowship Program, Waseda University Future Robotics Organization. 2,200,000 JPY. 2024 - 2025.
Overseas Research Travel Grant Program, Waseda University. 110,000 JPY. 2024 - 2025.
Isao Okawa Scholarship for Information Technology Science, Waseda University. 200,000 JPY. 2024.
Azusa Ono Memorial Scholarship, Waseda University. 400,000 JPY. 2023.
Azusa Ono Memorial Scholarship, Waseda University. 400,000 JPY. 2022.
The Monbukagakusho Honors Scholarship for Privately-Financed International Students, Japan Student Services Organization (JASSO). 576,000 JPY. 2019 - 2020.
Services
Volunteer for ICRA 2024
Reviewer for ACL SRW 2024
Competitive Programming
AtCoder highest rating: 1513
Codeforces highest rating: 1937
TopCoder highest rating: 1292
Data Science Competition (Kaggle)
M5 Forecasting – Accuracy: Estimate the unit sales of Walmart retail goods. (271/5558 Top5% Silver Medal)
Halite by Two sigma: Collect the most halite during your match in space. (64/1139 Top10% Bronze Medal)
Google AI4Code – Understand Code in Python Notebooks. (28/1135 Top5% Silver Medal)
U.S. Patent Phrase to Phrase Matching. (128/1889 Top10% Bronze Medal)
Open Source Pre-trained Models