Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Published in Under Review, 2023
Recommended citation: Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, and Yulia Tsvetkov. 2023. "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models." Under review. https://arxiv.org/abs/2305.13707