Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

Published in Under Review, 2023

Recommended citation: Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, and Yulia Tsvetkov. 2023. "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models." Under review. https://arxiv.org/abs/2305.13707

Direct Link