Why Nostr? What is Njump?
2024-07-29 07:23:41

Leon Brocard on Nostr: Do large language models break up text into smaller fragments similar to how humans ...

Do large language models break up text into smaller fragments similar to how humans do? To investigate this, I used ChatGPT-4o's tokenizer (o200k_base) to tokenize the first paragraph from https://en.wikipedia.org/wiki/Lexical_analysis. The boxes are tokens. Common (lower-valued) tokens are coloured red and rarer (higher-valued) tokens are coloured grey. #chatgpt

Author Public Key
npub16s8m2f5kx4u66d2m688z67p325ls38v7jh2lvp4ns8wvet4jwwjs9tdmc0