In working with Apple's foundation models, we often want to provide as much context as possible. However, since the model has a context size limit of 4096 tokens, is there a way to estimate the number of tokens beforehand?
How to Estimate Token Count Before Passing Context to Apple’s Foundation Model?
While the Foundation Models framework currently doesn't have a direct API for estimating tokens, you can use the Foundation Models Xcode Instrument to measure tokens.
Alternatively, to fit the maximum number of tokens, this is the trick I usually use:
-
Send the maximum amount of "typical for your use case" content you'd like to send the model. Catch and print the result of the exceededContextWindowSize error to Xcode console.
-
The error printout will give you the token count of your content, something like:
Content contains 9056 tokens, which exceeds the maximum allowed context size of 4096.
-
Start cutting down on your input content until you get under the limit and no longer hit the error.
-
Use your under-the-limit content length to estimate the allowable character length for your future content.
Note there's a common gotcha: remember you need to budget for enough context for the model's response. As you get close to the context limit, you may see an exceededContextWindowSize error like:
Content contains 4092 tokens, which exceeds the maximum allowed context size of 4096.
While 4092 is less than 4096, this error means that the model can't finish its response without going over the limit. The solution is to cut down your input content length further until the model has enough context to generate its response.
thanks you!
Many thanks for your feedback!