1 post tagged llm-internals.
Sparse autoencoders that turn LLM black-box internals into interpretable features you can actually use.