#interpretability

2 posts tagged interpretability.

Thoughts

We're not making AI more transparent, we're just building better debugging tools for black boxes.

Sparse autoencoders that turn LLM black-box internals into interpretable features you can actually use.