Introducing a new blog post on interpretable data fusion for distributed learning via gradient matching. Unlike traditional methods, our approach offers human interpretability and maintains privacy. Check it out at: bit.ly/3UQeCok #machinelearning #datascience
*A Primer on the Inner Workings of Transformer LMs*
by Javier Ferrando Gabriele Sarti Arianna Bisazza Marta R. Costa-jussa
I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens).
arxiv.org/abs/2405.00208
If you're at #ICLR2024 , drop by poster
#263 to hear about pitfalls of using subspace activation patching for interpretability! Joint work with Georg Lange, Atticus Geiger, Neel Nanda
Focus on interpretability to build trust in AI systems. Making your models transparent and understandable helps stakeholders see how decisions are made, boosting confidence in AI solutions. Clear, explainable AI is essential for widespread adoption. #TrustInAI #ExplainableAI