Tanat Tonguthaisri(@gastronomy) 's Twitter Profile Photo

Introducing a new blog post on interpretable data fusion for distributed learning via gradient matching. Unlike traditional methods, our approach offers human interpretability and maintains privacy. Check it out at: bit.ly/3UQeCok

account_circle
Yifei Wang(@yifeiwang77) 's Twitter Profile Photo

I'll present our poster #167 'Non-negative Contrastive Learning' tmr 10:45am. A simple (one-line) technique that gives much better feature interpretability while improving (at least maintaining) performance.

Welcome to drop by our poster and have a chat!

I'll present our #ICLR2024 poster #167 'Non-negative Contrastive Learning' tmr 10:45am.  A simple (one-line) technique that gives much better feature interpretability while improving (at least maintaining) performance. 

Welcome to drop by our poster and have a chat!
account_circle
ちろひ(@chihiro_ribbon) 's Twitter Profile Photo

Zimmermann+, openreview.net/forum?id=OZ7aI…
Mechanistic interpretabilityは,DNNを細かい単位に分解し,リバースエンジニアリングによりその内部の働きを解釈する試み.人間に各ユニットの挙動を予想させた時の正答率で解釈性スコアを測ると,データセットやモデルのサイズによる差は見られなかった.

account_circle
fly51fly(@fly51fly) 's Twitter Profile Photo

[CL] A Primer on the Inner Workings of Transformer-based Language Models
arxiv.org/abs/2405.00208
- The paper provides a concise technical introduction to interpretability techniques used to analyze Transformer-based language models, focusing on the generative decoder-only…

[CL] A Primer on the Inner Workings of Transformer-based Language Models  
arxiv.org/abs/2405.00208     
- The paper provides a concise technical introduction to interpretability techniques used to analyze Transformer-based language models, focusing on the generative decoder-only…
account_circle
Harrison Kinsley(@Sentdex) 's Twitter Profile Photo

Okay, first pass over KAN: Kolmogorov–Arnold Networks, it looks very interesting!

Interpretability of KAN model:
May be considered mostly as a safety issue these days, but it can also be used as a form of interaction between the user and a model, as this paper argues and I…

Okay, first pass over KAN: Kolmogorov–Arnold Networks, it looks very interesting!

Interpretability of KAN model: 
May be considered mostly as a safety issue these days, but it can also be used as a form of interaction between the user and a model, as this paper argues and I…
account_circle
Technical AI Safety Conference (TAIS)(@tais_2024) 's Twitter Profile Photo

In his talk at , Stan van Wingerden shared the discoveries of singular learning theory and how they pave the way for fresh prospects in interpretability, mechanistic anomaly detection, and the exploration of inductive biases. He elaborated on his vision for the field's…

In his talk at #TAIS, Stan van Wingerden shared the discoveries of singular learning theory and how they pave the way for fresh prospects in interpretability, mechanistic anomaly detection, and the exploration of inductive biases. He elaborated on his vision for the field's…
account_circle
Simone Scardapane(@s_scardapane) 's Twitter Profile Photo

*A Primer on the Inner Workings of Transformer LMs*
by Javier Ferrando Gabriele Sarti Arianna Bisazza Marta R. Costa-jussa

I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens).

arxiv.org/abs/2405.00208

*A Primer on the Inner Workings of Transformer LMs*
by @javifer_96 @gsarti_ @AriannaBisazza @costajussamarta 

I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens).

arxiv.org/abs/2405.00208
account_circle
John(@johnrachwan) 's Twitter Profile Photo

Just came across this paper introducing Kolmogorov-Arnold Networks (KANs): a breakthrough in deep learning that replaces fixed activation functions in MLPs with learnable activations on edges, achieving superior accuracy and interpretability compared to classic MLPs!

AI 2.0 ?

Just came across this paper introducing Kolmogorov-Arnold Networks (KANs): a breakthrough in deep learning that replaces fixed activation functions in MLPs with learnable activations on edges, achieving superior accuracy and interpretability compared to classic MLPs! 

AI 2.0 ?
account_circle
Javier Ferrando(@javifer_96) 's Twitter Profile Photo

[1/4] Introducing “A Primer on the Inner Workings of Transformer-based Language Models”, a comprehensive survey on interpretability methods and the findings into the functioning of language models they have led to.

ArXiv: arxiv.org/pdf/2405.00208

[1/4] Introducing “A Primer on the Inner Workings of Transformer-based Language Models”, a comprehensive survey on interpretability methods and the findings into the functioning of language models they have led to.

ArXiv: arxiv.org/pdf/2405.00208
account_circle
Burny — Effective Omni(@burny_tech) 's Twitter Profile Photo

Researchers at MIT Propose ‘MAIA’: An Artificial Intelligence System that Uses Neural Network Models to Automate Neural Model Understanding Tasks
A Multimodal Automated Interpretability Agent

Researchers at MIT Propose ‘MAIA’: An Artificial Intelligence System that Uses Neural Network Models to Automate Neural Model Understanding Tasks
A Multimodal Automated Interpretability Agent
account_circle
Alex Makelov(@AMakelov) 's Twitter Profile Photo

If you're at , drop by poster
#263 to hear about pitfalls of using subspace activation patching for interpretability! Joint work with Georg Lange, Atticus Geiger, Neel Nanda

If you're at #ICLR2024, drop by poster
#263 to hear about pitfalls of using subspace activation patching for interpretability! Joint work with @_georg_lange, Atticus Geiger, @NeelNanda5
account_circle
AI Coffee Break with Letitia(@AICoffeeBreak) 's Twitter Profile Photo

Ever wondered how to interpret your ML models? 🤔
We explain a powerful interpretability technique: Shapley Values – can be used to explain any model, including LLMs!
💻 We show simple code for how to use them and 📖 dive into the theory behind them.
📺 youtu.be/5-1lKFvV1i0

Ever wondered how to interpret your ML models? 🤔
We explain a powerful interpretability technique: Shapley Values – can be used to explain any model, including LLMs!
💻 We show simple code for how to use them and 📖 dive into the theory behind them.
📺 youtu.be/5-1lKFvV1i0
account_circle
Gašper Beguš(@begusgasper) 's Twitter Profile Photo

This amazing NSF workshop on New Horizons in Language Science will be streamed online!

My abstract on the topic What key future scientific opportunities lie at the interface between the study of human language and large language model development?

Interpretability techniques…

This amazing NSF workshop on New Horizons in Language Science will be streamed online! 

My abstract on the topic What key future scientific opportunities lie at the interface between the study of human language and large language model development? 

Interpretability techniques…
account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

A warm welcome to Subhabrata Dutta, who has just started as a Postdoctoral Researcher at UKP Lab! 👋 Subhabrata joins us from LCS2 Lab, his areas of interest are Mechanistic Interpretability and Security.

Find out more about him on GitHub: subha0009.github.io

A warm welcome to Subhabrata Dutta, who has just started as a Postdoctoral Researcher at @UKPLab! 👋 Subhabrata joins us from @lcs2lab, his areas of interest are Mechanistic Interpretability and #LLM Security.

Find out more about him on GitHub: subha0009.github.io
account_circle
Ziming Liu(@ZimingLiu11) 's Twitter Profile Photo

17/N Given our empirical results, we believe that KANs will be a useful model/tool for AI + Science due to their accuracy, parameter efficiency and interpretability. The usefulness of KANs for machine learning-related tasks is more speculative and left for future work.

17/N Given our empirical results, we believe that KANs will be a useful model/tool for AI + Science due to their accuracy, parameter efficiency and interpretability. The usefulness of KANs for machine learning-related tasks is more speculative and left for future work.
account_circle
Euan Ong(@euan_ong) 's Twitter Profile Photo

'Could a Mechanistic Interpretability Researcher Understand the Linux Kernel?'

(journals.plos.org/ploscompbiol/a…)

account_circle
Dana Roemling(@danaroemling) 's Twitter Profile Photo

Together with Yves Scherrer and Aleksandra Miletić I had a look at explainability/Interpretability of a machine learning approach in the forensic linguistic context. Our preprint is now out on arxiv: arxiv.org/abs/2404.18510

account_circle
Jack Lindsey(@Jack_W_Lindsey) 's Twitter Profile Photo

I joined the mechanistic interpretability team at Anthropic recently, and it's been really exciting. To any comp. neuroscience followers -- I highly recommend following this literature, or getting involved yourself! You can keep up with our research here: transformer-circuits.pub

account_circle
Chintan Parekh(@ChintanParekhAI) 's Twitter Profile Photo

Focus on interpretability to build trust in AI systems. Making your models transparent and understandable helps stakeholders see how decisions are made, boosting confidence in AI solutions. Clear, explainable AI is essential for widespread adoption.

account_circle