Twitter #interpretability hashtag • TwiCopy

Tanat Tonguthaisri

3 weeks ago

Introducing a new blog post on interpretable data fusion for distributed learning via gradient matching. Unlike traditional methods, our approach offers human interpretability and maintains privacy. Check it out at: bit.ly/3UQeCok #machinelearning #datascience

thumb_up_off_alt0

chat_bubble_outline0

account_circle

Yifei Wang

4 weeks ago

I'll present our #ICLR2024 poster #167 'Non-negative Contrastive Learning' tmr 10:45am. A simple (one-line) technique that gives much better feature interpretability while improving (at least maintaining) performance.

Welcome to drop by our poster and have a chat!

I'll present our #ICLR2024 poster #167 'Non-negative Contrastive Learning' tmr 10:45am. A simple (one-line) technique that gives much better feature interpretability while improving (at least maintaining) performance.

Welcome to drop by our poster and have a chat!

thumb_up_off_alt36

chat_bubble_outline0

account_circle

ちろひ

@chihiro_ribbon

3 weeks ago

Zimmermann+, openreview.net/forum?id=OZ7aI…
Mechanistic interpretabilityは，DNNを細かい単位に分解し，リバースエンジニアリングによりその内部の働きを解釈する試み．人間に各ユニットの挙動を予想させた時の正答率で解釈性スコアを測ると，データセットやモデルのサイズによる差は見られなかった．

thumb_up_off_alt4

chat_bubble_outline0

account_circle

fly51fly

1 month ago

[CL] A Primer on the Inner Workings of Transformer-based Language Models
arxiv.org/abs/2405.00208
- The paper provides a concise technical introduction to interpretability techniques used to analyze Transformer-based language models, focusing on the generative decoder-only…

[CL] A Primer on the Inner Workings of Transformer-based Language Models
arxiv.org/abs/2405.00208
- The paper provides a concise technical introduction to interpretability techniques used to analyze Transformer-based language models, focusing on the generative decoder-only…

thumb_up_off_alt62

chat_bubble_outline0

account_circle

Harrison Kinsley

1 month ago

Okay, first pass over KAN: Kolmogorov–Arnold Networks, it looks very interesting!

Interpretability of KAN model:
May be considered mostly as a safety issue these days, but it can also be used as a form of interaction between the user and a model, as this paper argues and I…

Okay, first pass over KAN: Kolmogorov–Arnold Networks, it looks very interesting!

Interpretability of KAN model:
May be considered mostly as a safety issue these days, but it can also be used as a form of interaction between the user and a model, as this paper argues and I…

thumb_up_off_alt874

chat_bubble_outline0

account_circle

Technical AI Safety Conference (TAIS)

1 month ago

In his talk at #TAIS , Stan van Wingerden shared the discoveries of singular learning theory and how they pave the way for fresh prospects in interpretability, mechanistic anomaly detection, and the exploration of inductive biases. He elaborated on his vision for the field's…

In his talk at #TAIS, Stan van Wingerden shared the discoveries of singular learning theory and how they pave the way for fresh prospects in interpretability, mechanistic anomaly detection, and the exploration of inductive biases. He elaborated on his vision for the field's…

thumb_up_off_alt19

chat_bubble_outline0

account_circle

Simone Scardapane

1 month ago

*A Primer on the Inner Workings of Transformer LMs*
by Javier Ferrando Gabriele Sarti Arianna Bisazza Marta R. Costa-jussa

I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens).

arxiv.org/abs/2405.00208

*A Primer on the Inner Workings of Transformer LMs*
by @javifer_96 @gsarti_ @AriannaBisazza @costajussamarta

I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens).

arxiv.org/abs/2405.00208

thumb_up_off_alt175

chat_bubble_outline0

account_circle

John

1 month ago

Just came across this paper introducing Kolmogorov-Arnold Networks (KANs): a breakthrough in deep learning that replaces fixed activation functions in MLPs with learnable activations on edges, achieving superior accuracy and interpretability compared to classic MLPs!

AI 2.0 ?

Just came across this paper introducing Kolmogorov-Arnold Networks (KANs): a breakthrough in deep learning that replaces fixed activation functions in MLPs with learnable activations on edges, achieving superior accuracy and interpretability compared to classic MLPs!

AI 2.0 ?

thumb_up_off_alt2

chat_bubble_outline0

account_circle

Javier Ferrando

1 month ago

[1/4] Introducing “A Primer on the Inner Workings of Transformer-based Language Models”, a comprehensive survey on interpretability methods and the findings into the functioning of language models they have led to.

ArXiv: arxiv.org/pdf/2405.00208

[1/4] Introducing “A Primer on the Inner Workings of Transformer-based Language Models”, a comprehensive survey on interpretability methods and the findings into the functioning of language models they have led to.

ArXiv: arxiv.org/pdf/2405.00208

thumb_up_off_alt540

chat_bubble_outline0

account_circle

Burny — Effective Omni

1 month ago

Researchers at MIT Propose ‘MAIA’: An Artificial Intelligence System that Uses Neural Network Models to Automate Neural Model Understanding Tasks
A Multimodal Automated Interpretability Agent

Researchers at MIT Propose ‘MAIA’: An Artificial Intelligence System that Uses Neural Network Models to Automate Neural Model Understanding Tasks
A Multimodal Automated Interpretability Agent

thumb_up_off_alt11

chat_bubble_outline0

account_circle

Alex Makelov

4 weeks ago

If you're at #ICLR2024 , drop by poster
#263 to hear about pitfalls of using subspace activation patching for interpretability! Joint work with Georg Lange, Atticus Geiger, Neel Nanda

If you're at #ICLR2024, drop by poster
#263 to hear about pitfalls of using subspace activation patching for interpretability! Joint work with @_georg_lange, Atticus Geiger, @NeelNanda5

thumb_up_off_alt12

chat_bubble_outline0

account_circle

AI Coffee Break with Letitia

4 weeks ago

Ever wondered how to interpret your ML models? 🤔
We explain a powerful interpretability technique: Shapley Values – can be used to explain any model, including LLMs!
💻 We show simple code for how to use them and 📖 dive into the theory behind them.
📺 youtu.be/5-1lKFvV1i0

Ever wondered how to interpret your ML models? 🤔
We explain a powerful interpretability technique: Shapley Values – can be used to explain any model, including LLMs!
💻 We show simple code for how to use them and 📖 dive into the theory behind them.
📺 youtu.be/5-1lKFvV1i0

thumb_up_off_alt33

chat_bubble_outline0

account_circle

Mish

4 weeks ago

I need Trenton Bricken’s interpretability T-shirt

I need @TrentonBricken’s interpretability T-shirt

thumb_up_off_alt8

chat_bubble_outline0

account_circle

Gašper Beguš

4 weeks ago

This amazing NSF workshop on New Horizons in Language Science will be streamed online!

My abstract on the topic What key future scientific opportunities lie at the interface between the study of human language and large language model development?

Interpretability techniques…

This amazing NSF workshop on New Horizons in Language Science will be streamed online!

My abstract on the topic What key future scientific opportunities lie at the interface between the study of human language and large language model development?

Interpretability techniques…

thumb_up_off_alt42

chat_bubble_outline0

account_circle

UKP Lab

4 weeks ago

A warm welcome to Subhabrata Dutta, who has just started as a Postdoctoral Researcher at UKP Lab! 👋 Subhabrata joins us from LCS2 Lab, his areas of interest are Mechanistic Interpretability and #LLM Security.

Find out more about him on GitHub: subha0009.github.io

A warm welcome to Subhabrata Dutta, who has just started as a Postdoctoral Researcher at @UKPLab! 👋 Subhabrata joins us from @lcs2lab, his areas of interest are Mechanistic Interpretability and #LLM Security.

Find out more about him on GitHub: subha0009.github.io

thumb_up_off_alt22

chat_bubble_outline0

account_circle

Ziming Liu

1 month ago

17/N Given our empirical results, we believe that KANs will be a useful model/tool for AI + Science due to their accuracy, parameter efficiency and interpretability. The usefulness of KANs for machine learning-related tasks is more speculative and left for future work.

17/N Given our empirical results, we believe that KANs will be a useful model/tool for AI + Science due to their accuracy, parameter efficiency and interpretability. The usefulness of KANs for machine learning-related tasks is more speculative and left for future work.

thumb_up_off_alt177

chat_bubble_outline0

account_circle

Euan Ong

4 weeks ago

'Could a Mechanistic Interpretability Researcher Understand the Linux Kernel?'

(journals.plos.org/ploscompbiol/a…)

thumb_up_off_alt90

chat_bubble_outline0

account_circle

Dana Roemling

1 month ago

Together with Yves Scherrer and Aleksandra Miletić I had a look at explainability/Interpretability of a machine learning approach in the forensic linguistic context. Our preprint is now out on arxiv: arxiv.org/abs/2404.18510

thumb_up_off_alt18

chat_bubble_outline0

account_circle

Jack Lindsey

@Jack_W_Lindsey

4 weeks ago

I joined the mechanistic interpretability team at Anthropic recently, and it's been really exciting. To any comp. neuroscience followers -- I highly recommend following this literature, or getting involved yourself! You can keep up with our research here: transformer-circuits.pub

thumb_up_off_alt193

chat_bubble_outline0

account_circle

Chintan Parekh

@ChintanParekhAI

4 weeks ago

Focus on interpretability to build trust in AI systems. Making your models transparent and understandable helps stakeholders see how decisions are made, boosting confidence in AI solutions. Clear, explainable AI is essential for widespread adoption. #TrustInAI #ExplainableAI

thumb_up_off_alt0

chat_bubble_outline0

account_circle