John Schulman(@johnschulman2) 's Twitter Profileg
John Schulman

@johnschulman2

Cofounder @openai, lead post-training for ChatGPT and the API. Interested in reinforcement learning, alignment, birds, jazz music

ID:1388977636618080256

calendar_today02-05-2021 22:05:23

90 Tweets

38,7K Followers

609 Following

Follow People
John Schulman(@johnschulman2) 's Twitter Profile Photo

I'd like to see some research on where the political and moral ideologies of RLHF'd language models come from. Make some questionairres that measure a model's ideology. Create a variety of models with few-shot prompting, SFT, and RL; look at the ideology at each stage and how it

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

'Trust region utilitarianism': there is a sensible utility function to maximize, but it's only valid locally around the current state of the world, where the intuitions that produced it are grounded.
'Repugnant conclusion' is outside trust region -- not a problem

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

Coming soon to your favorite word processor
Ctrl-alt-V: 'paste and paraphrase'
also, 'paste and match writing style'

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

I've been enjoying Richard Ngo's sci-fi writing at narrativeark dot xyz. It's a rare feat to combine these three properties: (1) about post-AGI worlds (2) plausible (3) actually fun to read.

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

Stumbled upon this charming short story, 'Someday', by Isaac Asimov: nyc3.digitaloceanspaces.com/sffaudio-usa/m…. Features a language model called Bard, which the boys fine-tune on some recent data discussing itself and other LMs...

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

Certain software skills are exceptionally useful for machine learning. In a previous era, it was GPU programming. Now in the era of pretrained models, it's front-end development -- to quickly whip up a UI to collect a fine-tuning or eval dataset.

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

Handy trick: if you say something dumb, follow with 'that was just a temperature=1 sample, don't take it seriously'

account_circle
Leo Gao(@nabla_theta) 's Twitter Profile Photo

Excited to share what I've been working on with John Schulman and Jacob Hilton!

We find that overoptimization of reward models can be modelled by simple functional forms with coefficients that scale smoothly with reward model size.

Paper: arxiv.org/abs/2210.10760

Excited to share what I've been working on with @johnschulman2 and @JacobHHilton! We find that overoptimization of reward models can be modelled by simple functional forms with coefficients that scale smoothly with reward model size. Paper: arxiv.org/abs/2210.10760
account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

Got access to cruise driverless ride service today -- flawless pickup + 30 min drive + dropoff. A bit slow at intersections, but still very impressive!

account_circle
TalkRL Podcast(@TalkRLPodcast) 's Twitter Profile Photo

Episode 38
OpenAI cofounder and inventor of PPO/TRPO John Schulman on RL from human feedback, tuning GPT-3 to follow instructions (InstructGPT) and answer long-form questions using the internet (WebGPT), AI alignment, AGI timelines, and more!
podcasts.apple.com/us/podcast/joh…

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

This morning a couple local kids rang my doorbell and ran away. Glad kids are still playing outside and not spending all day on homework and roblox

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

An ML modeling problem that occurred to while driving (maybe a good interview question): describe how to design a speech recognition system that preferentially decodes entities that are nearby (say, within 50 miles).

account_circle
TalkRL Podcast(@TalkRLPodcast) 's Twitter Profile Photo

Glad to share that next episode we will be featuring OpenAI founder and researcher John Schulman !
With a focus on his recent work on RL from human feedback.
DM or reply with suggest questions

account_circle
John Schulman(@johnschulman2) 's Twitter Profile Photo

A series of 4-month internships at companies and academic research groups could be a good replacement for an undergrad degree. Students would still go through coursework (perhaps online) but only as needed for job and interview prep.

account_circle