Beren Millidge(@BerenMillidge) 's Twitter Profileg
Beren Millidge

@BerenMillidge

Understanding Intelligence.

ID:1258771704995815424

linkhttps://www.beren.io calendar_today08-05-2020 14:52:20

235 Tweets

1,2K Followers

151 Following

Quentin Anthony(@QuentinAnthon15) 's Twitter Profile Photo

State-space models (SSMs) like Mamba and mixture-of-experts (MoE) models like Mixtral both seek to reduce the computational cost to train/infer compared to transformers, while maintaining generation quality.

Learn more in our paper: zyphra.com/blackmamba

State-space models (SSMs) like Mamba and mixture-of-experts (MoE) models like Mixtral both seek to reduce the computational cost to train/infer compared to transformers, while maintaining generation quality. Learn more in our paper: zyphra.com/blackmamba
account_circle
Tommaso Salvatori(@TommSalvatori) 's Twitter Profile Photo

New preprint :)
Predictive coding networks allow us to perform Bayesian inference on continuous state variables. In this work, we go beyond Bayesian inference, and show how to perform interventional and counterfactual inference.

account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

New work from Conjecture explores the evolution of LLM internals over the course of training using EleutherAI's Pythia suite! Very cool work with some surprising results.

lesswrong.com/posts/2JJtxitp…

account_circle
Tommaso Salvatori(@TommSalvatori) 's Twitter Profile Photo

New preprint :)



“INCREMENTAL PREDICTIVE CODING: A PARALLEL AND FULLY AUTOMATIC LEARNING ALGORITHM”



Efficiency problems when training predictive coding networks? bad local minima?

We provide a simple trick to address these:



arxiv.org/pdf/2212.00720…


(A thread)

account_circle
Tommaso Salvatori(@TommSalvatori) 's Twitter Profile Photo

Preprint :)

biorxiv.org/content/10.110……

“Recurrent predictive coding models for associative memory employing covariance learning”

Here, we present a family of predictive coding models that also learn the statistical information needed for associative memory.
A Thread:

account_circle
Tommaso Salvatori(@TommSalvatori) 's Twitter Profile Photo

Preprint :)
arxiv.org/pdf/2211.03481…

'Predictive Coding Beyond Gaussian Distributions'

Have you ever tried to train a transformer model using Rao and Ballard's predictive coding (PC) framework?

It doesn't work. Why? Because Gaussian assumptions are too limiting!
A thread 1/n

account_circle
Tommaso Salvatori(@TommSalvatori) 's Twitter Profile Photo

Happy to announce that the paper “Learning on Arbitrary Graph Topologies via Predictive Coding” has been accepted to NeurIPS ’22! In this paper, we show how to train networks with ANY topology and perform MANY tasks simultaneously. arxiv.org/pdf/2201.13180…

account_circle
q(Alex Kiefer | everything else)(@exilefaker) 's Twitter Profile Photo

Preprint!: I’m excited to share a new paper from the VERSES Research Lab, 'Capsule Networks as Generative Models', written with Beren Millidge Alec Tschantz and Chris L Buckley
arxiv.org/abs/2209.02567 1/2

account_circle
Maxwell Ramstead(@mjdramstead) 's Twitter Profile Photo

🚨 Preprint alert 🚨 “On the Map-Territory Fallacy Fallacy,” by Dalton A R Sakthivadivel, Karl Friston, and yours truly (that’s right—you read the title correctly) 1/10
arxiv.org/abs/2208.06924

account_circle
Akseli Ilmanen(@akseli_ilmanen) 's Twitter Profile Photo

Do you have some spare time, or 'free energy' :=), then listen to the latest episode with Beren Millidge about the Free Energy Principle, Active Inference and Reinforcement learning!

anchor.fm/the-embodied-a…

Do you have some spare time, or 'free energy' :=), then listen to the latest episode with @BerenMillidge about the Free Energy Principle, Active Inference and Reinforcement learning! anchor.fm/the-embodied-a…
account_circle