CLISS(@CLISS2024) 's Twitter Profile Photo

It's that time of year! Learn all about Harmful Algal Blooms and who's tracking them in Clear Lake at the CLISS 2024! Until then, check out Clear Lake Water Quality page that posts all current HABS results.



account_circle
Ofir Press(@OfirPress) 's Twitter Profile Photo

DeepMind's Gopher and BigScience's BLOOM already use relative position embeddings, but most other language models don't. I believe we should all start using relative positioning.

In this new post, I discuss the use case for relative position methods:
ofir.io/The-Use-Case-f…

DeepMind's Gopher and BigScience's BLOOM already use relative position embeddings, but most other language models don't. I believe we should all start using relative positioning.

In this new post, I discuss the use case for relative position methods:
ofir.io/The-Use-Case-f…
account_circle
松xR(@matsu_vr) 's Twitter Profile Photo

ローカルでLLMを動かすやつのBloomz版とも言えるbloomz.cppを初代M1 macbook air(メモリ16GB)で動かしてみました。動いた! 試したのは70億パラメータの bigscience/bloomz-7b1 。アメリカの大統領を聞いたらG.W.ブッシュを答えたけど。レスポンスも7Bなら早いです
github.com/NouamaneTazi/b…

ローカルでLLMを動かすやつのBloomz版とも言えるbloomz.cppを初代M1 macbook air(メモリ16GB)で動かしてみました。動いた! 試したのは70億パラメータの bigscience/bloomz-7b1 。アメリカの大統領を聞いたらG.W.ブッシュを答えたけど。レスポンスも7Bなら早いです
github.com/NouamaneTazi/b…
account_circle
AK(@_akhaliq) 's Twitter Profile Photo

BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model
abs: arxiv.org/abs/2212.04960

BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model 
abs: arxiv.org/abs/2212.04960
account_circle
Saulnier Lucile(@LucileSaulnier) 's Twitter Profile Photo

Wondering how one can create a dataset of several TB of text data to train a language model?📚

With BigScience Research Workshop, we have been through this exercise and shared everything in our paper 'The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset'

🧵

Wondering how one can create a dataset of several TB of text data to train a language model?📚

With @BigscienceW, we have been through this exercise and shared everything in our #NeurIPS2022 paper 'The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset'

🧵
account_circle
Edouard d'Archimbaud(@edarchimbaud) 's Twitter Profile Photo

With the ability to train self-supervised models, research is scaling up both data and model size at an impressive rate. Hugging Face BigScience is a new effort to establish good practices in data curation.
bit.ly/3xog6uR

With the ability to train self-supervised models, research is scaling up both data and model size at an impressive rate. @huggingface BigScience is a new effort to establish good practices in data curation.
bit.ly/3xog6uR
#dcai #SelfSupervised #labeling #data #ai #ml
account_circle
Taiga(@tg3517) 's Twitter Profile Photo

1700億パラメータのOSS LLM。性能も良さげ。300GBくらいあるっぽい。
bigscience/bloom · Hugging Face huggingface.co/bigscience/blo…

account_circle
BigScience Research Workshop(@BigscienceW) 's Twitter Profile Photo

Excited to announce the BigScience Biomedical Hackathon! Together we're creating an open source, community resource of over 150 biomedical datasets. Join us! 🙌

🌸 Our mission: hfbigbio.github.io
🚀 Contribute: github.com/bigscience-wor…

Excited to announce the BigScience Biomedical Hackathon! Together we're creating an open source, community resource of over 150 biomedical datasets. Join us! 🙌
 
🌸 Our mission: hfbigbio.github.io
🚀 Contribute: github.com/bigscience-wor…
account_circle
yoheLab(@LabYohe) 's Twitter Profile Photo

See the Yohe Lab featured here in contributing to the new UNC CHARLOTTE COLLEGE OF COMPUTING + INFORMATICS BioinformaticsUNCC UNCC Biological Sciences Center for Computational Intelligence to Predict Health and Environmental Risks ( )! 🦇👩‍🔬🦠🔬👩‍💻📊
features.charlotte.edu/laurel-yohe

See the Yohe Lab featured here in contributing to the new @CLT_CCI @UNCC_BIGScience @UNCCBiology Center for Computational Intelligence to Predict Health and Environmental Risks (#CIPHER)! 🦇👩‍🔬🦠🔬👩‍💻📊
features.charlotte.edu/laurel-yohe
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Documents the data creation and curation efforts of ROOTS corpus, a 1.6TB dataset used to train BLOOM

Releases a large initial subset of the corpus

data: huggingface.co/bigscience-data
abs: arxiv.org/abs/2303.03915

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Documents the data creation and curation efforts of ROOTS corpus, a 1.6TB dataset used to train BLOOM

Releases a large initial subset of the corpus

data: huggingface.co/bigscience-data
abs: arxiv.org/abs/2303.03915
account_circle
ISIS Neutron and Muon Source(@isisneutronmuon) 's Twitter Profile Photo

🎊15 years ago today Target Station 2 detected it's first neutrons!! 🎉
To celebrate here is a collection of some of our favourite science highlights from each of our fantastic TS2 Instruments across the years... isis.stfc.ac.uk/Pages/15-Years…

Science and Technology Facilities Council

account_circle
Louis Maddox(@permutans) 's Twitter Profile Photo

Another LLM not instruction tuned on OpenAI-derived data, but on P3 huggingface.co/datasets/bigsc… (came from BigScience T0 arxiv.org/abs/2110.08207)

After Dolly v2: Pythia-12b instruction tuned on Databricks' in-house dataset (v1 was Alpaca, derived from OpenAI) databricks.com/blog/2023/04/1…

Another LLM not instruction tuned on OpenAI-derived data, but on P3 huggingface.co/datasets/bigsc… (came from BigScience T0 arxiv.org/abs/2110.08207)

After Dolly v2: Pythia-12b instruction tuned on Databricks' in-house dataset (v1 was Alpaca, derived from OpenAI) databricks.com/blog/2023/04/1…
account_circle