TweetDeck : (((ل()(ل() yoav)))) This is a deliberately obtuse misreading of our paper. We aren't talking about using LMs to study the terabytes of data produced by Hemingway nor about LMs used to study reddit. We ground our discussion in the particular trend of large "general purpose" LMs overviewed in Sec 2.

TweetDeck : Emiel van Miltenburg He seems to have backed off to "the data as it is", which isn't even "the language that was used" but "the language that was collected".

I totally agree that there are uses of LM technology in studying what people say, but that's diff to "general purpose" LMs.

TweetDeck : (((ل()(ل() yoav)))) Also: you're the one who chose to cite Lissack in your piece, despite the fact that he has been harassing Timnit and I on Twitter and that his premise seems to be that it is unscientific to leave it as unspoken that white supremacy is bad. So yeah, I'm not gonna let that slide.

TweetDeck : (((ل()(ل() yoav)))) But "the data as it is" is completely incoherent. Data sets only exist when they have been collected, representing a whole series of decisions in terms of what to collect & how to filter, etc. There's no there there to represent!

Twitter Web App : The claim that this kind of scholarship is "political" and "non-scientific" is precisely the kind of gate-keeping move set up to maintain "science" as the domain of people of privilege only. /fin

Twitter Web App : We draw on scholarship from a range of fields that looks at understanding how systems of power and oppression work in society. >>

Twitter Web App : And lastly, miss me with the claim that our work is "political" and therefore has a responsibility to "present the alternative views".

Twitter Web App : Furthermore, the "debate" you would like us to acknowledge is based on a false premise. As we lay out in detail in Sec 4, the training data emphatically do NOT represent "the world as it is". >>