Introduction
Introduction Statistics Contact Development Disclaimer Help
Post AwJFG4DDdeKTJiu0fo by [email protected]
More posts by [email protected]
Post #AwJFG3nhAXue2YxctE by [email protected]
0 likes, 0 repeats
My naive understanding of LLMs is that they start by converting a token into an…
Post #AwJFG3vqgER0Rqm8bA by [email protected]
0 likes, 0 repeats
I asked this question as @[email protected] but I realize it's more of a M…
Post #AwJFG440BuxMr8aeJ6 by [email protected]
0 likes, 0 repeats
@dneary @fclc @[email protected] I don't know the details of the embe…
Post #AwJFG4DDdeKTJiu0fo by [email protected]
0 likes, 0 repeats
@dneary @fclc (3) because of the composition of many RELUs to get their (or wha…
Post #AwJFG4JxEbiVec3OAi by [email protected]
0 likes, 1 repeats
@dwmalone @dneary great place to start is the original attention paper: https:/…
Post #AwJFG4iPlfHasTUvIW by [email protected]
0 likes, 0 repeats
My very naive question is why we make M and N 4096 and 14554 (or whatever) inst…
You are viewing proxied material from pleroma.anduin.net. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.