Post AwJFG440BuxMr8aeJ6 by [email protected] | |
More posts by [email protected] | |
Post #AwJFG3nhAXue2YxctE by [email protected] | |
0 likes, 0 repeats | |
My naive understanding of LLMs is that they start by converting a token into an… | |
Post #AwJFG3vqgER0Rqm8bA by [email protected] | |
0 likes, 0 repeats | |
I asked this question as @[email protected] but I realize it's more of a M… | |
Post #AwJFG440BuxMr8aeJ6 by [email protected] | |
0 likes, 0 repeats | |
@dneary @fclc @[email protected] I don't know the details of the embe… | |
Post #AwJFG4DDdeKTJiu0fo by [email protected] | |
0 likes, 0 repeats | |
@dneary @fclc (3) because of the composition of many RELUs to get their (or wha… | |
Post #AwJFG4JxEbiVec3OAi by [email protected] | |
0 likes, 1 repeats | |
@dwmalone @dneary great place to start is the original attention paper: https:/… | |
Post #AwJFG4iPlfHasTUvIW by [email protected] | |
0 likes, 0 repeats | |
My very naive question is why we make M and N 4096 and 14554 (or whatever) inst… |