Systematic search for 1B-2B MoE models. Best: bs=1, ctx=2048 achieves 0.32 loss. Top-8 routing beats top-2.
-
kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs4-ctx1024
Updated ⢠25 -
kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs2-ctx2048
Updated ⢠26 -
kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs2-ctx1024
Updated ⢠23 -
kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs1-ctx2048
Updated ⢠34