admarcosai 's Collections Model Architectures
updated
togethercomputer/StripedHyena-Hessian-7B
Text Generation
• 8B • Updated
• 45
• 66
Zebra: Extending Context Window with Layerwise Grouped Local-Global
Attention
Paper
• 2312.08618
• Published
• 13
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
• 2312.07987
• Published
• 41
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
• 2312.06550
• Published
• 57
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
• 2312.12742
• Published
• 13
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
• 2401.04658
• Published
• 27
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion
Tokens
Paper
• 2401.17377
• Published
• 38
Advancing Transformer Architecture in Long-Context Large Language
Models: A Comprehensive Survey
Paper
• 2311.12351
• Published
• 5
H2O-Danube-1.8B Technical Report
Paper
• 2401.16818
• Published
• 18
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
Learning and Leveraging World Models in Visual Representation Learning
Paper
• 2403.00504
• Published
• 33