Spaces:
Running
Running
File size: 826 Bytes
78dbc2a 76aa86c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ---
title: README
emoji: 🐨
colorFrom: pink
colorTo: indigo
sdk: static
pinned: false
---
# The Stack v2 Training Data
This organization contains the full datasets used to train StarCoder2:
- `the-stack-v2-train-full`: contains the training data with 600+ programming languages used to train StarCoder2-15B with the files concatenated per repository
- `the-stack-v2-train-full-files`: same as `the-stack-v2-train-full` but without repository concatenation which makes filtering files or licenses easier
- `the-stack-v2-train-smol`: contains the training data with 17 programming languages used to train StarCoder2-3B and 7B with the files concatenated per repository
- `the-stack-v2-train-smol-files`: same as `the-stack-v2-train-smol` but without repository concatenation which makes filtering files or licenses easier |