AI & ML interests

Character Database of Bangumis (If you need character LoRAs, see: https://huggingface.co/CyberHarem)

Recent Activity

AbstractPhilΒ 
posted an update about 6 hours ago
view post
Post
29
I'll attempt to expand the geolip-clip to full sequence context window to encompass sequential learning.
AbstractPhil/geolip-clip-vit-large-patch14-ctx576
The memory pod is specifically meant to tune everything based on final state pooling, which is fine if you aren't trying to actually use sequential utility.
HOWEVER, there are many elemental biases that present themselves if attempting to USE the standard sequence of 77 in conjunction with this final pooled state. Even though the standard 77 is predominantly noise past token 10 it still houses considerable amounts of information in terms of utility, so this should be handled carefully. Zero-shot structures are a tricky structure to analyze, especially structures based on attention mechanisms instead of true sequential accumulation. I've noticed I need to watch them for quite a while before the real bugs show up.

As it stands the token pool is essentially [B, 7+8, 768] for pools. This contains a robust and highly complex representation of useful accumulated bidirectional attention data, so it's quite powerful.

I'll build a few prototypes and tap into some papers. I'll either come up with something or a reason why I didn't. The end result will either produce an anchor bank set of tokens [B, 15, 768] for pooling, or [B, 15, 77, 768] ideally - which should expand the sequence of the clip to 1,155 if successful. That doesn't necessarily mean this sequence will be more useful than the [b, 15, 768], but it will be representationally valid to the context window expansion.

I wouldn't hold out for a single full-sequence option in a single day, that's a lot of moving parts to analyze, not to mention highly impractical to train with. A smaller dose of this information would be necessary for rapid prototyping so it'll likely be packaged as such.
  • 3 replies
Β·
AbstractPhilΒ 
posted an update 3 days ago
view post
Post
227
geolip-bertenstein-v1 - 5 experts chosen. A collective of shared transformer aligned experts, not a mixture of experts. Similar to a MOE, but not quite. This first prototype won't have the full mailing projection relay system afforded by the geofractal router, but it will definitely be a solid prototype.

It is not production ready yet, there needs to be a few upstream and downstream tools meant to consume and process the outputs to create useful representations.

This model will be able to text respond, use whisper, see with dinolip, code with codebert, and process proteins using esm2_t33_650m_ur50.

Our experts for the prototype are;
google-bert/bert-large-uncased
facebook/dinov2-large
microsoft/codebert-base
openai/whisper-large-v3
facebook/esm2_t33_650M_UR50

Not the smartest text model, but more than enough for this preliminary use case test setup. Text is predominantly meant to align and orient downward function, the entire machine is meant to be operated unilaterally as a collective, or independently through individual pairs requests via special token access.

This model will be capable of substantial power and feats as a prototype. It will be capable of seeing and processing differential equations utilizing dinov2 and esm2 data simultaneously, which can be used for downstream analysis - and I WILL use that data to create a more powerful connection between dinov2 tokens, protein tokens, video tokens, code tokens, and audio tokens.

This is the FIRST prototype of this case, and I will introduce video, genetics, shape analysis, pattern recognition processing, and a much more powerful and reusable text model.

The tests show the models can have differential communication through the geolip transformers after procrustes pairwise analysis and pentachoron CV protective measures.

Whitening procrustes for precalculation and center-aligning allows for a faster convergence, so that should help too.
  • 2 replies
Β·
AbstractPhilΒ 
posted an update 4 days ago
view post
Post
1947
I've... done it. This, with experts, achieves near 100% R1 retrieval accuracy on an adjacent - unseen by the fusion transformer - dataset with around 40k steps from the seen dataset. This means the language of the models are at least tested fused within the constraints, not just projected or estimated.
AbstractPhil/geolip-procrustes

I encourage EVERYONE who is curious to check my work. Check it, double check it, and triple check it.

These were aligned using COCO and then validated with Flickr. Entirely different datasets. The experts arbitrated and the alignment yielded the correct answers. Preliminary tests show that with almost no alignment requirement, the models can reach 100% R1 retrieval accuracy.

Not to be confused with validation accuracy for a classification model or a text encoder's text response, this allows multispectral communication between entirely different models for direct downstream consumption with almost no training for the chosen models.

I have a working procrustes experiment that learns adjacent manifolds within a reasonable spectrum and the speed is... well, 1 epoch with COCO using Bert-Large and DinoV2 that allows the models to align nearly perfectly. For some scales in the experiment it shows that the 3 set epochs aren't quite enough to align R1 to highest, while many align nearly immediately.

These two were an obvious pair to pick, 60% similarity and >90% spectral similarity.

The trainer transfers layers, learns embeddings, and more - all by sticking strictly to geometric boundaries and procrustes informational accumulation within a modulation model's constraints.

I have many experiments to run.
  • 1 reply
Β·
AbstractPhilΒ 
posted an update 6 days ago
view post
Post
255
The small projection-based approximator model for the geolip patchwork did not breach a certain level of accuracy as required by my specifications, so I've defaulted to harvesting direct geometric information from AI models until I get the comparative bounds required for a useful topology.

I must sincerely apologize for not solving this problem quickly.

This will take time. Without the approximator it's going to be considerably slower, but this model I begin training will be providing the approximations in a different way over time. As iterations progress, the system will conform to a huge array of geometric potentials and be capable at predicting those, but it will not be as powerful as the full patchmaker up front, and it will be slow training.

If I can get my hands on a cluster of A100's or H100's for a measure I'll make a post immediately, until then I must default to the slower process.

I really banked that the smaller version would have worked, but it simply couldn't hold complex topological shape without the correct boundaries being learnable AND endure entropic decay simultaneously. The only way to have a predominant shot at a full geometric shared language, is to make those boundaries learnable in the full spectrum of potentials, or at least more than I have placed on it.

I'll be refining my process in the coming days further, and I do apologize for pre-emptively announcing a potential that I have yet to fully explore.

There will be a full upgraded 38 shape geolip patchwork trained asap to fully encompass the Flux 1 AE spectrum, and another trained for SD15, SDXL, and Flux 2's VAE as well. These will accommodate DIRECT complex geometric patchwork learning, but not to the scale as promised yet. Autoregression is a complex mistress as many of you know, and I will be spending a great deal of time and compute analyzing all of the information required to build a uniformly useful and powerful autoregression patchwork to utilize as invariance to teaching.
  • 2 replies
Β·
AbstractPhilΒ 
posted an update 12 days ago
view post
Post
1578
GLIP - Geometric Linear Interpolative Patchwork aka geolip.
https://github.com/AbstractEyes/glip-autoencoder

To tinker with the topology directly you can play with it here, though I admit it's imperfect in this form - it's quite the tinker toy to see the effects of patching.
https://claude.ai/public/artifacts/697287e4-fa18-4753-8b57-904d5e2022ed



This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.

In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.

This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.

Many of the underlying formulas are outlined here;
AbstractPhil/geometric-experiment-history

Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.

More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.
  • 6 replies
Β·
AbstractPhilΒ 
posted an update 20 days ago
view post
Post
1338
The Rosetta Stone geometric vocabulary and the ramping up capacity.

What makes this particular invariant special, is the existence within all structures I've tested so far. I had Claude write up the direct article based on what we built together, but I've tested it on many substructures. This is flawed, and I have a series of answers to making it more accurate.

First a reconstruction from the ground up. This means each shape is specifically built upward from the substructure to the point of inductive deviance. This will be less quick at first and then build speed as I optimize like the last system did.

The "saddle" problem; the system detected saddles because there wasn't enough deviance in the shapes to attenuate to more cardinality and more aligned substructures. The blobs were around 30-40% of the overall patches, which interpolated into the others produced a fair approximation.
It MOST DEFINITELY did see those shapes in their voxel complexity. This is real.

https://claude.ai/public/artifacts/bf1256c7-726d-4943-88ad-d6addb263b8b
You can play with a public claude artifact dedicated to viewing the current shape spectrum - and with that know exactly why it's flawed.

The flawed and repetitive shapes. I rapid prototyped and there are multiple redundant shapes that simply don't classify well or at all. Not to mention the rotation simply doesn't help much of the time, or doesn't exist with many shapes. This will be rectified in the next variation.

Projecting to shared latent space as a catalyst to allow growing subjective geoflow matched step variance, rather than simply direct classification. This will theoretically allow for full channel-to-channel invariant features to be mapped from structure to structure, and the very formula that encapsulated them to be directly baked into the math rather than classified as a substructure analysis.

There are many challenges between here and there, so stay tuned my friends as I plot the geometric language of pretrained AI.
  • 2 replies
Β·
AbstractPhilΒ 
posted an update about 1 month ago
view post
Post
277
GeoFlow update β€” two training runs on the pentachoron geometric prior (4.8M params modulating frozen SD1.5).

10k ImageNet run fixed fragmented anatomy and spatial coherence in 7 minutes.

50k object-relations run taught actual compositional reasoning β€” "red cup on top of blue book" goes from a floating cup to correctly placed on the book.

Most interesting finding: learning happens in two phases. Object association locks first (~500 steps), spatial arrangement crystallizes after. You can watch it happen β€” "three candles in a triangle on a wooden tray" starts as candles side by side, then reorganizes into proper triangular formation. The tray itself rendered as a pentagon. Five vertices in, five sides out. The simplex is thinking in its own geometry.

Loss sits around 0.4 the entire time yet composition steadily improves. The prior nudges conditioning, it doesn't overwrite it.

Weights:
AbstractPhil/sd15-geoflow-object-association
Dataset:
AbstractPhil/synthetic-object-relations
Formulas:
AbstractPhil/sd15-rectified-geometric-matching

Next up β€” measuring the exact entropy decay inflection point across layers to enable branching the simplex into parallel paths with different anchor deviations. Geometric ensemble attention where the branches disagree on purpose.
AbstractPhilΒ 
posted an update about 1 month ago
view post
Post
2273
AbstractPhil/tinyflux-experts
Introducing the "blot" expert, sd15-flow-sol. The twin sister flow-matching experts for tinyflux-lailah; sd15-flow-lune AND sd15-flow-sol will be used in tandem to train tinyflux-Lailah. sd15-flow-sol never managed to reach full flow-matching prediction, so epsilon vpred conversion is required. All experts will exist within the tinyflux-experts repo, including all the critical checkpoint sets.
Lune was heavily finetuned in the sd3-style and adapted shift timestep system after David's interpolation converted sd15 into geometric basis.
Sol was left abandoned after 50 epochs with David and was considered overcooked and rigid, until I noticed the geometric structure today. Lune doesn't produce geometric structure as solid as Sol, not even close. Lune produces improved fidelity and detail, but Sol produces something very very different, aligned to sd15's behavior, and fully representative of the 5point 4simplex structure that David brought to the table.

Sol is essentially a nearly perfect blob-forming geometric blotter. Sol is SD15, and yet SOL was trained using a specific pattern recognizing and timestep aligned David model. David was tasked with classifying timesteps and patterns using complex deep-recognition structural analysis layer-by-layer, determining full-scale opinions after watching the entirety of sd15's structure during training.

Even though the sd15-flow-sol was left abandoned, the structure of Sol is HIGHLY effective at understanding TIMESTEP blotting interpolation. I didn't realize how crucially important this was until Lailah started to show rigidity and compartmentalized behavior with sequence - which likely happens to ALL flow matching models.

AbstractPhil/sd15-flow-matching

AbstractPhil/geo-david-collective-sd15-distilled
AbstractPhil/geo-david-collective-sd15-base-e40
  • 3 replies
Β·
AbstractPhilΒ 
posted an update about 2 months ago
view post
Post
985
Meet FluxLailah; AbstractPhil/tiny-flux-deep; 220m Flux variant currently pretraining at BF16. She is experimental, does not produce solid images yet - and yet she is producing. There is both an EMA and a raw weights pair producing different images. The EMA is particularly interesting at times.
Lailah uses flan-t5-base, clip-vit-l-14, and BlackForestLabs Flux1s VAE.
SEQ limit 128, images 512x512 for now. Lailah's early form is based on three variants. TinyFlux's weights were carefully planted into a deeper structure and trained yet again - dubbed TinyFlux-Deep. This variant has 15 dual-stream blocks and 25 single-stream blocks, nearly identical weight code as Flux with a similar attention mechanism - but intentionally deviant and compacted with careful consideration to scaling and purpose of mechanisms.
She went through quite a few growing pains with her earlier attention mechanism which required a reimagining today and careful consideration of the consequences, and now I present to you the preliminary look into Lailah.
The preliminary training is still heavily under way, the mechanisms are still being augmented, and her stability is currently being measured. The potential for fidelity, depth, and quality are still in measure - so I will be shifting attention and pivoting utility based on the needs over time.
  • 2 replies
Β·
AbstractPhilΒ 
posted an update about 2 months ago
view post
Post
237
pytorch-parallel-compiler v0.5.0 upgrades:
*Complex benchmarking for wide primitive objects is supported now. This includes multiple presets for quick tests on hardware.
* All supported primitive either have validity checks or will have them.
* 6 new wide layers supported directly, and will be a key part to the autotuner before v1.0
* WideTracedModel is a preliminary auto-builder so the user doesn't need to build them manually by gathering layers.

https://github.com/AbstractEyes/pytorch-parallel-compiler

New Layers for 0.5.0:
WideGRU, WideLSTM, WideGroupNorm, WideMultiheadedAttention, WideInstancenorm1/2d, WideConv3d,

Upcoming for 1.0:
* WideTracedModel fully building any supported layer patterns with multiple autotune potentials for autoselection.
* Module cherry-picking for use-case only; E.G. WideLinear replace only benefits your case 35% while Attention reduces by 10% no attn.
* All (roughly 32 more) commonly used pytorch layer systems supported in one form or another with wide-batched kernels to benefit both eager and compiled, many of which require reworks or completely remaking them.
* Autotuning wide formats based on hardware response to the kernels. Kernel chunking for big slow processes such as LSTM, kernel fusion for small process with excess overhead, expanding kernels with masking to fit specific use-case paradigms with hardwares, and a series of smaller and more important optimizations along the way.
* Full transformer and rope support with wide-batched optimizations throughout the structures to allow more robust autoregression throughput.
* Additional Conv1d, Conv2d, and Conv3d optimizations.

>version 1.0 :
* Entire diffusion structures specifically kernelized for high-efficiency utilization with eager and compilation.
* Video diffusion specific targets meant to heavily reduce computation costs on the gpu and increase computation throughput on the gpu.
AbstractPhilΒ 
posted an update about 2 months ago
view post
Post
247
The Long: this is a proof of concept; ensemble compilation vmap prototype is functional and can be used to increase throughput for wider batches on FFN, MLP, LLM, or other models than just ensembles. This system traces your model and creates stages of functional activation. Based on the stage it will combine or remove combinations of stages meant to assign your layers to batched functional calls meant to put pressure on your GPU with less loops with directly curated cudagraph compliance where applicable. Identical weights yield identical results at the cost of hardware and vram.

TLDR:
This is an ensemble optimization adapted to standard models. This will yield high-capacity speed improvements through increased throughput for inference and training alike using carefully traced staged vmap structures.

https://github.com/AbstractEyes/pytorch-parallel-compiler

The early list of layers isn't fully represented yet, so this is a preliminary look into the potentials of this structure when fully fleshed out.

MLP (N=100, batch=32, CUDA):
Eager:    2-3x speedup
Compiled: 35-40x speedup


ResBlock (N=20, batch=8, CUDA):
Eager:    ~5x speedup  
Compiled: ~10x speedup


This is early testing and so far the yields indicate that WIDENING your model with adjacent shared batched vmaps for uniformly staged models will yield considerably higher output for inference at the cost of additional hardware utilization.

This is akin to lining up all your systems and uniformly passing the necessary implications through a shared frozen representation gate.

Training for this is not tested nor supported yet, use at your own risk.
  • 1 reply
Β·
AbstractPhilΒ 
posted an update 2 months ago
view post
Post
363
Happy Holidays all! geofractal architectural expansions; timm is now a core component for experimenting. As it stands, the system is growing rapidly in one direction, and timm brings a whole lot to the table in another rapid-prototyping direction. Therefore, timm is now a core component for ease-of-use.

BaseUtil is a new core component; aka src.geofractal.router.base_util inherits BaseComponent's behavior, so it should allow device movement for util operations which will direct utilization for device-to-device behavior for the upcoming accelerate integration.

I'm trying to mitigate the base component structure as much as possible, but the need to chain components in specific orders presented a unique problem. By compartmentalizing utils into structures that can be delegated and moved, these structures can be repurposed, expanded autonomously, reduced autonomously, and more.

ChainComponent inherits a subsystem specifically designed to organize multi-system multi-device formulas designated for inception and synchronization purposes. This is meant to allow distributed tasking to multiple-devices in chained utilization. This also enables ease-of-integration into nn.ModuleList with a few other caveats that will be ironed out meant to target wide-distributed models.

FusionComponent is specifically dedicated to the new fusion processing system meant for experimental expansion. This includes sub-module schedule control, Component and Tower functional control, device-movement, and will be packaged under the term "gfu.UtilType" as a standard naming convention.
"gfc.ComponentTypeName"
"gfr.RouterTypeName"
"gfu.UtilityTypeName"
"gft.TowerTypeName"
All of which are basically just import thing as.
"gf.AnythingTopLevelPackaged" which will include the core.

Better debugging for compilation
I'm in prototyping phases of a better debugging for compiled wide models and will prepare a baseline component readout structure by the end of the day today or tomorrow.
AbstractPhilΒ 
posted an update 3 months ago
view post
Post
298
geofractal getting started guide available, bulk ablation for fusion, simple towers, oscillator capacity, and substructure systemic associative capacity.
Many formulas were tested, 92 tests for collectives, oscillation bulk experiments, and more. All of them either coalesce into the correct behavior or the failures are directly visible, which means the system is robust enough to declare some tools functionally valid but not scalable yet.

ai-crash course available;
https://github.com/AbstractEyes/geofractal/blob/main/ai_helpers/v101_claude_helpers.txt
Feed GPT, Claude, or Grokk and they will assist.

getting started guide;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/GETTING_STARTED.md

geofractal router architecture is in prototype phases;
https://github.com/AbstractEyes/geofractal

This is likely one of it's final growing phases before full production capacity is ramped up. The architecture is not for the novice, it's meant for experts to either get ideas, borrow code, utilize library capacity, or simply tell AI what to do. MOST files in current production have good descriptions for AI integration.

Transfer learning notebook available here;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/Router_Transfer_Learning-12_19_25.ipynb

Stress test and multiple diagnostics available here;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/components/diagnostics/

WideRouter compilation capacity available;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/wide_router.py

The wide router compiler organizes similar towers into stacked staged combinations before compiling with torch.compile. This is experimental, but has shown increased speed with multiple structures of wide models and will serve it's purpose in the future.
  • 1 reply
Β·
AbstractPhilΒ 
posted an update 3 months ago
view post
Post
361
Many updates. Cantor route experiments, GeoViT-david-beans 75% test standalone cifar100 geofractal 30m encoder. MultiHeaded Cantor Attention heavily optimized. The migration is primarily complete between geofractal and geovocab2.
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/model/david_beans/model.py
Cantor route staircase and wormhole excavation findings posted. A full article will be posted to represent the findings of cantor routing and the potentials for self-learning fractals through loss.
https://github.com/AbstractEyes/lattice_vocabulary/blob/master/src/geovocab2/proofs/cantor_steps_experiments.md
The steps experiments show profoundly important implications for cross-contamination problems with fractal and linear spaces, with some currently assessed as useful utilities as of today.
Today the classification experiment will continue by using mini-experts applied to patches within a miniature david-beans. The mini-experts were an accident that showed improvement to the fidelity and not destruction, so those experiments are to be continued. geovit-david-beans trainer was added to the first repo.
  • 1 reply
Β·
AbstractPhilΒ 
posted an update 4 months ago
view post
Post
310
For those using my geovocab2 repo for SimplexFactory, CantorRouteFactory, fusion modulations, model code import, training weights and models, or specific extraction systems; I will be refactoring in the coming days.
The new repo for all geometric, cantor, and fractal-based trainings will be;
https://github.com/AbstractEyes/geofractal
The change is due to MY own excessive abuse of the vocabulary repo and the excessive overuse of subfolders attached to a working pycharm project. These behaviors should be decoupled and I apologize for making such code bloat through experimentation.

Directly installing the geofractal repo will install geovocab2 as a sidecar. However, there will be a clause within the geovocab2 to warn the user.

You have my deepest and most sincere apologies for breaking your active working code if I do. I know this is difficult work so please bare with my efforts as I progress the codebase to it's next state of truth vs experimentation.

Please, reach out to me directly if you have problems converting.

It is meant to be a DIRECT and UTILIZABLE pain-free conversion that will enable the same interface from both geovocab2 and all future updated model code changes applied to geofractal - once the geofractal module is imported.
The original goevocab2 will contain outdated train code instead of full deprecation with a direct warning - and the geovocab2 repo will be folding in geovocab and geovocab2 into matching aliased systems - allowing the factory and extraction structure to behave within geovocab2 and training to behave within geofractal by design.

I will be introducing a direct alias system that will hopefully allow a smooth transition system to the new codebase, but there's never a way to account for those you don't know are using your work. This will include pyi files for the aliases and some necessary elemental additions that may break current functionality in systems I'm unaware of. Please reach out if I break something crucial that you require.
AbstractPhilΒ 
posted an update 4 months ago
view post
Post
317
Lyra, Lune, Cantor, k-simplex, and many relational experiments.
AbstractPhil/sd15-flow-matching-lune
Today I will be updating the space to support all three forms of lyra to enable tinkertoying with various other models like flux-schnell and sdxl.

It should be noted, I didn't know nvidia actually released a model named LYRA. This model has no association with NVIDIA's LYRA model. This LYRA is full MIT licensed. If necessary I'll rename this model, but I don't think it'll matter.

Unlike NORMAL VAE, this VAE was intentionally meant to introduce incorrectness into the correctness that already exists. The concept was to pull towards a goal - t5-xl being the primary goal.

AbstractPhil/vae-lyra Lyra is a multimodal MM-VAE prototype meant to encompass a fusion of multiple types of encodings together. Tested with circle of fifths audio and text, multiple text encoders, vision and text encoder, and a few other smaller prototypes that yielded.
Lyra has a few direct clip_l and t5_xl prototypes that directly learned to associate clip_l with t5-base. This version worked, so version 2 expanded the concept.

AbstractPhil/vae-lyra-sdxl-t5xl is another prototype using CLIP_L and CLIP_G fused with T5_XL for the first version, directly utilizing projection with minimal geometric and cantor assistance. The shared layers ended up teaching CLIP_L how to be CLIP_G and the output ended up warping too much for SDXL or SD15 to understand.

AbstractPhil/vae-lyra-xl-adaptive-cantor
Utilizing adapative cantor is the successful prototype where CLIP_L and CLIP_G learned independent structures internally, where CLIP_L and T5_XL learned a route with CLIP_G and T5_XL in parallel conjunction. This enabled two entirely divergent opinions, and thus enables the t5-xl to manipulate either the clip_l or the clip_g for models like FLUX-SCHNELL or SDXL.

Each lyra has a purpose, and each purpose matters.
adamm-hfΒ 
posted an update 4 months ago
adamm-hfΒ 
posted an update 4 months ago
adamm-hfΒ 
posted an update 4 months ago
view post
Post
2828
πŸ’ΈπŸ€‘You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on πŸ€— :
HuggingFaceTB/smol-training-playbook