I think the current architecture should be consistent with Ministral3, but why is it different?
#5
by
win10
- opened
I think the current architecture should be consistent with Ministral3, but why is it different?
Ministral3:
Mistral3ForConditionalGeneration
Devstral-2:
Ministral3ForCausalLM
I'm curious about the reason for this design.
Mistral3ForConditionalGeneration is actually a wrapper around a causal LM (for Ministral, Ministral3ForCausalLM) and a vision backbone (I forget what Mistral's is)
Devstral 2 just uses that causal LM architecture.
juliendenize
changed discussion status to
closed