I think the current architecture should be consistent with Ministral3, but why is it different?

#5
by win10 - opened

I think the current architecture should be consistent with Ministral3, but why is it different?

Ministral3:

Mistral3ForConditionalGeneration

Devstral-2:

Ministral3ForCausalLM

I'm curious about the reason for this design.

Mistral3ForConditionalGeneration is actually a wrapper around a causal LM (for Ministral, Ministral3ForCausalLM) and a vision backbone (I forget what Mistral's is)
Devstral 2 just uses that causal LM architecture.

juliendenize changed discussion status to closed

Sign up or log in to comment