Graph Machine Learning
AnemoI
English

How to train AIFS Single v1.1: Version missmatch?

#9
by andiknurpsel - opened

After successfully running AIFS Single v1.1 using anemoi-inference 0.6.3 as described in the notebook I'm trying to reproduce the training procedure in the course of my masters thesis using the config_pretraining.yaml from this repo.

I installed the exact versions as described in the Readme:

- anemoi-inference[huggingface]==0.6.3
- anemoi-training==0.4.0
- anemoi-models==0.5.0
- anemoi-graphs==0.5.2

However, I get a lot of config validation errors when running anemoi-training train --config-name=config_pretraining.yaml.

  • At first, it claimed that the property config_validation was missing, which was first introduced as no_validation and then renamed to config_validation in anemoi-training 0.4.0(?).
  • When I add the missing config_validation: True at the end of config_pretraining.yaml, I still get a lot of hydra/pydantic schema errors, see below.

This makes me suspect that config_pretraining.yaml was meant for a different version of anemoi-training. So I have the following questions:

  1. Which version do I need to follow the training procedure from the Readme?
  2. What is the idea behind versioning of anemoi-inference, anemoi-models, anemoi-models, anemoi-graphs? I cannot find any info which versions work with which?
  3. Is there a way to migrate config files to newer versions of Anemoi, like there is this migration feature for checkpoints? (This would also be interesting in regard of new features added to Anemoi like Model Freezing etc.)
  4. Did someone from the community get the training procedure running and would like to share some advice?

Thanks a lot for your help and for the amazing work on both the models and the framework!


Errors when running anemoi-training train --config-name=config_pretraining.yaml:

pydantic_core._pydantic_core.ValidationError: 14 validation errors for BaseSchema
data.processors.imputer.config.NormalizerSchema.minimum
  Extra inputs are not permitted [type=extra_forbidden, input_value=['swvl1', 'swvl2', 'ro'], input_type=ListConfig]
data.processors.imputer.config.NormalizerSchema.mean
  Extra inputs are not permitted [type=extra_forbidden, input_value=['stl1', 'stl2'], input_type=ListConfig]
data.processors.imputer.config.ImputerSchema.maximum
  Field required [type=missing, input_value={'default': 'none', 'mini...mean': ['stl1', 'stl2']}, input_type=DictConfig]
data.processors.imputer.config.ImputerSchema.mean
  Extra inputs are not permitted [type=extra_forbidden, input_value=['stl1', 'stl2'], input_type=ListConfig]
data.processors.imputer.config.RemapperSchema.minimum
  Extra inputs are not permitted [type=extra_forbidden, input_value=['swvl1', 'swvl2', 'ro'], input_type=ListConfig]
data.processors.imputer.config.RemapperSchema.mean
  Extra inputs are not permitted [type=extra_forbidden, input_value=['stl1', 'stl2'], input_type=ListConfig]
data.processors.imputer._convert_
  Extra inputs are not permitted [type=extra_forbidden, input_value='all', input_type=str]
dataloader.num_workers.predict
  Extra inputs are not permitted [type=extra_forbidden, input_value=1, input_type=int]
dataloader.batch_size.predict
  Extra inputs are not permitted [type=extra_forbidden, input_value=4, input_type=int]
dataloader.limit_batches.predict
  Extra inputs are not permitted [type=extra_forbidden, input_value=20, input_type=int]
dataloader.validation_rollout
  Field required [type=missing, input_value={'prefetch_factor': 2, 'p...end': None, 'drop': []}}, input_type=DictConfig]
datamodule
  Field required [type=missing, input_value={'data': {'format': 'zarr...onfig_validation': True}, input_type=dict]
diagnostics.log.wandb
  Input should be a valid mapping, error: MissingMandatoryValue: Missing mandatory value: diagnostics.log.wandb.entity
    full_key: diagnostics.log.wandb.entity
    object_type=dict [type=mapping_type, input_value={'enabled': False, 'offli...se, 'parameters': False}, input_type=DictConfig]
diagnostics.log.mlflow
  Input should be a valid mapping, error: MissingMandatoryValue: Missing mandatory value: diagnostics.log.mlflow.tracking_uri
    full_key: diagnostics.log.mlflow.tracking_uri
    object_type=dict [type=mapping_type, input_value={'enabled': False, 'offli... 'http_max_retries': 35}, input_type=DictConfig]
ECMWF org

Can you try with the configs as listed here? I suspect maybe the config uploaded here was incorrect?
Regarding your questions:
2. They are released on an ad-hoc basis with limited compatability matrices available.
3. At the moment, there is no plan for config migration

Sign up or log in to comment