Need help with TerraMind/TerraTorch: crop output not available, NDVI/LULC working, next steps
Hello everyone,
we are working with TerraMind/TerraTorch to build a multimodal pipeline that combines Sentinel‑2 L2A, NDVI, and DEM. Our goal is to generate land cover maps (LULC) and move towards crop classification and detection of pests/diseases.
What we have achieved so far
Successfully loaded and run TerraMind v1 base and large with local checkpoints (.pt), avoiding Hub downloads.
Generated LULC and NDVI outputs, calculated class percentages, and visualized colored maps.
Integrated multiple input modalities (S2L2A + NDVI + DEM) in a single forward pass.
Automated exporting of results (PNG + TXT) for auditing and reproducibility.
Current problems
Missing “Crop” output modality:
When trying to use tok_crop@224 we get NotImplementedError.
Documentation lists supported outputs as S2L2A, S1GRD, S1RTC, DEM, LULC, NDVI, and Coords.
We cannot find any official reference to a crop output in the current registry.
Unrealistic results with dummy inputs:
Using random tensors produces improbable classes (e.g., snow/ice in Cuba).
We need to confirm how to properly prepare real Sentinel‑2 L2A and NDVI patches for inference.
Specialized tokenizers:
We have local files like TerraMind_Tokenizer_S2L2A.pt, TerraMind_Tokenizer_NDVI.pt, TerraMind_Tokenizer_DEM.pt, etc.
It’s unclear how to integrate them for inference (are they only for fine‑tuning, or also for preparing inputs in generation?).
What we need to move forward
Confirmation if there is an official crop output head (CropType, Crop, etc.) and how to invoke it.
Practical examples of how to use the local tokenizers to prepare real Sentinel‑2/NDVI data.
Best practices to avoid unrealistic outputs and ensure contextually valid results (e.g., no snow in tropical regions).
Guidance on how to extend TerraMind towards pest and disease detection: does this require a different checkpoint, or fine‑tuning with new labels?
Hi, we have not trained TerraMind with a crop modality. If you want to predict a crops, you have to fine-tune the model. Here is one example for crop segmentation: https://github.com/IBM/terramind/blob/main/notebooks%2Fterramind_v1_small_multitemporal_crop.ipynb
The tokenizers are used for generating the pre-training tokens and for multimodal generation. I would avoid using them in fine-tuning as it just adds a pretty large single-modal encoder to the model with a pretty large bottleneck. What exactly do you plan to do with the tokenizers?
"pest and disease detection" -> Yes, you have to fine-tune the model.