Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

60%
+1 −0
Q&A How can I export an encoder-decoder PyTorch model into a single ONNX file?

I converted the PyTorch model Helsinki-NLP/opus-mt-fr-en (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script: import os from optimum.onnxruntime im...

0 answers  ·  posted 1d ago by Franck Dernoncourt‭  ·  edited 18h ago by Franck Dernoncourt‭

Question python onnx pytorch machine-translation
#7: Post edited by user avatar Franck Dernoncourt‭ · 2025-04-18T21:46:40Z (about 18 hours ago)
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes**
  • **Total size** = 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes**. That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
  • ---
  • Crossposts:
  • - https://stackoverflow.com/q/79580250/395857
  • - https://old.reddit.com/r/mlops/comments/1k2c6tl/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pytorch/comments/1k2c78e/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnmachinelearning/comments/1k2c6nw/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/huggingface/comments/1k2c8p2/how_can_i_export_an_encoderdecoder_huggingface/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnpython/comments/1k2c89a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pythonhelp/comments/1k2c7wz/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/MachineLearning/comments/1k2c6ii/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LanguageTechnology/comments/1k2c71l/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LocalLLaMA/comments/1k2c68a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://github.com/huggingface/transformers/issues/16006#issuecomment-2815889377
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes**
  • **Total size** = 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes**. That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
  • ---
  • Crossposts:
  • - https://stackoverflow.com/q/79580250/395857
  • - https://old.reddit.com/r/mlops/comments/1k2c6tl/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pytorch/comments/1k2c78e/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnmachinelearning/comments/1k2c6nw/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/huggingface/comments/1k2c8p2/how_can_i_export_an_encoderdecoder_huggingface/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnpython/comments/1k2c89a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pythonhelp/comments/1k2c7wz/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/MachineLearning/comments/1k2c6ii/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LanguageTechnology/comments/1k2c71l/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LocalLLaMA/comments/1k2c68a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://github.com/huggingface/transformers/issues/16006#issuecomment-2815889377
  • - https://redd.it/1k2gjz2
#6: Post edited by user avatar Franck Dernoncourt‭ · 2025-04-18T21:44:13Z (about 18 hours ago)
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes**
  • **Total size** = 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes**. That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
  • ---
  • Crossposts:
  • - https://stackoverflow.com/q/79580250/395857
  • - https://old.reddit.com/r/mlops/comments/1k2c6tl/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pytorch/comments/1k2c78e/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnmachinelearning/comments/1k2c6nw/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/huggingface/comments/1k2c8p2/how_can_i_export_an_encoderdecoder_huggingface/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnpython/comments/1k2c89a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pythonhelp/comments/1k2c7wz/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/MachineLearning/comments/1k2c6ii/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LanguageTechnology/comments/1k2c71l/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LocalLLaMA/comments/1k2c68a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes**
  • **Total size** = 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes**. That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
  • ---
  • Crossposts:
  • - https://stackoverflow.com/q/79580250/395857
  • - https://old.reddit.com/r/mlops/comments/1k2c6tl/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pytorch/comments/1k2c78e/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnmachinelearning/comments/1k2c6nw/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/huggingface/comments/1k2c8p2/how_can_i_export_an_encoderdecoder_huggingface/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnpython/comments/1k2c89a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pythonhelp/comments/1k2c7wz/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/MachineLearning/comments/1k2c6ii/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LanguageTechnology/comments/1k2c71l/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LocalLLaMA/comments/1k2c68a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://github.com/huggingface/transformers/issues/16006#issuecomment-2815889377
#5: Post edited by user avatar Franck Dernoncourt‭ · 2025-04-18T21:43:32Z (about 18 hours ago)
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes** **Total size** =
  • 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes** That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes**
  • **Total size** = 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes**. That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
  • ---
  • Crossposts:
  • - https://stackoverflow.com/q/79580250/395857
  • - https://old.reddit.com/r/mlops/comments/1k2c6tl/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pytorch/comments/1k2c78e/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnmachinelearning/comments/1k2c6nw/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/huggingface/comments/1k2c8p2/how_can_i_export_an_encoderdecoder_huggingface/?ref=share&ref_source=link
  • - https://old.reddit.com/r/learnpython/comments/1k2c89a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/pythonhelp/comments/1k2c7wz/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/MachineLearning/comments/1k2c6ii/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LanguageTechnology/comments/1k2c71l/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
  • - https://old.reddit.com/r/LocalLLaMA/comments/1k2c68a/how_can_i_export_an_encoderdecoder_pytorch_model/?ref=share&ref_source=link
#4: Post edited by user avatar Franck Dernoncourt‭ · 2025-04-18T18:30:56Z (about 21 hours ago)
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX. I used this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes** **Total size** =
  • 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes** That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX using this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes** **Total size** =
  • 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes** That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
#3: Post edited by user avatar Franck Dernoncourt‭ · 2025-04-18T18:29:48Z (about 21 hours ago)
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • How can I export an encoder-decoder PyTorch model into a single ONNX file?
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace) to ONNX. I use this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue as the PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicate that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • ----
  • Crossposts:
  • - https://qr.ae/pAYRNW
  • - https://stackoverflow.com/q/79580250/395857
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace), which is an encoder-decoder model for machine translation, to ONNX. I used this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue because:
  • 1. The PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicates that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • 2. Having both a `decoder_model.onnx` and `decoder_with_past_model.onnx` duplicates many parameters.
  • The total size of the three ONNX files is:
  • * `decoder_model.onnx`: **346,250,804 bytes**
  • * `decoder_with_past_model.onnx`: **333,594,274 bytes**
  • * `encoder_model.onnx`: **198,711,098 bytes** **Total size** =
  • 346,250,804 + 333,594,274 + 198,711,098 = **878,556,176 bytes** That’s approximately **837.57 MB**, why is almost 3 times larger than the original PyTorch model (300 MB).
#2: Post edited by user avatar Franck Dernoncourt‭ · 2025-04-18T00:57:24Z (1 day ago)
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace) to ONNX. I use this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue as the PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicate that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace) to ONNX. I use this script:
  • ```python
  • import os
  • from optimum.onnxruntime import ORTModelForSeq2SeqLM
  • from transformers import AutoTokenizer, AutoConfig
  • hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
  • onnx_save_directory = "./onnx_model_fr_en"
  • os.makedirs(onnx_save_directory, exist_ok=True)
  • print(f"Starting conversion for model: {hf_model_id}")
  • print(f"ONNX model will be saved to: {onnx_save_directory}")
  • print("Loading tokenizer and config...")
  • tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
  • config = AutoConfig.from_pretrained(hf_model_id)
  • model = ORTModelForSeq2SeqLM.from_pretrained(
  • hf_model_id,
  • export=True,
  • from_transformers=True,
  • # Pass the loaded config explicitly during export
  • config=config
  • )
  • print("Saving ONNX model components, tokenizer and configuration...")
  • model.save_pretrained(onnx_save_directory)
  • tokenizer.save_pretrained(onnx_save_directory)
  • print("-" * 30)
  • print(f"Successfully converted '{hf_model_id}' to ONNX.")
  • print(f"Files saved in: {onnx_save_directory}")
  • if os.path.exists(onnx_save_directory):
  • print("Generated files:", os.listdir(onnx_save_directory))
  • else:
  • print("Warning: Save directory not found after saving.")
  • print("-" * 30)
  • print("Loading ONNX model and tokenizer for testing...")
  • onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)
  • onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)
  • french_text= "je regarde la tele"
  • print(f"Input (French): {french_text}")
  • inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors
  • print("Generating translation using the ONNX model...")
  • generated_ids = onnx_model.generate(**inputs)
  • english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  • print(f"Output (English): {english_translation}")
  • print("--- Test complete ---")
  • ```
  • The output folder containing the ONNX files is:
  • ```
  • franck@server:~/tests/onnx_model_fr_en$ ls -la
  • total 860968
  • drwxr-xr-x 2 franck users 4096 Apr 16 17:29 .
  • drwxr-xr-x 5 franck users 4096 Apr 17 23:54 ..
  • -rw-r--r-- 1 franck users 1360 Apr 17 04:38 config.json
  • -rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
  • -rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
  • -rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
  • -rw-r--r-- 1 franck users 288 Apr 17 04:38 generation_config.json
  • -rw-r--r-- 1 franck users 802397 Apr 17 04:38 source.spm
  • -rw-r--r-- 1 franck users 74 Apr 17 04:38 special_tokens_map.json
  • -rw-r--r-- 1 franck users 778395 Apr 17 04:38 target.spm
  • -rw-r--r-- 1 franck users 847 Apr 17 04:38 tokenizer_config.json
  • -rw-r--r-- 1 franck users 1458196 Apr 17 04:38 vocab.json
  • ```
  • How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
  • Having several ONNX files is an issue as the PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicate that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).
  • ----
  • Crossposts:
  • - https://qr.ae/pAYRNW
  • - https://stackoverflow.com/q/79580250/395857
#1: Initial revision by user avatar Franck Dernoncourt‭ · 2025-04-18T00:56:18Z (1 day ago)
How can I export an opus-mt-fr-en PyTorch model into a single ONNX file?
I converted the PyTorch model [`Helsinki-NLP/opus-mt-fr-en`](https://huggingface.co/Helsinki-NLP/opus-mt-fr-en) (HuggingFace) to ONNX. I use this script:

```python
import os
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer, AutoConfig 

hf_model_id = "Helsinki-NLP/opus-mt-fr-en"
onnx_save_directory = "./onnx_model_fr_en" 

os.makedirs(onnx_save_directory, exist_ok=True)

print(f"Starting conversion for model: {hf_model_id}")
print(f"ONNX model will be saved to: {onnx_save_directory}")

print("Loading tokenizer and config...")
tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
config = AutoConfig.from_pretrained(hf_model_id)

model = ORTModelForSeq2SeqLM.from_pretrained(
	hf_model_id,
	export=True,
	from_transformers=True,
	# Pass the loaded config explicitly during export
	config=config
)

print("Saving ONNX model components, tokenizer and configuration...")
model.save_pretrained(onnx_save_directory)
tokenizer.save_pretrained(onnx_save_directory)

print("-" * 30)
print(f"Successfully converted '{hf_model_id}' to ONNX.")
print(f"Files saved in: {onnx_save_directory}")
if os.path.exists(onnx_save_directory):
	 print("Generated files:", os.listdir(onnx_save_directory))
else:
	 print("Warning: Save directory not found after saving.")
print("-" * 30)


print("Loading ONNX model and tokenizer for testing...")
onnx_tokenizer = AutoTokenizer.from_pretrained(onnx_save_directory)

onnx_model = ORTModelForSeq2SeqLM.from_pretrained(onnx_save_directory)

french_text= "je regarde la tele"
print(f"Input (French): {french_text}")
inputs = onnx_tokenizer(french_text, return_tensors="pt") # Use PyTorch tensors

print("Generating translation using the ONNX model...")
generated_ids = onnx_model.generate(**inputs)
english_translation = onnx_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(f"Output (English): {english_translation}")
print("--- Test complete ---")
```

The output folder containing the ONNX files is:


```
franck@server:~/tests/onnx_model_fr_en$ ls -la
total 860968
drwxr-xr-x 2 franck users      4096 Apr 16 17:29 .
drwxr-xr-x 5 franck users      4096 Apr 17 23:54 ..
-rw-r--r-- 1 franck users      1360 Apr 17 04:38 config.json
-rw-r--r-- 1 franck users 346250804 Apr 17 04:38 decoder_model.onnx
-rw-r--r-- 1 franck users 333594274 Apr 17 04:38 decoder_with_past_model.onnx
-rw-r--r-- 1 franck users 198711098 Apr 17 04:38 encoder_model.onnx
-rw-r--r-- 1 franck users       288 Apr 17 04:38 generation_config.json
-rw-r--r-- 1 franck users    802397 Apr 17 04:38 source.spm
-rw-r--r-- 1 franck users        74 Apr 17 04:38 special_tokens_map.json
-rw-r--r-- 1 franck users    778395 Apr 17 04:38 target.spm
-rw-r--r-- 1 franck users       847 Apr 17 04:38 tokenizer_config.json
-rw-r--r-- 1 franck users   1458196 Apr 17 04:38 vocab.json
```


How can I export an opus-mt-fr-en PyTorch model into a single ONNX file? 

Having several ONNX files is an issue as the PyTorch model shares the embedding layer with both the encoder and the decoder, and subsequently the export script above duplicate that layer to both the `encoder_model.onnx` and `decoder_model.onnx`, which is an issue as the embedding layer is large (represents ~40% of the PyTorch model size).