NVIDIA/Megatron-LM: Ongoing research training transformer models at scale

3243
STARS
89
WATCHERS
667
FORKS
146
ISSUES

Megatron-LM's Language Statistics

NVIDIA's Other Repos

Star history of Megatron-LM
Issue history of Megatron-LM

Megatron-LM Recent Issues

Issue Title State Comments Created Date Updated Date Closed Date
The model can converge normally when mp=1, but does not converge when mp=2. Is there any good analysis method? closed 0 2022-09-18 2022-09-15 2022-09-21
Numerical difference between ColumnParallelLinear and torch.nn.Linear closed 0 2022-09-10 2022-09-15 2022-09-10
Does Megatron support or will support unpad strategy for variable sequence length? Just like MLPerf Bert training did. open 0 2022-08-30 2022-09-15 -
Variable usage problem open 0 2022-08-17 2022-09-15 -
is there a pretrained GPT model of tensor parallel? open 0 2022-08-15 2022-09-15 -
How to use a sequence length longer than 2048 open 1 2022-08-04 2022-09-15 -
OPT-175B fp16 or bfloat16? closed 0 2022-07-10 2022-09-15 2022-07-10
Suggestion for merge (and split into) model parallel partitions open 0 2022-07-08 2022-09-15 -
about the "Multi-Stage Prompting for Knowledgeable Dialogue Generation" open 0 2022-07-05 2022-09-15 -
When running GPT trainning, the program quit due to torch.distributed.elastic.agent.server.api:Received 1 death signal, shutting down workers open 1 2022-07-03 2022-09-15 -
Can we get the scripts for merging pipeline parallel model partitions? closed 2 2022-06-30 2022-09-15 2022-07-10
When complied helper.cpp failed, you can install ninja-build closed 0 2022-06-30 2022-09-15 2022-06-30
Is this really necesarry? (might improve performance) open 0 2022-06-28 2022-09-15 -
Can I debate it like it debated itself at the Oxford Union? open 0 2022-06-22 2022-09-15 -
save and load checkpoint open 1 2022-06-21 2022-09-15 -
Error:size mismatch for weight open 0 2022-06-15 2022-09-26 -
unintelligible token generation in GPT text generation closed 1 2022-06-14 2022-09-15 2022-06-14
Confused about the comment of adding Encoder open 0 2022-06-08 2022-09-27 -
Finetune T5 on translation task open 0 2022-05-28 2022-09-15 -
Inference on multi GPU open 1 2022-05-23 2022-09-15 -
Delays and async_op in sequence parallel open 0 2022-05-23 2022-09-15 -
Bug for load_checkpoint while merging checkpoint partitions. open 0 2022-05-12 2022-09-15 -
Pytorch distributed runtime check failure when using pipeline parallelism open 0 2022-05-12 2022-09-15 -
txt_generation_cli.py imports urllib2 closed 2 2022-05-09 2022-09-15 2022-05-16
msdp gpt pretrained model max support 1024? open 0 2022-05-04 2022-09-15 -
Why would one use --accumulate-allreduce-grads-in-fp32? open 0 2022-04-04 2022-09-15 -
How to adjust training args to adapt our server without many gpus? open 2 2022-03-27 2022-09-29 -
Training recipe/example for vit(visual transformer) open 0 2022-03-17 2022-09-15 -
is that possible running on macOS M1? open 0 2022-03-08 2022-09-15 -
how to load state checkpoints from different MP settings? open 1 2022-03-05 2022-09-26 -
Transformer layers Division open 0 2022-02-26 2022-09-15 -
SImple question on forward_step in pretrain_gpt.py closed 0 2022-02-25 2022-09-15 2022-03-03
ModuleNotFoundError: No module named 'fused_mix_prec_layer_norm_cuda' open 3 2022-02-25 2022-09-15 -
Discussion about softmax operating range open 11 2022-02-23 2022-09-26 -
Error: scaled_upper_triang_masked_softmax.o: file not recognized open 1 2022-02-12 2022-09-15 -
Why don't use exp in crossentropy.py open 0 2022-02-01 2022-09-15 -
Unable to load checkpoint open 0 2022-02-01 2022-09-15 -
T5 pretraining grows slower if micro batch size > 1 open 1 2022-01-29 2022-09-29 -
Further pretraining using BERT-base weights open 0 2022-01-25 2022-09-15 -
What is the difference between F1 and KF1 evaluation in eval_resp_generation.sh? closed 2 2022-01-22 2022-09-15 2022-08-11
How to load finetuned DPR model for MSDP preprocessing? closed 2 2022-01-20 2022-09-15 2022-08-16
How to convert megatron T5 model to huggingface T5 model? open 0 2022-01-20 2022-09-15 -
How to finetune T5 with Megatron? open 0 2022-01-20 2022-09-26 -
Is there any problem in my pretrain_gpt.sh? The AssertionError: last epoch number of samples exceeded max value. closed 0 2022-01-18 2022-09-23 2022-01-21
Does scaled softmax of qkv perform scaling operations duplicately? open 0 2022-01-12 2022-09-15 -
Optimizer will hold unloaded model weights when model weights but not optimizer states are loaded from a checkpoint. open 0 2022-01-03 2022-09-15 -
srun: error: s_p_parse_file open 0 2021-12-09 2022-09-15 -
Error: 'DistributedDataParallel' object has no attribute 'set_input_tensor' on text generation. need to unwrap model open 0 2021-12-09 2022-09-27 -
Add support of lazy dataloader? closed 2 2021-12-04 2022-09-15 2021-12-04
There is a difference in the calculation of num_warmup_microbatches closed 11 2021-11-30 2022-09-15 2021-12-06