root@22318190fe62:~/axolotl# pwd /home/axolotl root@22318190fe62:~/axolotl# accelerate launch -m axolotl.cli.train examples/tiny-llama/lora.yml The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `1` `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. WARNING: BNB_CUDA_VERSION=118 environment variable detected; loading libbitsandbytes_cuda118.so. This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version. If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION= If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH::58] [PID:96] PyTorch version 2.1.2+cu118 available. [2024-05-25 11:14:42,465] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) df: /home/.triton/autotune: No such file or directory [2024-05-25 11:14:42,539] [INFO] [root.spawn:38] [PID:96] gcc -pthread -B /root/miniconda3/envs/py3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/miniconda3/envs/py3.10/include -fPIC -O2 -isystem /root/miniconda3/envs/py3.10/include -fPIC -c /tmp/tmp4lzmy1iw/test.c -o /tmp/tmp4lzmy1iw/test.o [2024-05-25 11:14:42,561] [INFO] [root.spawn:38] [PID:96] gcc -pthread -B /root/miniconda3/envs/py3.10/compiler_compat /tmp/tmp4lzmy1iw/test.o -laio -o /tmp/tmp4lzmy1iw/a.out [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible [2024-05-25 11:14:44,099] [DEBUG] [axolotl.normalize_config:79] [PID:96] [RANK:0] bf16 support detected, enabling for this configuration. /root/miniconda3/envs/py3.10/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 560/560 [00:00<00:00, 3.09MB/s] [2024-05-25 11:14:44,621] [INFO] [axolotl.normalize_config:182] [PID:96] [RANK:0] GPU memory usage baseline: 0.000GB (+0.313GB misc) dP dP dP 88 88 88 .d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88 88' `88 `8bd8' 88' `88 88 88' `88 88 88 88. .88 .d88b. 88. .88 88 88. .88 88 88 `88888P8 dP' `dP `88888P' dP `88888P' dP dP **************************************** **** Axolotl Dependency Versions ***** accelerate: 0.30.1 peft: 0.10.0 transformers: 4.40.2 trl: 0.8.5 torch: 2.1.2+cu118 bitsandbytes: 0.43.1 **************************************** [2024-05-25 11:14:44,657] [WARNING] [axolotl.scripts.check_user_token:487] [PID:96] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from https://huggingface.co/settings/tokens if you want to use gated models or datasets. /root/miniconda3/envs/py3.10/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 776/776 [00:00<00:00, 2.61MB/s] tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 15.9MB/s] special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.41MB/s] tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 8.50MB/s] [2024-05-25 11:14:47,333] [DEBUG] [axolotl.load_tokenizer:280] [PID:96] [RANK:0] EOS: 2 / [2024-05-25 11:14:47,333] [DEBUG] [axolotl.load_tokenizer:281] [PID:96] [RANK:0] BOS: 1 / [2024-05-25 11:14:47,334] [DEBUG] [axolotl.load_tokenizer:282] [PID:96] [RANK:0] PAD: 2 / [2024-05-25 11:14:47,334] [DEBUG] [axolotl.load_tokenizer:283] [PID:96] [RANK:0] UNK: 0 / [2024-05-25 11:14:47,334] [INFO] [axolotl.load_tokenizer:294] [PID:96] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference. [2024-05-25 11:14:47,334] [INFO] [axolotl.load_tokenized_prepared_datasets:183] [PID:96] [RANK:0] Unable to find prepared dataset in last_run_prepared/b679ea8e13fdec9db52fe0332ca58c81 [2024-05-25 11:14:47,334] [INFO] [axolotl.load_tokenized_prepared_datasets:184] [PID:96] [RANK:0] Loading raw datasets... [2024-05-25 11:14:47,334] [WARNING] [axolotl.load_tokenized_prepared_datasets:186] [PID:96] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset. [2024-05-25 11:14:47,334] [INFO] [axolotl.load_tokenized_prepared_datasets:193] [PID:96] [RANK:0] No seed provided, using default seed of 42 Downloading readme: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 93.1kB/s] Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.76M/1.76M [00:00<00:00, 5.00MB/s] Generating train split: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 62978.96 examples/s] Tokenizing Prompts (num_proc=64): 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:01<00:00, 1451.75 examples/s] [2024-05-25 11:14:59,020] [INFO] [axolotl.load_tokenized_prepared_datasets:410] [PID:96] [RANK:0] merging datasets Dropping Long Sequences (num_proc=64): 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 4458.76 examples/s] Add position_id column (Sample Packing) (num_proc=64): 100%|█████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 2937.27 examples/s] [2024-05-25 11:15:02,316] [INFO] [axolotl.load_tokenized_prepared_datasets:423] [PID:96] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/b679ea8e13fdec9db52fe0332ca58c81 Saving the dataset (1/1 shards): 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 71343.83 examples/s] [2024-05-25 11:15:02,368] [DEBUG] [axolotl.calculate_total_num_steps:299] [PID:96] [RANK:0] total_num_tokens: 414_041 [2024-05-25 11:15:02,386] [DEBUG] [axolotl.calculate_total_num_steps:312] [PID:96] [RANK:0] `total_supervised_tokens: 294_246` [2024-05-25 11:15:07,864] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:96] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 414041 [2024-05-25 11:15:07,864] [DEBUG] [axolotl.calculate_total_num_steps:365] [PID:96] [RANK:0] data_loader_len: 12 [2024-05-25 11:15:07,864] [INFO] [axolotl.calc_sample_packing_eff_est:371] [PID:96] [RANK:0] sample_packing_eff_est across ranks: [0.9719637357271634] [2024-05-25 11:15:07,864] [DEBUG] [axolotl.calculate_total_num_steps:383] [PID:96] [RANK:0] sample_packing_eff_est: 0.98 [2024-05-25 11:15:07,864] [DEBUG] [axolotl.calculate_total_num_steps:391] [PID:96] [RANK:0] total_num_steps: 48 [2024-05-25 11:15:07,873] [DEBUG] [axolotl.train.train:56] [PID:96] [RANK:0] loading tokenizer... TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T /root/miniconda3/envs/py3.10/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [2024-05-25 11:15:08,464] [DEBUG] [axolotl.load_tokenizer:280] [PID:96] [RANK:0] EOS: 2 / [2024-05-25 11:15:08,464] [DEBUG] [axolotl.load_tokenizer:281] [PID:96] [RANK:0] BOS: 1 / [2024-05-25 11:15:08,464] [DEBUG] [axolotl.load_tokenizer:282] [PID:96] [RANK:0] PAD: 2 / [2024-05-25 11:15:08,464] [DEBUG] [axolotl.load_tokenizer:283] [PID:96] [RANK:0] UNK: 0 / [2024-05-25 11:15:08,464] [INFO] [axolotl.load_tokenizer:294] [PID:96] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference. [2024-05-25 11:15:08,464] [DEBUG] [axolotl.train.train:85] [PID:96] [RANK:0] loading model and peft_config... [2024-05-25 11:15:08,762] [INFO] [axolotl.load_model:360] [PID:96] [RANK:0] patching with flash attention for sample packing [2024-05-25 11:15:08,762] [INFO] [axolotl.load_model:419] [PID:96] [RANK:0] patching _expand_mask `low_cpu_mem_usage` was None, now set to True since model is quantized. model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.40G/4.40G [00:44<00:00, 99.4MB/s] generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 129/129 [00:00<00:00, 1.07MB/s] [2024-05-25 11:15:58,242] [WARNING] [axolotl.load_model:712] [PID:96] [RANK:0] increasing model.config.max_position_embeddings from 2048 to 4096 [2024-05-25 11:15:58,242] [INFO] [axolotl.load_model:734] [PID:96] [RANK:0] GPU memory usage after model load: 1.225GB (+0.025GB cache, +0.546GB misc) [2024-05-25 11:15:58,251] [INFO] [axolotl.load_model:785] [PID:96] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training [2024-05-25 11:15:58,253] [INFO] [axolotl.load_model:794] [PID:96] [RANK:0] converting modules to torch.bfloat16 for flash attention [2024-05-25 11:15:58,256] [INFO] [axolotl.load_lora:951] [PID:96] [RANK:0] found linear modules: ['q_proj', 'up_proj', 'v_proj', 'down_proj', 'o_proj', 'k_proj', 'gate_proj'] trainable params: 25,231,360 || all params: 1,125,279,744 || trainable%: 2.2422299996542017 [2024-05-25 11:15:58,674] [INFO] [axolotl.load_model:843] [PID:96] [RANK:0] GPU memory usage after adapters: 1.319GB (+0.515GB cache, +0.546GB misc) [2024-05-25 11:15:58,703] [INFO] [axolotl.train.train:119] [PID:96] [RANK:0] Pre-saving adapter config to ./outputs/lora-out [2024-05-25 11:15:58,706] [INFO] [axolotl.train.train:156] [PID:96] [RANK:0] Starting trainer... [2024-05-25 11:15:58,941] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:96] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041 [2024-05-25 11:15:58,943] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:96] [RANK:0] packing_efficiency_estimate: 0.98 total_num_tokens per device: 414041 0%| | 0/48 [00:00', 'eos_token': '', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 1: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 2: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), }) root@22318190fe62:~/axolotl#