如何用语言学习构建世界模型?

摘要:博客地址:https:www.cnblogs.comzylyehuo Learning to Model the World with Language 项目地址:dynalang (结合项目 README,遇到问题再看该笔记) 运
博客地址:https://www.cnblogs.com/zylyehuo/ Learning to Model the World with Language 项目地址:dynalang (结合项目 README,遇到问题再看该笔记) 运行效果 部署步骤 第一步:配置 cuda、cudnn、torch nvcc -V ls -l /usr/local/ | grep cuda nvidia-smi ls ~/.local/lib/python3.8/site-packages | grep nvidia rm -rf ~/.local/lib/python3.8/site-packages/nvidia* ls ~/.local/lib/python3.8/site-packages | grep nvidia export PYTHONNOUSERSITE=1 pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 --extra-index-url https://download.pytorch.org/whl/cu117 python -c "import torch; print('PyTorch CUDA:', torch.version.cuda); print('GPU Available:', torch.cuda.is_available())" sudo cp /var/cudnn-local-repo-ubuntu2004-8.9.7.29/cudnn-local-CD2C2DD4-keyring.gpg /usr/share/keyrings/ sudo apt update sudo apt install libcudnn8 libcudnn8-dev sudo ldconfig sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.7.29_1.0-1_amd64.deb ldconfig -p | grep cudnn conda install -c conda-forge "cudnn>=8.9.1" 第二步:安装相关依赖包和源码 mkdir VLN_learning cd VLN_learning git clone git@github.com:zylyehuo/dynalang.git cd dynalang conda create -n dynalang-vln python=3.8 conda activate dynalang-vln pip install "jax[cuda11_cudnn82]==0.4.8" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html # 验证 JAX 和 GPU python3 -c "import jax; print('JAX Devices:', jax.devices())" # 验证 PyTorch 和 GPU python3 -c "import torch; print('Torch CUDA available:', torch.cuda.is_available())" # 验证 IPython 依赖是否报错 python3 -c "import IPython; print('IPython loaded successfully')" pip install homegrid cd VLN_learning/dynalang conda activate dynalang-vln pip install optax==0.1.7 pip install ruamel.yaml==0.17.40 pip install gym==0.26.2 pip install absl-py rich chex optax pip install tensorflow-cpu==2.13.1 tensorboard==2.13.0 tensorflow-probability==0.21.0 pip install "typing-extensions<4.6.0" pip install pandas python-dateutil click platformdirs importlib-resources sudo apt-get install \ libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev \ libsdl1.2-dev libsmpeg-dev subversion libportmidi-dev ffmpeg \ libswscale-dev libavformat-dev libavcodec-dev libfreetype6-dev git clone https://github.com/ahjwang/messenger-emma pip install pbr -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple pip install -e messenger-emma cd VLN_learning/dynalang conda activate dynalang-vln conda install absl-py==1.4.0 conda env update -f env_vln.yml conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless git clone https://github.com/jlin816/VLN-CE VLN_CE git clone https://github.com/jlin816/habitat-lab habitat_lab cd VLN_learning/dynalang conda activate dynalang-vln pip install colored cd VLN_learning/dynalang conda activate dynalang-vln pip install datasets 第三步:下载数据集 huggingface Download Matterport3D data into the VLN-CE directory (requires Python 2.7) cd /home/yehuo/VLN_learning/dynalang/VLN_CE conda create -n py27 python=2.7 conda activate py27 python scripts/download_mp.py --task habitat -o VLN_CE/data/scene_datasets/mp3d/ cd VLN_CE/data/scene_datasets unzip mp3d/v1/tasks/mp3d_habitat.zip conda deactivate 第四步:修改源码 /home/yehuo/VLN_learning/dynalang/dynalang/train.py /home/yehuo/VLN_learning/dynalang/dynalang/embodied/envs/vln.py /home/yehuo/VLN_learning/dynalang/habitat_lab/habitat/tasks/vln/vln.py cd VLN_learning/dynalang sed -i 's/spaces.Discrete(0)/spaces.Discrete(1)/g' /home/yehuo/VLN_learning/dynalang/habitat_lab/habitat/tasks/vln/vln.py 运行指令 ① HomeGrid CPU 运行指令 cd VLN_learning/dynalang conda activate dynalang-vln WANDB_MODE=disabled JAX_PLATFORMS=cpu CUDA_VISIBLE_DEVICES="" python dynalang/train.py \ --configs defaults debug \ --task homegrid_task \ --logdir ~/logdir/homegrid/debug_run_cpu \ --jax.platform cpu \ --jax.policy_devices 0 \ --jax.train_devices 0 \ --batch_size 1 \ --batch_length 64 \ --encoder.mlp_keys '^token$' \ --decoder.mlp_keys '^token$' GPU 运行指令【GPU显存不够时候使用,若够直接使用项目原指令即可】 cd VLN_learning/dynalang conda activate dynalang-vln export HF_ENDPOINT=https://hf-mirror.com export LD_LIBRARY_PATH=/home/yehuo/anaconda3/envs/dynalang-vln/lib:$LD_LIBRARY_PATH export XLA_PYTHON_CLIENT_ALLOCATOR=platform unset XLA_PYTHON_CLIENT_PREALLOCATE unset XLA_PYTHON_CLIENT_MEM_FRACTION export XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false export JAX_PLATFORMS=cuda WANDB_MODE=disabled \ CUDA_VISIBLE_DEVICES=0 \ python dynalang/train.py \ --configs defaults debug \ --task homegrid_task \ --logdir ~/logdir/homegrid/debug_run_gpu \ --jax.platform cuda \ --jax.policy_devices 0 \ --jax.train_devices 0 \ --batch_size 1 \ --batch_length 6 \ --encoder.mlp_keys '^token$' \ --decoder.mlp_keys '^token$' \ --run.log_every 999999 \ --run.eval_every 999999 \ --run.save_every 999999 TensorBoard 可视化 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/homegrid/debug_run_cpu_0 --port 6006 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/homegrid/debug_run_gpu_0 --port 6006 ② Messenger CPU 运行指令 cd VLN_learning/dynalang conda activate dynalang-vln WANDB_MODE=disabled \ JAX_PLATFORMS=cpu \ CUDA_VISIBLE_DEVICES="" \ sh scripts/run_messenger_s1.sh test_run 0 42 \ --configs defaults debug \ --jax.platform cpu \ --jax.policy_devices 0 \ --jax.train_devices 0 \ --batch_size 1 \ --batch_length 64 \ --encoder.mlp_keys '^token$' \ --decoder.mlp_keys '^token$' GPU 运行指令【GPU显存不够时候使用,若够直接使用项目原指令即可】 cd VLN_learning/dynalang conda activate dynalang-vln # 1. 禁止 JAX 提前霸占显存,改为按需分配 export XLA_PYTHON_CLIENT_PREALLOCATE=false export XLA_PYTHON_CLIENT_ALLOCATOR=platform # 2. 听从报错信息的建议,关闭严格的卷积算法挑选(节省显存) export XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false # 3. 把虚拟环境的库路径提升到最高优先级 export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH WANDB_MODE=disabled \ sh scripts/run_messenger_s2.sh train_from_scratch 0 42 \ --configs debug \ --batch_size 1 \ --batch_length 8 \ --jax.prealloc False TensorBoard 可视化 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/messenger/s1_test_run_42 --port 6006 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/messenger/s2_train_from_scratch_42 --port 6006 ③ VLN CPU 运行指令 cd VLN_learning/dynalang conda activate dynalang-vln # 1. 彻底屏蔽物理显卡,切断 JAX 对 GPU 的混淆 export CUDA_VISIBLE_DEVICES="" export JAX_PLATFORMS=cpu unset XLA_PYTHON_CLIENT_PREALLOCATE unset XLA_PYTHON_CLIENT_MEM_FRACTION # 2. 强制 Habitat 降级为纯软件渲染(完全由 CPU 计算) export HABITAT_SIM_BACKEND=cpu export LIBGL_ALWAYS_SOFTWARE=1 export GALLIUM_DRIVER=llvmpipe export MESA_GL_VERSION_OVERRIDE=3.3 export MESA_GLSL_VERSION_OVERRIDE=330 export PYTHONPATH=$PYTHONPATH:$(pwd)/VLN_CE WANDB_MODE=disabled \ sh scripts/run_vln.sh vln_task debug_cpu 0 \ --jax.platform cpu \ --envs.amount 1 \ --run.actor_batch 1 \ --envs.parallel none \ --run.script train TensorBoard 可视化 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/vln_task_0 --port 6006 ④ LangRoom git stash git checkout langroom # 回到上一个分支 git checkout - git stash pop CPU 运行指令 cd VLN_learning/dynalang conda activate dynalang-vln # 1. 彻底屏蔽物理显卡,防止底层脚本误调用 export CUDA_VISIBLE_DEVICES="" export JAX_PLATFORMS=cpu # 2. 启动纯 CPU 单进程训练 # 对应参数:EXP_NAME="debug_langroom_cpu" | GPU_IDS="" (留空) | SEED="0" WANDB_MODE=disabled \ sh run_langroom.sh debug_langroom_cpu "" 0 \ --jax.platform cpu \ --envs.amount 1 \ --run.actor_batch 1 \ --envs.parallel none TensorBoard 可视化 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/langroom --port 6006 ⑤ Text Pretraining 在步骤 ② Messenger 中运行:scripts/run_messenger_s2.sh GPU 运行指令【GPU显存不够时候使用,若够直接使用项目原指令即可】 cd VLN_learning/dynalang conda activate dynalang-vln # 环境变量 export HF_ENDPOINT=https://hf-mirror.com export LD_LIBRARY_PATH=/home/yehuo/anaconda3/envs/dynalang-vln/lib:$LD_LIBRARY_PATH export XLA_PYTHON_CLIENT_PREALLOCATE=false export XLA_PYTHON_CLIENT_ALLOCATOR=platform export XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false # 在最后加上 --jax.platform cuda WANDB_MODE=disabled \ CUDA_VISIBLE_DEVICES=0 \ sh scripts/pretrain_text.sh text_pretrain_debug 0 42 \ roneneldan/TinyStories \ /home/yehuo/logdir/messenger/s2_train_from_scratch_42/episodes \ --configs debug \ --batch_size 1 \ --batch_length 16 \ --jax.prealloc False \ --jax.platform cuda TensorBoard 可视化 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/textpt/text_pretrain_debug_42 --port 6006 ⑥ Text Finetuning GPU 运行指令【GPU显存不够时候使用,若够直接使用项目原指令即可】 cd VLN_learning/dynalang conda activate dynalang-vln # 1. 显卡与网络 export HF_ENDPOINT=https://hf-mirror.com export LD_LIBRARY_PATH=/home/yehuo/anaconda3/envs/dynalang-vln/lib:$LD_LIBRARY_PATH export XLA_PYTHON_CLIENT_PREALLOCATE=false export XLA_PYTHON_CLIENT_ALLOCATOR=platform export XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false export JAX_PLATFORMS=cuda # 2. 启动文本预训练(尾部追加 --jax.platform cuda 覆盖 debug 默认值) WANDB_MODE=disabled \ CUDA_VISIBLE_DEVICES=0 \ sh scripts/pretrain_text.sh text_pretrain_debug 0 42 \ roneneldan/TinyStories \ /home/yehuo/logdir/messenger/s2_train_from_scratch_42/episodes \ --configs debug \ --batch_size 1 \ --batch_length 16 \ --jax.prealloc False \ --jax.platform cuda TensorBoard 可视化 conda activate dynalang-vln tensorboard --logdir /home/yehuo/logdir/textpt/text_pretrain_debug_42 --port 6006