Uploaded by Tom jim

2.QM gpu

advertisement
QMUL EECS GPU Service
Instruction
Changjae Oh
Updates
• 29 Jun 2022
• Combined the existing document for 21/22 Final Project and the current GPU option (JP module)
• Added useful tips from former JP students (Huaixi Tang (19/20), Hengyi Wang, Chaoran Zhu (20/21), Yehan Fan
(21/22))
• 04 Jul 2022
• Conda create revised
• 11 May 2023
• Conda create revised
0. Spec
• NVIDIA A40 (48G) x 4
• Storage 50G
• This is not for dataset if your dataset is too large
• We have a separate storage to ONLY download the dataset
• To download the dataset, email me with the dataset details (why, size) and download instruction
1. Set up your EECS account
• You first need to set up an EECS account (different from the QMUL/QMPlus account). The EECS
account will give you access to our EECS JuypterHub where you can create a GPU-enable virtual
server for your project.
• You should have received your username and a one-time account unlocking code in your QMUL
email. Now set up your own password as follows:
•
•
•
•
Visit https://www.eecs.qmul.ac.uk/password (Firefox or Chrome recommended)
Read the instructions on screen carefully
Enter your username and unlocking code
Think of a new EECS password and enter that two times (Please make sure that you will remember it! Not just save it
in the browser)
2. Login EECS JuypterHub
• TO WATCH A DEMONSTRATION VIDEO:
• https://media.qmplus.qmul.ac.uk/media/BBC6521+EECS+JuypterHub+ML+Server+Usage/1_lnslk2n3
• Visit https://jhub.eecs.qmul.ac.uk/ (Firefox or Chrome recommended)
2. Login EECS JuypterHub
• Enter your EECS username and password. You will then be directed to the Hub Control Panel
where you can find a list of servers. Locate “Project (JP)" and click "Start"
2. Login EECS JuypterHub
• It would take a short moment for the server to start. Please be patient and wait.
• You will be directed to a workspace with the file browser, Launcher and various menus:
3. Transferring files to/from servers
• To upload your files, you can simply drag and drop your files into the file browser of the
workspace. You can download a file by right-click and choose "Download". This should be similar
to most online Integrated Development Environment (IDE).
• BUT this can be slow if you are outside of the UK.
• If you want to download any large-size data, directly download from the server
• You can use wget (or other commands) (google “wget command linux”)
4. Test your GPU
• You can now import the basic usage notebook "bbc6521_gpu.ipynb". Then go through the
notebook and learn how to work with the server.
• Also, check out: BBC6521_GPU_Notebook.pdf
• A detailed documentation of JuypterHub can be found at: https://jupyter.org/documentation
• The example code may not work well as the Notebook version is changed! Then, disregard this stage.
5. More practical way
• For deep learning, using terminal can be easier than using Jupyter Notebook
• Disconnection happens very often if you use Jupyter Notebook.
• Use terminal for setting up your environment (e.g. anaconda, library install (pip install)) and
training your model
• Go to Terminal and type nvidia-smi to check if you can see the GPU status
Tips: Some useful Linux commands
• Check the GPU usage
• nvidia-smi
• top # To check who is using
• Keep running gpus after you log out
• nohup python -u [path-to-your-python-code] >[[path-to-your-output].out] 2>&1 &
• Choose the gpus you want to use
• export CUDA_VISIBLE_DEVICES= 0
• or 1 or 2 or 3 (GPU numbers when you check nvidia-smi).
• Do not use all gpus at a time
• Note that sudo commands are not available in the server
• You should enquire Dr Matthew Tang (cc me) if you need to
install something with sudo
• And discuss with me first
Tips: Conda environment
• You may have heard about anaconda, a platform to deal with various data science packages.
• You can create a virtual environment which includes the libraries with the version you want to install
• See this webpage: https://towardsdatascience.com/setting-up-a-new-pytorch-deep-learning-environment313d8d1c2df0 (You don’t need to install anaconda in QM GPU as it is already installed. So please check the above
website from “Create a virtual environment”)
• At QM GPU, conda create -n test_env is not enough as the installed environment can
be reset when the Jupyter Server restarts! (This happened last year)
• So, please make one folder in your user folder and install the conda environment there. This kind
of command would work
• Make a folder to store conda environments: mkdir test_cj_conda_env
• Create a conda env.: conda create --prefix ./test_cj_conda_env/testenv python=3.6
• make a folder (I named as test_cj_conda_env) and install your conda environment (I named as test_cj_conda_env) in
the folder. This won't be deleted even the server is reset
More tips welcome!
• Note that some instructions are from your seniors (JP final project in 20/21, 21/22) and me after
we struggled many times
• (Thanks Huaixi Tang, Hengyi Wang, Chaoran Zhu, Yehan Fan!)
• Please share your tips for using QM GPUs more easily and contribute to this manual!
• E.g. Hope one of you come back to me with “HOW_TO_USE_TENSORBOARD” ☺
Download