A caveat of running TensorFlow on a multi-GPU system such as Hydra is that by default, a TensorFlow session will allocate all GPU memory on all GPUs, even though you only use a single GPU! A better usage pattern is to launch multiple jobs in parallel, each one using a subset of the available GPUs.
Before you start any TensorFlow session, you should first run nvidia-smi to see which GPUs are being utilized, then select an idle GPU and target it with the environment variable CUDA_VISIBLE_DEVICES.
For example, to select GPU 2 (/dev/nvidia2) to run your TensorFlow session, first set the environment variable:
Here we also use the environmental variable TF_CPP_MIN_LOG_LEVEL to filter TensorFlow logs. It defaults to 0 (all logs shown), but can be set to 1 to filter out INFO logs, 2 to additionally filter out WARNING logs, and 3 to additionally filter out ERROR logs.
If you use nvidia-docker, you can use the environment variable NV_GPU to isolate GPU. For example: