Base

monitor_gpu_temperature(threshold:int, sleep_seconds:int, gpu_id:int)

Checks the GPU temperature and sleeps if it exceeds a threshold.

Parameters:

threshold (int): Temperature in Celsius above which the function sleeps.
sleep_seconds (int): Number of seconds to sleep when the threshold is exceeded.
gpu_id (int): ID of the GPU to monitor.

Returns:

None: The function will print a warning and sleep if the temperature exceeds the threshold.

Labels:

src_PyThon_NeuralNetwork_trainer_Base_monitor_gpu_temperature

class AverageMeter

computes and stores the average and current value

Author:

- Farshad Sangari

Methods:

create_save_dir()

Create a timestamped directory for saving model checkpoints and reports

Labels:

src_PyThon_NeuralNetwork_trainer_Base_create_save_dir

save_model(file_path:str, file_name:str, model:nn.Module, optimizer:Optional[nn.Module])

Save model and optimizer state

Parameters:

file_path (str): Directory to save the model
file_name (str): Name of the file to save the model
model (nn.Module): PyTorch model to save
optimizer (Optional[nn.Module]): Optimizer to save (if available)

Returns:

None: Saves the model state to the specified file

Author:

- Yassin Riyazi

- Farshad Sangari

Labels:

src_PyThon_NeuralNetwork_trainer_Base_save_model

load_model(ckpt_path:Union[str, os.PathLike], model:nn.Module, optimizer:Optional[nn.Module])

Load model and optimizer state from checkpoint

Parameters:

ckpt_path (Union[str, os.PathLike]): Path to the checkpoint file
model (nn.Module): PyTorch model to load state into
optimizer (Optional[nn.Module]): Optimizer to load state into (if available)

Returns:

optimizer (Optional[nn.Module]): Optimizer with loaded state (if provided)

Labels:

src_PyThon_NeuralNetwork_trainer_Base_load_model

normal_accuracy(pred:torch.Tensor, labels:torch.Tensor)

Calculate the accuracy of predictions against true labels.

Parameters:

pred (torch.Tensor): Predictions from the model
labels (torch.Tensor): True labels

Returns:

float: Accuracy as a percentage

Labels:

src_PyThon_NeuralNetwork_trainer_Base_normal_accuracy

teacher_forcing_decay(epoch:int, num_epochs:int)

Calculate the teacher forcing ratio for a given epoch.

Parameters:

epoch (int): Current epoch number
num_epochs (int): Total number of epochs

Returns:

float: Teacher forcing ratio for the current epoch

Labels:

src_PyThon_NeuralNetwork_trainer_Base_teacher_forcing_decay

HardNegativeMiningPostHandler(args:tuple[torch.Tensor, ...])

Post-processing handler for hard negative mining. This function can be customized to save or visualize hard negative samples. Currently, it does nothing but can be extended as needed.

Parameters:

args (tuple[torch.Tensor, ...]): Tuple containing the data and possibly other tensors

Returns:

np.ndarray: Processed data, currently just returns the first tensor in args as a numpy array

Labels:

src_PyThon_NeuralNetwork_trainer_Base_HardNegativeMiningPostHandler

hard_negative_mining(model:nn.Module, dataloader:torch.utils.data.DataLoader, criterion:nn.Module, device:str, num_hard_samples:int)

Select the hardest examples (highest loss) from the dataset Returns a new DataLoader containing only the hard examples

Parameters:

model (nn.Module): The trained model to evaluate
dataloader (torch.utils.data.DataLoader): DataLoader for the dataset
criterion (nn.Module): Loss function to compute the loss
device (str): Device to run the model on ('cuda' or 'cpu')
num_hard_samples (int): Number of hard examples to select

Returns:

torch.utils.data.DataLoader: DataLoader containing only the hard examples

Todo:

- Add handler for different model types (e.g., CNN, LSTM)

Labels:

src_PyThon_NeuralNetwork_trainer_Base_hard_negative_mining

train(model:nn.Module, train_loader:torch.utils.data.DataLoader, val_loader:torch.utils.data.DataLoader, criterion:nn.Module, optimizer:nn.Module, epochs:int, device:str, model_name:str, ckpt_save_freq:int, ckpt_save_path:Union[str, os.PathLike], ckpt_path:Union[str, os.PathLike], report_path:Union[str, os.PathLike], lr_scheduler:torch.optim.lr_scheduler, Validation_save_threshold:float, use_hard_negative_mining:bool, hard_mining_freq:int, num_hard_samples:int, GPU_temperature:int, GPU_overheat_sleep:float)

Standard training loop for autoencoder models with hard negative mining

Parameters:

model (nn.Module): PyTorch model
train_loader (torch.utils.data.DataLoader): DataLoader for training data
val_loader (torch.utils.data.DataLoader): DataLoader for validation data
criterion (nn.Module): Loss function
optimizer (nn.Module): Optimizer
epochs (int): Number of training epochs
device (str): Device to train on ('cuda' or 'cpu')
model_name (str): Name of the model for saving checkpoints
ckpt_save_freq (int): Frequency of checkpoint saving (in epochs)
ckpt_save_path (Union[str, os.PathLike]): Path to save checkpoints
ckpt_path (Union[str, os.PathLike]): Path to load checkpoint from (if resuming training)
report_path (Union[str, os.PathLike]): Path to save training report
lr_scheduler (torch.optim.lr_scheduler): Learning rate scheduler
Validation_save_threshold (float): Threshold for saving best validation model
use_hard_negative_mining (bool): Whether to use hard negative mining
hard_mining_freq (int): Frequency of hard negative mining (in epochs)
num_hard_samples (int): Number of hard examples to select
GPU_temperature (int): Temperature threshold for GPU monitoring
GPU_overheat_sleep (float): Sleep time in seconds if GPU temperature exceeds threshold

Returns:

report (pd.DataFrame): Training report with metrics

Todo:

- Plot training loss over epochs real time in the terminal or a window

Author:

- Yassin Riyazi

- Farshad Sangari

Labels:

src_PyThon_NeuralNetwork_trainer_Base_train