Base
monitor_gpu_temperature(threshold:int, sleep_seconds:int, gpu_id:int)
Checks the GPU temperature and sleeps if it exceeds a threshold.
Parameters:
threshold
(int): Temperature in Celsius above which the function sleeps.sleep_seconds
(int): Number of seconds to sleep when the threshold is exceeded.gpu_id
(int): ID of the GPU to monitor.
Returns:
None
: The function will print a warning and sleep if the temperature exceeds the threshold.Labels:
src_PyThon_NeuralNetwork_trainer_Base_monitor_gpu_temperature
class AverageMeter
computes and stores the average and current value
Author:
- Farshad Sangari
Methods:
create_save_dir()
Create a timestamped directory for saving model checkpoints and reports
Labels:
src_PyThon_NeuralNetwork_trainer_Base_create_save_dir
save_model(file_path:str, file_name:str, model:nn.Module, optimizer:Optional[nn.Module])
Save model and optimizer state
Parameters:
file_path
(str): Directory to save the modelfile_name
(str): Name of the file to save the modelmodel
(nn.Module): PyTorch model to saveoptimizer
(Optional[nn.Module]): Optimizer to save (if available)
Returns:
None
: Saves the model state to the specified fileAuthor:
- Yassin Riyazi
- Farshad Sangari
Labels:
src_PyThon_NeuralNetwork_trainer_Base_save_model
Load model and optimizer state from checkpoint
Parameters:
ckpt_path
(Union[str, os.PathLike]): Path to the checkpoint filemodel
(nn.Module): PyTorch model to load state intooptimizer
(Optional[nn.Module]): Optimizer to load state into (if available)
Returns:
optimizer (Optional[nn.Module])
: Optimizer with loaded state (if provided)Labels:
src_PyThon_NeuralNetwork_trainer_Base_load_model
normal_accuracy(pred:torch.Tensor, labels:torch.Tensor)
Calculate the accuracy of predictions against true labels.
Parameters:
pred
(torch.Tensor): Predictions from the modellabels
(torch.Tensor): True labels
Returns:
float
: Accuracy as a percentageLabels:
src_PyThon_NeuralNetwork_trainer_Base_normal_accuracy
teacher_forcing_decay(epoch:int, num_epochs:int)
Calculate the teacher forcing ratio for a given epoch.
Parameters:
epoch
(int): Current epoch numbernum_epochs
(int): Total number of epochs
Returns:
float
: Teacher forcing ratio for the current epochLabels:
src_PyThon_NeuralNetwork_trainer_Base_teacher_forcing_decay
HardNegativeMiningPostHandler(args:tuple[torch.Tensor, ...])
Post-processing handler for hard negative mining. This function can be customized to save or visualize hard negative samples. Currently, it does nothing but can be extended as needed.
Parameters:
args
(tuple[torch.Tensor, ...]): Tuple containing the data and possibly other tensors
Returns:
np.ndarray
: Processed data, currently just returns the first tensor in args as a numpy arrayLabels:
src_PyThon_NeuralNetwork_trainer_Base_HardNegativeMiningPostHandler
hard_negative_mining(model:nn.Module, dataloader:torch.utils.data.DataLoader, criterion:nn.Module, device:str, num_hard_samples:int)
Select the hardest examples (highest loss) from the dataset Returns a new DataLoader containing only the hard examples
Parameters:
model
(nn.Module): The trained model to evaluatedataloader
(torch.utils.data.DataLoader): DataLoader for the datasetcriterion
(nn.Module): Loss function to compute the lossdevice
(str): Device to run the model on ('cuda' or 'cpu')num_hard_samples
(int): Number of hard examples to select
Returns:
torch.utils.data.DataLoader
: DataLoader containing only the hard examplesTodo:
- Add handler for different model types (e.g., CNN, LSTM)
Labels:
src_PyThon_NeuralNetwork_trainer_Base_hard_negative_mining
train(model:nn.Module, train_loader:torch.utils.data.DataLoader, val_loader:torch.utils.data.DataLoader, criterion:nn.Module, optimizer:nn.Module, epochs:int, device:str, model_name:str, ckpt_save_freq:int, ckpt_save_path:Union[str, os.PathLike], ckpt_path:Union[str, os.PathLike], report_path:Union[str, os.PathLike], lr_scheduler:torch.optim.lr_scheduler, Validation_save_threshold:float, use_hard_negative_mining:bool, hard_mining_freq:int, num_hard_samples:int, GPU_temperature:int, GPU_overheat_sleep:float)
Standard training loop for autoencoder models with hard negative mining
Parameters:
model
(nn.Module): PyTorch modeltrain_loader
(torch.utils.data.DataLoader): DataLoader for training dataval_loader
(torch.utils.data.DataLoader): DataLoader for validation datacriterion
(nn.Module): Loss functionoptimizer
(nn.Module): Optimizerepochs
(int): Number of training epochsdevice
(str): Device to train on ('cuda' or 'cpu')model_name
(str): Name of the model for saving checkpointsckpt_save_freq
(int): Frequency of checkpoint saving (in epochs)ckpt_save_path
(Union[str, os.PathLike]): Path to save checkpointsckpt_path
(Union[str, os.PathLike]): Path to load checkpoint from (if resuming training)report_path
(Union[str, os.PathLike]): Path to save training reportlr_scheduler
(torch.optim.lr_scheduler): Learning rate schedulerValidation_save_threshold
(float): Threshold for saving best validation modeluse_hard_negative_mining
(bool): Whether to use hard negative mininghard_mining_freq
(int): Frequency of hard negative mining (in epochs)num_hard_samples
(int): Number of hard examples to selectGPU_temperature
(int): Temperature threshold for GPU monitoringGPU_overheat_sleep
(float): Sleep time in seconds if GPU temperature exceeds threshold
Returns:
report (pd.DataFrame)
: Training report with metricsTodo:
- Plot training loss over epochs real time in the terminal or a window
Author:
- Yassin Riyazi
- Farshad Sangari
Labels:
src_PyThon_NeuralNetwork_trainer_Base_train