PossitionalImageGenerator

sinusoidal_positional_encoding()

Task:

This is a part of viscosity estimation project.

Sub-task:

Implement positional encoding for the Markov chain model.

Description:

Probably I am going to use CNN and Transformer for this project. I will treat each frame of the video as

a Markov state and the video as a Markov chain.

But I need to provide position of the object and its velocity as input features to the model.

I want to use positional encoding to encode the position and velocity of the object in the video.

Before I have used padding to keep the scale of objects in frames.

I was thinking about by adding polar/arc like positional encoding maybe I could keep the scale of objects in frames.

But I am not sure if it is a good idea.

I will try to implement it and see how it works. Worst case I will use padding again.

Why positional encoding?

In the context of transformers, every word in a sentence is mapped to a vector through embeddings. The transformer then uses these vectors to generate keys and queries for its self-attention mechanism.

The effectiveness of this process hinges on how well the positional encoding adapts to shifts in position., which do not inherently understand the order of input data.

In a Markov chain model, especially when dealing with time series or sequential data, positional encoding helps the model to:

- Understand the temporal relationships between states.

- Capture the dynamics of the system over time.

- Maintain the order of states in the Markov chain, which is essential for accurate predictions

- Facilitate the model's ability to generalize from seen to unseen positions in the sequence

Steps:

1. Deciding on the positional encoding scheme.

one of the most important concepts of a transformer model — positional encoding.

The authors of “Attention is all you need” chose a combination of sinusoidal curves for the positional encoding vectors.

Encoding must be able to identified the position of a data in a timeseries uniquely.

It should be able to expand to arbitrary number of data in a time series.

It should be compact and efficient. [Not one-hot encoding]

Binary encoding is not good because it is not smooth and does not generalize well.

Meaningful encoding:

we want small changes in position to correspond to small changes in representation.

The binary method, however, leads to large jumps in encoding for adjacent positions, which can confuse the model as it tries to interpret these as meaningful patterns.

Continuity:

Continuity in positional encoding helps the model to generalize better to positions it hasn’t seen. A continuous encoding scheme means that positions that are numerically close will receive similar representations, and this would aid the model in understanding patterns that span across multiple positions. Lack of smooth transitions also means that the model’s ability to interpolate and extract useful positional information is limited

Encoding scheme should not blow up the size of the input data. So a periodic function is a good choice.

To extend the encoding to arbitrary number of data in a time series, we can add a perpendicular sine wave with different frequency component to the encoding.

The first sine wave provides a base positional signal, and by adding a second sine wave at a different frequency, we allow the encoding to oscillate differently as we move along the sequence.

This additional dimension effectively creates a more complex positional landscape.

Each position is now represented by a point in this two-dimensional space, where each dimension corresponds to a sine wave of a different frequency.

* My initial thought was to use the row as the position but I think its better to include time or a dimensionless number of time as the position.

For a small positional shift δx, the change in positional encoding P should be a linear function.

Such a transformation ensures that the relationship captured between different parts of the sentence remains consistent, even as inputs vary slightly.

This consistency is vital for the transformer to maintain syntactic relationships within a sentence across different positions.

x: Length/ Embedding dimension

y: Speed / Position

Finding maximum and minimum values of the frame lengths and speeds.

2. Implement positional encoding for the Markov chain model.

3. Test the positional encoding with a simple model.

Analogies to LNP:

A sentence: A frame of video

A Token embedding vector: A row of pixels in a frame

References:

https://medium.com/@gunjassingh/positional-encoding-in-transformers-a-visual-and-intuitive-guide-0761e655cea7

https://medium.com/data-science/master-positional-encoding-part-i-63c05d90a0c3

Labels:

src_PyThon_Viscosity_PositionalEncoding_PossitionalImageGenerator_sinusoidal_positional_encoding

cropped_position_encoding()

Assumptions: Images are unified in size ex. (130, 1248) Adding position encoding to the images For erode results Italian Trulli

And without erode results see the images below. Italian Trulli

Labels:

src_PyThon_Viscosity_PositionalEncoding_PossitionalImageGenerator_cropped_position_encoding

main_Visualizer()

Main function to generate sinusoidal positional encoding.

Labels:

src_PyThon_Viscosity_PositionalEncoding_PossitionalImageGenerator_main_Visualizer

PE_Generator(numberOfImages:int, save_address:os.PathLike, PE_height:int, velocity_encoding:bool, positional_encoding:bool, default_image_size:tuple)

Parameters:

numberOfImages (int): Number of images to generate positional encodings for. Basically the width of the image.
save_address (os.PathLike): Path to save the positional encoding image.
PE_height (int): Height of the positional encoding image. I used 530 to move embedding a little up to avoid losing it after placing the drop. Later PE resized to (130, 1248) to match the image size.
velocity_encoding (bool): If True, use velocity encoding. Default is False.
positional_encoding (bool): If True, use positional encoding. Default is True.
default_image_size (tuple): Default size of the image to resize the positional encoding to. Default is (1245, 130).

Returns:

pe_norm (cv2.Mat): Normalized positional encoding image.

Raises:

ValueError: If the number of images is less than 1.
ValueError: If both velocity_encoding and positional_encoding are True.

Todo:

Check position encoding yields a better results or velocity encoding.

Position encoding is to fix the width of the PE to 1246 and Velocity encoding it to calculate the length of the images and then resize width to 1245.

Generate positional encodings for a set of images.

Labels:

src_PyThon_Viscosity_PositionalEncoding_PossitionalImageGenerator_PE_Generator

make_PE_image(source_img:np.ndarray, fill_img:np.ndarray, threshold_activation:int)

Generates a fill image based on the source image by replacing pixels in the fill image with those from the source image where the source image is below a certain threshold. This function modifies the fill image in place.

Parameters:

source_img (np.ndarray): Source image
fill_img (np.ndarray): Fill image
threshold_activation (int): Threshold for activating the fill image. Default is 1.

Returns:

fill_img (np.ndarray): The function modifies the fill image in place.

Raises:

ValueError: If the source and fill images do not have the same dimensions.
ValueError: If no dark regions are found in the source image.

Labels:

src_PyThon_Viscosity_PositionalEncoding_PossitionalImageGenerator_make_PE_image

make_PE_image_Folder(address:os.PathLike, verbose:bool, extension:str, remove_Previous_Dir:bool, velocity_encoding:bool, positional_encoding:bool)

Generate positional encoding images for a folder of images.

Parameters:

address (os.PathLike): Path to the folder containing the images.
verbose (bool): If True, print the progress of the function.
extension (str): File extension of the images to process. Default is '.png'.
remove_Previous_Dir (bool): If True, remove the previous directory if it exists
velocity_encoding (bool): If True, use velocity encoding. Default is False.
positional_encoding (bool): If True, use positional encoding. Default is True.

Returns:

None: No return value, the function saves the positional encoding images to the specified address.

Raises:

ValueError: If the address does not contain any images.
ValueError: If the index in the CSV file is NaN for any image.

Todo:

- [ ] Test with super resolution images.

- [ ] Test with padding images. Right now I'm gonna prototype with resizing the result image and feed it to the model.

- [ ] Later test with velocity encoding.

Labels:

src_PyThon_Viscosity_PositionalEncoding_PossitionalImageGenerator_make_PE_image_Folder