Residual Neural Network

Outline: Creating a residual neural network

Skip Connections

Theoretically, you cannot have too many layers in a network. For example, if a perfect network has 3 layers, then any excessive layer should just not do anything. So a deeper network should always have atleast the same performance as a shallower one…

This is not the case. Primarily it is theorized that layers do not like to become the identity function, so they negatively impact performance. To address this, we can effectivly add an identity function to our network that bypasses the network layers known as a skip connection.

---
title: RES
---
flowchart LR
  D(Input) --> E[Layer]
  E --> F(($$+$$))
  D --> F
  F --> G(Output)

Residual Networks

Let’s build a basic linear residual network, in essence it is just a mlp with a skip connection

class ResidualModel(nn.Module):
    def __init__(self,
                 in_neurons:int,
                 hidden_neurons:int,
                 out_neurons:int,
                 dropout:float = 0.,
                )->None:
        
        super().__init__()

        layers = [
                nn.Linear(in_neurons,hidden_neurons),
                nn.Relu(),
                nn.Dropout(dropout),
                nn.Linear(hidden_neurons,out_neurons),
                ]
                
        self.dense = nn.Sequential(*layers)
        self.skip = nn.Linear(in_neurons,out_neurons)

    def forward(self,x) -> torch.Tensor:
        x = self.dense(x) + self.skip(x)
        return x

This is a completely viable network, but it isn’t really how these are used. The idea of a residual block is for a bunch of them to be commbined in sequence to form a “deep” network. To make this, we can just copy-paste our layers multiple times, or we can create a block.

class ResidualBlock(nn.Module):
    def __init__(self,
                 in_neurons:int,
                 hidden_neurons:int,
                 out_neurons:int,
                 dropout:float = 0.,
                 use_layer_norm:bool = False,
                )->None:
        
        super().__init__()

        layers = [
                nn.Linear(in_neurons,hidden_neurons),
                nn.Relu(),               
                nn.Linear(hidden_neurons,out_neurons),
                nn.Dropout(dropout),
                ]
                
        self.dense = nn.Sequential(*layers)
        self.skip = nn.Linear(in_neurons,out_neurons)
        self.layer_norm = nn.LayerNorm(out_neurons) if use_layer_norm else None

    def forward(self,x) -> torch.Tensor:
        x = self.dense(x) + self.skip(x)

        if self.layer_norm is not None:
                x = self.layer_norm(x)        
        return x

Notice that we moved the drop out to the end of block, and added the ability to apply a layer norm to the entire block

We can now generate a model using our residual blocks, basically the same way we made our original MLP

class ResNet(nn.Module):
    def __init__(self,
                 in_neurons:int,
                 hidden_neurons:int,
                 out_neurons:int,
                 no_blocks: int,
                 dropout:float = 0.,
                 use_layer_norm:bool = False,
                )->None:
        
        super().__init__()


        layers = [ResidualBlock(in_neurons,hidden_neurons,hidden_neurons,dropout,use_layer_norm)]
                
        for i in range(no_blocks):
                layers.append(ResidualBlock(hidden_neurons,hidden_neurons,hidden_neurons,dropout,use_layer_norm))

        layers.append(ResidualBlock(hidden_neurons,hidden_neurons,out_neurons,dropout,use_layer_norm))
        self.model = nn.Sequential(*layers)

    def forward(self,x) -> torch.Tensor:
        x = self.model(x)     
        return x