

This was used to inpaint a 64 × 64 central square region in the 128 × 128 image. suggested a deep generative network for predicting the missing content of arbitrary image regions, which was named Context-Encoders. However, this approach was limited to greyscale (monochromatic channel) images, and it resulted in significant computational costs. They set the computational task framework as a statistical framework for regression rather than density estimation. were the first to use CNN for image inpainting. In recent years, deep learning models have also achieved amazing results. The main idea was to encode the image for potential features, fill the missing regions in the feature layer, and then decode the acquired features as an image. The models are divided into Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) for image inpainting. Our work focused on deep learning model image inpainting approaches to predict the pixels of the missing regions by training deep convolutional neural networks. Experiments demonstrate that this design can obtain richer image details and more realistic images. The global discriminator takes the complete image as input to identify the global consistency of the image, whereas the local discriminator takes the missing regions in the completed image as input to judge the consistency of the missing regions. The multi-scale discriminators include a global discriminator and a local discriminator. This idea improves the refinement of the inpainting results by determining whether the image is consistent with the ground truth. Then, we added the multi-scale discriminators to obtain semantically consistent and visually superior images. At the same time, we used the instance normalization to accelerate the convergence of the model. Therefore, our model introduces the residual blocks to address the reduced precision caused by the increase in the depth of the neural network. However, increasing the depth of the network is not always applicable due to the non-convergence of the network caused by the disappearance of the gradient. On the other hand, we can obtain more high-level semantic features by arbitrarily increasing the depth of the network. However, U-Net has shallow layers and fewer parameters than many current networks, so it is easy to overfit during training. Our work is based on the Pyramid-Context Encoding Network (PEN-NET), which was proposed in 2019. To obtain visually realistic and semantically consistent images, we propose the Semantic Residual Pyramid Network (SRPNet) for filling the missing regions of the images at the image and feature levels. As a result, great performance was achieved for filling both the regular and irregular missing regions. Finally, we conducted experiments on four different datasets. The discriminators are divided into global and local discriminators, where the global discriminator is used to identify the global consistency of the inpainted image, and the local discriminator is used to determine the consistency of the missing regions of the inpainted image. To generate semantically consistent and visually superior images, the multi-scale discriminators are added to the network structure.

At this stage, a multi-layer attention transfer network is used to gradually fill in the missing regions of the image. This method encodes a masked image by a residual semantic pyramid encoder and then decodes the encoded features into a inpainted image by a multi-layer decoder. In this article, we propose a Semantic Residual Pyramid Network (SRPNet) based on a deep generative model for image inpainting at the image and feature levels. These methods either generate contextually semantically consistent images or visually excellent images, ignoring that both semantic and visual effects should be appreciated. Existing image inpainting methods based on deep learning have made great progress.
