The goal of this project is to allow users to easily remove certain objects from an image so that they can use it as a Zoom background. The motivation for this project comes from the idea that it can be difficult to find areas with no distracting objects in the background to use in professional Zoom meetings. To do this, we use adaptive Gaussian thresholding to divide the image into foreground and background. We then use connected components labeling to identify distinct objects as blobs and allow the user to click on the blobs they would like to remove. Our program then removes seams from the portion of the image containing the object and adds minimal seams back so the picture remains the same size but the object is no longer visible. The results showed that images containing objects with both high and low contrast from their background have successful results and resulted in the least visible interruptions in the output. The change in thresholding technique from the first iteration also resulted in multicolored or irregularly-shaped objects being difficult to detect; however, the algorithm now detects touching objects as distinct objects.
The figure below shows an ideal result where the user chooses to remove the television and cups off of the dresser and obtains the image on the right to use as a professional-looking Zoom background.
The goal of our project is to remove distracting items from a scene in order to use the scene as a Zoom background in meetings. We will use adaptive thresholding to divide the image into foreground and background, connected components segmentation algorithm to identify the distracting objects as blobs from the foreground, and seam carving algorithms to remove the blobs. A user can input an image and our system will allow the user to select objects to be removed. After the user chooses the blobs to remove, our system outputs a final image which can be used as a clean background. The motivation behind our idea comes from the struggle of having to find a blank wall or clean room whenever we need to join a Zoom meeting. With remote interviews and important meetings, our backgrounds become important in how we present ourselves. The system we are presenting would solve that problem by providing a quickly and easily accessible background that's not distracting (ex: textbooks or clothes behind you or preset ocean background). The domain we are working in would be regular RGB photographs.
To go about getting rid of unwanted objects in images, we first use connected component labeling to detect different blobs in the image. One of the other early segmentation techniques attempted was mean shift, but this was found to be ineffective because it resulted in objects in different locations of the image with roughly the same color being considered a single object. This made it especially hard to correctly segment multi colored objects and different objects with similar colors. Using deformable contours was also attempted, but this method made it much harder to correctly identify objects of some shapes.
Therefore, connected components ended up being the chosen segmentation technique because it is able to recognize objects that are multicolored as well as any shape. This algorithm uses binary thresholding on grayscale images to separate the image into foreground and background based on color, and labels the different blobs in the foreground. We drew inspiration from https://iq.opengenus.org/connected-component-labeling/ as the basis for detecting the connected components in the image. For each image, we modified the thresholding parameters utilized in the code above. In our first iteration, we utilized binary thresholding to determine the foreground and background of images. Images with light backgrounds used inverse binary thresholding, images with dark backgrounds used binary thresholding, and the threshold level was adjusted for each image depending on the level of contrast. In our second iteration, we decided to utilize an adaptive Gaussian threshold. Adaptive thresholding helps determine which parts of the image should be in the foreground and background without setting a threshold value. This helped to remove issues caused by shadows and differences in lighting. Instead, there is a window size that determines the size of the neighborhood for the Gaussian window, and a constant that removes extraneous edges that are detected. From there, the user is presented with an image containing only the detected blobs with the background removed as shown below.
We chose to keep the original pixel values when displaying the blobs to the user rather than display each individual blob with a uniform color in an effort to make the objects recognizable to the user. Originally, the potential objects that a user could remove were going to be highlighted in some uniform color. However, it was found that some of the images appeared very cluttered with many different objects highlighted, and it was hard for a user to recognize objects in the picture. Therefore, the original pixel values from the image are displayed to the user, and the user can then select the blobs they would like to remove by clicking on a point within each object. The system then determines which blobs those points belong to and marks all pixels in those blobs for removal. The segmentation outputs a numpy array that is the same size as the input image and contains 1’s where the object(s) to be removed are located and 0’s everywhere else that can be used as a mask as shown below.
After discovering that our segmentation technique often resulted in the outer edges of the object failing to be removed, we implemented a buffer to slightly increase the size of the object to ensure that the entire object gets removed. To execute this, we looped over the pixels forming the border of the blob and determined which direction from this pixel marked a transition from blob to background. We then changed the background pixels surrounding the blob within distance neighborhood_size / 2 from the blob border pixel from 0s to 1s in the mask. This resulted in a “buffer” of width neighborhood size / 2 around the entire object that helped ensure the entire object gets removed.
We then use seam carving methods to remove the unwanted objects that we found during segmentation. To do this, we first read in the mask that is output from the segmentation. The mask is used to weight the energy of the pixels containing the object to be removed to a very negative number so they are guaranteed to have minimal energy. For each image, we compute the energy image and the cumulative minimum energy map. For our project midterm update we computed both the vertical and horizontal minimal seams and chose the minimum energy of the two to remove. To make our final update more efficient, we first compute the total height and width of the object(s) to be removed using the mask and transpose the image if the object is wider than it is tall so we can just remove vertical seams every time. In addition, we changed our seam carving algorithm to recompute the cumulative energy map between each seam removal in our final project update to more accurately find the minimal seam to remove. The image with enough seams removed to completely remove the object is shown below.
Then, in order to replace the seams from the removed object, we took an approach similar to removing seams. We found the minimal seam and inserted it over the gap. This completely filled in the gap created by the object removal from the previous step. The improvement of recalculating the cumulative energy map after each addition of a seam also helped to smooth out the results and disrupt the images less with the insertion of seams. Finally, after adding back in enough seams to return the image to its original dimensions, we transposed the image back if it had been transposed to obtain the final result shown below.
We performed our procedure on 12 different images of varying predicted compatibility with our current algorithm. The properties that we varied across the images in our dataset were:
In order to evaluate how well our approach was working, we had people rate how effective our program was in removing objects from an image without disrupting the image. This was chosen as the evaluation metric because the point of the project is to remove objects to make suitable professional Zoom backgrounds, so the goal is to make pictures that appear suitable to humans. Humans are ultimately the only ones capable of judging what humans will view as acceptable images.
For each image on which the program was run, humans were asked to rate whether the program removed a) less than expected, b) exactly what they expected, or c) more than expected from the image. The point of this question was to determine whether the program was grouping things together as one object that humans would consider to be multiple distinct objects, or whether it was considering multiple objects where a human would perceive one object. The goal is to reach results where a user can click an object in the image and the program can consistently remove the entire object that the user was hoping to remove and nothing else.
Humans were also asked whether the resulting image appeared a) not disrupted, b) somewhat disrupted, or c) significantly disrupted after the object was removed and seams were added back in. The point of this question was to determine whether the removal and replacement of an object had effects that were noticeable to the human eye. The goal is to reach results where a user is not able to detect any disruptions in the resulting image, which would mean the object has been effectively removed and replaced to create an image that a user could use as a Zoom background without appearing out of the ordinary.
Finally, we asked whether they were happy with the results. The point of this question was to determine whether the user believed the program had done a good enough job removing the objects that they would be willing to use the resulting image as their Zoom background.
The naive approach to our problem is to detect the blobs in the image, determine which blobs the user would like to remove, and then replace the blob’s pixel with a color from the background. Below you can see an example of the original image and the naive approach applied to it. As you can see, it is clear that the pumpkin and the ghost were selected to be removed, but the objects are still visible despite the fact that they are now a solid background color.
For trends in the accuracy of object removal, we saw 58% of our images (an increase of 16.6% since phase one) displaying the objects removed exactly as expected after implementing adaptive thresholding, object edge buffers, and recalculating the cumulative energy map after each iteration of adding a seam. We still had 25% of our results (a decrease of 8% since phase one) removing less of an object than expected, which is likely due to the over-segmentation of the image causing single objects to be detected as multiple distinct objects. A trend among the images that did not remove enough is that many of them contained objects with more irregular shapes, whereas images containing objects of more simple shapes (squares/rectangles) did not have this issue. This could also be an issue because of objects with lower contrast with the background. To detect these objects, we had to increase the neighborhood size during adaptive thresholding which decreases the amount of the object that is detected. Additionally, 16.7% of our images (a decrease of 8% since phase one) had more than expected removed. This occurred when our segmentation techniques could not fully distinguish the object to be removed from the background. Because of this, portions of the background were removed along with the object.
For trends in image disruption, we had 50% of our images (an increase of 33.3% since phase one) showing up not disrupted when we added adaptive thresholding, object edge buffers, and recalculating the cumulative energy map after each iteration of adding a seam. This led our results to have only 50% of images displaying some or significant disruption. In low contrast images, our new techniques allowed for little to no disruption when removing objects with little contrast between them and the background due to the added adaptive threshold. In high contrast images, images in the first iteration that had significant disruption were improved with the new approach and had somewhat disruption or no disruption.
In the images below, the input to the program is displayed on the left side, the resulting output from the first project iteration is in the middle, and the resulting output from the second iteration with some object(s) removed is displayed on the right side.
The image above shows an example of a solid colored object being removed from a dark background with which it has high contrast. In both iterations of our project, the portion of the image where the object used to be was replaced effectively, as the outlines of the couch cushions were filled in in a way that does not immediately appear excessively unusual and seems to fit in with the rest of the couch. However, the bottom of the couch did appear somewhat disrupted by additional seams being inserted into it. This is likely because many seams from the couch were removed to get rid of the pillow, and a majority of seams were added back at the bottom of the couch instead of where the pillow used to be. In our second iteration, there were improvements on the disruption of couch pillows. Since the cumulative minimal energy map was calculated each time, more optimal seams were removed, and the couch cushions are no longer slanted.
The picture above shows an example of an object on top of a dark background with which it has high contrast. In both iterations of our project, the object was removed and replaced in a way that caused relatively little disruption to the background and was thus one of the better results obtained from this experiment. The changes implemented in the second phase of our project made the artifact left behind from removing the object slightly less noticeable, and the seams inserted into the image look slightly more natural by eliminating the diagonal disruption in the upper left portion of the image.
The image above shows an example of an object (mask on wall) on top of a light background with which it has high contrast. In the first iteration, the object was removed and replaced with some noticeable disruption to the image, but not to an extent that the image appears too unusual or unusable as a Zoom background. In the second iteration, most of the object was removed, but the straps remained behind and were elongated to leave some disruption in the resulting image. This is likely because the increased segmentation of the images that helped us detect two touching objects as distinct objects caused the straps to be detected as a separate object from the rest of the mask. This caused the straps to not be removed, and since seams from the pillar were inserted back into the image, some of these seams contained the straps and thus caused the thin line between the two straps in the output above.
The image above shows an example of an object (mask on wall) on top of a light background with which it has low contrast. In the first iteration, the results turned out very different than expected and resulted in an image with much interruption that is likely unusable as a Zoom background. This likely happened because the object and its background are too similar in color. The mask is slightly lighter than its background, and due to the way in which the threshold was set for a light background in the connected components portion, an object that is lighter than its background is not viewed as a distinct object. Instead, the program perceived the entire pillar including the mask as one object and attempted to remove the entire pillar. In the second iteration, the adaptive threshold was able to detect the light colored mask on the light background despite the fact that there is low contrast. Since the thresholding could differentiate the mask and the background, the pillar was not removed, so the image was less disrupted.
The image above shows an example of multiple objects to remove on a dark background. The results display the output of an attempt to remove the ghost and the pumpkin from the image. The result from the first iteration from the project successfully removes the entire ghost and pumpkin without remaining any pieces of them behind. However, the resulting image with all of the seams inserted appears severely distorted and resulted in one of the objects that we didn’t want to be affected (tree) to be distorted. The result from the second iteration left behind some parts of the ghost and the pumpkin since the buffer was not wide enough to remove all the borders of the two objects. However, the final output with seams inserted did not distort the rest of the image nearly as much, as the tree and much of the rest of the background remained untouched.
The pictures above show a good example of how backgrounds with high contrasting colors can disrupt the image. In the first iteration, this image was run as if the sticky note was on a light background. The blue portions of the background and the sticky note were above the threshold, these were detected as blobs. Since the sticky note is touching a portion of the blue background that spans the entire width of the image, this entire section was considered to be one blob and almost all of the image was removed along with this blob. Therefore, only one seam was used for the entire resulting image which is why this is one of the worst resulting images. In the second iteration, the adaptive thresholding determines that each diamond from the background is a different object. When the sticky note is selected, it only blends into the diamond on the left hand side of the background. Therefore, a much smaller portion of the image is disrupted since most of the image was previously removed.
The picture above shows an example of objects that are touching in front of a light background. The program was supposed to remove the TV from the image, but in our midterm update the entire TV and dresser ended up getting removed. This is because the program perceived the TV and dresser as a single object instead of two distinct objects since they are touching and have relatively similar colors. Some of the other objects that were sitting on top of the dresser were not removed because they were light enough to be perceived by the program as separate objects from the TV and dresser. With the use of adaptive thresholding, our new program was able to distinguish between the objects on the dresser and the dresser itself. We were able to remove only the TV in the right-hand image using our new program.
The image above shows an example of an object on top of a solid colored, dark background with which it had high contrast. This was probably the best result from this experiment, as the object was removed and replaced with no clear disruption to the rest of the image. In the first iteration, the sticky note was removed but left a pink mark and the inserted seams somewhat disrupted the flow of the image. In the second iteration, we were able to completely remove the pink note without any residue and the new seams were added towards one end of the image, where they did not disrupt the flow.
We started this project wanting to be able to help users remove certain objects from their Zoom backgrounds. With the increased use of video chats for school and work, it makes sense that some people might not want certain objects visible on their video screens displayed to other users. We used adaptive thresholding to determine which portions of the image are foreground and background, and then a connected components algorithm to determine distinct objects from the background. The user is then able to select the object they would like to remove. We then performed seam carving and used the minimal seam to replace the object in the image identified by segmentation. The minimal seam along with a mask of the desired object was used to remove the object from the image altogether and replace it with minimal seams. We found that we were able to greatly improve the results of removing objects that were touching other objects. We also found that we were more able than before to detect objects that had low contrast when compared to their backgrounds. Our program is now able to detect objects regardless of their contrast with their background. We found that we were able to make best use of the program by choosing images with relatively regularly shaped objects. For example, the square pillow, square earbud case, square sticky note, and rectangular mask were able to be removed well with our program. However, our segmentation program struggled to detect more minor details of the shapes such as the mask straps as part of the same object. As a result, our program struggled with leaving behind portions of the objects that are shaped more irregularly. Our program also has problems detecting multicolored objects as a single object, as our segmentation method utilizes color as one of the factors to determine the segmentation of distinct objects. As a result, when trying to remove multicolored objects from images, our program would often only remove the portion of the object that was the same color as the selected portion. Users would have to select every portion of the image containing a different color to ensure the entire object is removed.
As a result of these efforts, we have images, shown previously, that can be used as Zoom backgrounds. However, many of the resulting images have streaks or textures where the object was removed, or they had portions left behind of the objects that were supposed to be removed. Therefore, we can improve our approach to make the backgrounds better and less distracting.
For the future, we should:
By using these techniques, we can make our approach better by allowing objects in an image to be removed seamlessly and without trace.