Segmentation – identifying which image pixels belong to an object – helps with tasks like analyzing scientific imagery or editing photos. Our original Segment Anything Model released last year inspired new AI-enabled image editing tools in our apps, such as Backdrop and Cutouts on Instagram. SAM has also catalyzed diverse applications in science, medicine and numerous other industries. For example, SAM has been used in marine science to segment sonar images and analyze coral reefs, satellite imagery analysis for disaster relief, and in the medical field, segmenting cellular images and aiding in detecting skin cancer.
Our new Segment Anything Model 2 (SAM 2) extends these capabilities to video. SAM 2 can segment any object in an image or video, and consistently follow it across all frames of a video in real-time. Existing models have fallen short in enabling this as segmentation in video is significantly more challenging than in images. In videos, objects can move fast, change in appearance and be concealed by other objects or parts of the scene. We solved many of these challenges in building SAM 2.
We believe this research can unlock new possibilities such as easier video editing and generation, and allow new experiences to be created in mixed reality. SAM 2 could also be used to track a target object in a video to aid in faster annotation of visual data for training computer vision systems, including the ones used in autonomous vehicles. It could also enable creative ways of selecting and interacting with objects in real-time or in live videos.
Keeping with our open science approach, we’re sharing our research on SAM 2 so others can explore new capabilities and use cases. We’re excited to see what the AI community does with this research.