Dec 14, 2021
Thanks for your question.
The spectrogram mask is a binary 0/1 mask. Like any binary mask, this mask can split a signal (audio or image) into two pieces. In image segmentation, a binary mask splits the image into parts, e.g., background and foreground.
In this paper, the input spectrogram describes the audio coming from multiple sources, e.g., piano and guitar. To split this signal into its parts (piano and guitar), one can use a binary 0/1 mask. By applying this mask to the input spectrogram, we generate two output spectrograms: one for the piano and another for the guitar.
I hope this helps.
Thanks