It should be possible to subtract the system audio playing in the background from the recorded audio. This feature would allow users to achieve clearer and more accurate transcriptions by isolating the speaker's voice from any background music or noise.
This might be doable via some kind of spectral subtraction.
Perhaps this would be useful: