Even though the transcription correctly identifies which speaker is speaking, the generated video sometimes highlights the wrong person as speaking.
I've attached a screenshot of the problem happening in https://www.flowjin.com/studio/100079
At the moment captured in the screenshot, the word "wasn't" is being spoken. As can be seen in the transcript in the left, the word "wasn't" in pink is part of Stephanie's dialogue. However, as you can see on the video preview on the right, there's a white glow-ring around Nebu, implying that he is the one who is speaking.
Most of the time during the video, the correct circle is highlighted, but it highlights the wrong ones sometimes here and there.