Duplicate caption lines with zero length in ts files

Begonnen von jaydear, April 15, 2022, 07:27:12

« vorheriges - nächstes »

jaydear

Lately I'm getting duplicate caption lines with zero length in some of the ts files output from TSD. The srt files generated by TSD seem to be almost but not totally cleared of the duplicate lines. I edit the ts files from TSD in another program which keeps the captions in sync when ads are removed, but the duplicate lines are still present in the ts files and they cause an ugly unreadable mess when they are muxed into the final mkv files by MKVToolnix!

1. Is there some good reason for having duplicate text lines with zero duration in ts files?
2. Can the duplicates be removed from the ts files by The Doctor when recordings are processed?
The.Doctor is the ants' pants :)

Mam

I remember vaguely that the BBC once played with "stuttering subtitles". They send them letter by letter with minimum lenght of time.
like
"
A
Ab
Abr
Abra
"
(and so on).
this looked cute when done with DVB subs (fixed position) but was a pain in the ass for SRTs because they are centered again for every new line.

And I think, I've made a tool to collect these stutters and convert them to a single line with summed up length (long time ago...)
And I also think the Doc took over the the idea and produces such collections too, maybe still today.

But then, I am old, my memories are blurred...  8)


Cypheros


jaydear

Here's an example:

00:00:52,879 --> 00:00:52,879
Well, you will, so...
00:00:52,919 --> 00:00:55,919
Well, you will, so...
00:00:56,119 --> 00:00:56,119
..be ready for it.
00:00:56,159 --> 00:00:57,319
..be ready for it.
00:00:57,519 --> 00:00:57,519
<font color="#ffff00">Ma'am?</font>
00:00:57,559 --> 00:00:58,719
<font color="#ffff00">Ma'am?</font>
00:01:00,079 --> 00:01:00,079
<font color="#ffff00">Er...I do know this car.</font>
00:01:00,119 --> 00:01:02,479
<font color="#ffff00">Er...I do know this car.</font>
00:01:02,679 --> 00:01:02,679
<font color="#ffff00">I spoke to the driver</font>
<font color="#ffff00">the other night, and, erm...</font>
00:01:02,719 --> 00:01:05,599
<font color="#ffff00">I spoke to the driver</font>
<font color="#ffff00">the other night, and, erm...</font>

I can send you a ts file with the problem if you like. I can't use the srt files TSD makes because they still contain all the unwanted ads and promos, etc. Usually, the edited ts files I make in the video editor I use contain just the wanted subs, still in sync with the audio, and it's easy to mux them into an mkv file when I'm finished editing.

The problem I'm seeing recently is that there are duplicate sub lines within the ts file from TSD. The duplicate lines have zero durations which causes huge problems for VLC, MPC-HCx64 and media players in e.g. Sony BD/DVD players.

TSD does remove most of the duplicate lines from it's srt file which is great, but editing the subs after editing the ts files would be a MEGA-task.

If the broadcasters are doing this on purpose I would suggest that they are unfairly targetting people with a hearing disability.
The.Doctor is the ants' pants :)

Mam

your example is perfectly ok, there are no duplicates  8)

What you see and do not like are the lines with the double text, but they are with different times and do not overlap. So this is not a bug. Most likely its broadcasted this way, so you may ask your tv station why they are sending out subs with a flicker in the middle.
But that is nothing the doc produces, maybe my old tool I've already meantioned could help a bit by collecting those texts together and modifying the end time. So there would be no flicker (if it a visible flicker at all, the gaps are seriously small)

What puzzles me a bit with these lines is that the 2nd line (with "zero" length) does not immediately follow the "real" one. There is a small gap between them, perfectly 40ms (2 frames).
Maybe this is to trigger a special device, we have not heard about (yet) ?
But it is too symmetric, to be accidental. You better ask your station.
Maybe there is a special box sold in your area that looks for these lines and reacts on them? The lines should not disturb normal devices, a zero lenght is unlikely to be shown at all, it would be canceled by the optimizer.
So, even if this may look strange to you, I'm sure, nothing bad will happen when using these files...
 :-*

(But even if, building a small script that eliminates them is a piece of cake, just look for lines with same start and end times and delete them and the following lines until a new timecode is at the beginning of the line (subs can contain more than one line))


Cypheros

Zitat von: jaydear am April 16, 2022, 07:58:25I can send you a ts file with the problem if you like. I can't use the srt files TSD makes because they still contain all the unwanted ads and promos, etc.

Would be great. Maybe I can merge the dublicated lines and fix the problem with the ads.

jaydear

Zitat von: Cypheros am April 16, 2022, 16:39:46Would be great. Maybe I can merge the dublicated lines and fix the problem with the ads.

Done  :D
The.Doctor is the ants' pants :)

jaydear

Zitat von: Mam am April 16, 2022, 08:39:02your example is perfectly ok, there are no duplicates

Incorrect!

1. The first of each pair of lines has zero length:
00:00:52,879 --> 00:00:52,879 = 0.0mS difference
Well, you will, so...
00:00:52,919 --> 00:00:55,919 = 3,000mS (3 Seconds)
Well, you will, so...

2. It definitely causes major display problems for the players I mentioned
3. It's not a flicker, it's a jumbled mess of text that stacks up on-screen. I have seen flicker before, but only on live-captioned programs.
The.Doctor is the ants' pants :)

Cypheros

Lines are indeed broadcasted doubled.

Du darfst in diesem Board keine Dateianhänge sehen.

Cypheros

#9
Strange. I see no duplicated lines in the srt file.
What version of TS-Doctor are you using?

Teletext subtitle output of TS-Doctor 3.2.29 of the same passage looks like this:

10
00:00:52,920 --> 00:00:55,960
Well, you will, so...

11
00:00:56,160 --> 00:00:57,360
..be ready for it.

12
00:00:57,560 --> 00:00:58,760
<font color="yellow">Ma'am?</font>

13
00:01:00,120 --> 00:01:02,520
<font color="yellow">Er...I do know this car.</font>

14
00:01:02,760 --> 00:01:05,640
<font color="yellow">I spoke to the driver</font>
<font color="yellow">the other night, and, erm...</font>

jaydear

Yes, as I said "...there are duplicate sub lines within the ts file from TSD...". I don't edit within TSD because it can only cut on I frames. I also said "...I edit the ts files from TSD in another program...".

So, I'm guessing there is no way for TSD to output a processed ts file containing the repaired subs - perhaps because of codec issues? 
The.Doctor is the ants' pants :)

Cypheros

#11
Hi, the subs are not wrong and don't need to be repaired. If you watch the file with activated subs via common mediaplayers, you can see, that they are ok and within the specs. The problem is the wrong/lazy subtitle interpretation of your "frame accurate video editor".
What if you process the file with TS-Doctor, remove teletext and let it create srt subtitles. Is your "frame accurate video editor" maybe be able to process external srt subtitle files?

jaydear

It was all working just fine until they started putting those zero-length lines in. I've found a way around it. Thanks for trying.  :)
The.Doctor is the ants' pants :)


www.cypheros.de