feature idea: duplicate audiostream detecting.

Begonnen von molnarg007, August 08, 2015, 03:50:16

« vorheriges - nächstes »

molnarg007

Hi !

An idea for the future:
TS doctor is very good in deleting trash/not needed data to save space, lesser video size is always great.  :D

When there are more audio streams and I can deactivate/activate any of them it would be wonderful to have some kind of
"checksum" below PID Layer Check, and if it's the same with another audio stream then they are exactly the same.

If the audio format/type/codec is not the same such comparing is irreleveant (((maybe not even possible))), but in case of same audio format/type some streams are exactly the same so they can be trashed to save space.

Or any kind of indication/warning if some audio streams are the same. (Like "There are duplicate audio streams detected do you wish to deactive them?"

Or at least a settings option "automatically deactivate duplicate audio streams."

Thanks !
Gabor

Djfe

I`m sorry for destroying your illusions but the chance that there are audio streams which are exactly the same is pretty much zero
of course you can take checksums, but they won`t be the same

the streams (video and audio) are encoded live by the hardware so there will always be slight differences in the result even if the audio streams are the same
of course they could use the same file they just encoded and stream it twice but that wouldn't make any sense at all

I mean what reason should broadcasters have to send the same audio twice (bitrates on satellites level aren't cheap)


the only reason why two audio streams might sound the same is that broadcasters have several audio streams, which can sometimes contain another language but normally be the same audio language that is also send over the default audio stream

the reason for that is probably, that the broadcaster doesn't want to confuse receivers by suddenly stopping to send an audio stream and to keep the bitrate at a constant rate (maybe they would need to fill the stream with zeros if they would leave it out anyways, so they simply keep it in) and last but not least it's easier to keep it in and doesn't make things more complicated


of course you could decode the audio and compare the wave forms but first you would need a decoder for that audio stream, a license (which costs quite a bit and the TSD doesn't have it because it doesn't need it -> it works on the packet level and doesn't decode anything itself -> it uses decoders installed on your system for the cutting window) and in the end there might be still some false positives or false negatives

it's easier less time expensive to simply compare the audio streams yourself ;)

Mam

yeah, I am afraid, your idea won't work too  ::)

First of all, to be able to calculate a checksum (or better a more reliable hashvalue), you have to sum up all bytes from that stream and feed it through your hashroutine. Of course, this would mean, you have to read ahead ALL DATA OF THE STREAM!
This is something the doc just does during its final correction run at the end of processing, it takes a long time, so its not a really bright idea to have it run at startup just to calc some values you might not even want to see. You would have to wait some MINUTES before the doc becomes responsible.

Second, unless cutted, the input file may contain audio tracks that might not be identical in certain  regions. I can think of a channel that uses different languages for the ads shown but the same one for the film shown. Obviously your checksums would be different although the net output would be the same.

And finally, many stations use different audio formats / different bitrates for the same audio. German ÖR channels for instance have usually a track with MPEG2/256 kbit/s and another one with MPEG2/192 kbit/s. When available, the latter one carries the audio description for blind people, else both contain the same audio. But obviously, because of the different encoding setting, the two channels would never result into the same checksum.

So I think, spending the time for this calculation is wasted, sorry.

Djfe

you don't have to check the whole stream
first of you can bisect or do the same thing that Cypheros already does with the AC3 ad detcetion -> check only a certain amount of times at different parts in the movie which can save a lot of time in detecting, that the audio isn't the same
so in the end you could say it's likely that they are the same or they definitely aren't the same

it's still wouldn't be useful (at least not for me)

there are a lot of other features that Cypheros could develop anyways

Derrick

..the ultimate feature would be autowatching the content. It could save a huge amount of time. Auto timer -> correction and cutting -> auto watching -> auto delete :D

Mam

Zitat von: Djfe am August 08, 2015, 16:17:35
you don't have to check the whole stream

As usual my young hotshot friend is wrong  ;D
If you only compare random samples only, you have a big chance to miss some important things like audio description for instance. The sound of both streams is ALMOST alike, only now and then the AD stream contains additional infos.
Taking 18 short samples of less than 100ms is almost a guarantee to miss them...


www.cypheros.de