Cypheros Transportstream Forum

Eng English-speaking Support => TS-Doctor 3.x => Thema gestartet von: Anders2 am Februar 01, 2020, 23:21:20

Titel: Possible to replace character during OCR scan?
Beitrag von: Anders2 am Februar 01, 2020, 23:21:20
First I just want to say thanks for a great application!

When I cut and OCR scan my recordings I would like to add a global replace of a character occurance, is that possible?
I tried to add it to the swedish list of known words but that doesn't seem to work.

The OCR engine translates the dash (denoted by 0x2D) to another char that is longer (0xE28094) and it also always trigger the first word as faulty so I have to replace it manually for these channels which takes some extra time for subtitles.
Is it possible to trigger a global replace so the OCR engine does that job for me?
Titel: Re: Possible to replace character during OCR scan?
Beitrag von: Cypheros am Februar 09, 2020, 10:33:55
We will check that and try to find a solution.

Which channels are affected?
Titel: Re: Possible to replace character during OCR scan?
Beitrag von: Anders2 am Februar 10, 2020, 18:41:32
C More channels
Titel: Re: Possible to replace character during OCR scan?
Beitrag von: Cypheros am Februar 19, 2020, 00:44:17
That is a problem. I checked the source code and the dash is exclude with a purpose, because in most cases the characters l,1,i,I are mixed up.
As the dash is not found that often in subtitles, we exclude that character.
If we change that we will have a lot of mistakes where a dash sits in the wrong place.
Titel: Re: Possible to replace character during OCR scan?
Beitrag von: Anders2 am Februar 19, 2020, 18:58:37
Ok, I understand that.
The problem as it is now (at least with the swedish wordlist) is that if it is 0xE28094 the first word is always marked as incorrect, even if it is correct and already in the wordlist.

Could it be handled beside the wordlists of different languages?
For example in a file in the installer directory where I specify global character code replacements that are only valid for myself?

Another thing I notice on the same channels (maybe others, don't remember now) is that the time set for the last subtitle text row is always set to 1.0 seconds independent of it's text length.
Has anyone else seen that or is it only me?