Wrong subtitles letters

Begonnen von Webshows2002, Juni 14, 2021, 19:24:42

« vorheriges - nächstes »

Webshows2002

When I have a .TS file and cut and ripping subtitles from it TS Doctor shows wrong "å" it shows an "a" and letter "ö" shows as a "6" and letter ä shows as a "a" What is I making wrong?, It´s not understand "Swedish" even I have choosed UTF8...

Cypheros

#1
You left out the most important part of the screenshot. The subtitle language is Swedish but what language is the OCR engine using?


To have good results, the OCR engine should use the correct language specific training data. I guess in your case the wrong OCR language is used.
Swedish training data is not part of the standart installation but TS-Doctor should try to download missing training data for your language, if you are connected to the internet.

Webshows2002

Is yhis the way "I have choosen swedish this way" and I have say yes the download "training"

Webshows2002

#3
Now Im think I now what "OCR Enginge"... is it this pic u want
and no "swe" there how I can fix that ?
everything in settings is choosed swedish (I think)

I see u have another post on this matter and that solution
not working for me, so I gonna try uninstall an reinstall TS doctor
instead of Ä I get a (and without download traine... is that an é

or maby I go to an earlier versiob (That maby working on december.... version)

Cypheros

You see, that "swe" OCR data is not available.

As soon as you save a file (Spara som ny fil) with subtitles and the file saving is finished, the OCR engine starts up. If the DVB language of the subtitles is not supported by the available OCR languages, you get a dialog where you can download the missing files. Just click on "Ja" and the needed files will be downloaded. After the download the OCR should continue and give you far better results.

Webshows2002

#5
I have giving up even if I have choose "yes" it does downloads train-file and I get no swedish å-ä-ö, anyway I give up
u wanna try (file is 3.5 GB, I can cut it the 0,5 GB and send to you...? so u can test

Cypheros

#6
No, Swedish OCR file is just 7 MBytes.

Please send the application report via email to support(at)cypheros.de .
You can find the report in the menu under Help/Application Report.

Webshows2002

THE TS-file I use is 3.5 GB nothing else

Cypheros

Sorry, misunderstanding. Yes, if you can upload a sample, I can check the Swedish subtitle conversion.

Webshows2002

#9
Solution to this (I solved it)
download "swe.traineddata" from github and put it in the ocr/tessdata mapp in
programdata/cypheros/TsDoc3....
the "trainfile" size is 13MB.
in https://github.com/tesseract-ocr/tessdata/blob/master/swe.traineddata

then start ts doctor and choose DVB Swedish subs in preferencers

seems it´s not working if TS doctor download it when it´s needed.!

Cypheros

Did a test with your sample file and the download of the Swedisch training data worked.




Either you have no internet connection, blocked TS-Doctor with your firewall or disabled the download dialog by accident.

Webshows2002

I have internet and it´s downloading swe.train -file, but it´s not jumping in to TS Doc. so I do the other way and have the file in "map" that´s works, (and I have find a file that is just under 7MB works excellent)Sometime I gonna install win 10 again and maby it jumps to work proberly.! THX  ;)


www.cypheros.de