TS-Doctor 3.0 www.cypheros.de

Autor Thema: Wrong subtitles letters  (Gelesen 501 mal)

Webshows2002

  • Newbie
  • *
  • Beiträge: 11
  • DVB User
Wrong subtitles letters
« am: Juni 14, 2021, 19:24:42 »
When I have a .TS file and cut and ripping subtitles from it TS Doctor shows wrong "å" it shows an "a" and letter "ö" shows as a "6" and letter ä shows as a "a" What is I making wrong?, It´s not understand "Swedish" even I have choosed UTF8...

Cypheros

  • Administrator
  • Hero Member
  • *****
  • Beiträge: 8445
    • Cypheros Software Seite
Re: Wrong subtitles letters
« Antwort #1 am: Juni 14, 2021, 22:48:28 »
You left out the most important part of the screenshot. The subtitle language is Swedish but what language is the OCR engine using?


To have good results, the OCR engine should use the correct language specific training data. I guess in your case the wrong OCR language is used.
Swedish training data is not part of the standart installation but TS-Doctor should try to download missing training data for your language, if you are connected to the internet.
« Letzte Änderung: Juni 14, 2021, 22:50:48 von Cypheros »

Webshows2002

  • Newbie
  • *
  • Beiträge: 11
  • DVB User
Re: Wrong subtitles letters
« Antwort #2 am: Juni 15, 2021, 00:35:48 »
Is yhis the way "I have choosen swedish this way" and I have say yes the download "training"

Webshows2002

  • Newbie
  • *
  • Beiträge: 11
  • DVB User
Re: Wrong subtitles letters
« Antwort #3 am: Juni 15, 2021, 12:25:26 »
Now Im think I now what "OCR Enginge"... is it this pic u want
and no "swe" there how I can fix that ?
everything in settings is choosed swedish (I think)

I see u have another post on this matter and that solution
not working for me, so I gonna try uninstall an reinstall TS doctor
instead of Ä I get a (and without download traine... is that an é

or maby I go to an earlier versiob (That maby working on december.... version)
« Letzte Änderung: Juni 15, 2021, 12:41:28 von Webshows2002 »

Cypheros

  • Administrator
  • Hero Member
  • *****
  • Beiträge: 8445
    • Cypheros Software Seite
Re: Wrong subtitles letters
« Antwort #4 am: Juni 15, 2021, 14:18:08 »
You see, that "swe" OCR data is not available.

As soon as you save a file (Spara som ny fil) with subtitles and the file saving is finished, the OCR engine starts up. If the DVB language of the subtitles is not supported by the available OCR languages, you get a dialog where you can download the missing files. Just click on "Ja" and the needed files will be downloaded. After the download the OCR should continue and give you far better results.

Webshows2002

  • Newbie
  • *
  • Beiträge: 11
  • DVB User
Re: Wrong subtitles letters
« Antwort #5 am: Juni 15, 2021, 15:57:08 »
I have giving up even if I have choose "yes" it does downloads train-file and I get no swedish å-ä-ö, anyway I give up
u wanna try (file is 3.5 GB, I can cut it the 0,5 GB and send to you...? so u can test
« Letzte Änderung: Juni 15, 2021, 16:00:23 von Webshows2002 »

Cypheros

  • Administrator
  • Hero Member
  • *****
  • Beiträge: 8445
    • Cypheros Software Seite
Re: Wrong subtitles letters
« Antwort #6 am: Juni 15, 2021, 19:14:44 »
No, Swedish OCR file is just 7 MBytes.

Please send the application report via email to support(at)cypheros.de .
You can find the report in the menu under Help/Application Report.
« Letzte Änderung: Juni 15, 2021, 19:17:26 von Cypheros »

Webshows2002

  • Newbie
  • *
  • Beiträge: 11
  • DVB User
Re: Wrong subtitles letters
« Antwort #7 am: Juni 15, 2021, 19:56:22 »
THE TS-file I use is 3.5 GB nothing else

Cypheros

  • Administrator
  • Hero Member
  • *****
  • Beiträge: 8445
    • Cypheros Software Seite
Re: Wrong subtitles letters
« Antwort #8 am: Juni 15, 2021, 20:50:37 »
Sorry, misunderstanding. Yes, if you can upload a sample, I can check the Swedish subtitle conversion.

Webshows2002

  • Newbie
  • *
  • Beiträge: 11
  • DVB User
Re: Wrong subtitles letters
« Antwort #9 am: Juni 17, 2021, 19:04:59 »
Solution to this (I solved it)
download "swe.traineddata" from github and put it in the ocr/tessdata mapp in
programdata/cypheros/TsDoc3....
the "trainfile" size is 13MB.
in https://github.com/tesseract-ocr/tessdata/blob/master/swe.traineddata

then start ts doctor and choose DVB Swedish subs in preferencers

seems it´s not working if TS doctor download it when it´s needed.!
« Letzte Änderung: Juni 17, 2021, 19:07:14 von Webshows2002 »

Cypheros

  • Administrator
  • Hero Member
  • *****
  • Beiträge: 8445
    • Cypheros Software Seite
Re: Wrong subtitles letters
« Antwort #10 am: Juni 18, 2021, 01:21:18 »
Did a test with your sample file and the download of the Swedisch training data worked.




Either you have no internet connection, blocked TS-Doctor with your firewall or disabled the download dialog by accident.

Webshows2002

  • Newbie
  • *
  • Beiträge: 11
  • DVB User
Re: Wrong subtitles letters
« Antwort #11 am: Juli 01, 2021, 11:43:29 »
I have internet and it´s downloading swe.train -file, but it´s not jumping in to TS Doc. so I do the other way and have the file in "map" that´s works, (and I have find a file that is just under 7MB works excellent)Sometime I gonna install win 10 again and maby it jumps to work proberly.! THX  ;)

 


www.cypheros.de