-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different accuracy between paper and competition website #13
Comments
Good question! 3 major points make different accuracy between our paper and ICDAR challenge.
Best, |
@ku21fan Understood. Thanks for the clarification. |
@brianliu3650 [Bigger model configuration] P.S. we used different character sets from our paper ( deep-text-recognition-benchmark/train.py Line 251 in 16c3aad
--sensitive mode results in
Best |
@ku21fan |
@tjdevWorks Thank you for your attention to our works. We referred to MJSynth and SynthText, and their code to generate synthetic data. Best |
@ku21fan |
@hoainamken Thank you for your attention to our works :) Best |
@ku21fan
Should I generate more synthetic data or reduce the complexity of the model instead. May I ask what would you do in this situation? |
@hoainamken
Hope it helps :) |
@ku21fan |
@hoainamken |
@ku21fan |
what is the learning rate for adam, In addition, I noticed that the learning rate decay is not used when training. |
You have mentioned that opt.character = 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ However, the output of your method on the competition website seems to contain characters not present in the list, like chinese characters, for example: Did you train the model on multi-language datasets, too? |
@klarajanouskova Hello, No, I did not train with this model on multi-language dataset. The character that you mentioned is '몲' Secondly, I just replace [UNK] token with '몲', because [UNK] is counted as 5 characters. Best |
@ku21fan Thanks a lot for the explanation! |
|
Hi, Thanks for your awesome work. I'd really like to know where I can find the generation code for MJSynth and SynText? And could you share how you modify the code to generate the training data that you use in ICDAR contest? Best. |
Hello @ku21fan, I have scanned images of electronic theses and dissertations (ETDs) and it contains the typewritten text. I used this website (https://github.com/clovaai/deep-text-recognition-benchmark) to perform OCR. Based on the instruction, it seems it only works on ICDAR and Imdb datasets. Correct me if I am wrong. I tried demo.py on the scanned ETDs and it returns word per ETDs with a low confidence score. If I am not using the right URL, could you please provide me the link which does general OCR? I also found this website (https://clova.ai/ocr) which does the general OCR. So, is the General OCR not released yet? |
Hi Author,
Great work from you, and thanks for the sharing.
I noted that the accuracy of your best model on IC13 is 93.6% in the paper, while it's 95.98% on the robust reading competition website.
Could you please explain about this difference?
Thanks.
The text was updated successfully, but these errors were encountered: