Why is OCR for handwritten content still that bad?
It seems like with the current progress in ML models, doing OCR should be an easy task. After all, recognizing handwritten numbers was one of the prime benchmarks for image recognition (MNIST was released in 1994).
Yet, when I try to OCR any of my handwritten notes all I ever get is a jumbled mess of nonsense. Am I missing something, is my handwriting really that atrocious or is it the models?
Here's a quick example, a random passage from a scientific article:
I tried EasyOCR, Tesseract, PPOCR and a few online tools. Only PPOCR was able to correctly identify the numbers and the words "J." and "Chem.". The rest is just a random mess of characters.
Edit: thank you all for shitting on my handwriting. That was not asked for, and also not helpful. That sample was intentionally "not nice" but is how I would write a note for myself. (You should see how my notes look like when I don't need to read them again, lol)
chatGPT can transcribe it perfectly, and also works on a slightly larger sample. Deepseek works ok-ish but made some mistakes, and gemini is apparently not available in my country atm. I guess the context awareness is what makes those models better in transcription, and also why I can read it back without problems.
This is challenging to read as a human. And I know I'm not the only one. So if we can't work out all the letters... no way a computer could either. I liken it to the idea that if I type out "detialed", spell check can suggest "detailed", but if I write "ditaled" it's not going to know.
I mean no offense at all, but your handwriting is not good. It's somewhat legible but that's the highest opinion I have of it. That said, maybe the dot paper is interfering with the scan?
Well, I haven't had any issues at exams with my handwriting. But if I write something for myself, and fast then it'll look somewhat like this. If I'd take my time it'll be better but that's not the point.
And that's totally fine. I didn't say you're not good. Perfect writing isn't necessary, I'm just giving my opinion since you did ask in the post whether you had bad writing.
At the end of the day, a lot of OCR models were mostly trained on typeset text, so it makes sense that a general purpose model wouldn't be very good at recognizing handwriting that looks non-standard, so to speak.
I'm pretty good at reading terrible cursive, and this is my best attempt using the letters as written
Dime stabilization for enrjies were also determined from thermodynamih integsalion of the MM-GBSA results.
I think the first one in italics should be energies, but wouldn't assume OCR would know the context to fill in the missing letters. Not sure what word that starts with thermo ends in an h or maybe a k. No idea on the one that starts with inte. I might have been able to determine those words if I was familar with the context, but OCR doesn't work that way.
As many others are saying, I can't read that handwriting. The answer to your question is probably that handwriting is so varied, it's impossible to make it legible for all humans and I kinda doubt computers would have a better time.
You seriously need to work on your handwriting. I'm impressed OCR can make out anything at all from that.
This isn't a OCR problem. This is a you problem. I'm human and I can only make out a few words.
Edit. Assuming it's yours. Or is this from the scientific article? Regardless. Whoever wrote that needs to go back to third grade and redo their writing exercises.
You took the time to spell your post correctly and use correct grammar.
I used to have very sloppy handwriting. I've come to realize that if you want other people to understand you, you do need to make an effort to be understandable.
Shortcuts in communication do not show superiority. Too many shortcuts devalue your communication, just like poor spelling and grammar would devalue your post.
I'm writing notes for myself and I can read them. When I'm writing for someone else (which rarely happens for handwritten notes) I take the time and effort to write nicer.
Also, I specifically didn't write the example carefully because the use case for me would specifically be handwritten notes I made for myself.
How else do you write them? Worth mentioning that I learned cursive in school and we had to write in cursive until like middle school when I then mostly transitioned to a happy mix of cursive and non-cursive
In a single (but not smooth) stroke, like how one would write a (mirrored) h, but where you would end the h normally, you connect it back to the bottom of the stem instead.
I learned cursive
That's even weirder that you'd do ol for d then. I'd expect you to do a single stroke o, starting at the right hand side, but upon completing the o, continue straight up to make the stem of the d.
IMO a hallmark of messy writing should be the shortcuts taken to reduce the amount of lifts of the stylus for efficiency's sake. You need to improve the efficiency of your sloppiness, to make things worse so it gets better 😂
When I write them, I do the loop anticlockwise until I reach the ascender, continue the stroke straight up to drae the ascender, then back down to put the little tail down to the baseline or continue on to the next letter
"Dimer stabilization free energies were also determined from thermodynamic integration (TI, see methods), which provide a direct validation of the MM-GBSA results."
That's perfect. Now I'm just wondering why chatGPT is apparently much better in OCR than a dedicated OCR model like EasyOCR or Tesseract.
Btw, Deepseek did a good job but not perfect. I also fed chatGPT a full page of notes and the transcription to markdown worked quite well, although not perfect. However, if I supply the same note as part of a larger pdf, it will refuse to transcribe it, stating that it's unreadable.
If I had to guess, I'd say it was the dot paper confusing the OCR reader. I suppose the LLM has some way to cancel out the dots and thereby gets a better scan of it.
They aren't just general purpose tools like tesseract are, they can be additionally trained to recognize handwritten text to become much-much better at their task. For example my kobo reader has builtin offline ocr and it works incredible, almost too good.
Also I can't recognize half of the text as well. 😄
I like dotted paper, the dots are less distracting than grids, lined paper sucks for sketches/etc. and with plain paper I'm missing guides. But I agree that on this particular one, the dots are a bit too prominent.
Here's what I got with Google Lens. Certainly some mistakes, but not "jumbled mess of nonsense."
Dimes stabilization fire einiges were also delirmed. from thermodinamik integration (I), see methods), which provide a dimict, validation of the MM. GBSA results