Use Image Classification to ID Handwritten documents?

Use Image Classification to ID Handwritten documents?

Postby bollingm@saic.com » Tue Dec 29, 2009 3:53 pm

I have an issue where I have handwritten correspondence (think letters on notebook paper or plain paper). All I want to be able to do is have KTM be able to ID the document as a HW page. I do not need/want to extract data if that matters. I have the HW docs as a subclass to a parent class as well.

Is this possible? I have examples, to train on but how many is too many ( I would be curious to this answer in general for both content and image based classification)?

Thanks for any ideas,

MikeB.
bollingm@saic.com
Participant
 
Posts: 380
Joined: Wed Dec 06, 2006 4:50 pm

Re: Use Image Classification to ID Handwritten documents?

Postby bollingm@saic.com » Thu Jan 07, 2010 2:13 am

bump.
bollingm@saic.com
Participant
 
Posts: 380
Joined: Wed Dec 06, 2006 4:50 pm

Re: Use Image Classification to ID Handwritten documents?

Postby Hando Penu » Fri Jan 08, 2010 5:42 am

Hi!

One idea is to check the OCR results and if it is complete garbage, classify it as handwritten document?

Hando
Hando Penu
Participant
 
Posts: 362
Joined: Thu Jul 17, 2008 9:42 pm

Re: Use Image Classification to ID Handwritten documents?

Postby dkekesi » Tue Jan 12, 2010 9:38 am

Check the confidence level of OCR. Well printed documents give results over a certain threshold (like 90%). If the document is below that, you can assume it's handwritten (or empty with some noise).
Best Regards,

Daniel Kekesi
DocSoft Hungary
Image
dkekesi
Participant
 
Posts: 2569
Joined: Thu Dec 08, 2005 12:56 am
Location: Budapest, Hungary

Re: Use Image Classification to ID Handwritten documents?

Postby radanmi » Wed Aug 20, 2014 1:27 am

Where can I find the confidence level of the OCR? Can anybody help? I need for similiar funcionality mentioned above.
Thanks a lot
radanmi
Participant
 
Posts: 5
Joined: Thu Mar 13, 2008 2:25 am

Re: Use Image Classification to ID Handwritten documents?

Postby David Wright » Wed Aug 20, 2014 7:02 am

Radan, which confidence are you looking for?
The latest fixpacks for KTM6 now have page level OCR confidences in pXDoc.Words(w).Confidence.
Are you looking for the word level confidence from a Advanced Zone Locator?
Code: Select all
pXDoc.Locators.ItemByName("AZL").Alternatives(a).Subfields.ItemByName("FirstName").Confidence

Are you looking for the character level confidence from an Advanced Zone Locator?
Code: Select all
pXDoc.Locators.ItemByName("AZL").Alternatives(a).Subfields.ItemByName("FirstName").Chars(c).Confidence


Why are you looking for this confidence? What problem do you hope to solve?

If you are trying to detect whether the document has a lot of handwriting you are better off using "Mixed Print" Full Page Recognition Profiles.
The Mixed Print profile creates Boxes around all handwritten words on the document - these boxes are then sent to a handwritten engine, and the rest of the document to machine print engine.

So if
Code: Select all
pXDoc.Representations.ItemByName("Mixed").Boxes.Count
is very small there is little handwriting on the document....
You can also use this to find a signature on the document if you expect that to be on the only handwritten mark on the page.

How to see handwritten boxes.
-Right click on a Xdoc and select "Recongize/MixedPrint".
-Right click on a Xdoc and select "Open in XDoc Browser".
-Expand CDoc/Representations/Representation0:MixedPrint
-Click on "Boxes" and you will SEE the boxes on the image to the right.
David Wright
Participant
 
Posts: 18
Joined: Mon Jul 17, 2006 3:34 am
Location: Vienna


Return to Kofax Transformation Modules General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest