problems with ANSI support?

problems with ANSI support?

Postby David » Fri May 04, 2012 9:39 am

Hello!
KC 9, SP3 fix, pack 3.
We just discovered a problem in one Text Export connector implementation. At this stage, it seems the Export/Release does not handle ANSI is the same way as the previously used Ascent Capture Text Plus. National characters get messed up in the index file, including characters for the index field names themselves.

Using UTF 8 or UTF 16 most probably not an option. The default, ANSI, is expected to work as the previously used release script and the receiving systems can't be changed.
Has anyone noticed a change in the default setting on the Text connector? As I remember it, ANSI was inheritedly used by default in the Text and Text Plus release scripts.
Funny thing is, it seems Text Export connectors used in other batch classes with (ANSI) on the same site and the same clients work as expected.
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: problems with ANSI support?

Postby russell@centuryc.com » Fri May 04, 2012 11:10 am

David wrote:Funny thing is, it seems Text Export connectors used in other batch classes with (ANSI) on the same site and the same clients work as expected.


I just looked at the Kofax Text export script. There's a setting in there for "Encoding". (Choices ANSI, UTF8, UTF16) Are you sure you have it set the same way for the batch in question?
Russell
russell@centuryc.com
Participant
 
Posts: 3374
Joined: Wed May 17, 2006 12:53 pm
Location: USA

Re: problems with ANSI support?

Postby David » Tue May 08, 2012 9:26 am

Thanks for your reply.

ANSI is the default setting as far as I can tell, but I did double check this specific setting while trouble shooting and ANSI was indeed still there.

I installed a Text Plus release script (unmodified) on the same document class as the one having problems now, and the index file from this release script seems fine.

I can't understand how an instance of the Text Export connector can create a different result with the same ANSI setting as another instance of the Text Export connector on other batch classes. I don't understand how or why this can happen. The batch classes working and the faulty ones were probably made at different times and may have had different Kofax SP level when build... I have no idea if this matters or not.
We may end up in an interim solution where the Text Plus Release is providing the index file and the Text Export connector does the rest (image, full text OCR etc.).

Hm, what else...?
This particular batch class is actively using rubber band ocr (with national settings used in the profile instead of the default setting), this feature is not used elsewhere.
And, so I notice, the Text Export is used for both index file and full text ocr output. When I think about it, I belive I use to have those sepearated in two Text Export instances on the very same document class. One having a configuration so to not create an index file and not have image output. Only the full text ocr. This faulty one, does it all, image, index file and full text ocr output.

Hm, can national settings on either rubber band ocr or Text Export full text ocr 'polute' /mess up the character tables used for the ANSI setting in the given Text Export intance? Again, the old Text Plus release script worked fine with the index file on the same batch class/document class.
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: problems with ANSI support?

Postby russell@centuryc.com » Tue May 08, 2012 11:20 am

David wrote:ANSI is the default setting as far as I can tell, but I did double check this specific setting while trouble shooting and ANSI was indeed still there.
[...]
I can't understand how an instance of the Text Export connector can create a different result with the same ANSI setting as another instance of the Text Export connector on other batch classes. I don't understand how or why this can happen.


You are far too trusting. ;) I've had programs lie to me before. (Although I can't remember any Kofax programs doing that.) I think you're assuming that the GUI is an accurate indicator of what's going on inside the program. You may want to "toggle" the setting and see if that fixes it.

You might want to dig though the XML of the exported batch class to see what the internal settings are.

Also, I'm not sure as you can change encoding in a file. If for some reason the file it's appending to is UTF, it may have to go along with UTF even if the settings are ANSI.
Russell
russell@centuryc.com
Participant
 
Posts: 3374
Joined: Wed May 17, 2006 12:53 pm
Location: USA

Re: problems with ANSI support?

Postby russell@centuryc.com » Tue May 08, 2012 11:23 am

Also, what is the source of the index values? Is it from a database and contains UTF characters? It's possible the export script won't convert certain types.
Russell
russell@centuryc.com
Participant
 
Posts: 3374
Joined: Wed May 17, 2006 12:53 pm
Location: USA

Re: problems with ANSI support?

Postby David » Wed May 09, 2012 5:26 am

russell@centuryc.com wrote:
David wrote:ANSI is the default setting as far as I can tell, but I did double check this specific setting while trouble shooting and ANSI was indeed still there.
[...]
I can't understand how an instance of the Text Export connector can create a different result with the same ANSI setting as another instance of the Text Export connector on other batch classes. I don't understand how or why this can happen.


You are far too trusting. ;) I've had programs lie to me before. .

:D true. But they should obey, should they not. Not lie.
russell@centuryc.com wrote:You may want to "toggle" the setting and see if that fixes it..
Sorry, I didn't write I tested that already. No dice. :cry:

russell@centuryc.com wrote:You might want to dig though the XML of the exported batch class to see what the internal settings are...
Good idea. Will try to do that. I haven't looked at these things, details like that, for some time tho. And never in KC9.
russell@centuryc.com wrote: If for some reason the file it's appending to is UTF, it may have to go along with UTF even if the settings are ANSI.
Good thought, maybe correct but not sure. And it should have been so that the very first index file was using ANSI. Or at least the GUI boldy declared an ANSI setting to us, the hapeless users... :roll:

russell@centuryc.com wrote:Also, what is the source of the index values? Is it from a database and contains UTF characters?
Most index fields should be plain VARCHAR in KC, data from Automatic recognition. No Database validation is used and no scripting to external data sources.Some VARCHAR fields are using forces match to value lists thou.
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: problems with ANSI support?

Postby David » Wed May 09, 2012 7:45 am

It seems to be
Code: Select all
<CustomProperty Name="Encoding" Value="US-ASCII"/>
in the admin.xml file used.

Hm. I know some can talk for ages of ANSI not being a real standard/industry standard, but nevertheless, I wonder if this current "US-ASCII" implementation maps correctly to our Windows character tables to reflect the Windows ANSI 'de facto' standard usage in Windows... As Kofax says themselves in some old kb article "The ANSI character set differs from ASCII in the extended character range (128-255).”
http://knowledgebase.kofax.com/faqsearch/results.aspx?QAID=34

The old release script from Kofax obviously does the trick correctly on the same system and on the same setting.

I have made an quick overview of the national settings on the server and client just for the sake of it, but couldn't see anything obviously wrong. Also tried running the Export module manually on the client side, and then also tried running all steps on the server side only. Also set the rubber band setting to default. Also set up full text ocr output in a separate Text Export. No difference.

A funny thing is ofc that the Text Export connector does work as wished on other batch classes/document classes on the very same clients and servers. And yeah, while looking in the admin.xml of a working batch class, I found
Code: Select all
<CustomProperty Name="Encoding" Value="US-ASCII"/>
. :|
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: problems with ANSI support?

Postby russell@centuryc.com » Thu May 10, 2012 11:46 am

Were the older batch classes imported or "inhearited"? Any other changes in the XML that you can spot?

This is starting to sound like a bug. And I know these can be hard to track down. I don't think I've used anything but the first 127 characters of ASCII, so I'd probably never notice this.
Russell
russell@centuryc.com
Participant
 
Posts: 3374
Joined: Wed May 17, 2006 12:53 pm
Location: USA

Re: problems with ANSI support?

Postby David » Fri May 11, 2012 10:27 am

russell@centuryc.com wrote:Were the older batch classes imported or "inhearited"? Any other changes in the XML that you can spot?

Good questions... This might have been the only one created after SP3 was added. Not sure tho, need to check. There have been others involved. Pity one can't see batch class history (like creation date) within the Admin GUI. :x You can not do that, can you?

I will try to look more into the XML, but not having big hopes.


I belive the full text OCR text files created are all using ANSI by the way. If like 1 in 100 fullt text OCR files gets a very odd character due to a very dirty or odd image, my current text editor informs me that the file contains characters not found in Code Page 1252 Windows Latin 1 (ANSI) and then converts them. On the 99 other files it load and display these files as ANSI.
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: problems with ANSI support?

Postby russell@centuryc.com » Fri May 11, 2012 11:21 am

David wrote:Pity one can't see batch class history (like creation date) within the Admin GUI. :x You can not do that, can you?


That would be nice. I can find "last published" date. But that's all I know how to find short of going back though the logs. (Log_YYMM.TXT)
Russell
russell@centuryc.com
Participant
 
Posts: 3374
Joined: Wed May 17, 2006 12:53 pm
Location: USA


Return to Release Scripts General Discussion

Who is online

Users browsing this forum: Google [Bot] and 1 guest