Text Output with RegEx builder

Text Output with RegEx builder

Postby mirodonie.george » Tue Apr 24, 2012 2:50 am

Hello,

I want to make a script that recognize only 14 characters.

here is my string:
SIGURARE PRIMERA nr. PRM002768564 ic5tc ccrcrcii | | 1 | | I I I I I I I oie5ic polila | 1 | 1 I ! I ! : I I I CNP \J^J'yfjMi£l3-^^A2\M. ™e\!\OAA/ Af?,fi^^ I M i I I I Nr Sector

and now i want at the output just give me the number "PRM002768564"

i make a script already but it doesn't work.

here is the script below:
Code: Select all
REM ========================================================================
REM
REM   Recognition Script: asd
REM
REM   Recognition scripts can be utilized to augment or replace the default
REM   recognition capabilities provided by Ascent Capture.  These scripts provide
REM   function calls before and after Kofax recognition.  Each of these
REM   functions can examine and/or modify the recognition results.
REM
REM ------------------------------------------------------------------------

Option Explicit

Function RegexSearch(strSearch, strPattern) As String
   Dim oRegEx As Object
   Dim oMatches As Object
   
   Set oRegEx = CreateObject("VBScript.RegExp")
   oRegEx.IgnoreCase = False
   strPattern="\w{3}\d{9}"
   oRegEx.Pattern = strPattern
   Set oMatches = oRegEx.Execute(strSearch)

   If oMatches.Count > 0 Then
      RegexSearch = oMatches(0).Value
   End If
End Function


REM ========================================================================
REM Kofax script return codes. Do not modify or pass back a different value.
REM ------------------------------------------------------------------------

   ' Indicates fatal error.  Batch is set to error state.

const FatalError           = -1

   ' Indicates successful operation.  When returned by the KfxPreRecognition
   ' function, the application continues to perform Kofax recognition
   ' (if enabled) and then call the KfxPostRecognition function.  When returned
   ' by the KfxPostRecognition function, processing continues with the next
   ' recognition area.

const NoError              = 0
 
   ' When returned from the KfxPreRecognition function, the application saves
   ' the recognition information and skips the Kofax recognition.  The
   ' KfxPostRecognition function is still called in this case.

const SaveAndSkipOperation = 80

REM ========================================================================
REM Function handling initialization for this module.
REM This function is called the first time the script is referenced in a
REM batch.  The script stays loaded until the batch is complete.
REM ------------------------------------------------------------------------

Function KfxLoad(  ) As Integer
   On Error GoTo Failure

   ' Insert initialization code here
   
   KfxLoad= NoError
   Exit Function

Failure:
   KfxLoad= FatalError
   Exit Function
End Function


REM ========================================================================
REM Function handling termination of this module.
REM This function is called upon end of processing for the batch.  The
REM function is called once per batch and is the last function to be
REM called in this module.
REM ------------------------------------------------------------------------

Function KfxUnload ( ) As Integer
   On Error GoTo Failure

   ' Insert cleanup code here
   
   KfxUnload = NoError
   Exit Function

Failure:
   KfxUnload = FatalError
   Exit Function
End Function

REM ========================================================================
REM Module variables specific to PreRecognition and PostRecognition functions
REM
REM NOTE:
REM   These variables are not valid within routines above this point
REM   in the module.
REM ------------------------------------------------------------------------

   ' The TIFF image file representing the zone snippet to be recognized.
   ' This parameter is read-only.  This is a temporary file which is only valid
   ' while processing this zone.  The image contains the zone being processed only
   ' (not the whole page).  Any image processing for the recognition type has already
   ' been performed.  The user should NOT update this image file.  Doing so is not
   ' supported and may have unpredictable consequences.
   
   ' Note: This variable is commented for performance reasons.  Generating
   ' the zone tiff file accounts for about 10% of zone execution time.  In most
   ' cases, the zone image does not need to be generated for built-in engine
   ' execution.  Uncommenting and using this variable forces the zone image
   ' to be created every time.  Therefore, do not uncomment this variable
   ' unless you actually need access to the image.
   
'Dim KfxImageFile as String


   ' The current recognized value.  This value will be null in the PreRecognition
   ' function.  It may be changed in PreRecognition, but will be overwritten
   ' by the Kofax recognition engine (if any) unless the KfxPreRecognition returns
   ' SaveAndSkipOperation.  The KfxPostRecognition function receives the value
   ' generated by the KfxPreRecognition function, or the value that has been
   ' overwritten by the Kofax recognition (if Kofax recognition is enabled).

   
Dim KfxValue as String


   ' The confidence as a number between 0 and 100 with 100 being most
   ' confident.  If KfxValue is changed, then the confidence must be set
   ' to some value as well representing the engine's confidence level
   ' of the value.
   ' If the engine has no metric like this, then always set this value to 100.

 
Dim KfxConfidence as Long

   ' The search text is the expected search string set in the Administration module
   ' for a separation, form identification, or registration zone.

   ' Additionally, if the field type for an index zone has exactly
   ' one suggested value, and is marked as Force Match, then the search
   ' text exposes that value.  Changes to the KfxSearchText variable are ignored.
 
Dim KfxSearchText as String

   ' The following two variables can be used to retrieve and/or modify the registration
   ' offsets returned by the default engine.  These variables define the offset of the lower-left
   ' pixel for the first character in the search text found in the zone.  The offsets are defined
   ' from the upper-left of the zone and are in 1/1200'ths of an inch (BMU's). 

   ' Registration offsets are only produced by the Kofax OCR and Kofax High Performance OCR/ICR engines.
   ' If you want to design a custom profile that produces registration offsets, then you must
   ' derive your custom profile from the profile of one of those engines (if desired, you can skip the default
   ' engine logic by using the SaveAndSkipOperation return result from the KfxPreRecognition function).

   ' Registration offsets may be generated with any type of coordinates
   ' but the values are ignored unless the zone is a registration zone.

   ' If you are customizing a registration zone and wanting to guarantee success, the script must return
   ' with the KfxValue set to KfxSearchText, KfxConfidence set to 100, and the two
   ' offset variables set to a value >= 0.

   ' Note: These values default to -1 if the default engine has not yet been run, or does not generate
   ' offsets
 
Dim KfxRegistrationHorizBMU as Long
Dim KfxRegistrationVertBMU as Long


REM ========================================================================
REM Function handling recognition prior to Kofax recognition
REM
REM NOTE:
REM   We recommend returning SaveAndSkipOperation if you actually perform
REM   your recognition at this stage.  This will cause Kofax recognition to
REM   be skipped.  The KfxPostRecognition function is still called.
REM ------------------------------------------------------------------------

Function KfxPreRecognition As Integer
   On Error GoTo Failure

   ' Insert user recognition engine here if you want it to be executed
   ' before the Kofax recognition engine.
   
   KfxPreRecognition = NoError
   Exit Function

Failure:
   KfxPreRecognition = FatalError
   Exit Function
End Function


REM ========================================================================
REM Function handling recognition after Kofax recognition
REM ------------------------------------------------------------------------


Function KfxPostRecognition As Integer
   On Error GoTo Failure

   ' Insert user recognition engine here if you want it to be executed
   ' after the Kofax recognition engine.
 
   KfxPostRecognition = NoError
   Exit Function

Failure:
   KfxPostRecognition = FatalError
   Exit Function
End Function



Can anyone help me to fix the problem?

Thanks,

George Mirodonie
mirodonie.george
Participant
 
Posts: 9
Joined: Tue Mar 27, 2012 1:10 am

Re: Text Output with RegEx builder

Postby aadslingerland » Tue Apr 24, 2012 4:35 am

The function you have defined (RegexSearch)... where is it called?
With Regards, Aad Slingerland
User avatar
aadslingerland
Participant
 
Posts: 299
Joined: Tue Sep 22, 2009 12:23 am

Re: Text Output with RegEx builder

Postby dkekesi » Wed Apr 25, 2012 1:57 am

The RegEx you're looking for is
\b[A-Z]{3}\d{9,11}\b
This will match any whole word that begins with 3 uppercase characters followed by at least 9 but not more than 11 digits. Your RegEx will match false positives as well, so be careful (i.e. your RegEx will return "fgH123456789" from this string: "ABCDEfgH123456789012345").
Are you aware that you can enter this RegEx into the Recognition profile window in which case only the match will be returned? You can save yourself some scripting this way.
Best Regards,

Daniel Kekesi
DocSoft Hungary
Image
dkekesi
Participant
 
Posts: 2569
Joined: Thu Dec 08, 2005 12:56 am
Location: Budapest, Hungary

Re: Text Output with RegEx builder

Postby rpapa » Wed Apr 25, 2012 2:41 pm

dkekesi wrote:Are you aware that you can enter this RegEx into the Recognition profile window in which case only the match will be returned? You can save yourself some scripting this way.


where? I thought regex were only available in the separation and form, for barcodes?
rpapa
Participant
 
Posts: 3552
Joined: Mon Mar 13, 2006 12:00 pm
Location: Livonia, Michigan

Re: Text Output with RegEx builder

Postby dkekesi » Wed Apr 25, 2012 10:34 pm

rpapa wrote:
dkekesi wrote:Are you aware that you can enter this RegEx into the Recognition profile window in which case only the match will be returned? You can save yourself some scripting this way.


where? I thought regex were only available in the separation and form, for barcodes?

My bad... sorry, you're right!
Best Regards,

Daniel Kekesi
DocSoft Hungary
Image
dkekesi
Participant
 
Posts: 2569
Joined: Thu Dec 08, 2005 12:56 am
Location: Budapest, Hungary

Re: Text Output with RegEx builder

Postby mirodonie.george » Tue Jun 12, 2012 2:27 am

I was asking where to put the main in my SBL to give me the answer that I''m looking for
mirodonie.george
Participant
 
Posts: 9
Joined: Tue Mar 27, 2012 1:10 am

Re: Text Output with RegEx builder

Postby David » Tue Jun 12, 2012 6:53 am

mirodonie.george wrote:I was asking where to put the main in my SBL to give me the answer that I''m looking for

The main? That sounds odd. There is no 'main' the SBL validation code framework.
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: Text Output with RegEx builder

Postby mirodonie.george » Wed Jun 13, 2012 12:54 am

The main or the subMain to give me only the value that I want
mirodonie.george
Participant
 
Posts: 9
Joined: Tue Mar 27, 2012 1:10 am

Re: Text Output with RegEx builder

Postby Hando Penu » Wed Jun 13, 2012 4:25 am

Hi!

the sub main is usable only for debugging. it is never launched by validation client. You need to use events, that are present in code, that is automatically generated. There are events for batch open and close, document open and close, pre and postfield events. You have to choose one, that suits You - possibly pre or post document event. And do not forget to move all field variable definitions to the beginning of the code, so they are accessible to all.

Hando
Hando Penu
Participant
 
Posts: 362
Joined: Thu Jul 17, 2008 9:42 pm

Re: Text Output with RegEx builder

Postby mirodonie.george » Fri Jun 15, 2012 5:05 am

Hello,

thanks for the reply. I saw that you know how to do it, can you help me with some code. I put what I've found on the forum and nothing is happening yet.
All the code is up there. I try to make another code but it is worst.
Maybe you can help me pleaseeeeeeeeeeeee
mirodonie.george
Participant
 
Posts: 9
Joined: Tue Mar 27, 2012 1:10 am

Re: Text Output with RegEx builder

Postby David » Fri Jun 15, 2012 6:05 am

Hi
I think this sticky thread being very educational on regex and how to use it in SBL:
viewtopic.php?f=65&t=2542&start=0

Stephan Mayer did a great job on that one.

Aad Slingerland asked where you called the RegexSearch function. I also did wonder that. As far as I can see now in the code snippet posted, you do not call this function anywhere, thus it is never used (it is only defined, but essentially it is dead code as long as it is never called upon).

As Hando wrote; "You have to choose one, that suits You".

Your RegexSearch function is declared to return a string.
I would guess you could try to add a call in an index field post event for example. But I think it is safest and best you figure out what event is the best one for you in this case.

Something like this perhaps:
Code: Select all
strNewString = RegexSearch (strOldstring,"\w{3}\d{9}")


I noticed you add a fixed pattern inside the RegexSearch function even if you have a parameter for this in the functional call. Not an error per se, but it is asking for trouble. In my example, I pass the pattern in the function call (it is still hard coded, yes, but at least not in the function itself).
Last edited by David on Fri Jun 15, 2012 6:40 am, edited 1 time in total.
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: Text Output with RegEx builder

Postby David » Fri Jun 15, 2012 6:31 am

Must RegEx be used?

IF "SIGURARE PRIMERA nr. " is found as header all the time, you can get the PRM002768564 simply by using MID.
Simply ignore the 21 first characters, cut out 12 characters after these.

This only works if you get the same positions all the time.

Here is a small piece of code I wrote just now:

You can paste it into the stand alone SBL editor, save, compile and then run it. As I run it stand alone here, sub main is used.
SBL used like this make some step by step debugging possible. Press F8 instead of F5 to follow the program instructions - the code - one instruction at the time. It is very educational for learning SBL Basic I think.

F5 runs it all on the spot, F8 allows debugging and checking out variables on the fly (be sure to check Variables in the Windows meny inside the SBL Editor).


Code: Select all
Function RegexSearch(strSearch, strPattern) As String
   Dim oRegEx As Object
   Dim oMatches As Object
   
   Set oRegEx = CreateObject("VBScript.RegExp")
   oRegEx.IgnoreCase = False

   oRegEx.Pattern = strPattern
   Set oMatches = oRegEx.Execute(strSearch)

   If oMatches.Count > 0 Then
      RegexSearch = oMatches(0).Value
   End If
End Function


sub main

   dim strOldValue as string
   
   dim strNewValue1 as string
   dim strNewValue2 as string
   
   dim strPattern as string
   
   
   strPattern="\b[A-Z]{3}\d{9,11}\b"   
   strOldValue ="SIGURARE PRIMERA nr. PRM002768564 ic5tc ccrcrcii "
   
   
   strNewValue1 =trim(strOldValue )
   strNewValue1 =mid(strNewValue1 ,22,12)

   
   strNewValue2 =trim(strOldValue )
   strNewValue2 = RegexSearch(strNewValue2,strPattern )

   msgbox "Mid:    " & strNewValue1 & chr(13) & "Regex: " &  strNewValue2 ,64, "Mid result & Regex result"
   
end sub


Thanks to Daniel Kekesi for providing the Regex pattern.
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am

Re: Text Output with RegEx builder

Postby mirodonie.george » Thu Jun 21, 2012 12:47 am

Thank you verry much for the reply.

I'm very new in this stuff and I don't know yet how to run a script on a regex.
I don't know where to put it into my SBL all the code to work and the problem that I ask is if the text is variable, not always the same text with the same return after I run a test on the OCR zone.
I have a regex builder and I make a regex for what I want and now I want to implement that in the SBL.
I tried to put the regex in the field and the exit is:

"CERERE DE ASIGURARE PRIMERA inlocuic5tc ccrcrca I I 1 I Reinnoie5ic polila | 1 | 1 nr. PRM002768564 -J- Preiiuine\I\04A/ L^/?/^^^^ M I 1 I M r Localitaie i//j/V/ I I Nr Al Sector cara 'i.^L3LUJ Etaj Apananieiu Telefon fix: I Adresa de doiniciliu esie diferiiS de cea de corespondenta IGURATE mai sus meniionaia ^ Da [j Nu (se va completa mai jos) I' Localitaie ' | ^ i i i .11 I Nr Sector :ara Etaj Apanamciii I Apartament \y<S Casa f Suprafaia uliia desfSsurata (mp) {^-^ | | Nr. caniere | Regimul de baipme P+ I din uniiaioarele tipuri de cladiri; a) consmiiie inainie de anul 1940; n C111 Akn miiniS/r-himioi- Inf<iHMtf> in j-l<iCAlf> Af^ rice cpicmif I 11 111 na rSa Nil"

all i want is to put me out the "PRM002768564 "

with the regex put into the field "Content" it gives me out:

"CERERE DE ASIGURARE PRIMERA I I I R PRM J CNP JV A\ J\ M P\I\A L M I I M L V I I N A S LLUJ E A T I Ad d d d d d d IGURATE D [ N I L I N S E A I A \S C S dS N R d P I d d d d C A S IHM CA A I S N"

Thanks again
mirodonie.george
Participant
 
Posts: 9
Joined: Tue Mar 27, 2012 1:10 am

Re: Text Output with RegEx builder

Postby David » Wed Jun 27, 2012 12:46 pm

Hi!
Instead of doing this in the Recognition script, try out in the Validation script and on the index field at hand (post or pre event). If you are already using the Validation module this is very easy to test without any noticeable changes of the workflow.
I may be wrong, but I believe it is easier for a beginner to try this before trying out optimizing in the recognition script (you wrote yourself you are new to this).

Add your general purpose regex function to the validation script. Be sure to add it early on (before the function is called upon), as SBL does top down compile of the code.

Try take the Kofax input field value as input to the function call and the return value of the function call as new value to the index field value. I think I showed how to call the function in my earlier post, just use the Kofax index field variables instead while doing this inside the Kofax Capture Validation script framework. You must call - trigger - the regex function, else is it just dead code and never doing anything.

I did try your latest sample data on the very same code snippet of mine and the result of the regex was perfect. Obviously, using MID is a bad idea in your case.
My guess is that you never run the regex function...
If you make it idiot-proof, someone will make a better idiot.
User avatar
David
Participant
 
Posts: 1512
Joined: Wed Dec 07, 2005 4:08 am


Return to Release Scripts General Discussion

Who is online

Users browsing this forum: No registered users and 4 guests

cron