Design Simulation of Optical Character Recognition System Using Machine Learning
Design Simulation of Optical Character Recognition System Using Machine Learning
________________________________________________________________________________________________________
Abstract :-The purpose of this work is to review existing procedures for the interpreted character affirmation issue using AI
counts and realize one of them for a simple to utilize graphical UI (GUI) application. The guideline errands the application gives a
response for are penmanship affirmation subject to contact input, penmanship affirmation from live camera plots or an image
record, adjusting new characters, and adjusting shrewdly reliant on customer's information. The affirmation show we have picked
is a multilayer perceptron, a feed forward counterfeit neural framework, especially in perspective on its predominant on
nonlinearly separable issues. It has in like manner shown pivotal in OCR and ICR systems that could be seen as a further
expansion of this work. We had evaluated the perceptron's execution and organized its parameters in the programming language,
after which we executed the application using the identical perceptron designing, learning parameters and streamlining counts.
The application was by then attempted on a planning set containing digits with the ability to learn all together or special
characters.
IndexTerms- Dynamic Features, Offline Signature, Online Signature, Pre-processing, Signature, Static Features, Verification
________________________________________________________________________________________________________
I. INTRODUCTION
In enterprises, foundations and workplaces overpowering volume of paper-based information moves their capacity to oversee
archives and records. The vast majority of the fields are progressed toward becoming modernized in light of the fact that
Computers can work quicker and more productively than people. Administrative work is decreased step by step and PC
framework occurred of it. However, shouldn't something be said about as of now put away papers? All the early desk work is
currently turned out to be paperless work. To change over any manually written, printed or chronicled reports (for example Book)
into content report, it is possible that we need to type the entire record or sweep the archive. On the off chance that record
measure is exceptionally vast, it ends up unwieldy and tedious to type the entire report and there might be the conceivable
outcomes of composing mistakes. Another alternative is checking of record. At whatever point record estimate is exceptionally
extensive as opposed to composing, examining of archive is best. Structures can be looked over scanner. Checking is simply an
image catch of the first record, so it can't be altered or sought through in any capacity and it additionally consumes more space
since it is put away as an image. To alter or look from the archive, record must be changed over to doc (content) design. Optical
Character Recognition (OCR) is a procedure of changing over checked report into content archive so it tends to be effectively
altered if necessary and ends up accessible. OCR is the mechanical or electronic interpretation of images of written by hand or
printed content into machine-editable content. Acknowledgment motor of the OCR framework decipher the examined images and
transform images of written by hand or printed characters into ASCII information (Machine clear characters). It consumes less
space than the image of the report. The report which is changed over to content record utilizing OCR possesses 2.35KB while a
similar archive changed over to an image involves 5.38MB. So at whatever point archive estimate is huge as opposed to filtering,
OCR is ideal.
Character Recognition strategies partner a representative personality with the image of a character. Character acknowledgment
framework is arranged into two, in light of information obtaining and message type: on the web and disconnected (Figure. 2.1).
The online character acknowledgment framework use the digitizer which legitimately catch composing with the request of the
strokes, speed, pen up and pen down data [Mohamed et al., 2007]. Disconnected character acknowledgment catches the
information from paper through optical scanner or cameras. Disconnected character acknowledgment is otherwise called optical
character acknowledgment in light of the fact that the image of content is changed over in to a bit example by an optically
digitizing gadgets. If there should be an occurrence of online written by hand character acknowledgment, the penmanship is
caught and put away in computerized structure by means of various methods. Normally, an exceptional pen is utilized related to
an electronic surface. As the pen moves over the surface, the two-dimensional directions of progressive focuses are spoken to as a
component of time and are stored.It is commonly acknowledged that the on-line strategy for perceiving manually written content
has accomplished preferable outcomes over its disconnected partner. This might be credited to the way that more data might be
caught in the on-line case, for example, the bearing, speed and the request of strokes of the penmanship.
The disconnected character acknowledgment can be additionally gathered into two kinds:
iLinear i iand inon-linear ifiltering i[Gonzalez i& i iWoods, i2002] ias iwell ias imorphological ioperations iare iwidely iemployed iin inoise
ireduction.
Character iNormalization
Character inormalization iis iconsidered ito ibe ithe imost iimportant ipreprocessing ioperation ifor icharacter irecognition.iCharacter ican
ihave idifferent isizes, i ipositions i iand i iorientation.i iThe i igoal i ifor i icharacter i inormalization i iis i ito ireduce ithe iwithin-class ivariation
iof ithe ishapes iof ithe icharacters iin iorder ito ifacilitate ifeature iextraction iprocess iand ialso iimprove itheir iclassification
iaccuracy.iBasically, ithere iare itwo idifferent iapproaches ifor icharacter inormalization: ilinear imethods iand inonlinear imethods
i[Mohamed iet ial., i2007]. iIn ithis iwork ithe isegmented icharacter iimages iobtained ifrom idocument iimage iare icropped i ito i ifit i iinto i
iminimum i irectangle i iand i isubsequently i inormalized i ito i ia istandard isize i72x72 i(Figure i4.1) iusing inearest ineighbor iinterpolation
imethod i[Gonzalez i&Woods, i2002].iThe istandard isize iis ichosen iconsidering i maximum isize ireduction iwith i minimum iloss iof
iinformation iand ishape idistortion iand iis ifixed iby iempirical ianalysis.i
imethod ifor ibinarization iis ito iselect ia iproper iintensity ithreshold ifor ithe iimage iand ithen ito iconvert iall ithe iintensity ivalues iabove
ithe ithreshold ito ione iintensity ivalue, iand ito iconvert iall iintensity ivalues ibelow ithe ithreshold ito ithe ianother ichosen
iintensity.iThresholding i(Binarization) imethods ican ibe iclassified iinto itwo icategories: iglobal iand ilocal i(adaptive)
ithresholding.iGlobal i methods iapply ione ithreshold ivalue ito ithe ientire iimage.iLocal ior iadaptive ithresholding i methods iare
iperformed iin ione ipass iusing ilocal iinformation iobtained ifrom ithe iimage i[11].iThe ithreshold ivalue iis idetermined iby ithe
ineighborhood iof ithe ipixel ito iwhich ithe ithresholding iis ibeing iapplied.iAmong iglobal itechniques, iOtsu’s ithresholding itechnique
i[Otsu, i1979] ihas ibeen icited ias ian iefficient iand ifrequently iused itechnique i[23].i[19], i[14], i[11], iand i[10] iused ilocal imethods ito
𝑊0 = ∑ 𝑃𝑞 𝑟𝑞
𝑞=0
4.i i i iCompute iW1, iu0 i,u1 iand iut iwith ithe ifollowing iformulas
𝐿−1
𝑊1 = ∑ 𝑃𝑞 𝑟𝑞
𝑞=𝑘
𝑘−1
𝑢0 = ∑ 𝑞𝑃𝑞 𝑟𝑞 /𝑊0
𝑞=0
𝐿−1
𝑢1 = ∑ 𝑞𝑃𝑞 𝑟𝑞 /𝑊1
𝑞=0
𝐿−1
𝑢𝑡 = ∑ 𝑞𝑃𝑞 𝑟𝑞
𝑞=0
5.i iFind ithe iclass ivariability iλ2 ibetween iC0 iand iC1 iwhere
λ2 i= iW0 (u0 - u1)2 + iW1 (u1 - ut)2
i ii i
i
i ii i
6.i iRepeat ithe iabove isteps i(for idifferent ik) iuntil iwith imaximum iclass ivariability iis iobtained
7.iChoose iOtsu’s ithreshold ivalue i(k) iwith imaximum iclass ivariance.
Thinning ihas ibeen iused iin idifferent iapplications isuch ias i i imedical iimage ianalysis, ibubble-chamber iimage ianalysis i(a idevice ifor
iviewing i microscopic iparticles), itext iand ihandwritten irecognition ianalysis, imaterials ianalysis, ifingerprint i iclassification, i iprinted i
icircuit i iboard i idesign i iand i irobot i ivision.iThinning iis ithe iprocess iof ireducing ian iobject iin ia idigital iimage ito ithe iminimum isize
inecessary ifor imachine irecognition iof ithat iobject.i iAfter ithinning, ianalysis ion ithe isize ireduced iimage ican ibe iperformed.iThinning
iis ian iimportant istep ifor idesigning iefficient ihandwritten icharacter irecognition i(HCR) isystem.iOffline ihandwritten icharacters ihave
igreat ivariations iin icharacter ithickness.iThinning ieliminates ithe iinfluence iof icharacter ithickness ion ifeature iextraction.Non-
iterative ithinning imethods iare inot ibased ion iexamining iindividual ipixels.iSome ipopular inon-pixel ibased imethods iinclude imedial
iaxis itransforms, idistance itransforms, iand idetermination iof icenterlines iby iline ifollowing i method. iIn i iline i ifollowing i imethods, i
imidpoints i iof i iblack i ispaces i iin i ithe i iimage i iare idetermined iand ithen ijoined ito iform ia iskeleton.iThis imethod iis ifast iin icomputing
ibut itends ito iproduce inoisy iskeletons.i iIt ihas ibeen iconjectured ithat ihuman ibeings inaturally iperform ithinning iin i ia imanner isimilar i
ito ithis.iAnother i method iof icenterline idetermination iis iby ifollowing icontours iof iobjects.iBy isimultaneously ifollowing icontours ion
ieither iside iof ithe iobject, ia icontinual icenterline ican ibe icomputed. i iThe i iskeleton i iof i ithe i iimage i iis i iformed i ifrom i ithese i
iconnected icenterlines.iMedial iaxis itransforms ioften iuse igray-level iimages i where ipixel iintensity irepresents idistance ito ithe
iboundary iof ithe iobject.i iThe ipixel iintensities iare icalculated iusing idistance itransforms
.i
Artificial ineural inetworki(ANN) ihas ibeen isuccessfully iapplied iin imany ifields.iTherefore, iit iis imeaningful ito iuse iit ifor iEEG
ianalysis.iIt ican ibe imodeled ion ithe ihuman ibrain i[30].iThe ibrain's ibasic iprocessing iunit iis ithe isame ineuron iin ian iartificial ineural
inetwork.iNeural inetworks iconsist iof ia igroup iof ineurons iinterconnected iby isynaptic i weights.iIt iis iused ito igain iknowledge iduring
ithe ilearning iphase.iThe inumber iof ineurons iand ithe iweight iof isynapses ican ivary idepending ion ithe idesired idesign i[37], i[30].iThe
i).Input ilayer: i iInput ilayer ifrom ithe isource inode.i i iThis ilayer icaught i i iFunction imode iclassification.i i iThe inumber iof inodes iin ithis
ilayer idepends ion i i iUsing ithe ifeature ivector iat ithe iinput iof idimensions.
ii).Hidden ilayer: iThis ilayer iis ilocated ibetween ithe iinput iand ithe ioutput ilayer.i i iHidden ilayer ican ibe ione ior imore. i i iEach ihas ia
ihidden ilayer i i inode i i iThe ispecific inumber i i iThe ihidden inodes ican i i ichange ito iget ithe idesired iperformance.i i iThese ihidden
ineurons iplay ian iimportant irole i i iRole iin ithe iimplementation iof ihigher-order icalculation.iProvide ithe ioutput iof ithis ilayer i i ito ithe
inext ilevel.
iii).Output ilayer: iThe ioutput ilayer iis ithe iend iof ithe ineural inetwork.i i iIt iled ito i i iCharacterized ithrough ithe ineural inetwork ioutput
ithrough. i i iOutput iof ia igroup iof ioutput i i iThe ilayer idetermines ithe ioverall iresponse iof ithe ineural inetwork ito ithe iinput iprovided i i
The itypical istructure iofia ineural inetwork iis ishown iin iFigure i3, iwhich iis icomposed iof im iinput ineurons iand in ineurons ithat iimply ia
isingle ihidden ilayer.iThe ioutput ilayer ihas ionly ithree ineurons.iWhen iall ineurons iconnect iwith ineighboring ineurons, ithe inetwork iis
icalled ia ifully iconnected inetwork.The ibasic iunit iof iprocessing iof ithe ibrain iis ithe ineuron, iwhich i works ithe isame iway iin ithe
iANN.i A ineural inetwork iis iformed iby ia iset iof ineurons iinterconnected iby isynaptic i weights. iIt iis iused ito iacquire iknowledge iduring
ithe ilearning iphase.iThe inumber iof ineurons iand isynaptic iweights ican ibe ivaried iaccording ito ithe idesired idesign iperspective
i[35].iNeurons iare iinformation iprocessing iunits ithat iare ithe ibasis iof ineural inetwork ioperations.iWe idefine ithree ibasic ielements iof
ia ineuron imodel:
i). iThe isignal ixj iconnected ito ithe iinput iof ithe isynapse ij iof ithe ineuron ik iis imultipliediby ithe isynaptic iweight iwkj.iIf ithe iassociated
isynapse iis iexcitatory, ithe i weight i wkj iis ipositive, iand iif ithe isynapse iis iinhibitory, iit iis inegative.
ii).iAn iadder ithat isums ithe iinput isignals, iweighted iby ithe icorresponding isynapse iNeurons.
iii)ian iactivation ifunction ifor ilimiting ithe iamplitude iof ithe ineuron ioutput.iusually,
The inormalized iamplitude irange iof ithe ineuron ioutput iis iwritten iasi[0,1] ior iinstead iof i[-1,1].iThe imodel iof ithe ineuron ialso
iincludes ithe iexternally iapplied ideviation i(threshold) i wk0 i= ibk iReduce ior iincrease ithe ieffect iof ithe inet iinput iof ithe iactivation
ifunction.
1
E ( w) (t kd okd ) 2
2 dD koutputs
The error function in back propagation neural network is given as-
E defined as a sum of the squared errors over all the output unitsk for all the training examples d.
Error surface can have multiple local minima
Guarantee toward some local minimum
No guarantee to the global minimum
Termination Conditions for BP-
The weight update loop may be iterated thousands of times in a typical application.
The choice of termination condition is important because
Too few iterations can fail to reduce error sufficiently.
Too much iteration can lead to overfitting the training data.
Termination Criteria
After a fixed number of iterations (epochs)
Once the error falls below some threshold
Once the validation error meets some criterion
net j o j net j
Step 3: o j (net j )
o j (1 o j )
All together: net j net j
E
w ji d (t j o j )o j (1 o j ) x ji
Rule for Hidden Unit Weights- w ji
Step 1:
w ji j x ji , where j o j (1 o j ) w
k kj
kDownstream( j )
Thus:
Ed Ed net k o j
net j kDownstream( j ) net k o j net j
net k o j
k
kDownstream( j ) o j net j
o j
kDownstream( j )
k wkj
net j
k
kDownstream( j )
wkj o j (1 o j )
III. CONCLUSION
This paper presents an ANN system that uses standard templates of uppercase letters and numbers for training; show 100%
recognition of the set of data, it is trained. The system is tested against untrained fonts. It shows the recognition rate 85.83% of the
10 simple fonts are available, which means that it correctly recognizes 309 of the 360 characters. Consider 10 other fonts
belonging to the Low style to Medium category style, the recognition rate drops to 75%, that is, the system efficiency drops to
identify 540, 720 characters are correct. Due to the similarity between ambiguities, this surge in performance is due to
misidentification. I&J, K&R, O&Q, B&8 and other characters. Another thing to note is that the style of letters or 'Cursivity'
increases, and the degree of blurring between characters becomes higher protruding. The recognition rate table shown inalso
displays the fonts in decreasing order. Admit. Times New Roman and Arial performs best and recognizes all characters except "I"
(35/36). Microsoft Sans Serif performs sub-optimally, and another font has no errors in identifying numbers. Highest The error
recognition cases appear in the order "I", "O", "Q", "1" and "3".
IV. REFERENCES
[1] T.S. enturk. E. O¨ zgunduz. and E. Karshgil,(2005)“ Handwritten Signature Verification Using Image Invariants and Dynamic Features,”
Proceedings of the 13th European Signal Processing Conference EUSIPCO 2005,Antalya Turkey, 4th-8th September, 2005.
[2] Sabourin, R.; Genest, G.; Preteux, F. J., (1997): “Off-line Signature verification local granulometric size distributions”, IEEE Trans. Pattern
Anal. Mach. Intell. 19 (9).
[3] Abbas, R.; (2003): “Back propagation Neural Network Prototype for off line signature verification”, thesis Submitted to RMIT
[4] M. Hanmandlu, M. H. M. Yusof and V. K. Madasu,(2005) “Offline Signature Verification and forgery detection using fuzzy modeling,”
Pattern Recognition , vol. 38, pp.341-356.
[5] Plamondon, R. and Srihari, S.N., (Jan.2000): "Online and Offline Handwriting Recognition: A Comprehensive Survey", IEEE Tran. on
Pattern Analysis and Machine Intelligence, vol.22 no.1, pp.63-84.
[6] Ramachandra A. C ,Jyoti shrinivas Rao(2009) ”Robust Offline signature verification based on global features” IEEE International Advance
Computing Conference.
[7] Anu Rathi, Divya Rathi, Parmanand Astya(2012) “Offline handwritten Signature Verification by using Pixel based Method”, International
Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 7, September-2012.
[8] Raghuwanshi K. , Dubey N. , Nema R. and Sharma R. (2013) “Signature Verification through MATLAB Using Image Processing”
International Journal on Emerging Trends in Electronics and Computer Science, VOL. 2, Issue 4,April 2013.
[9] Srihari, S.; Kalera, K. M. and A. XU, (2004): “Offline Signature Verification and Identification Using Distance Statistics,” International
Journal of Pattern Recognition And Artificial Intelligence, vol. 18, no. 7, pp. 1339–1360.
[10] Ibrahim S.I. Abuhaiba(2007), “Offline Signature Verification Using Graph Matching, Turk J ElecEngin, VOL.15, NO.1.
[11] N. Christofides, (1977): Graph theory: an algorithmic approach (New York, Academic Press Inc.).
[12] Prashanth C. R. and K. B. Raja(2012) “Off-line Signature Verification Based on Angular Features”, International Journal of Modeling and
Optimization, Vol. 2, No. 4, August 2012.
[13] S. Uchida and M. Liwicki(2010) “Analysis of Local Features for Handwritten Character Recognition.” In Proc. ICPR 2010, pp. 1945-1948.
[14] K. Frank,(2009) “Analysis of Authentic Signature and Forgeries” In Proc. IWCF,pp 150-164.
[15] Pradeep Kumar, Shekhar Singh, Ashwani Garg,(2013) “Hand Written Signature Recognition & Verification using Neural Network” ,
International Journal of Advanced Research in Computer Science and Software Engineering , Volume 3, Issue 3, March 2013.”