FPGA Contrast Enhancement Algorithm
FPGA Contrast Enhancement Algorithm
FPGAImplementationofaContrastEnhancement
AlgorithmwithDiscriminativeFiltering
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Index
Collaborations.........................................................................................................................................................7
Appreciation...........................................................................................................................................................8
Resumdeltreball....................................................................................................................................................9
Resumendelproyecto..........................................................................................................................................10
Abstract................................................................................................................................................................11
1. Introduction..................................................................................................................................................12
1.1 Context.................................................................................................................................................12
1.2 Motivationandobjectives....................................................................................................................13
1.3 Reportstructure...................................................................................................................................14
2. Backgroundinthecontrastenhancementalgorithm...................................................................................15
2.1 Generaloverview.................................................................................................................................16
2.2 Blockdescriptions.................................................................................................................................17
Histogramequalization...............................................................................................................................17
Lowpassfiltering........................................................................................................................................18
Classification:binarizationofimages..........................................................................................................20
Maskcorrection..........................................................................................................................................21
3. Hardwaredescription....................................................................................................................................22
3.1 Requirementsandspecifications.........................................................................................................22
3.2 Systemstructure..................................................................................................................................23
3.3 Detailedblockstructure.......................................................................................................................26
CLAHEblock.................................................................................................................................................26
Binarymasksblock......................................................................................................................................31
Filteringblock..............................................................................................................................................32
4. Implementationresults.................................................................................................................................35
4.1 Testpictures.........................................................................................................................................35
Picture1......................................................................................................................................................35
Picture2......................................................................................................................................................37
Picture3......................................................................................................................................................38
FPGAImplementationofaContrastEnhancementAlgorithm
4.2 Summary..............................................................................................................................................39
5. Conclusions...................................................................................................................................................40
5.1 Projectresults.......................................................................................................................................40
5.2 Futurework..........................................................................................................................................41
6. Annexes.........................................................................................................................................................42
A.
Matlabcodes....................................................................................................................................42
AlgorithmMatlabImplementation(Author:BadrunNahar)......................................................................42
[Link](suitableforrecordingintoROMwithXilinxCoregen)..............45
ScripttoreadandshowimagefromRAMdump(.memModelsimfile)....................................................46
B.
ProjectVHDLcode............................................................................................................................47
Binary_correction_int.vhd..........................................................................................................................47
binary_generator_int2.vhd.........................................................................................................................51
clahe_complete4.vhd..................................................................................................................................54
clhe_clipping_int4.vhd................................................................................................................................62
clipping_wrapper_int2.vhd.........................................................................................................................65
filter_system_int2.vhd................................................................................................................................67
[Link]..................................................................................................................................................72
histogram_int3.vhd.....................................................................................................................................76
histogram_wrapper_int2.vhd.....................................................................................................................78
[Link]...................................................................................................................................................80
median_filter2.vhd......................................................................................................................................84
tiling_int3.vhd.............................................................................................................................................94
transform_interp17.vhd..............................................................................................................................97
C.
Modelsimsimulations....................................................................................................................101
Globaltimeline:mainentityview.............................................................................................................101
Globaltimeline:CLAHEblock....................................................................................................................103
Globaltimelineview:binarymaskgeneration..........................................................................................106
Globaltimeline:filterblockview.............................................................................................................108
7. References...................................................................................................................................................110
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Indexoffigures
[Link]...........................................................14
[Link].[3]............................................................................................................16
[Link],greenzonesare
justlinearlyinterpolatedandredzonesareleftuntouched...............................................................18
[Link],theexcessisdistributeduniformlyacrossthehistogram................18
[Link][3].........................................................................................................19
[Link]:centerpixel,WDandWO[3]............19
[Link]
theboundariesoftheclassificationregion[3]....................................................................................20
Figure 8. Group one pattern examples. The rest of the patterns can be obtained shifting or rotating
them.....................................................................................................................................................21
Figure 9. Group two pattern examples. The rest of the patterns can be obtained shifting or rotating
them.....................................................................................................................................................21
Figure10.Group3patterns......................................................................................................................................21
[Link]
pipelinedorsetuptoruninparallel...................................................................................................24
[Link](thetoplevelentity).....................................................................25
[Link]...............................................................................27
Figure14.Simplifieddiagramofthehistogram_wrapperentitystructure..............................................................29
[Link]............................................................................29
Figure16.Representationofthe100histogramRAMsarrangedaccordingtotheirspatialpositioninthe
image. Tiles with the same indexes represent duplicated/quadruplicated tiles (light/dark
blueandredrespectively).Wheninterpolating,intheplaceswherethetileisduplicatedin
onedirection,therewillnotbeavisibleinterpolation.......................................................................31
[Link][10].............................................................32
Figure18.Group4patterns.......................................................................................................................................32
[Link]
inputs and outputs are zero padded. Bear in mind that the filter_testbench entity has
somelogicnotrepresentedinthisdiagram........................................................................................33
[Link]=8andn=[Link],because8isnotan
oddnumber,themedianinthatcaseistheaverageofthe2centralvalues.....................................34
Figure21.Originalpicture1anditshistogram.........................................................................................................35
[Link];atrightusingtheMatlabscript.............35
Figure 23. Picture 1 final selectively filtered result. At left, using the hardware design; at right using
Matlab..................................................................................................................................................36
FPGAImplementationofaContrastEnhancementAlgorithm
Figure 24. At left, histogram of the hardware output for picture 1; at right, histogram of the Matlab
script'soutputobtainedwiththesameimage....................................................................................36
Figure25.Originalpicture2anditshistogram.........................................................................................................37
[Link];atrightusingtheMatlabscript.............37
Figure 27. Picture 2 final selectively filtered result. At left, using the hardware design; at right using
Matlab..................................................................................................................................................37
Figure 28. At left, histogram of the hardware output for picture 2; at right, histogram of the Matlab
script'soutputobtainedwiththesameimage....................................................................................38
Figure29.Originalpicture3anditshistogram.........................................................................................................38
[Link];atrightusingtheMatlabscript.............38
Figure 31. Picture 3 final selectively filtered result. At left, using the hardware design; at right using
Matlab..................................................................................................................................................39
Figure 32. At left, histogram of the hardware output for picture 3; at right, histogram of the Matlab
script'soutputobtainedwiththesameimage....................................................................................39
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Collaborations
FPGAImplementationofaContrastEnhancementAlgorithm
Appreciation
[Link]
[Link]
give me last impulse to jump in and live this experience. Without them, this project would probably not exist.
Also, thanks to Ted Obuchowicz for his valuable help in VHDL and Badrun Nahar to orient me in order to
[Link],butnotleast,IwanttomentionmyparentsAlbertandPilar,myfamilyingeneraland
newandoldfriends,whohelped,sufferedmeandshowedmetheirsuportwhenIreallyneededit.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Resumdeltreball
Aquest projecte s la continuaci duna recerca enfocada a millorar els resultats aportats per tcniques
populars de millora del contrast dimatges. Hi ha instantnies que sn preses sota condicions molt pobres
dadquisici, com ara escenes amb un rang dinmic molt gran o imatges mdiques que requereixen aquestes
[Link]
aquestsalgorismes(comaralequalitzaciadaptativadhistogramaambcontrastlimitatoCLAHEperlessiglesen
angls)podenrevelarnonomseldetalldelaimatge,sintambelsorollqueshioculta,fentdifcildistingirel
quesrellevantdelquesinformaciinventadapelsensor.
Com a part de lactivitat de recerca en processat dimatge de Concordia University, un algorisme capa de
milloraraquestsresultatssotacertescondicionsvaserdesenvolupatperunaestudiantcomatesifinaldemster.
Aquest algorisme genera mscares binries que intenten detectar el soroll de la imatge a partir de loriginal.
Desprs, una versi de la imatge amb contrast millorat amb CLAHE s filtrada pasbaix noms en els pxels
detectats com a candidats a tenir soroll. La feina exposada aqu esta basada en aquella tesi i estudia el
comportament,[Link]
escollitvaserVHDL.
Per tal de ferho, es va escollir una metodologia de disseny topdown bottomup. El primer pas va ser la
documentaci i procs daprenentatge per entendre lalgorisme, la seva implementaci inicial en Matlab i els
conceptes de processat dimatge que hi ha al darrere (filtrat pasbaix, equalitzaci dhistograma, classificaci,
etc.).
Desprsdaix,iseguintlaproximacidedissenyesmentada,elsistemavaserdividitenpartsalladesquevan
[Link]
[Link]
dissenyhaestatdestinataunaplacaFPGAXilinxVirtex6.
Els resultats han donat una imatge amb una millora de contrast molt similar a laportada pel codi Matlab
[Link],seguramentacausadecertsproblemesdedisseny,lequalitzacidelaimatgeproporciona
un resultat una mica ms fosc que lesperat, i sense utilitzar els nivells de gris ms prxims al blanc. El filtrat,
daltrabanda,semblafuncionarcomsespera,ielresultatglobalfinalsdifcildedistingirdeloriginalsenseuna
[Link],elstempsdeprocstericsambeldissenyhardwareestanmoltperdavantdel
codioriginal,ipotserunaalternativaviableperprocessatdevdeoentempsreal.
10
FPGAImplementationofaContrastEnhancementAlgorithm
Resumendelproyecto
Este proyecto es la continuacin de una investigacin enfocada a mejorar los resultados aportados por
tcnicas populares de mejora del contraste de imgenes. Hay instantneas que son tomadas bajo condiciones
muypobresdeadquisicin,comoporejemploescenasconunrangodinmicomuygrandeoimgenesmdicas
querequierenestastcnicaspararevelarciertosdetallesquedeotramanerarestaranescondidosalojohumano.
Elproblemaesqueestosalgoritmos(comoporejemplolaecualizacinadaptativadehistogramaconcontraste
limitadooCLAHEporlassiglaseningls)puedenrevelarnosloeldetalledelaimagen,sinotambinelruidoque
se oculta, haciendo difcil distinguir el que es relevante del que es informacin inventada por el sensor.
ComopartedelaactividaddeinvestigacinenprocesadodeimagendeConcordiaUniversity,unalgoritmocapaz
de mejorar estos resultados bajo ciertas condiciones fue desarrollado por una estudiante como tesis final de
[Link].
Despus, una versin de la imagen con contraste mejorado con CLAHE es filtrada pasobajo slo en los pxeles
detectados como candidatos a tener ruido. El trabajo expuesto aqu est basado en aquella tesis y estudia el
comportamiento, rendimiento y posibilidades del algoritmo como implementacin en FPGA. El lenguaje de
descripcinescogidofueVHDL.
Para hacerlo, se escogi una metodologa de diseo topdown bottomup. El primer paso fue la
documentacinyprocesodeaprendizajeparaentenderelalgoritmo,suimplementacininicialenMatlabylos
conceptosdeprocesadodeimagenquehaydetrs(filtradopasobajo,ecualizacindehistograma,clasificacin,
etc.).
Despusdeesto,ysiguiendolaaproximacindediseomencionada,elsistemafuedivididoenpartesaisladas
[Link]
demsaltonivelutilizandoaquelloscomponentesyfinalmentelaentidaddemsaltonivelfueconstruidapara
juntarlotodo.EldiseohasidodestinadoaunaplacaFPGAXilinxVirtex6.
Los resultados han dado una imagen con una mejora de contraste muy similar a la aportada por el cdigo
[Link],seguramentedebidoaciertosproblemasdediseo,laecualizacindelaimagen
proporciona un resultado algo ms oscuro que el esperado, y sin utilizar los niveles de gris ms prximos al
[Link],porotrolado,parecefuncionarcomoseespera,yelresultadoglobalfinalesdifcildedistinguir
[Link],lostiemposdeprocesotericosconeldiseohardware
estn mucho por delante del cdigo original, y puede ser una alternativa viable para procesado de vdeo en
tiemporeal.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Abstract
Thisprojectisthecontinuationofaresearchworkfocusedonimprovingthecurrentpopulartechniquesfor
[Link],suchashigh
dynamicrangeimagesormedicalimagesthatrequirethosetechniquesinordertorevealdetailsthatotherwise
[Link],thosecontrastenhancementalgorithms(likeContrastLimited
AdaptiveHistogramEqualizationCLAHE)canrevealnotjustthedetailoftheimagebutalsothenoisehiddenin
it,makingithardtodistinguishbetweentherelevantinformationandtheinventeddetail.
As part of the research activity in image processing of Concordia University, an algorithm able to improve
[Link]
[Link],aversionoftheimagewith
[Link]
project is based in that thesis, and it studies the behavior, performance and possibilities of the algorithm as a
hardware(FPGA)[Link].
Todoso,[Link]
learning process to understand the algorithm, its initial Matlab implementation and the image processing
conceptsbehindit(lowpassfiltering,histogramequalization,classification,etc.).
Afterthat,followingthedesignapproach,thesystemwasdividedinisolatedpartsthatwereimplementedand
testedseparately,[Link],higherlevelblocksweredesignedusingthosecomponentsandfinally
thetoplevelentitywasbuiltaswell.ThedesignwastargetedtoaXilinxVirtex6FPGAboard.
The results gave an image with a visually very similar contrast enhancement to the one provided by the
[Link],likelyduetosomedesignflaw(s),theequalizationoftheimageprovidesaresulta
littlebitdarkerthanexpected,[Link],ontheother
hand, seems to work just as expected, and the overall result is hard to distinguish from the original without a
comparison side to side. Also, the theoretical processing times with the hardware design are far ahead of the
originalsoftwarecode,anditcanbeaviablealternativeforrealtimevideoprocessingapplications.
11
12
FPGAImplementationofaContrastEnhancementAlgorithm
1. Introduction
In this chapter the work done in this project and the process through its elaboration are going to be
introduced to the reader, as well as the reasons and motivations from which the project was born. Also, the
reportstructureanditscontentwillbebrieflydetailed.
1.1
Context
Inmodernsociety,[Link],someofthemhard
to imagine few years ago, and have improved our lives in equally unexpected ways: entertainment, medicine,
security, industry, productivity in general... But in order to create, manage, improve and distribute these
multimedia resources, a wide variety of specialised hardware and software components have to interoperate
formingacomplexchainfromthecontentsourcetotheuser'[Link]
angularstonesaroundwhichallthistechnologyisbuilt,[Link]
inthepresentday,anditsimportanceisstillgrowing.
One of the most usual operations in image processing is contrast enhancement. Contrast enhancement
algorithmsarepowerfultooltorevealdetailsonalowcontrastimagehiddeninaverysmallrangeofgrey/colour
levels. There are various ways to enhance the contrast of an image. One of the most popular algorithms is
Histogram Equalization (HE), which has several variants that add some improvements like Adaptive Histogram
Equalization(AHE)orContrastLimitedAdaptiveHistogramEqualization(CLAHE)[1].Howeverthatprocedure,in
anyofitsvariations,alsorevealsnoisehiddeninthepicture,asitcannotdistinguishbetweenitandthepicture
detailbyitself.
Thesekindsofalgorithmsareusuallyimplemented[2]usingstandardprogramminglanguageslikeC,C++,Java
orMatlabtogivejustsomeexamples,[Link]
easierwayandenoughforcertaincases,butthislimitsinaseverewaytheachievableperformanceandefficiency
of the design. This is important since certain image processing operations are computationally very intensive.
Computershaveincreaseddramaticallytheirpower,makingthemsuitableforcertainisolatedoperationsinsmall
[Link],however,canoptimizemuchmore
theperformanceperwattandgetbetterresultswithafractionoftheprocessingpowerthankstoparallelization,
[Link],[Link]
final design and longer design and manufacturing process (especially if it is an ASIC instead of an FPGA) are
[Link],theyarestillapreferable,veryinteresting
choice for powersensitive applications such as embedded systems, specialised devices or even just as a co
processingmoduleabletoassistageneralpurposeprocessorinordertoincreasetheoverallprocessingspeed,
eliminatingbottlenecks.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
1.2
Motivationandobjectives
Giventheboostimageprocessingisreceivingandhowitwilllikelystillbegivenaveryimportantroleinthe
nearfuture,itisaveryactivefieldthatisseeingagreatnumberoftechnicaladvancesthatwereunimaginable
few years ago. To bring this new technology to the masses, new waves of hardware able to keep up with the
advancements and fulfill those visions is necessary, and designing that hardware with the tools available
[Link],[Link]
abletoexplorenewfieldsthatIhadbarelystudiedbefore,basiccoursesaside,andconnectthemtomydegrees
speciality was a very good opportunity to have a new point of view and learn new things, in this case image
processingalgorithms.
Last,butnotleast,thekindofimageenhancementstudiedintheprojectwasanintriguingfieldasittriesto
[Link]
question whose answer I was willing to check by myself, which is: how far can one go while trying to make
something look better without manipulating or distorting the source material to the point to make the
enhancementpointless?
The main objective of this project is the implementation in an FPGA of an advanced contrast enhancement
algorithmwithselectivenoisefiltering,asdescribedinBadrunNaharsMasterThesis(ConcordiaUniversity)[3],
using VHDL as the hardware description language. This is done in order to achieve a high performance while
efficientexecutionofthatalgorithmandevaluateitsviabilityintimesensitiveapplicationssuchasrealtimevideo
processing,[Link]
is targeted to a real board, made entirely with synthesizable code. Also, it is expected to acquire a good
knowledgeinimageprocessingandimplementationofthiskindofalgorithmsinhardware.
In order to accomplish those goals, the plan to face this project consisted of two clearly differentiated long
[Link]
focuswasputmainlyinhistogramequalization,[Link]
[Link]
implement each component of the design and implementing and testing each individual part and the whole
systemwithModelsim,usingatopdownbottomupdesignapproach.
13
14
FPGAImplementationofaContrastEnhancementAlgorithm
[Link].
1.3
Reportstructure
[Link],theintroduction,givesabriefexplanationofthe
contrastenhancementfieldinparticularandimageprocessingingeneral,aswellasotherfactorsthatleadtothe
[Link],itprovidesbasicinformationabouttheobjectivesandthecontentsoftherestofthe
[Link]
tothefirstphaseofalgorithmstudy,thesecondchapterprovidesadescriptionoftheimplementedmathematical
algorithm,detailingitsseparablepartsstepbystepandbrieflytalkingtheimageprocessingconceptsassociated
with them. Chapter 3 is associated with the second phase, the VHDL hardware description. With a structure
similartotheoneinchapter2,thehardwareimplementationofeachblockandseparablecomponentmentioned
[Link],inchapter4,consistsindebatingtheresultsofthe
design with different tests and images. Finally, to close the report, a fifth chapter with the conclusions makes
balanceoftheworkandresultsandgivessomeideasonhowitcouldbeimprovedand/orexpanded.
Asadditionalinformation,3annexeswiththedescriptioncode,MatlabscriptsandModelsimsimulationsare
included.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
2. Backgroundinthecontrastenhancementalgorithm
Good contrast is an essential property in most image processing tasks. However, the conditions in which
[Link],thesensorlimitations,lighting
orthephotographedobjectitselfinfluencethefinalresultinwaysthatarenotalwaysdesirable,leadingtoalack
[Link],contrastenhancementbecomesagoodpreprocessing
toolforawiderangeofimageprocessingcases.
Therearemanydifferentimagecontrastenhancementtechniquesbutoneofthemostpopularishistogram
equalization (HE), and the algorithm implemented in this project is built around it. There are various
versions/variationsincludingthebasichistogramequalization,butalsoimprovedvariantslikeAdaptiveHistogram
Equalization(AHE)orClipLimitedAdaptiveHistogramEqualization(CLAHE)[1].
However, HE and its variants not only increase the contrast of the real detail, but also the imperfections
introducedduringtheacquisitionofthepicture,[Link]
canbetroublesomeinsomecontextssuchaswhentheimage'scontrastisextremelyloworwhentherelevant
datacanbeeasilyconfusedwiththenoise.
Forthisreason,somedesigningeffortsinthatfieldarenowconcentratedonreducingtheapparitionofthat
undesired data. There are mainly two points where the problem can be faced: right before or right after the
[Link],relevantinformationcan
belosttogetherwiththeremovednoise,andthusitcannotbedetectedandenhancedduringtheequalization.
Ontheotherhand,ifthenoisereductiontakesplaceaftertheequalization,itishardertoremovebecausethe
enhancementmakesitmorevisibleandrelevant.
In this project, an algorithm [3] based on CLAHE is evaluated and implemented that, following the trend
indicatedinthepreviousparagraphs,[Link]
algorithm was chosen to work in its FPGA implementation is that it can be interesting to see how well it can
performintermsofspeedandatwhatcost,[Link]
include examples like an image or a video stream of a medical image, like an echography, where it would be
valuable not just as an aesthetical improvement, but also as a way to make diagnose easier for a doctor, who
could adjust the parameters on the fly, see the improved image in real time, etcetera. CLAHE is already being
widelyusedforthiskindofpurposes.[1][4][5]
In this chapter, the original mathematical algorithm and its strategy to face the noise problem will be
describedandthemaintheoreticalconceptsbehinditspartsandblockswillbeintroducedaswell.
15
16
FPGAImplementationofaContrastEnhancementAlgorithm
2.1
Generaloverview
As it has already been said, the enhancement algorithm implemented is based on CLAHE, with some extra
processingtoimprovetheendresults,focusingspecificallyonthenoisereduction.
Inordertopartiallyovercomethenoiseproblems,theroutefollowedbytheimplementedalgorithmhasbeen
to selectively filter key areas more likely to have noise in the enhanced picture, which are detected by the
[Link],accordingtothe
followingscheme:
[Link].[3]
I is the original source lowcontrast input image. The HEbased enhancement block is what contains
specifically the CLAHE algorithm, where the contrast enhancement itself takes place. I is also used to generate
somebinarymasksthatwillindicateinwhichareasoftheenhancedimagetheselectivefilteringmusttakeplace
andwhichpartmustbeleftuntouched.
AftertheCLAHEsteptheprefilteringblockappliesasoftlowpassfiltertoeliminatesomehighfrequency
noise inthe whole enhanced bitmap. A common Gaussian filter is enough for ourneeds, and alsoallows some
[Link]
regions, the idea is to keep filtering low in non homogeneous regions, which will be affected only by this pre
[Link].
Finally, the different layers of selective filtering (LPn blocks) are applied to the image to get the final result.
Dependingonthebinarymasksgeneratedwiththeclassificationofthepixelsontheoriginalimage,itisdecided
[Link]
havedifferentlevelsoffilteringstrength:thepixelsmorelikelytohavenoise(homogeneous)willbeclassifiedas
suchinmoremasksandthushavemorefilteringappliedthanthosethataremorelikelytobemisclassifiedbut
still included in a single filtering step. For these steps, the filter of choice has been a bidirectional multistage
medianfilter,thankstoitscapabilitieswhenitcomestopreservingtheedgesafterthefilteringprocess.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
2.2
Blockdescriptions
Histogramequalization
[Link],itadjuststhegrayscaleof
theimagesothatthehistogramoftheoriginalimageismappedontoauniformhistogramusingatransformation
[Link].
[Link],thefunctionmust
be single valued, monotonically increasing and lying between 0 and 1. For discrete values, if we translate the
equations of the continuous domain from probability density functions and integrals to probabilities and
summations[6]:
Probabilityofoccurrenceofagraylevel:
(2.1)
[Link]
imageandnk [Link]
thefollowingsummation:
(2.2)
Theequalizedimage,then,canbeobtainedbymappingeachpixel'slevelrkwithitscorrespondingnewlevel
sk,whichrepresentsacumulativedistributionfunction(cdf).
Unlike the continuous version, it cannot be demonstrated that it will produce the discrete equivalent of a
[Link],itdoestendtospreadthehistogramoftheinputimageinawayit
usesawiderrangeofthegrayspectrum.
17
18
FPGAImplementationofaContrastEnhancementAlgorithm
[Link],itmustbenotedthatitdoesnotguaranteethat
incasepixelavalueisgreaterthanpixelbvaluethisrelationshipwillbepreservedaftertheequalization.
Moreover,inpracticalterms,computingahistogramforeachpixelisnotviablebecauseofitscomputational
[Link],inmostcasesthisapproachisscrappedandinstead[2],theimageisdividedinalimited
number of tiles, and for each of them a histogram is computed. In order to prevent the apparition of the
boundariesofthetileswhenapplyingthetransformationtothedifferentpixels,bilinearinterpolationisusedto
makethetransitionsinthefinalpicturesmoother.
Theothervariantmentioned,contrastlimitedadaptive
histogramequalization(CLAHE),addsanotherlayertothe
AHEinordertolimittheamountofcontrastenhancement
[Link]
example,theexcessisdistributeduniformly
acrossthehistogram.
CLAHEisusefultolimittheappearanceofcertainnoisecontentinzonesoflowgraylevelvariabilitybylimiting
[Link],thereducedcontrastenhancementincertainzonesofthisalternativecouldhidethe
presenceofsomesignificantdataintheimage.
ThereasonwhythevariantchosenisCLAHEisitsabilitytocontrolthedegreeofenhancement,whichcanbe
usefulasatweakingparameter,whilemaintainingalltheimprovementspresentinAHEregardingbettercontrast
enhancement.[3]
Lowpassfiltering
Low pass filters are useful to eliminate high frequency noise present in an image, as the equalization itself
cannotdiscriminatethenoise.
There are various types of lowpass filters, depending on the main purpose of their application. Some are
[Link],the
mathematicalcomplexityisanotherimportantcharacteristictotakeintoaccount.
In the following lines, the different lowpass filters used in different stages of the noise removal will be
described,andtheirstrengthsandweaknesseswillbediscussedaswell.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Gaussian filtering
Gaussian filters have always been very popular and relevant because of their simplicity: they are easily
specifiedandboththeforwardandinverseFouriertransformsarerealGaussianfunctions.
[Link],italsoblurstheedgesandthe
[Link]()willallowacontroloftheblurrinesslevel
appliedintheprefilteringstage:
,
(2.3)
Thehigherthesigmais,thestrongertheblurringeffectwillbecome.
TocomputeafilteringoperationwithaGaussianfilter,agoodhardwarefriendlyapproachistouseaGaussian
kernelandadiscreteconvolutionoperation:
,
(2.4)
WherexistheinputimageandhtheGaussiankernel,whichforourneedscanbejustthesampledversionof
thecontinuousGaussiankernel,obtainedbysamplingF(x,y).
Median filtering
Medianfiltersarenonlinearfilterswithgoodsignalvariationpreservationqualitieswhilesmoothingthenoise
[Link]
as the important information in the revealed detail is not as likely to be lost compared to alternatives like the
[Link],theprinciplebehindmedianfilteringissortingthepixelsinsideawindow
[Link]
[Link],itis
[Link],variousadvancedversionsofthemedian
filteringhavebeendeveloped[7].
[Link]
[Link]:centerpixel,WDandWO[3].
[Link][3].
19
20
FPGAImplementationofaContrastEnhancementAlgorithm
Inourcase,[Link],thebidirectionalmultistage
medianfilter(BMM)[3].Multistagemedianfiltersuseseveralstagesofmedianfiltersinsteadofasinglemedian
for the entire window. BMM filters operate in two steps: first, they find a median of the diagonal pixels and
anotheroftheorthogonalpixels,[Link],theytakethemedianofthesubsetformedbythe
valuescalculatedinthepreviousstageandthecentralpixel.
Putinpropermathematicalterms:
,
(2.5)
Classification:binarizationofimages
Classificationallowsseparationofanimageintodifferentregions,[Link]
arevariousclassificationmethods,butinthiscontrastenhancementalgorithm,specifically,aclassificationbased
onclusteringisemployed:[Link]
histogram, which indicates the presence of a large amount of similar pixels. Then, the image is divided in two
regions:onecontainingallthevaluesaroundthepeakdelimitedbytheselectedthresholds,whichshouldinclude
[Link],outsidethe
[Link],wecangenerateabinarymask.
[Link]
boundariesoftheclassificationregion[3].
This method is useful in low contrast images because they have important peaks in their histograms, as a
consequenceofthislackofcontrast,sobigareasofhomogeneouspixelscanbedefinedwiththisclassification
method.
(2.6)
However,[Link]
topartiallysolvethat,aswewillseeinthenextsubchapter.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
21
Maskcorrection
The correction applied to the generated binary masks consists in checking the similarity of the central pixel
with its neighbors. If a certain pattern likely to be a misclassificationis detected, the pixels in the mask can be
corrected(theirvaluecanbechanged).
Using 3x3 windows, the patterns considered as indicators of nonhomogeneous pixels misclassified as
homogeneouspixelshavebeendividedintodifferentcategoriesaccordingtotheirdetectionmethod[3]:
Group1:[Link]
passfiltering(thismeans,correctedinallthemasks).
[Link].
Group 2: they can also be seen as onepixel wide, but have more singular forms and more variations than
group 1. Compared to group 1, they are not as likely to be misclassified, and probably are located near non
[Link],theywillnotbeincludedinthemasksorientedtostrongfiltering.
[Link].
Group3:onepixelwidepatternsdrawingacross,asshowninthefigurebelow.
Figure10.Group3patterns.
Whenanyofthesepatternsortheirshifted/rotatedvariantsisdetected,thevalueofthecentralpixelshould
[Link],thecorrectionwillbeappliedtoall
[Link],justthemaskforbroaderfilteringwillhavethechangeapplied.
22
FPGAImplementationofaContrastEnhancementAlgorithm
3. Hardwaredescription
Now that thedifferent parts of the original algorithm have been exposed, it is time to see how it has been
portedtoaVHDLdescription.
Programmingalgorithmsinageneralpurposeorembeddeddeviceusingstandardprogramminglanguageshas
proventobeagoodenoughsolutionforquick,[Link],thereisaseriousamount
[Link],theyare
[Link],becausethealgorithmislikelyimplementedusingahighlevelprogramming
languagetospeedupdevelopment,thetranslationprocesstotheexecutablebinarywilladdmoreoverheadas
[Link],thepresenceofotherlayerssuchasanoperatingsystemcanmakethingsevenmoreredundant.
Dedicatedhardwareimplementationshaveamuchlowerdegreeofflexibility,butrequirelesspowertorun
[Link],itprovidespossibilities
relatedtoparallel,customizeddesignthatarenotpossiblewithatraditionalprogramminglanguageexecutedon
topofaprocessor,whichhelpsincreasingthealgorithmexecutionspeedevenmore.
Inordertoimplementthealgorithmdetailedinchapter2,atopdownandbottomupstrategywaschosento
[Link]
that can work autonomously. Then, inside each big block, all the smallest separable parts were identified and
studied in order to find a good way to translate them to hardware with the available resources, which were
studied as well. Each identified part was implemented and tested separately with a test bench and next, the
tested parts were used to assemble bigger VHDL entities and recreate the big blocks. Each block was tested
separatelyusingModelsimandfinally,thetoplevelentitywasdesignedinordertoconnectthebigblocks.
In this chapter it will be described how the whole algorithm has been redesigned in VHDL code, taking
advantageofthepossibilitiesitgivestodefinethelevelofconcurrency:parallelsegments,sequentialparts,etc.
Thefirststepwillbedefiningtherequirementsandspecificationsoftheimplementation,andnextproceedwith
[Link]
thecases,thereasonsbehindeachdesigndecisionwillbeaddressedaswell.
3.1
Requirementsandspecifications
Before going into depth about how the design has been made, it is important to have an idea of what the
differentrequirementsandspecificationsofthedesignare,andalsohowtheyhavebeentargetedinthedesign.
Themainrequirementsofthedesigninclude:
Theabilitytoprocessgrayscale(8bit)imageswitharbitraryprecision/[Link]
theimageisspecifiedthankstodifferentinputsthatincludewidth,lengthandnumberofpixels.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Tweaking options or easy modification of the main parameters of the design. At the end, those
include:
o
CLAHEcliplimit,asapercentage.
Definition of 2 gray zones (by entering an upper and lower limit for each one) for the
generationofbinarymasks,astheywillspecifythethresholdsoftheclassificationprocess.
EasetomodifytheGaussianprefilterVHDLcodetochangeitsstrength.
Performancehasbeenapriorityoverareaandpowerconsumption.
AftersometestswiththeMatlabcode,[Link]
benefitofmakingitbiggerthanthatdoesnotseemtobeworththeextraresources.
ThedesignhasbeentargetedtoaVirtex6basedboard:specifically,theML605[8]developmentboard(listed
witha$1795priceattimeofwritingthisreport).Thespecificationsthataremorerelevanttothisprojectinclude
600MHzmaximumclockfrequency,14976KbitofblockRAMdistributedacrosstheboardand768DSPslicesto
[Link],ifneeded,ithas512MBofregularDDR3RAM.ThereasontochooseML605
boardisthat,giventhelackofmemoryorareaconstraintsforthisfirstdesign,itdidprovideaverycomfortable
environmentthatdoesnotsetverystrictphysicallimits [Link],itiseasiertofocusinjust
tryingtogetthemaximumperformancebytryingtoparallelizeasmuchaspossible,whichcomesattheexpense
ofmoreareaandmemoryslices.
Regardinglibraries,theIEEEstandardlibrarynumeric_stdwillhandlealltheneededarithmeticvariabletypes
(likesignedandunsigned)andoperations(addition,subtraction,multiplication,division,comparisons).However,
some division operations cause problems in the synthesis step using Precision RTL and hence, the design in its
currentstatecannotbesynthesizedyetandwouldneedsomemodificationstobuildproperly.
3.2
Systemstructure
InordertodesigntheFPGAimplementation,thefirststepisdefiningwhatpartsofthealgorithmhavetobe
[Link]
lookatthealgorithmdiagraminchapter2[Figure2]wecanclearlydivideitin3bigblocks,representedinFigure
11:CLAHEcomputation(1),masksgeneration(2)andnoisefiltering(3).
The CLAHE and masks blocks are clearly independent asbothdependonly on the source image tooperate.
Consequently,bothblockscanberuninparallel,concurrently,duringwhatinFigure11isidentifiedassequence
[Link],block3needsboththeenhancedCLAHEimageandthebinarymaskstoselectthezonesthatneedto
befiltered,soitmustnotbecomeactiveuntilblocks1and2finishtheirwork,duringsequence2.
Anotheraspecttobetakenintoaccountisthelatencyaddedbyeachblockandhowthisaffectstheexecution
of the other steps. The main bottleneck in this regard is the histogram equalization step. This is caused by the
needofcomputingthehistogramsforalltheimagepixelsbeforeapplyingthetransformationtothem,whichis
[Link],aswellasthedifferentfilterandmask
23
24
FPGAImplementationofaContrastEnhancementAlgorithm
generationsteps,[Link],itisnegligiblecomparedtothepart
causedbythehistogramgeneration.
Tile
generation
(1.1)
Histogram
computing
(1.2)
Histogram
clipping&
CDF
generation
(1.3)
Equalization
&
interpolation
oftheimage
(1.4)
Computationoftheequalizationfunctionforeachtile(64
iterations)
Gaussian
prefiltering
(3.1)
CLAHEcomputation(1)
1.2
Image
stream
1.11.2
Binary
classification
(2.1)
1.3
1.4
[Link]=(x_size+2)x2x3+y_sizex2+6+17x2+numpixels
Image
stream
1.4
Window
generationx3
(3.1+3.2+3.3)
Zero 3.1
padding
3
3.2
+
3.3
CLAHE
stream
3
Pipelinedstep
Nonpipelinedstep
Pixel
correction
(2.2)
Generationofthemasks(2)
Discriminative
median
filtering2
(3.3)
Filtering(3)
Aproxcycles=(5x64+7x64+numpixels)+(3x64+(256+2)x2x64)+(5+numpixels)
1.1
Discriminative
median
filtering1
(3.2)
Sequence1/Sequence2
Aproxcycles=(x_size+2)x2+1+9+numpixels+y_sizex2
Window
generation
2.2
2.1 2.2
Image
stream
2
Zero
padding
2
[Link]
pipelinedorsetuptoruninparallel.
Also,thefilteringandmaskgenerationstepsingeneraloutputonepixelperclock,exceptwhenjumpingtothe
next image line, because of the implementation of zero padding. The generation and discard process of those
extra pixels in the boundaries of the image adds some small incremental delays directly tied to the images
[Link],thezeropaddingoperationsareonlynecessaryatthebeginningandattheendof
the whole filtering/mask generation process; it is not necessary to repeat it before and after each filter (pre
filtering,discriminative 1, discriminative 2). For moredetails on thisprocess, read the section correspondingto
thenoisefilteringblock.
The top level entity is called main, and is stored inside the file [Link]. According to all the mentioned
superblocksandalgorithmparts,itsstructureisshowninFigure12.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Start_cn tr
End_flag
Start_cn tr
X_size
10
Y_size
10
Clip_limit
X_size
Ram_wea
Y_size
Ram_addra
Im_width
Pulse_start_input
wea
18
Ram_dina
Clip_limit
Numpixels
numpixels
19
CLAHE
computation
addra
dina
doutb
CLAHERAM
Clahe_ram_doutb
Discriminative
filtering End_flag
addrb
Rom_addra
B1_doutb
End_flag
Input_ram_addr
Rom_douta
addra
B2_doutb
douta
numpixels
Sourceimage
ROM
18
18
addrb
18
doutb
wea
addra
End_flag
Pulse_start_input
Numpixels
binaryRAM
Rom_addrb
Binary1_dina
Im_width
doutb
18
Binary_addra
dina
addrb
Binary_wea
Rom_dou tb
Limit1_t
Limit1_t
Limit1_b
Limit2_t
Limit2_b
Limit1_b
Limit2_t
Binarymasks
generation
wea
addra
Binary2_dina
Limit2_b
doutb
binaryRAM
dina
addrb
[Link](thetoplevelentity).
Asthediagramshows,theinputsare:
1.
Imagenumberofpixels(numpixels):necessarytoknowatwhatpointmustthesystemstopreading
the ROM because it has reached the end of the image. It is also necessary to compute certain
parametersusedinternallybycertainblocks,suchasimagetiling,equalization,etc
2.
Imagewidth(x_size)andheight(y_size):neededincertainstepswhereknowingwhenalineorrow
endsandtheaspectratiooftheimageiscritical,mainlythetilingandequalizationsteps.
3.
Cliplimit(clip_limit):[Link].
4.
Top and bottom limits 1 and 2 (limit1_t, limit1_b, limit2_t, limit2_b): used to manually define the
[Link]
generatesthebinarymasksemployedtodeterminewhatisfilteredandwhatnot.
5.
Startsignaltotriggertheprocess(start_cntr).
Ideally,thereshouldbeanotherinputtostreamtheinputimageandputitintoaRAMinsteadoftheROMof
[Link],duetotimeconstraintsandbecauseitisenoughfortestingpurposes,aROMpreloaded
[Link],RAMandFIFOentitiesinstantiatedinthe
various parts of the design has been generated with the Xilinx Core Generator tool using the faster integrated
blockram(BRAM)insteadofthehighercapacityandslowerDDR3RAM,asthereisenoughBRAMcapacityforthe
25
26
FPGAImplementationofaContrastEnhancementAlgorithm
[Link],theyarereducedtotheendflag(end_flagoutput)thatindicatestheendofthe
[Link],anoutputtostreamtheoutputimagefromthefilteringblockwouldbeavailableifthedesign
wasadaptedtoconnecttoanothercomponent,butitwasnotaddedduetotimeconstraints.
Notethatalltheblocksaresynchronousandcontrolledwiththesameclocksignal,andshareaglobalreset
[Link]/enablesignalineachblocktoactivateitwhentheprevious
[Link].
3.3
Detailedblockstructure
The big blocks of the enhancement system are divided into various components that perform different
sequential tasks, described in their own VHDL files. Each big block has its own internal top level entity that
wrapsallthedifferentsubcomponentsandthesystemsglobaltoplevelentity(main)connectsthemandprovides
accesstotheexternalinputsandsharedmemoryresources.
CLAHEblock
[Link]
[Link]:
a)
Thegenerationoftheaddressesforeachindividualtilewhenreadingthemfromthesourcepicture
anddecisionofwheretheboundariesforeachtiledolay.
b) [Link],eachtileneedsamemorypooltostore
histogramdata.
c)
ReuseofthehistogramcomputationcomponentsandROMinput.
d) Relatedtopointsb)andc),animportantamountofmultiplexingisneededtomanagetheaccessto
storageblocks.
Theclaheentityisinstantiatedasclahe_generatoronthemainentityanddescribedinclahe_complete4.[Link]
[Link]
[Link]:
Switching the image rom I/O between the histogram generation and image transformation blocks
whentheprocessingofthetilesfinishes.
Switching histogram rams array read interface betweenclipping and image transform blocks when
theprocessingofallthetilesfinishes.
Switchinghistogramramsarraywriteinterfacebetweenhistogramcomputingandimagetransform
blockswhenthetileshistogramcomputingfinish.
When both processing the different tiles and generating the CLAHE image, it has to manage also
choosing between the different tile RAM pools depending on the one that is being
computed/accessed.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Theinstancesofotherentitiesfoundintheclaheentityare:
tiler (entity image_tiling): outputs sequentially the tiles in which the image is divided, from left to
rightandtoptobottomofthesourcepicture.
histo_wrapper(entityhistogram_wrapper):containsthehistogramgenerationblock.
histo_clipper (entity clipping_wrapper): clips the histogram bins that exceed the specified limit and
replaces the histogram ram content with the cumulative distribution function (cdf) needed by the
transformationfunction.
equalizer(entityhistogram_equalizer):usingthesourceimageandallthecomputedcdfitgenerates
theCLAHEimagewhilesimultaneouslyapplyingbilinearinterpolation.
tiles(099)(entityhisto_ram2):arraywithalltheneededmemorypoolstostorethehistogramsofthe
different tiles. Despite the image being divided in 8x8 tiles, there is a total amount of 100 tiles to
make the implementation of interpolation easier by duplicating the tiles in the sides and corners.
Consequently,weendupwitha10x10=[Link]
theinterpolationsection.
When the CLAHE execution is triggered, tiler starts accessing the addresses of the top left tile, and outputs
them to the histogram generation block, which will deliver the resulting histogram to the corresponding
[Link],theramdatainputsignalsareswitchedtoconnect
them to the clipper block. When histo_clippers end flag rises, the ram data input signals are switched back to
theiroriginalpositionandthetilerblockisactivatedagaintobeginthenexthistogram.
Histogram
clippingandcdf
generation
histo_clipper
Histogram
generation
histo_wrapper
Tilegenerator
tiler
Image
transformation
equalizer
Switchwhenthereis
achangeoftile
Switchduringthe
cycleofatile
histogram
RAM(0)
tiles(0)
histogram
RAM(1)
tiles(1)
histogram
RAM(99)
tiles(99)
[Link].
Whenallthecdfareready,thetilersandclippersend_flagoutputstriggertheimagetransformation.Inthat
moment,[Link]
[Link],thehistogramRAM
27
28
FPGAImplementationofaContrastEnhancementAlgorithm
pools accessed simultaneously change dynamically depending on what is requested by the transformation
componentaccordingtothecurrentpixel.
Now,seetheinnerstructureofthedifferentblocks.
Tiler
Thebehavioroftheinstancetilerisdescribedinthefiletiling_int3.vhdundertheentitynameimage_tiling.
Its operation principle is simple from an algorithmic point of view. Making use of the board DSP blocks, it
calculates the addresses corresponding to the current tile and outputs the pixel values corresponding to those
addressesrowperrow,[Link]:
Computationofthepositioninthex/yaxisreferencedtotheoriginalimage:
;
(3.1)
Wherexposandyposarethecurrentcoordinatesreferencedtothetopleftcornerofthetileandnumx numythe
[Link].
Withthepreviousresults,itiseasytocomputethememoryaddresscorrespondingtothatpixel:
(3.2)
Wherexsizeisthewidthofthesourceimage.
Theblockalsohasacoupleofsmallcountersthatareonlyresetwhentheglobalcircuitresetisemployed,but
[Link]
andypositioninthegridoftiles(numxandnumy).
Thesizeofthetileiscomputedinrealtimebyanothercountereverytime,[Link]
ontherightandbottomcornersoftheimagecanbesmallerthantherestwhenprocessingcertainimagesizes.
Histo_wrapper
This instance, whose entity (histogram_wrapper) is described in the file histogram_wrapper_int2.vhd,
computes the histogram of any input image with the aid of its internal histogram component (instance
histogram_generator),describedinhistogram_int3.vhdandwhichincludespartofthecomputationfunctionality.
Thiscomponentiscapableofcalculatingeachhistogramintheamountofcyclesittakestoreadastreamed
image,justwithafewcyclesofinitiallatencyatthebeginningofthecomputingprocess.[8]
Everytimeapixelisread,itscorrespondingbininthehistogramstoragememoryisreadandoverwrittenwith
[Link],thebinsaregraduallyincrementedaccordingtotheinputsuntilthe
imagereachesitsendandthereads/writesstop:
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
_
1
(3.3)
Wherehisto_datainisthevalueofhisto_ram[device_data]duringthepreviousclockcycle.
Histo_datain
Ram_raddr
Device_data
Ram_wraddr
Cntr_value
Ram_wren
Histogram_int
Start_cntr
histogram
RAM(x)
Ram_datain
Ram_douta
Component
limit
Figure14.Simplifieddiagramofthehistogram_wrapperentitystructure.
Histo_clipper
As told before, despite its name, the instance histo_clipper does not only clip the histogram, but also
generates the cdf, which is used to apply the transformation later. Its entity (clipping_wrapper) is stored in
clipping_wrapperc_int2.vhd. However,
the real functionality is stored in
another
component
clipping_wrapper:
the
inside
instance
described
in
clhe_clipping_int4.vhd.
[Link].
_
_
(3.4)
29
30
FPGAImplementationofaContrastEnhancementAlgorithm
Where x[n] is the input histogram bin (number of pixels with that gray value), excess the variable that
graduallyaccumulatesthetotalexcessofpixelsandclip_limitthemaximumtoleratedvalueinthehistogram.
Next,thesecondsweepreadsagainthecontentsofthememorysequentially,butthistimeaccumulatesthe
value read plus a fraction of the excess in another register. Then, the read address is overwritten with the
accumulatorvalue,thusgeneratingthecdf:
1
(3.5)
Wherey[n]isthehistogrambinvalueascalculatedinthepreviousstep,ntheRAMposition(bin,greylevel,
between0and255),excessrepresentsthetotalclippedexcessofpixelsandnumpixelsthenumberofpixelsof
thehistogramsinputimage:inthiscontext,thetilesize.
255
(3.6)
Then,interpolatethedifferentresults:
,
,
,
(3.7)
,
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Theoretically, the borders of the image should be just linearly interpolated (in a single direction) and the
[Link]
differentiate between the bilinear, linear and not interpolated cases. To avoid that problem, duplicated border
tiles were introduced. This way, when interpolating a pixel in one direction that uses the same tile twice, the
resultislikeiftherewasnointerpolationatall,avoidingtheimplementationofspecialcases.
0,0 0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 7,0
0,0 0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 7,0
Figure16. Representationofthe100histogramRAMsarranged
[Link]
indexes represent duplicated/quadruplicated tiles (light/dark blue
andredrespectively).Wheninterpolating,intheplaceswherethe
tile is duplicated in one direction, there will not be a visible
interpolation.
0,1 0,1 1,1 2,1 3,1 4,1 5,1 6,1 7,1 7,1
0,2 0,2 1,2 2,2 3,2 4,2 5,2 6,2 7,2 7,2
0,3 0,3 1,3 2,3 3,3 4,3 5,3 6,3 7,3 7,3
0,4 0,4 1,4 2,4 3,4 4,4 5,4 6,4 7,4 7,4
0,5 0,5 1,5 2,5 3,5 4,5 5,5 6,5 7,5 7,5
0,6 0,6 1,6 2,6 3,6 4,6 5,6 6,6 7,6 7,6
0,7 0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7 7,7
0,7 0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7 7,7
Binarymasksblock
The binary masks generation is handled by the component binarizer, an instantiation of the entity
mask_generatorcontainedinthefilebinary_generator_int2.[Link]
to sequentially read the image and classifies the pixels with differentcomparators that take as a reference the
values entered externally in the main top level entity. The output is 1 if the pixel falls inside the limits
(homogeneous)or0otherwise(equation2.6).
Thenextstepisapplyingthecorrectionthatwilloutput2differentmasksasaresult,storedin2differentRAM
blocks waiting for the beginning of the filtering process. This is done by corrector_l, contained in
binary_correction_int.vhdasthebinary_correction_lessentity.However,whatthiscomponentdoesnothandleis
theadditionofzeropadding,[Link]
[Link],it
justhastobediscardedwhenreceivingtheoutputstreambeforewritingtotheRAM.
Corrector_l
[Link],itisnecessarytohavea3x3
[Link],thecomponentcanuseitsalgorithms
to detect thepatterns susceptible ofcorrection. Depending on the result, it will give a changed output for one
mask,bothorleavethebinaryvalueunchangedinbothmasks.
To get this 3x3 window, the structure employed [9] is the same that will be seen in the filtering section.
[Link]
sequentiallythefirstrowofregistersand,afterthat,[Link]
firstlineofregisters,[Link]
[Link],the
[Link].
31
32
FPGAImplementationofaContrastEnhancementAlgorithm
[Link][10].
Todetectthedifferentpatternsintroducedinchapter2,thefollowedprocedureis:
Group1:ifazeroisdetectedinthecentralpixelandthesumoftheotherzerosinthewindowis2or
less,[Link],thepixeliscorrectedinbothfinalmasks.
Group2:ifazeroisdetectedinthecentralpixelandthesumoftheotherzerosinthewindowis3,it
is likely to bea group 2pattern, but there are some specific cases known as group4 that must be
discarded.Thedetectionofthesegroup4patternsisdonebycheckingthatthedistributionofthe0
doesnotmatchthem.Becausegroup2patternsarenotaslikelytobemisclassifiedasgroup1,only
[Link],thezeroisleftunchanged.
Figure18.Group4patterns.
Group3:ifanyofthe2exactpatternsinFigure10isdetectedbycheckingthevaluesinallpositions
individually,thecentralpixelischangedinthemaskforbroaderfiltering.
Filteringblock
Finally,[Link]
entityfilter_testbench,whichactsasthelocaltoplevelentityandisdescribedinthefilefilter_system_int2.vhd.
Similarlytothesystememployedinthebinarycorrectionsection,all3filteringblocksemployanequivalent
systemtogetthefilteringwindows,[Link],whichneedstobe2pixels
widefora5x5window,isaddedatthebeginningofthefirstfilteringstageandremovedrightbeforewritingthe
final image to the output RAM after the last filter. It pipelines the 3 filtering steps without any intermediate
buffer,whichhelpsminimizingthelatencyintroducedbythisblockofthesystem.
Theonlyextrastepsinvolvedbetweenfiltersarethediscriminationsbetweenfilteredandnonfilteredpixels.
[Link]
moreefficienttojustcomputethewholefilteredimageandreplacepartsoftheoutputwithunfilteredpixelsthan
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
selectively choosing the pixels that have to be filtered. It would not save time (the current system can already
outputonepixelperclockaftertheinitialdelay)andwouldincreasealotthecomplexityofthedesign.
To do so, the source image stream is not just put inside the first filter, but also copied to a FIFO used as a
[Link]
medianfilteranditswriteenableoutputrises,[Link],
the filter output stream is aligned with the binary mask and the unfiltered image. Then, the decision can take
place:[Link],themultiplexerwill
choosethefilteredbit.
CLAHEimage
out
Gaussianpre
filtering
Binarymask
1out
Discriminative
medianfiltering1
Discriminative
medianfiltering2
Imagebuffering1
(FIFO)
Imagebuffering2
(FIFO)
Filteredimage
Mask1buffering
(FIFO)
Binarymask
2out
Mask2buffering
(FIFO)
[Link]
inputsandoutputsarezeropadded.Bearinmindthatthefilter_testbenchentityhassomelogicnotrepresented
inthisdiagram.
Buthowdothefiltersoperateinternally?
Gaussian filter
As said before, the windowing process for the Gaussian filter is the same used in the binary correction
component, but expanded to a 5x5 window. It is described in the filter_system_int2.vhd file, as the entity
smooth_filter.
Using the elements of the window and a pregenerated Gaussian kernel, it computes the output for the
centralpixelusingadiscreteconvolutionaspresentedinchapter2(equation2.4),whichemploysallthepixelsin
the 5x5 window. The kernel present in the current version of the description was generated using the fspecial
functioninMatlabandastandarddeviation=0.5:
0
0.0028
0.0208
0.0028
0
(3.8)
The kernel can be changed in the code to compile a new filter with a different blurring. The values are
multipliedby100androundedtooperatewithnaturalnumbers.Theoutputisdividedby100againtogetthe
grayvaluebetween0and255.
33
34
FPGAImplementationofaContrastEnhancementAlgorithm
Median filter
Themedianfilter,implementedastheentitymedian_filterinthefilemedian_filter2.vhd,isthesameforboth
stages of discriminative filtering. The code structure is almost identical to the one of the Gaussian filter, just
changing the computation method of the output pixel and the alignment of the output signals, as the
computationhassomecyclesoflatency.
The cause of this latency is the sorting process of the values in the WD and WO masks, as well as the final
[Link].
However, the sorting of WD and WO involves eight different integers for each mask, and consequently the
algorithmisnottrivial.
Numbersortingis,ingeneral,acomplexproblemthathasbeensubjecttoalotofstudyinorderimprovethe
[Link],thereareveryfewparticularcaseswhereasortingnetworkthatis
[Link]
9sofortunately,there
isanoptimalsolutionforthisimplementation.[11][12]
Cycle1
4comparisons
Cycle2
4comparisons
Cycle3
4comparisons
Cycle4
2comparisons
Cycle5
3comparisons
Cycle6
2comparisons
Total:6
clockcycles
Total:19
comparisons
Cycle1
1comparison
Cycle2
1comparison
Cycle3
1comparison
Total:3
clockcycles
Total:3
comparisons
[Link]=8andn=[Link],because8isnotanodd
number,themedianinthatcaseistheaverageofthe2centralvalues.
Usingthen=8networkforWDandWO,anddelayingthesameamountofcyclesthewriteenablesignalthe
implementationofthefirststageofthemedianfilteriscomplete(seeFigure5).Thesecondstageiseasier,asit
involves just 3 numbers. Three cycles, with one comparison per cycle, are enough to sort the values. The
implementationofequation2.5isfinished.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
4. Implementationresults
This chapter will expose the results of the implementation described in chapter 3. To do so, some sample
imageswillbeshownintheiroriginal,Matlabprocessedandhardwareprocessedforms,aswellastheirresulting
[Link],somekeypointswillbehighlighted.
4.1
Testpictures
[Link]
[Link]
ofpixelsandaspectratiosinordertotestthepursuedabilityofthesystemtodealwitharbitrarysizedpictures.
Also,[Link]
inmindthatthescaleoftheMatlabprocessedimagesisdifferentbecauseofhowitoperateswiththeimages,
givingafinalresultconsistinginrealvaluesbetween0and1insteadofanintegervalueinthe0255range.
Picture1
Figure21.Originalpicture1anditshistogram.
[Link];atrightusingtheMatlabscript.
35
36
FPGAImplementationofaContrastEnhancementAlgorithm
[Link],usingthehardwaredesign;atrightusingMatlab.
[Link],histogramofthehardwareoutputforpicture1;atright,histogramoftheMatlabscript's
outputobtainedwiththesameimage.
[Link]
[Link],[Link],
thestandarddeviationoftheGaussianprefilteris=0.5,thecliplimitissetat3%andthegrayrangesofthepixel
classification are 67
79 and 208
[Link],whichcanbeeasily
associated with the lack pixels in the brighter bins of its histogram, compared to the Matlab results. It is a
deviationalreadyvisibleontheCLAHEimages,priortothenoiseremovalprocess,soitcanbeconcludedthatthe
[Link]
are used to make the synthesis less complex and/or other nonidentified design bugs. On the other hand, the
hardware implementation is as good as the software one at eliminating noise. Running on top of a multicore
desktopprocessorclockedatmorethan1GHz,[Link]
simulationestimatesthat28.5millisecondsareneededtofinishtheoperationswitha25MHzclock,whichiswell
belowthemaximum600MHzclockoftheboard.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Picture2
input image histogram
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0
50
100
150
200
250
Figure25.Originalpicture2anditshistogram.
[Link];atrightusingtheMatlabscript.
[Link],usingthehardwaredesign;atrightusingMatlab.
37
38
FPGAImplementationofaContrastEnhancementAlgorithm
3500
3500
3000
3000
2500
2500
2000
2000
1500
1500
1000
1000
500
500
0
0
50
100
150
200
250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
[Link],histogramofthehardwareoutputforpicture2;atright,histogramoftheMatlabscript's
outputobtainedwiththesameimage.
Inthesecondexample,[Link]
[Link]
darkerimages,[Link]:theequalizationdoes
not make use of the brighter levels of gray. Still, the output of the design has a good contrast enhancement
compared to the original and the differences in filtering are indistinguishable. The processing times are 28.36
[Link]
issimilartothepreviousimage,whichhasasimilaramountofpixels.
Picture3
input image histogram
3000
2500
2000
1500
1000
500
0
0
50
100
150
200
250
Figure29.Originalpicture3anditshistogram.
[Link];atrightusingtheMatlabscript.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
[Link],usingthehardwaredesign;atrightusingMatlab.
2000
2000
1500
1500
1000
1000
500
500
0
0
50
100
150
200
250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
[Link],histogramofthehardwareoutputforpicture3;atright,histogramoftheMatlabscript's
outputobtainedwiththesameimage.
[Link]=0.5fortheGaussian
prefilter,acliplimitof8%andthegrayrangesincludedintheclassificationtogeneratethemasksare0
20and210
[Link]:visuallyverysimilar,justa
[Link],[Link],theMatlabscriptneeds8.85secondsto
[Link]
inbothcasesareexpectedsincetheamountofpixelstoprocessisnotablylowerthaninthepreviouspictures.
4.2
Summary
Overall,theresultsareverysimilar,[Link]
someminorimperfectionsthatpreventitfrombeingonparwiththesoftwareresultsintermsofoutputquality,
[Link] probablyduetosometruncationsincertainoperations
where rounding should be implemented, or other hidden small implementation mistakes in the CLAHE
components,[Link],theselectivefilteringseemstoworkverywellinallthepictures.
Inefficiencyterms,thehardwareimplementationclearlystandsoutwithprocessingtimesthatarevariousorders
ofmagnitudeshorterandanoutputhardtodistinguishwithoutacomparisonsidetosideandwithaccesstothe
histograms.
39
40
FPGAImplementationofaContrastEnhancementAlgorithm
5. Conclusions
In this last chapter some conclusions and last thoughts about the work will be exposed and possible future
[Link],importanteventsduringdevelopment,decisionsandthefinal
outcomeofthedesignwillbetalkedamongstotherexperiencesandlearnedlessons.
5.1
Projectresults
According to the results exposed inchapter 4, it canbeseen that theresults of theoriginal algorithm have
beenalmostmatched,[Link]
and in line with the results seen in the original implementation, with some minor imperfections. Also, the
selective filtering matches the original implementationpixel per pixel, giving a goodsmooth effectif calibrated
correctlybutalsotheexpectedweakerresultsiftheadjustmentsmakeitshowupinnondesiredplaces.
Itiscapableofprocessingcorrectly(withoutglitches)imageswitharbitraryresolutionsupto512x512pixels
without modifying or recompiling the description and, if necessary, the design can be easily scaled to
[Link]
[Link].
Intermsofefficiencyversusspeed,theimprovementsarealsoremarkable,inlinewhatonewouldexpectwith
the shift from a highlevel preliminary software implementation in Matlab to a specific FPGA implementation.
While the original supplied code needed more than 30 seconds to process a single 512x439 image on top of a
multicoreprocessorclockedatseveralGHz,thehardwaredesigncanpotentiallymodifythesamepictureinless
than30millisecondswithatheoretical25MHzclockspeedwhilegettingnearidenticalvisualresults,whichisa
very welcome improvement. With a faster clock, which is achievable with the target board, the results can be
[Link],thesenumbersdefinitelysituateitasaviabletoolfor
realtimevideoprocessing,[Link],theRAM
usehasbeenquitelow,[Link]
[Link],aswillbe
notedlaterinthischapter.
However,duetotimeconstraints,thedesigncouldnotbetestedinthephysicaltargetboardasitwasinitially
[Link],[Link]
conceptsandrevisingoldones,fullyunderstandingallthedetailsofanadvancedimageprocessingalgorithmand
[Link]
thosewasveryvaluableandwillbeveryusefulinfutureprojects,theyalsotookmuchmoretimethanexpected
(almosttwoofthefouravailablemonthsofwork),[Link],due
to some implementation difficulties found while working on the CLAHE logic (mainly the tile generation and
interpolationsteps),thisschedulerapidlybecametootighttofullyrealizetheinitialplaninjustfourmonths.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
5.2
Futurework
Anyway, the final outcome of the project is overall satisfactory. There is a fully working simulation and the
codecouldpotentiallybesynthesizedandtriedinrealhardwarespendingjustfewweekstweakingthecodefor
theproblematicdivisions(probablywithsuitableIPcores)[Link]
[Link],tomaketheimplementationsuitablefor
integration inmore complex systems, it needs interfacesto acquirethe inputpictureanddeliver the equalized
output.
Another welcome addition would be a straightforward method to change the Gaussian prefilter without
recompiling the component. It would involve making a component able to generate a kernel according to a
certainstandarddeviationandchangingthefilteringblocktoretrievethekernelgeneratedbythenewblock.
Additionsaside,[Link]
thatcanbechangedtogetbetterperformance,decreasetheamountofareausedorimprovetheoutputimage.
First,asnotedinchapter4,theoutputimagesareabitdarkerinthehardwareimplementationthantheMatlab
code. Analyzing the histograms, it can be easily seen that it is because the histogram equalization does not
relocate pixels in the highest gray levels, the ones closest to white, whereas the Matlab implementation does
distribute them better. This is probably related to certain truncations during the CLAHE step, mainly in the
equalization and interpolation block. Rounding was not initially implemented to simplify bothcode and logic in
those sections of the system. There might be other factors related to potential differences between the
mathematicalalgorithmsusedbyMatlabandthehardwaredesignaswell.
AnotherpointintheCLAHEblockthatcanbeimprovedisthestorageandaccessoftheblockRAMsthatstore
the histogram data of the different tiles. With some work, the duplicated tile RAMs could be scrapped and
[Link],the
totalRAMusageofthesystemcanbelowered.
Also,inordertoimprovethelatencyandreduceevenmoretheamountofusedRAM,thememorythatstores
theCLAHEprocessedimagecouldbereplacedbyasmallFIFOthatcouldactasabufferandbeginthefiltering
processwhenthefirstCLAHEprocessedpixelsappear(pipelining).Thesameispossiblewiththebinarymasks,but
[Link],notjusttheRAMusagewould
be lower but also the latency would be considerably reduced and the area corresponding to certain address
counterswouldbecuttoo,[Link]
way because the address counters and the RAMs were inherited from the initial test benches for the isolated
componentsoftheimplementation.
[Link],
depending on the final application, it might be desirable to modify other parts of the algorithm in order to
prioritizeeitherareaorperformanceinplaceswheretheycanconflict.
41
42
FPGAImplementationofaContrastEnhancementAlgorithm
6. Annexes
A. Matlabcodes
AlgorithmMatlabImplementation(Author:BadrunNahar)
tic;
input_image= imread('C:\Users\Roger\Documents\UPC\TFG\imatges\microscopic_merge2.jpg');
%input_image= imread('/home/roger/Documents/UPC/TFG/imatges/[Link]');
A1=rgb2gray(input_image);
cliplimit_cla=0.08;
grid_size=[8 8];
figure, imshow(A1),title('input image');
figure, imhist(A1),title('input image histogram');
% CLAHE Enhanced Image
enhanced_A=adapthisteq(A1,'ClipLimit',cliplimit_cla,'NumTiles',grid_size);
figure,imshow(enhanced_A),title('After CLAHE');
enhanced_A1=double(enhanced_A)/255;
filt_gaussian=fspecial('gaussian', [5 5], 0.5);
enhanced_A12=imfilter(enhanced_A1,filt_gaussian,'conv','replicate');
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
for i=1:m*n
if (A1(i)>=0 && A1(i)<=20)||(A1(i)>=210 && A1(i)<=250)
d(i)=1;
else d(i)=0;
end
end
figure,imshow(d),title('gray level thresholding');
%Region Correction for low-pass2
count1=8*ones(m+4,n+4);
count2=8*ones(m+4,n+4);
d1=zeros(m+4,n+4);
d1(3:m+2,3:n+2)=d(:,:);
d2=d1;
d3=d1;
for i=3:m+2
for j=3:n+2
if d1(i,j)==0
count1(i,j)=count1(i,j)-(d1(i-1,j-1)+d1(i-1,j)+d1(i-1,j+1)+d1(i,j1)+d1(i,j+1)+d1(i+1,j-1)+d1(i+1,j)+d1(i+1,j+1));
if count1(i,j)<=1
d2(i,j)=1;
%elseif ((d1(i-1,j-1)==d1(i-1,j+1)==d1(i+1,j+1)==d1(i+1,j-1)) && (d1(i1,j)==d1(i,j+1)==d1(i+1,j)==d1(i,j-1)))&& (d1(i-1,j-1)~=d1(i-1,j))
% d2(i,j)=1;
end
end
end
end
%figure,imshow(d2(3:m+2,3:n+2)),title('Group-1 & Group-3 corrected for low-pass2');
for i=3:m+2
for j=3:n+2
if d1(i,j)==0
count2(i,j)=count2(i,j)-(d1(i-1,j-1)+d1(i-1,j)+d1(i-1,j+1)+d1(i,j1)+d1(i,j+1)+d1(i+1,j-1)+d1(i+1,j)+d1(i+1,j+1));
if count2(i,j)<=3
%if ((d1(i+1,j)==d1(i+1,j+1)==d1(i,j+1)==0)||(d1(i,j+1)==d1(i-1,j+1)==d1(i1,j)==0)...
%
||(d1(i,j-1)==d1(i-1,j-1)==d1(i-1,j)==0)||(d1(i,j-1)==d1(i+1,j1)==d1(i+1,j)==0))
%d3(i,j)=0;
%else d3(i,j)=1;
%end
%elseif count2(i,j)<=2
d3(i,j)=1;
elseif ((d1(i-1,j-1)==d1(i-1,j+1)==d1(i+1,j+1)==d1(i+1,j-1)) && (d1(i1,j)==d1(i,j+1)==d1(i+1,j)==d1(i,j-1)))&& (d1(i-1,j-1)~=d1(i-1,j))
d3(i,j)=1;
end
end
end
end
%figure,imshow(d3(3:m+2,3:n+2)),title('Group-1,Group-3 & Group-2 with some preservation
corrected: R. C. for low-pass1');
% Discriminative filtering
filtered_A123=zeros(m+4,n+4);
43
44
FPGAImplementationofaContrastEnhancementAlgorithm
filtered_A1234=zeros(m+4,n+4);
%low-pass 2, binary mask d2(3:m+2,3:n+2)
for i=3:m+2
for j=3:n+2
if d2(i,j)==1
hor_ver_data1= [filtered_A123(i,j-2) filtered_A123(i,j-1) filtered_A123(i,j+1)
filtered_A123(i,j+2) filtered_A123(i-2,j) filtered_A123(i-1,j) filtered_A123(i+1,j)
filtered_A123(i+2,j) ]; Mr1=median(hor_ver_data1);
diag_data1 = [filtered_A123(i-2, j-2) filtered_A123(i-1, j-1)
filtered_A123(i+1,j+1) filtered_A123(i+2, j+2) filtered_A123(i+1,j-1) filtered_A123(i+2, j-2)
filtered_A123(i-1,j+1) filtered_A123(i-2, j+2) ]; Md1=median(diag_data1);
vect2=[Mr1 Md1 filtered_A123(i,j)];
filtered_A1234(i,j)= median(vect2);
else filtered_A1234(i,j)= filtered_A123(i,j);
end
end
end
filtered_A12345=zeros(m+4,n+4);
%low-pass 3, binary mask d2(3:m+2,3:n+2)
for i=3:m+2
for j=3:n+2
if d2(i,j)==1
hor_ver_data2= [filtered_A1234(i,j-2) filtered_A1234(i,j-1) filtered_A1234(i,j+1)
filtered_A1234(i,j+2) filtered_A1234(i-2,j) filtered_A1234(i-1,j) filtered_A1234(i+1,j)
filtered_A1234(i+2,j)]; Mr2=median(hor_ver_data2);
diag_data2 = [filtered_A1234(i-2, j-2) filtered_A1234(i-1, j-1)
filtered_A1234(i+1,j+1) filtered_A1234(i+2,j+2) filtered_A1234(i+1,j-1) filtered_A1234(i+2,j2) filtered_A1234(i-1,j+1) filtered_A1234(i-2,j+2)]; Md2=median(diag_data2);
vect3=[Mr2 Md2 filtered_A1234(i,j)];
filtered_A12345(i,j)= median(vect3);
else filtered_A12345(i,j)= filtered_A1234(i,j);
end
end
end
figure,imshow(d3(3:m+2,3:n+2)),title('region corrected mask for low-pass1');
figure,imshow(d2(3:m+2,3:n+2)),title('region corrected mask for low-pass2');
%figure,imshow(d3-d2);
figure,imshow(filtered_A123(3:m+2,3:n+2)); title('output of 1st stage of discriminative
filtering by 5x5 BMM');
figure,imshow(filtered_A1234(3:m+2,3:n+2)); title('output of 2nd stage of discriminative
filtering by 5x5 BMM')
toc;
%figure,imshow(filtered_A12345(3:m+2,3:n+2));%title('output of 3rd stage by 5X5 mult med');
%toc;
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Script to convert bitmaps to .coe format (suitable for recording into ROM
withXilinxCoregen)
Adaptedfrom[14].
%A1=rgb2gray(input_image);
45
46
FPGAImplementationofaContrastEnhancementAlgorithm
ScripttoreadandshowimagefromRAMdump(.memModelsimfile)
clear
clc
A=fopen('[Link]');
B =fgetl(A);
B =fgetl(A);
B =fgetl(A);
width=415;
height=265;
%width=512;
%height=512;
empty=512*512-width*height;
rubbish = 0;
for i=1:(empty)
rubbish = rubbish + fscanf(A, '%u\n', 1);
end
for i=1:height
i2=height+1-i;
for j=1:width
j2=width+1-j;
C(i2,j2)=uint8(fscanf(A, '%u\n', 1));
end
end
figure;
imshow(C);
fclose(A)
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
B. ProjectVHDLcode
Binary_correction_int.vhd
------------------------------------------------------------------------ Original smooth_filter: Nria Ordua
-- Modified by: Roger Oliv
-- Concordia University
-- 2012-2013
-----------------------------------------------------------------------library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;
entity binary_correction_less is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
binary_in : in std_logic_vector(0 downto 0);
binary_out1 : out std_logic_vector(0 downto 0);
binary_out2 : out std_logic_vector(0 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end binary_correction_less;
component binary_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(0 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(0 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(8 downto 0));
end component;
------------------------------------------------------------------------- Signal Declarations
------------------------------------------------------------------------
row1 : data_win;
row2 : data_win;
row3 : data_win;
row4 : data_win;
row5 : data_win;
sbinary_in, sbinary_out1, sbinary_out2
data_in1 : unsigned (0 downto 0);
data_out1 : std_logic_vector (0 downto
data_in2 : unsigned (0 downto 0);
data_out2 : std_logic_vector (0 downto
data_in3 : unsigned (0 downto 0);
data_out3 : std_logic_vector (0 downto
data_in4 : unsigned (0 downto 0);
data_out4 : std_logic_vector (0 downto
47
48
FPGAImplementationofaContrastEnhancementAlgorithm
begin
fifo1 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in1),
wr_en => wr_en,
rd_en => rd_en1,
dout => data_out1,
full => open,
empty => open,
data_count => data_count1);
fifo2 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in2),
wr_en => wr_en,
rd_en => rd_en2,
dout => data_out2,
full => open,
empty => open,
data_count => data_count2);
fifo3 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in3),
wr_en => wr_en,
rd_en => rd_en3,
dout => data_out3,
full => open,
empty => open,
data_count => data_count3);
fifo4 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in4),
wr_en => wr_en,
rd_en => rd_en4,
dout => data_out4,
full => open,
empty => open,
data_count => data_count4);
process (clk)
begin
if (clk'event and clk = '1') then
if (clearn = '0') then
t_setup <= (others => '0');
else
if su_flag = '1' and t_setup < (numpixels+im_width*2+3) then
t_setup <= t_setup + 1;
end if;
end if;
end if;
end process;
process (clk)
variable sync_cnt : integer range 0 to 6 := 0;
begin
if (clk'event and clk = '1') then --initialization
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
49
50
FPGAImplementationofaContrastEnhancementAlgorithm
not((row2(1)=0
and row3(1)=0
and row4(1)=1
and row2(2)=0
and row2(3)=1
and row3(3)=1
and row4(2)=1
and row4(3)=1)
or (row2(1)=1
and row3(1)=1
and row4(1)=1
and row2(2)=0
and row2(3)=0
and row3(3)=0
and row4(2)=1
and row4(3)=1)
or (row2(1)=1
and row3(1)=1
and row4(1)=1
and row2(2)=1
and row2(3)=1
and row3(3)=0
and row4(2)=0
and row4(3)=0)
or (row2(1)=1
and row3(1)=0
and row4(1)=0
and row2(2)=1
and row2(3)=1
and row3(3)=1
and row4(2)=0
and row4(3)=1)))
or (row2(1)=0
and row3(1)=1
and row4(1)=0
and row2(2)=1
and row2(3)=0
and row3(3)=1
and row4(2)=1
and row4(3)=0)
or (row2(1)=1
and row3(1)=0
and row4(1)=1
and row2(2)=0
and row2(3)=1
and row3(3)=0
and row4(2)=0
and row4(3)=1))) then
then
6 then
sync_cnt + 1;
<= '0'; --Finish
end if;
end if;
end if;
end process;
wr_en <= su_flag;
fifo_size <= im_width - 8; -- Added a -1 initially not forecasted
rst <= not(clearn);
sbinary_in <= unsigned(binary_in);
binary_out1 <= std_logic_vector(sbinary_out1);
binary_out2 <= std_logic_vector(sbinary_out2);
end behavior;
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
binary_generator_int2.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity mask_generator is
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--Trigger to start the generation
pulse_start_input: in std_logic;
limit1_t : in unsigned(7 downto 0);
limit1_b : in unsigned(7 downto 0);
limit2_t : in unsigned(7 downto 0);
limit2_b : in unsigned(7 downto 0);
end_flag : out std_logic;
rom_addrb : out std_logic_vector(17 downto 0);
rom_doutb : in std_logic_vector(7 downto 0);
im_width : in unsigned(9 downto 0);
numpixels : in unsigned(18 downto 0);
binary_wea : out std_logic_vector(0 downto 0);
binary_addra : out std_logic_vector(17 downto 0);
binary1_dina : out std_logic_vector(0 downto 0);
binary2_dina : out std_logic_vector(0 downto 0)
);
end mask_generator;
architecture bench of mask_generator is
signal device_data, data_in, filter_out1, filter_out2 : std_logic_vector(0 downto 0); -current pixel value
signal ram_wr_addr, ram_wr_addr2 : unsigned(17 downto 0); --address to be accessed in the RAM
containing the histogram
signal pulse_out, pulse_out2, pulse_out3, end_flag_signal: std_logic;
signal new_width, width_counter, width_counter2 : unsigned(10 downto 0);
signal
signal
signal
signal
signal
component binary_correction_less is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
binary_in : in std_logic_vector(0 downto 0);
binary_out1 : out std_logic_vector(0 downto 0);
binary_out2 : out std_logic_vector(0 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end component;
begin
corrector_l : binary_correction_less
port map(
clk => clk,
clearn => nrst,
51
52
FPGAImplementationofaContrastEnhancementAlgorithm
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
end bench;
53
54
FPGAImplementationofaContrastEnhancementAlgorithm
clahe_complete4.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity clahe is
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic;
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
clip_limit : in unsigned(6 downto 0);
rom_addra : out std_logic_vector(17 downto 0);
rom_douta : in std_logic_vector(7 downto 0);
ram_wea : out std_logic_vector(0 downto 0);
ram_addra : out std_logic_vector(17 downto 0);
ram_dina : out std_logic_vector(7 downto 0)
);
end clahe;
architecture wrapper of clahe is
component clipping_wrapper
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic; --triggers the beginning of the operation
--dataout : out std_logic_vector(17 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram clipping
numpixels : in unsigned(18 downto 0); --Total number of pixels in the image
clip_limit : in unsigned(6 downto 0); --Tolerated bin limit
histo_wea : out std_logic_vector(0 downto 0);
histo_addra : out std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addrb : out std_logic_vector(7 downto 0);
histo_doutb : in std_logic_vector(18 downto 0);
cdf_min : out unsigned(18 downto 0)
);
end component;
component histogram_wrapper
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--number of pixels of the image
cntr_value: in std_logic_vector(18 downto 0);
--Trigger to start the histogram generation
pulse_start_input: in std_logic;
--Output of the histogram data
--histogram_out: out std_logic_vector(17 downto 0);
im_douta : in std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addra : out std_logic_vector(7 downto 0);
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
type
type
type
type
55
56
FPGAImplementationofaContrastEnhancementAlgorithm
begin
--Connections between all the memory blocks and the computation block
histo_wrapper : histogram_wrapper
port map(
--global clock signal, active with its rising edge
clk => clk,
--reset signal, synchronous and active high
reset => rst,
--number of pixels of the image
cntr_value => std_logic_vector(numpixels_tile_pre),
--Trigger to start the histogram generation
pulse_start_input => histo_start_cntr,--tiler_wren... IMPORTANT! TODO!
--Output of the histogram data
im_douta => tiler_dataout_a,
histo_dina => histo_dina1,
histo_addra => histo_addra1,
histo_wea => histo_wea1,
histo_addrb => histo_addrb1,
histo_doutb => histo_doutb1,
end_flag => start_clipping_pre2
);
histo_clipper : clipping_wrapper
port map (
clk => clk,
rst => rstclip,
start_cntr => start_clipping,
end_flag => end_flag_clipper, --marks the end of the histogram clipping
numpixels => numpixels_tile_pre,
clip_limit => clip_limit, --Tolerated bin limit
histo_wea => histo_wea2,
histo_addra => histo_addra2,
histo_dina => histo_dina2,
histo_addrb => histo_addrb2,
histo_doutb => histo_doutb2,
cdf_min => cdfmin
);
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
tiler : image_tiling
port map (
romraddr => tiler_ramraddr,
datain => tiler_datain,
clk => clk,
ramwraddr => open,
dataout => tiler_dataout_a,
numx => numx,
numy => numy,
rst => rst,
start_cntr => tiler_start_cntr,
wren => tiler_wren,
end_flag => tiler_end_flag,
numpixels => numpixels,
tile_numpixels => numpixels_tile_pre,
x_size => x_size,
y_size => y_size,
xtile => x_size_sub,
ytile => y_size_sub
);
equalizer : histogram_equalizer
port map (
histo_raddr => histo_raddr_trans, -- device data as address for RAM
histo_in_ul => histo_in_ul_trans, -- histogram CDF value
histo_in_ur => histo_in_ur_trans, -- histogram CDF value
histo_in_ll => histo_in_ll_trans, -- histogram CDF value
histo_in_lr => histo_in_lr_trans, -- histogram CDF value
rom_raddr => rom_raddr_trans, --image pixel address
rom_in => rom_in_trans,--image pixel value
clk => clk,
clhe_wraddr => clhe_wraddr_trans, --Address for the transformed pixel to
write
rst => rst_trans,
start_cntr => start_cntr_trans, --Triggers the transformation operations
wren => wren_trans(0),
clhe_out => clhe_out_trans, -- Output for transformed pixel value
end_flag => end_flag_trans, --marks the end of the histogram calculation
numpixels => numpixels,
im_width => x_size,
im_height => y_size,
x_size => x_size_sub,--Subimage
y_size => y_size_sub,--Subimage
ul_id => ul_id_trans,
ur_id => ur_id_trans,
ll_id => ll_id_trans,
lr_id => lr_id_trans,
numpixels_ul => numpixels_ul_trans, --number of pixels of the image
histo_min_ul => histo_min_ul_trans, --Lowest CDF value of the histogram
numpixels_ur => numpixels_ur_trans, --number of pixels of the image
histo_min_ur => histo_min_ur_trans, --Lowest CDF value of the histogram
numpixels_ll => numpixels_ll_trans, --number of pixels of the image
histo_min_ll => histo_min_ll_trans, --Lowest CDF value of the histogram
numpixels_lr => numpixels_lr_trans, --number of pixels of the image
histo_min_lr => histo_min_lr_trans); --Lowest CDF value of the histogram
process(clk)
begin
if (CLK'EVENT AND CLK = '1') then
if rst='1' then
start_clipping_pre <= '0';
start_clipping <= '0';
else
start_clipping_pre <= start_clipping_pre2;
start_clipping <= not(start_clipping_pre) and start_clipping_pre2;
end if;
end if;
end process;
process(rst, start_clipping_pre, histo_dina1, histo_dina2, histo_addra1, histo_addra2,
histo_wea1, histo_wea2, histo_addrb1, histo_addrb2, histo_doutb, cdfmin, numpixels_tile,
57
58
FPGAImplementationofaContrastEnhancementAlgorithm
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
histo_min_ul_trans
histo_min_ur_trans
histo_min_ll_trans
histo_min_lr_trans
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
histo_in_ul_trans
histo_in_ur_trans
histo_in_ll_trans
histo_in_lr_trans
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
for p in 1 to 8 loop
for q in 1 to 8 loop
if (p*10+q)=numtile_int then
if start_clipping_pre = '1' then
histo_dina(p*10+q) <= histo_dina2;
histo_addra(p*10+q) <= histo_addra2;
histo_wea(p*10+q) <= histo_wea2;
histo_addrb(p*10+q) <= histo_addrb2;
tile_numpixels(p*10+q) <= tile_numpixels(p*10+q);
cdf_min(p*10+q) <= cdfmin;
else
histo_dina(p*10+q) <= histo_dina1;
histo_addra(p*10+q) <= histo_addra1;
histo_wea(p*10+q) <= histo_wea1;
histo_addrb(p*10+q) <= histo_addrb1;
tile_numpixels(p*10+q) <= numpixels_tile;
cdf_min(p*10+q) <= cdfmin;
end if;
else
histo_dina(p*10+q) <= (others=>'0');
histo_addra(p*10+q) <= (others=>'0');
histo_wea(p*10+q) <= (others=>'0');
histo_addrb(p*10+q) <= (others=>'0');
tile_numpixels(p*10+q) <= tile_numpixels(p*10+q);
cdf_min(p*10+q) <= cdf_min(p*10+q);
end if;
end loop;
end loop;
if start_clipping_pre = '1' then
histo_doutb1 <= (others=>'0');
histo_doutb2 <= histo_doutb(numtile_int);
else
histo_doutb1 <= histo_doutb(numtile_int);
histo_doutb2 <= (others=>'0');
end if;
numpixels_ul_trans
numpixels_ur_trans
numpixels_ll_trans
numpixels_lr_trans
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
histo_min_ul_trans
histo_min_ur_trans
histo_min_ll_trans
histo_min_lr_trans
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
histo_in_ul_trans
histo_in_ur_trans
histo_in_ll_trans
histo_in_lr_trans
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
59
60
FPGAImplementationofaContrastEnhancementAlgorithm
for p in 0 to 9 loop
for q in 0 to 9 loop
end loop;
end loop;
numpixels_ul_trans
numpixels_ur_trans
numpixels_ll_trans
numpixels_lr_trans
<=
<=
<=
<=
tile_numpixels(to_integer(ul_id_trans));
tile_numpixels(to_integer(ur_id_trans));
tile_numpixels(to_integer(ll_id_trans));
tile_numpixels(to_integer(lr_id_trans));
histo_min_ul_trans
histo_min_ur_trans
histo_min_ll_trans
histo_min_lr_trans
<=
<=
<=
<=
cdf_min(to_integer(ul_id_trans));
cdf_min(to_integer(ur_id_trans));
cdf_min(to_integer(ll_id_trans));
cdf_min(to_integer(lr_id_trans));
histo_in_ul_trans
histo_in_ur_trans
histo_in_ll_trans
histo_in_lr_trans
<=
<=
<=
<=
histo_doutb(to_integer(ul_id_trans));
histo_doutb(to_integer(ur_id_trans));
histo_doutb(to_integer(ll_id_trans));
histo_doutb(to_integer(lr_id_trans));
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
if rst='1' then
numtile_int <= 11;
elsif histo_start_cntr = '1' then
numtile_int <= to_integer(numx)+11+to_integer(numy)*10;
else
numtile_int <= numtile_int;
end if;
if rst='1' then
numpixels_tile <= (others=>'0');
transform <= '0';
start_cntr_trans <= '0';
else
numpixels_tile <= numpixels_tile_pre +1;
transform <= start_cntr_trans or transform;
start_cntr_trans <= (end_flag_clipper and tiler_end_flag);
end if;
end if;
end process;
61
62
FPGAImplementationofaContrastEnhancementAlgorithm
clhe_clipping_int4.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity histogram_clipper is
port ( ramraddr : out std_logic_vector(7 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (18 downto 0); -- RAM data out
clk : in std_logic;
ramwraddr : out std_logic_vector( 7 downto 0); -- written histogram bin as address
for RAM
rst : in std_logic;
start_cntr : in std_logic;--triggers the beginning of the operation
wren : out std_logic;--write enable output for the ram
dataout : out std_logic_vector(18 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0); --number of pixels of the histogram
clip_limit : in unsigned(6 downto 0); -- Clip limit in %
cdf_min : out unsigned(18 downto 0)
);
end histogram_clipper;
architecture clipper of histogram_clipper is
--Signal list
signal start_add, pre_start_add, pre_wren, pre_wren2 : std_logic;
signal
signal
signal
signal
signal
begin
--Process to read all the histogram, calculate the amount of absolute clipping and
generate the CDF for the transformation.
-- This is done in 2 sweeps:
--The first sweep calculates the amount of pixels that exceed the clip limit and clips
them from the corresponding bins.
--The second sweep adds to each bin the increase calculated from those counted excess
pixels and substitutes the bin value
--with the corresponding CDF value.
process(clk)
begin
if (CLK'EVENT AND CLK = '1') then
if rst = '1' then
start_add <='0';
pre_start_add <= '0';
else
start_add <= start2;
pre_start_add <= start_add; --Delay the beginning of the operation start
signal until
end if;
--it is aligned with input data
------------------------------beginning of reading block, shared amongst the 2 sweeps----------------------------------------------if rst = '1' or start2='0' then --initialization of variables
ramraddru <= to_unsigned(0, 8);
ramwraddru <= to_unsigned(0, 8);
ramwraddru2 <= to_unsigned(0, 8);
pre_wren <= '0';
pre_wren2 <= '0';
wren1 <= '0';
incr_trigger <= '0';
end_flag_signal <= '0';
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
elsif ramraddru = 255 then --Don't let it keep writing if the whole memory has
been sweeped.
ramraddru <= to_unsigned(255, 8);
ramwraddru2 <= ramraddru;
ramwraddru <= ramwraddru2;--align the write addresses and write enable with
the output data
pre_wren2 <= '0';
pre_wren <= pre_wren2;
wren1 <= pre_wren;
incr_trigger <= operation; --will turn to 1 only the first time we do the
memory sweep
end_flag_signal <= not(operation);
else
ramraddru <= ramraddru + 1; --Sweep the addresses
ramwraddru2 <= ramraddru;--align the write addresses and write enable with
the output data
ramwraddru <= ramwraddru2;
pre_wren2 <= '1';
pre_wren <= '1';
wren1 <= pre_wren;
incr_trigger <= '0';
end_flag_signal <= '0';
end if;
----------------------------end of reading block, beginning of writing block---------------------------------------if (start_add='0' and operation='1') or rst='1' then --Clipping block: excess
variable counter management
excess <= to_unsigned(0, 19);
elsif operation = '1' then
if difference >= to_signed(1,19) and pre_wren = '1' then --wren condition
here is to avoid increases in
excess <= excess + resize(unsigned(difference), 18); --the excess
variable after finishing the sweep.
else
excess <= excess;
end if;
else
excess<=excess;
end if;
if start_add='0' or rst='1' then --Clipping block: data output management
(clipped output or CDF value, according to
data_out <= (others=>'0');--the sweep number.
elsif operation = '1' then
if difference >= to_signed(1,19) and pre_wren = '1' then --wren condition
here is to avoid increases in
data_out <= abs_limit;
--the excess variable after
finishing the sweep.
else
data_out <= unsigned(datain);
end if;
else
data_out <= (data_out+resize((unsigned(datain) + binIncr),19));
end if;
if rst='1' then --Management of the start signal, used to trigger the shared
logic of the 2 sweeps
start <= '0';
elsif (start_cntr = '1' or incr_trigger = '1') then
start <= '1';
elsif (ramraddru = 255) then
start <= '0';
else
start<=start;
end if;
if rst='1' then
63
64
FPGAImplementationofaContrastEnhancementAlgorithm
(others=>'0');
and ramwraddru = 0 and wren1='1' then
unsigned(data_out);
cdf_min_signal;
end if;
end process;
wren <= wren1;
ramraddr <= std_logic_vector(ramraddru);
ramwraddr <= std_logic_vector(ramwraddru);
---debug--op1 <=(numpixels*clip_limit);
op2 <=op1/to_unsigned(100, 7);
abs_limit <= resize(op2, 19);
---fi debug--difference <= signed('0' & datain) - signed('0'&abs_limit); --Computes the difference
between bin value and clipping limit
binIncr <= resize(excess, 27)/256;
dataout <= std_logic_vector(data_out);
cdf_min <= cdf_min_signal;
end clipper;
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
clipping_wrapper_int2.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity clipping_wrapper is
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic; --triggers the beginning of the operation
--dataout : out std_logic_vector(17 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram clipping
numpixels : in unsigned(18 downto 0); --Total number of pixels in the image
clip_limit : in unsigned(6 downto 0); --Tolerated bin limit
histo_wea : out std_logic_vector(0 downto 0);
histo_addra : out std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addrb : out std_logic_vector(7 downto 0);
histo_doutb : in std_logic_vector(18 downto 0);
cdf_min : out unsigned(18 downto 0)
);
end clipping_wrapper;
architecture wrapper of clipping_wrapper is
component histogram_clipper
port ( ramraddr : out std_logic_vector(7 downto 0) ; -- accessed histogram bin as
address for RAM
datain : in std_logic_vector (18 downto 0); -- RAM data out
clk : in std_logic;
ramwraddr : out std_logic_vector( 7 downto 0); -- written histogram bin as address
for RAM
rst : in std_logic;
start_cntr : in std_logic;--triggers the beginning of the operation
wren : out std_logic;--write enable output for the ram
dataout : out std_logic_vector(18 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0); --number of pixels of the histogram
clip_limit : in unsigned(6 downto 0); -- Clip limit in %
cdf_min : out unsigned(18 downto 0)
);
end component;
signal ramraddr, ramwraddr : std_logic_vector(7 downto 0);
signal wren : std_logic_vector(0 downto 0);
signal datain, dataout_a : std_logic_vector (18 downto 0);
begin
--Process to read all the histogram and calculate the amount of absolute clipping
histo_clipper : histogram_clipper
port map (
ramraddr => ramraddr,
datain => datain,
clk => clk,
ramwraddr => ramwraddr,
rst => rst,
start_cntr => start_cntr,
wren => wren(0),
dataout => dataout_a,
end_flag => end_flag,
numpixels => numpixels,
clip_limit => clip_limit,
cdf_min => cdf_min);
65
66
FPGAImplementationofaContrastEnhancementAlgorithm
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
filter_system_int2.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity filter_testbench is
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--Trigger to start the generation
pulse_start_input: in std_logic;
--Output of the data
dataout : out std_logic_vector(7 downto 0);
memout : out std_logic_vector(7 downto 0);
end_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0);
b1_doutb : in std_logic_vector(0 downto 0);
b2_doutb : in std_logic_vector(0 downto 0);
input_ram_addr : out std_logic_vector(17 downto 0);
clahe_ram_doutb : in std_logic_vector(7 downto 0)
);
end filter_testbench;
architecture bench of filter_testbench is
signal device_data, device_data_2, filter_out_s, filter_out_sf, filter_out_sf1,
filter_out_sf2, filter_out_sf3, filter_out_sf4, filter_out_sf5, filter_out_sf6,
filter_out_sf7, filter_out_sf8, filter_out_sf9, filter_out_1, filter_out_1_2, filter_out_1f,
filter_out_2 : std_logic_vector(7 downto 0); --current pixel value
signal ram_wr_addr, ram_wr_addr2 : unsigned(17 downto 0); --address to be accessed in the RAM
containing the histogram
signal pulse_out, pulse_out2, pulse_out3, end_flag_signal: std_logic;
signal new_width, width_counter, width_counter2 : unsigned(10 downto 0);
signal
signal
signal
signal
downto
signal
component smooth_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end component;
component clhe_ram2
port (
clka: IN std_logic;
wea: IN std_logic_VECTOR(0 downto 0);
addra: IN std_logic_VECTOR(17 downto 0);
dina: IN std_logic_VECTOR(7 downto 0);
douta: OUT std_logic_VECTOR(7 downto 0));
end component;
67
68
FPGAImplementationofaContrastEnhancementAlgorithm
component median_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end component;
component wait_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(7 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(7 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(11 downto 0));
end component;
component binary1_wait_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(0 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(0 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(11 downto 0));
end component;
component binary2_wait_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(0 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(0 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(11 downto 0));
end component;
begin
smoother : smooth_filter
port map(
clk => clk,
clearn => nrst,
su_flag => pulse_out3,
smooth_filter_in => device_data,
smooth_filter_out => filter_out_s,
set_up_flag => wren_s(0),
im_width => new_width,
numpixels => new_numpixels);
discriminative1 : median_filter
port map(
clk => clk,
clearn => nrst,
su_flag => wren_s(0),
smooth_filter_in => filter_out_s,
smooth_filter_out => filter_out_1,
set_up_flag => wren_1(0),
im_width => new_width,
numpixels => new_numpixels);
discriminative2 : median_filter
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
port map(
clk => clk,
clearn => nrst,
su_flag => wren_1_2(0),
smooth_filter_in => device_data_2,
smooth_filter_out => filter_out_2,
set_up_flag => wren_2(0),
im_width => new_width,
numpixels => new_numpixels);
waitf_1 : wait_fifo
port map (
clk => clk,
rst => reset,
din => filter_out_s,
wr_en => wren_s(0),
rd_en => wren_1(0),
dout => filter_out_sf,
full => open,
empty => open,
data_count => open);
waitf_2 : wait_fifo
port map (
clk => clk,
rst => reset,
din => device_data_2,
wr_en => wren_1_2(0),
rd_en => wren_2(0),
dout => filter_out_1f,
full => open,
empty => open,
data_count => open);
binary_wait1 : binary1_wait_fifo
port map (
clk => clk,
rst => reset,
din => b1_doutb,
wr_en => pulse_out3,
rd_en => wren_1(0),
dout => binary1_data,
full => open,
empty => open,
data_count => open);
binary_wait2 : binary2_wait_fifo
port map (
clk => clk,
rst => reset,
din => b2_doutb,
wr_en => pulse_out3,
rd_en => wren_2(0),
dout => binary2_data,
full => open,
empty => open,
data_count => open);
ram : clhe_ram2
port map (
clka => clk,
wea => wren_2,
addra => std_logic_vector(ram_wr_addr2),
dina => data_ram,
douta => memout);
69
70
FPGAImplementationofaContrastEnhancementAlgorithm
if (CLK'EVENT AND CLK = '1') then --Part to write the filtered image. It also
discards the extra pixels added in the
if reset = '1' or wren_2="0" then--edges for zero padding, as they contain no
useful information.
ram_wr_addr <= to_unsigned(0, 18);
width_counter2 <= (others=>'0');
data_out <= (others=>'0');
end_flag_signal <= '0';
end_flag <= '0';
elsif unsigned(ram_wr_addr) >= (unsigned(numpixels) - 1) then
ram_wr_addr <= ram_wr_addr;
width_counter2 <= width_counter2;
data_out <= filter_out_2;
end_flag_signal <= '1';
end_flag <= end_flag_signal;
elsif width_counter2 = (im_width) or width_counter2 = (im_width + 1) then
ram_wr_addr <= ram_wr_addr;
width_counter2 <= width_counter2 + 1;
data_out <= filter_out_2;
end_flag <= end_flag_signal;
elsif width_counter2 = (im_width + 2) then
ram_wr_addr <= ram_wr_addr+1;
width_counter2 <= "00000000001";
data_out <= filter_out_2;
end_flag <= end_flag_signal;
else
ram_wr_addr <= ram_wr_addr + 1;
width_counter2 <= width_counter2 + 1;
data_out <= filter_out_2;
end_flag <= end_flag_signal;
end if;
ram_wr_addr2 <= ram_wr_addr;
if reset = '1' then
filter_out_sf1 <=
filter_out_sf2 <=
filter_out_sf3 <=
filter_out_sf4 <=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
filter_out_sf5
filter_out_sf6
filter_out_sf7
filter_out_sf8
filter_out_sf9
<=
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
<=
<=
<=
<=
<=
<=
<=
<=
<=
filter_out_sf;
filter_out_sf1;
filter_out_sf2;
filter_out_sf3;
filter_out_sf4;
filter_out_sf5;
filter_out_sf6;
filter_out_sf7;
filter_out_sf8;
end bench;
71
72
FPGAImplementationofaContrastEnhancementAlgorithm
[Link]
------------------------------------------------------------------------ Original smooth_filter: Nria Ordua
-- Modified by: Roger Oliv
-- Concordia University
-- 2012-2013
-----------------------------------------------------------------------library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;
entity smooth_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end smooth_filter;
component filter_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(7 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(7 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(8 downto 0));
end component;
------------------------------------------------------------------------- Signal Declarations
------------------------------------------------------------------------
begin
fifo1 : filter_fifo
port map (
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
process (clk)
begin
if (clk'event and clk = '1') then
if (clearn = '0') then
t_setup <= (others => '0');
else
if su_flag = '1' and t_setup < (numpixels+im_width*2+3) then
t_setup <= t_setup + 1;
end if;
end if;
end if;
end process;
process (clk)
variable sync_cnt : integer range 0 to 6 := 0;
begin
if (clk'event and clk = '1') then --initialization
if (clearn = '0') then
set_up_flag <= '0';
activated <= '0';
for j in 0 to 4 loop -- fifo reset
row1(j) <= (others => '0');
row2(j) <= (others => '0');
row3(j) <= (others => '0');
row4(j) <= (others => '0');
73
74
FPGAImplementationofaContrastEnhancementAlgorithm
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
smooth_filter_out <=
std_logic_vector(resize(((("0000000"&row2(1)) +
("0000"&row2(2)&"000") + ("0000000"&row2(3)) + ("0000"&row3(1)&"000") + row3(2)*62 +
("0000"&row3(3)&"000")+("0000000"&row4(1)) + ("0000"&row4(2)&"000") +
("0000000"&row4(3)))/100),8));
else
if activated <= '1' then
--if sync_cnt < 6 then
-- sync_cnt := sync_cnt + 1;
--else
--set_up_flag <= '0'; --Finish
--end if;
end if;
end if;
end if;
end if;
end process;
wr_en <= su_flag;
fifo_size <= im_width - 8; --Added a -1 not initially forecasted
rst <= not(clearn);
end behavior;
75
76
FPGAImplementationofaContrastEnhancementAlgorithm
histogram_int3.vhd
--2013/04/20--Forked from the description available at:
--[Link]
---------------------------------------------------------------------------------------------2013/05/01-- The code has been simplified to remove unneeded functionality and make
interfacing easier.
---------------------------------------------------------------------------------------------2013/05/03-- Fixed a bug which caused histogram count increasing to not work properly when
--two or more consecutive pixels with the same exact value appeared.
--2013/05/04-- Comments added for clarity and future reference.
--2013/05/07-- More bugfixes, related to component reset.
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
entity histogram is
port ( addrin : in std_logic_vector(7 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (18 downto 0); -- RAM data out
clk : in std_logic; --Synchronous rising edge clock
cntr_value : in std_logic_vector (18 downto 0); --Number of pixels of the input image
ramwraddr : out std_logic_vector(7 downto 0); --Address where the updated histogram
value must be
--[Link] to the grey value (from 0 to 255)
rst : in std_logic; --global reset
start_cntr : in std_logic; --triggers the start of the histogram calculation
wren : out std_logic; --write enable output for the ram containing the histogram
data_out : out std_logic_vector(18 downto 0); -- RAM data in
end_flag : out std_logic
);
end histogram;
architecture hlsm of histogram is
signal wr_addr, wr_addr1 : std_logic_vector(7 downto 0);
signal pre_cntr, next_cntr, pre_dout, dout : std_logic_vector(18 downto 0); -- count no. of
samples for which histogram to be computed.
signal addr, pre_addr : std_logic_vector(7 downto 0);
signal end_flag_signal, wren_signal, wren_next, wren_next1, wren_next2, addrpreaddr :
std_logic;
begin
addr <= addrin;
process(clk,rst)
begin
if(clk'event and clk = '1') then
if(rst = '1' or start_cntr='1') then --restart all the procedure
pre_cntr <= (others => '0');
wren_next1 <= '0';
wren_next <= '0';
wren_signal <= '0';
wren <= '0';
pre_addr <= (others=>'0');
addrpreaddr <= '0';
wr_addr1 <= (others => '0');
wr_addr <= (others => '0');
end_flag_signal <= '0';
else
pre_cntr <= next_cntr;
wren_next1 <= wren_next2;
wren_next <= wren_next1;--delay write enable changes to sync it
wren <= wren_next;
--with the output of valid values
wren_signal <= wren_next;
pre_addr <= addrin; --store current pixel gray value and its associated
pre_dout <= dout;
--counter for use if next pixel's gray value is equal
if wren_signal='1' and wren_next='0' then
end_flag_signal <= '1';
else
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
77
78
FPGAImplementationofaContrastEnhancementAlgorithm
histogram_wrapper_int2.vhd
--Adds transformation function to complete CLHE functionality.
--2013/05/04--Comments added for clarity and future reference.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity histogram_wrapper is
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--number of pixels of the image
cntr_value: in std_logic_vector(18 downto 0);
--Trigger to start the histogram generation
pulse_start_input: in std_logic;
--Output of the histogram data
--histogram_out: out std_logic_vector(17 downto 0);
--im_addra : out std_logic_vector(17 downto 0);
--enable : in std_logic;
im_douta : in std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addra : out std_logic_vector(7 downto 0);
histo_wea : out std_logic_vector(0 downto 0);
histo_addrb : out std_logic_vector(7 downto 0);
histo_doutb : in std_logic_vector(18 downto 0);
--histo_rstb : out std_logic;
end_flag : out std_logic
);
end histogram_wrapper;
architecture wrapper of histogram_wrapper is
signal device_data : std_logic_vector(7 downto 0); --current pixel value
--signal sel_data_input : std_logic; --selector between histogram generation/reading modes
signal ram_wr_addr : std_logic_vector(7 downto 0); --address to be accessed in the RAM
containing the histogram
signal wren : std_logic_vector(0 downto 0); --histogram ram write enable
signal dataout : std_logic_vector(18 downto 0); --output of the histogram counters values
--signal pulse_out, pulse_out_2, pulse_out_3: std_logic;
signal pulse_out, rstb, end_flag_signal : std_logic;
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
begin
--image_rom, histogram_generator and histogram_ram are interconnected according to the design
--principle proposed in:
--[Link]
--However, some functionality was simplified or removed because it was not needed.
histogram_generator : histogram
port map(
addrin => device_data,
datain => histo_out,
clk => clk,
cntr_value => cntr_value,
ramwraddr => ram_wr_addr,
rst => reset,
start_cntr => pulse_out,
wren => wren(0),
data_out => dataout,
end_flag => end_flag_signal
);
79
80
FPGAImplementationofaContrastEnhancementAlgorithm
[Link]
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity main is
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic;
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
clip_limit : in unsigned(6 downto 0);
limit1_t : in unsigned(7 downto 0);
limit1_b : in unsigned(7 downto 0);
limit2_t : in unsigned(7 downto 0);
limit2_b : in unsigned(7 downto 0)
);
end main;
architecture system of main is
signal rom_addrb, binary_addra, rom_addra, clahe_addra, binary_addrb : std_logic_vector(17
downto 0);
signal binary1_dina, binary2_dina, binary_wea, clahe_wea, binary1_doutb, binary2_doutb :
std_logic_vector(0 downto 0);
signal rom_doutb, rom_douta, clahe_dina, clahe_doutb : std_logic_vector(7 downto 0);
signal start_filters, end_flag_masks, end_flag_clahe : std_logic;
signal x_size2 : unsigned(10 downto 0);
component clahe_ram_dual
port (
clka: IN std_logic;
wea: IN std_logic_VECTOR(0 downto 0);
addra: IN std_logic_VECTOR(17 downto 0);
dina: IN std_logic_VECTOR(7 downto 0);
clkb: IN std_logic;
addrb: IN std_logic_VECTOR(17 downto 0);
doutb: OUT std_logic_VECTOR(7 downto 0));
end component;
component prova_grisa_rom3 --contains the source image
port (
clka: IN std_logic;
addra: IN std_logic_VECTOR(17 downto 0);
douta: OUT std_logic_VECTOR(7 downto 0);
clkb: IN std_logic;
addrb: IN std_logic_VECTOR(17 downto 0);
doutb: OUT std_logic_VECTOR(7 downto 0));
end component;
component binary_ram_dual
port (
clka: IN std_logic;
wea: IN std_logic_VECTOR(0 downto 0);
addra: IN std_logic_VECTOR(17 downto 0);
dina: IN std_logic_VECTOR(0 downto 0);
clkb: IN std_logic;
addrb: IN std_logic_VECTOR(17 downto 0);
doutb: OUT std_logic_VECTOR(0 downto 0));
end component;
component clahe is
port (
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic;
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
clip_limit : in unsigned(6 downto 0);
rom_addra : out std_logic_vector(17 downto 0);
rom_douta : in std_logic_vector(7 downto 0);
ram_wea : out std_logic_vector(0 downto 0);
ram_addra : out std_logic_vector(17 downto 0);
ram_dina : out std_logic_vector(7 downto 0)
);
end component;
begin
clahe_generator : clahe
81
82
FPGAImplementationofaContrastEnhancementAlgorithm
port map (
clk => clk,
rst => rst,
start_cntr => start_cntr,
end_flag => end_flag_clahe,
numpixels => numpixels,
x_size => x_size,
y_size => y_size,
clip_limit => clip_limit,
rom_addra => rom_addra,
rom_douta => rom_douta,
ram_wea => clahe_wea,
ram_addra => clahe_addra,
ram_dina => clahe_dina
);
image_rom : prova_grisa_rom3
port map (
clka => clk,
addra => rom_addra,
douta => rom_douta,
clkb => clk,
addrb => rom_addrb,
doutb => rom_doutb);
bram1 : binary_ram_dual
port map (
clka => clk,
wea => binary_wea,
addra => binary_addra,
dina => binary1_dina,
clkb => clk,
addrb => binary_addrb,
doutb => binary1_doutb);
bram2 : binary_ram_dual
port map (
clka => clk,
wea => binary_wea,
addra => binary_addra,
dina => binary2_dina,
clkb => clk,
addrb => binary_addrb,
doutb => binary2_doutb);
binarizer : mask_generator
port map(
clk => clk,
reset => rst,
pulse_start_input => start_cntr,
limit1_t => limit1_t,
limit1_b => limit1_b,
limit2_t => limit2_t,
limit2_b => limit2_b,
end_flag => end_flag_masks,
rom_addrb => rom_addrb,
rom_doutb => rom_doutb,
im_width => x_size,
numpixels => numpixels,
binary_wea => binary_wea,
binary_addra => binary_addra,
binary1_dina => binary1_dina,
binary2_dina => binary2_dina
);
output_clahe : clahe_ram_dual
port map (
clka => clk,
wea => clahe_wea,
addra => clahe_addra,
dina => clahe_dina,
clkb => clk,
addrb => binary_addrb,
doutb => clahe_doutb);
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
filters : filter_testbench
port map(
--global clock signal, active with its rising edge
clk => clk,
--reset signal, synchronous and active high
reset => rst,
--Trigger to start the generation
pulse_start_input => start_filters,
--Output of the data
dataout => open,
memout => open,
end_flag => end_flag,
--addrb : in std_logic_vector(17 downto 0);
--doutb : out std_logic_vector(7 downto 0);
im_width => x_size2,
numpixels => numpixels,
--ram_wea : out std_logic_vector(0 downto 0);
--ram_addra : out std_logic_vector(17 downto 0);
--ram_dina : out std_logic_vector(7 downto 0);
b1_doutb => binary1_doutb,
b2_doutb => binary2_doutb,
input_ram_addr => binary_addrb,
clahe_ram_doutb => clahe_doutb
);
end system;
83
84
FPGAImplementationofaContrastEnhancementAlgorithm
median_filter2.vhd
------------------------------------------------------------------------ Contrast enhancement algorithm with noise removal
-- 2nd level of hierarchy - smooth_filter
-- Original: Nria Ordua
-- Modified by: Roger Oliv
-- Concordia University
-- 2012-2013
-----------------------------------------------------------------------library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;
entity median_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end median_filter;
component filter_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(7 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(7 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(8 downto 0));
end component;
------------------------------------------------------------------------- Signal Declarations
------------------------------------------------------------------------
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
begin
fifo1 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in1),
wr_en => wr_en,
rd_en => rd_en1,
dout => data_out1,
full => open,
empty => open,
data_count => data_count1);
fifo2 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in2),
wr_en => wr_en,
rd_en => rd_en2,
dout => data_out2,
full => open,
empty => open,
data_count => data_count2);
fifo3 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in3),
wr_en => wr_en,
rd_en => rd_en3,
dout => data_out3,
full => open,
empty => open,
data_count => data_count3);
fifo4 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in4),
wr_en => wr_en,
rd_en => rd_en4,
dout => data_out4,
full => open,
empty => open,
data_count => data_count4);
process (clk)
begin
if (clk'event and clk = '1') then
if (clearn = '0') then
t_setup <= (others => '0');
else
if su_flag = '1' and t_setup < (numpixels+im_width*2+3+9) then
t_setup <= t_setup + 1;
end if;
end if;
end if;
end process;
process (clk)
variable sync_cnt : integer range 0 to 6 := 0;
begin
85
86
FPGAImplementationofaContrastEnhancementAlgorithm
<=
<=
<=
<=
<=
<=
(others
(others
(others
(others
(others
(others
=>
=>
=>
=>
=>
=>
'0');
'0');
'0');
'0');
'0');
'0');
end loop;
for n in 0
wf0(n)
wf1(n)
wf2(n)
end loop;
to
<=
<=
<=
2 loop
(others => '0');
(others => '0');
(others => '0');
for o in 0 to 5 loop
wc(o) <= (others => '0');
end loop;
smooth_filter_out <= (others => '0');
data_in1 <= (others => '0');
data_in2 <= (others => '0');
data_in3 <= (others => '0');
data_in4 <= (others => '0');
rd_en1 <= '0';
rd_en2 <= '0';
rd_en3 <= '0';
rd_en4 <= '0';
elsif su_flag = '1' then --Shifts all the registers and fifo's data one position
row1(0) <= unsigned(smooth_filter_in);
row1(1 to 4) <= row1(0 to 3);
data_in1 <= row1(4);
if (unsigned(data_count1) >= fifo_size) then --Maintain a constant amount of data
in the fifo
rd_en1 <= '1';
--components depending on the image size
row2(0) <= unsigned(data_out1);
else
rd_en1 <= '0';
row2(0) <= (others=>'0');
end if;
row2(1 to 4) <= row2(0 to 3);
data_in2 <= row2(4);
if (unsigned(data_count2) >= fifo_size) then
rd_en2 <= '1';
row3(0) <= unsigned(data_out2);
else
rd_en2 <= '0';
row3(0) <= (others=>'0');
end if;
row3(1 to 4) <= row3(0 to 3);
data_in3 <= row3(4);
if (unsigned(data_count3) >= fifo_size) then
rd_en3 <= '1';
row4(0) <= unsigned(data_out3);
else
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
end if;
wc(0) <= row3(2);
for o in 0 to 4 loop
wc(o+1) <= wc(o);
end loop;
--Calculation of the wd kernel median
if (wd0(0)
wd1(0)
wd1(7)
else
wd1(0)
wd1(7)
end if;
if (wd0(1)
wd1(1)
wd1(6)
else
wd1(1)
wd1(6)
end if;
if (wd0(2)
wd1(2)
wd1(5)
else
wd1(2)
wd1(5)
end if;
if (wd0(3)
wd1(3)
wd1(4)
else
wd1(3)
wd1(4)
end if;
<= wd0(0);
<= wd0(7);
<= wd0(1);
<= wd0(6);
<= wd0(2);
<= wd0(5);
<= wd0(3);
<= wd0(4);
--Stage 2
87
88
FPGAImplementationofaContrastEnhancementAlgorithm
if (wd1(0)
wd2(0)
wd2(3)
else
wd2(0)
wd2(3)
end if;
if (wd1(4)
wd2(4)
wd2(7)
else
wd2(4)
wd2(7)
end if;
if (wd1(1)
wd2(1)
wd2(2)
else
wd2(1)
wd2(2)
end if;
if (wd1(5)
wd2(5)
wd2(6)
else
wd2(5)
wd2(6)
end if;
<= wd1(0);
<= wd1(3);
<= wd1(4);
<= wd1(7);
<= wd1(1);
<= wd1(2);
<= wd1(5);
<= wd1(6);
--Stage 3
if (wd2(0)
wd3(0)
wd3(1)
else
wd3(0)
wd3(1)
end if;
if (wd2(2)
wd3(2)
wd3(3)
else
wd3(2)
wd3(3)
end if;
if (wd2(4)
wd3(4)
wd3(5)
else
wd3(4)
wd3(5)
end if;
if (wd2(6)
wd3(6)
wd3(7)
else
wd3(6)
wd3(7)
end if;
<= wd2(0);
<= wd2(1);
<= wd2(2);
<= wd2(3);
<= wd2(4);
<= wd2(5);
<= wd2(6);
<= wd2(7);
--Stage 4
wd4(0)
wd4(1)
wd4(6)
wd4(7)
<=
<=
<=
<=
wd3(0);
wd3(1);
wd3(6);
wd3(7);
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
else
wd4(2) <= wd3(2);
wd4(4) <= wd3(4);
end if;
if (wd3(3)
wd4(3)
wd4(5)
else
wd4(3)
wd4(5)
end if;
--Stage 5
wd5(0) <= wd4(0);
wd5(7) <= wd4(7);
if (wd4(1)
wd5(1)
wd5(2)
else
wd5(1)
wd5(2)
end if;
if (wd4(3)
wd5(3)
wd5(4)
else
wd5(3)
wd5(4)
end if;
if (wd4(5)
wd5(5)
wd5(6)
else
wd5(5)
wd5(6)
end if;
<= wd4(1);
<= wd4(2);
<= wd4(3);
<= wd4(4);
<= wd4(5);
<= wd4(6);
--Stage 6
wd6(0)
wd6(1)
wd6(6)
wd6(7)
<=
<=
<=
<=
wd5(0);
wd5(1);
wd5(6);
wd5(7);
if (wd5(2)
wd6(2)
wd6(3)
else
wd6(2)
wd6(3)
end if;
if (wd5(4)
wd6(4)
wd6(5)
else
wd6(4)
wd6(5)
end if;
<= wd5(2);
<= wd5(3);
<= wd5(4);
<= wd5(5);
89
90
FPGAImplementationofaContrastEnhancementAlgorithm
if (wo0(1)
wo1(1)
wo1(6)
else
wo1(1)
wo1(6)
end if;
if (wo0(2)
wo1(2)
wo1(5)
else
wo1(2)
wo1(5)
end if;
if (wo0(3)
wo1(3)
wo1(4)
else
wo1(3)
wo1(4)
end if;
<= wo0(1);
<= wo0(6);
<= wo0(2);
<= wo0(5);
<= wo0(3);
<= wo0(4);
--Stage 2
if (wo1(0)
wo2(0)
wo2(3)
else
wo2(0)
wo2(3)
end if;
if (wo1(4)
wo2(4)
wo2(7)
else
wo2(4)
wo2(7)
end if;
if (wo1(1)
wo2(1)
wo2(2)
else
wo2(1)
wo2(2)
end if;
if (wo1(5)
wo2(5)
wo2(6)
else
wo2(5)
wo2(6)
end if;
<= wo1(0);
<= wo1(3);
<= wo1(4);
<= wo1(7);
<= wo1(1);
<= wo1(2);
<= wo1(5);
<= wo1(6);
--Stage 3
if (wo2(0)
wo3(0)
wo3(1)
else
wo3(0)
wo3(1)
end if;
if (wo2(2)
wo3(2)
wo3(3)
else
wo3(2)
wo3(3)
end if;
<= wo2(0);
<= wo2(1);
<= wo2(2);
<= wo2(3);
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
if (wo2(4)
wo3(4)
wo3(5)
else
wo3(4)
wo3(5)
end if;
if (wo2(6)
wo3(6)
wo3(7)
else
wo3(6)
wo3(7)
end if;
<= wo2(4);
<= wo2(5);
<= wo2(6);
<= wo2(7);
--Stage 4
wo4(0)
wo4(1)
wo4(6)
wo4(7)
<=
<=
<=
<=
wo3(0);
wo3(1);
wo3(6);
wo3(7);
if (wo3(2)
wo4(2)
wo4(4)
else
wo4(2)
wo4(4)
end if;
if (wo3(3)
wo4(3)
wo4(5)
else
wo4(3)
wo4(5)
end if;
<= wo3(2);
<= wo3(4);
<= wo3(3);
<= wo3(5);
--Stage 5
wo5(0) <= wo4(0);
wo5(7) <= wo4(7);
if (wo4(1)
wo5(1)
wo5(2)
else
wo5(1)
wo5(2)
end if;
if (wo4(3)
wo5(3)
wo5(4)
else
wo5(3)
wo5(4)
end if;
if (wo4(5)
wo5(5)
wo5(6)
else
wo5(5)
wo5(6)
end if;
<= wo4(1);
<= wo4(2);
<= wo4(3);
<= wo4(4);
<= wo4(5);
<= wo4(6);
--Stage 6
wo6(0)
wo6(1)
wo6(6)
wo6(7)
<=
<=
<=
<=
wo5(0);
wo5(1);
wo5(6);
wo5(7);
91
92
FPGAImplementationofaContrastEnhancementAlgorithm
if (wo5(2)
wo6(2)
wo6(3)
else
wo6(2)
wo6(3)
end if;
if (wo5(4)
wo6(4)
wo6(5)
else
wo6(4)
wo6(5)
end if;
<= wo5(2);
<= wo5(3);
<= wo5(4);
<= wo5(5);
then
wd;
wo;
wo;
wd;
if (wf1(1)
wf2(2)
wf2(1)
else
wf2(1)
wf2(2)
end if;
<= wf0(0);
<= wf0(1);
<= wf1(1);
<= wf1(2);
<=
<=
<=
<=
<=
<=
<=
<=
row1(0);
row1(4);
row2(1);
row2(3);
row4(1);
row4(3);
row5(0);
row5(4);
wo0(0)
wo0(1)
wo0(2)
wo0(3)
wo0(4)
wo0(5)
wo0(6)
wo0(7)
<=
<=
<=
<=
<=
<=
<=
<=
row1(2);
row2(2);
row3(0);
row3(1);
row3(3);
row3(4);
row4(2);
row5(2);
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
wo <= resize(((resize(wo6(3),9)+wo6(4))/2),8);
wd <= resize(((resize(wd6(3),9)+wd6(4))/2),8);
end behavior;
93
94
FPGAImplementationofaContrastEnhancementAlgorithm
tiling_int3.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity image_tiling is
port ( romraddr : out std_logic_vector(17 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (7 downto 0); -- RAM data out
clk : in std_logic;
ramwraddr : out std_logic_vector(17 downto 0);
dataout : out std_logic_vector(7 downto 0); -- RAM data in
numx : out unsigned(2 downto 0);
numy : out unsigned(2 downto 0);
rst : in std_logic;
start_cntr : in std_logic;
wren : out std_logic;
active_flag : out std_logic;
end_flag : out std_logic; --marks the end of the distribution
numpixels : in unsigned(18 downto 0); --number of pixels of the histogram
tile_numpixels : out unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
xtile : out unsigned(9 downto 0);
ytile : out unsigned(9 downto 0)
);
end image_tiling;
architecture tiler of image_tiling is
--Llista de senyals
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
--Llista de components?
begin
--Codi codi codi (processos i no processos)
--Process to read all the histogram and calculate the amount of absolute clipping
process(clk)
begin
if (CLK'EVENT AND CLK = '1') then --calculate tile size
if rst = '1' then
x_tile <= (others=>'0');
y_tile <= (others=>'0');
start <= '0';
pre_start <= '0';
else
x_tile <= resize(shift_right((x_size),3)+1,8);
--x_tile <= x_tile1;
y_tile <= resize(shift_right((y_size),3)+1,8);
start <= start_cntr and not(finish);
pre_start <= start;
end if;
end if;
if (CLK'EVENT AND CLK = '1') then
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
95
96
FPGAImplementationofaContrastEnhancementAlgorithm
end if;
--if (y_condition1 = '1') then
--finish <= '1';
--else
--finish <= '0';
--end if;
else
--finish <= '0';
--stop <= '0';
x_pos <= x_pos_calc;
y_pos <= y_pos;
end if;
end if;
if rst='1' then
finish<='0';
else
if romraddr_signal>= (numpixels-1) then
finish<='1';
else
finish<=finish;
end if;
end if;
end process;
x_pos_calc <= x_pos + 1;
y_pos_calc <= y_pos + 1;
x_coord <= x_pos + (num_x-1)*x_tile;
y_coord <= y_pos + num_y*y_tile;
x_condition <= '1' when (x_pos_calc >= x_tile or x_coord >= (x_size-1))
else '0';
y_condition1 <= '1' when (y_coord >= (y_size-1))
else '0';
y_condition2 <= '1' when (y_pos_calc >= y_tile)
else '0';
stop <= '1' when (x_condition = '1' and (y_condition1 = '1' or y_condition2='1'))
else '0';
active_flag <= active;
numx <= resize((num_x-1),3);
numy <= num_y;
tile_numpixels <= numpixels_tile;
romraddr <= std_logic_vector(romraddr_signal);
end_flag <= finish;
xtile <= resize(x_tile,10);
ytile <= resize(y_tile,10);
--debug
--x_tile1
--x_tile2
--x_tile3
--x_tile4
<=
<=
<=
<=
resize(x_tile2,8);
shift_right(x_tile3,2);
shift_left((x_tile4+4),2)/8;
resize(x_size,12);
end tiler;
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
transform_interp17.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity histogram_equalizer is
port ( histo_raddr : out std_logic_vector(7 downto 0) ; -- device data as address for
RAM
histo_in_ul : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_ur : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_ll : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_lr : in std_logic_vector (18 downto 0); -- histogram CDF value
rom_raddr : out std_logic_vector(17 downto 0); --image pixel address
rom_in : std_logic_vector(7 downto 0);--image pixel value
clk : in std_logic;
clhe_wraddr : out std_logic_vector( 17 downto 0); --Address for the transformed pixel
to write
rst : in std_logic;
start_cntr : in std_logic; --Triggers the transformation operations
wren : out std_logic;
clhe_out : out std_logic_vector(7 downto 0); -- Output for transformed pixel value
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
im_width : in unsigned(9 downto 0);
im_height : in unsigned(9 downto 0);
x_size : in unsigned(9 downto 0);--Subimage
y_size : in unsigned(9 downto 0);--Subimage
ul_id : out unsigned(7 downto 0);
ur_id : out unsigned(7 downto 0);
ll_id : out unsigned(7 downto 0);
lr_id : out unsigned(7 downto 0);
numpixels_ul : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ul : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_ur : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ur : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_ll : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ll : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_lr : in unsigned(18 downto 0); --number of pixels of the image
histo_min_lr : in unsigned(18 downto 0) --Lowest CDF value of the histogram
);
end histogram_equalizer;
architecture transformation of histogram_equalizer is
signal ramwraddru1, ramwraddru2, ramwraddru3, ramwraddru4, im_raddr : unsigned(17 downto 0);
signal pixel, pixel_pre, transformed_ul, transformed_ur, transformed_ll, transformed_lr,
ul_id_pre : unsigned(7 downto 0);
signal start_add, start, pre_wren, pre_wren2, pre_wren3, pre_wren4, wren1 : std_logic;
signal x_pos, x_pos_pre1, x_pos_pre2, x_pos_pre3, x_pos_pre4, x_pos_pre5, y_pos, y_pos_pre1,
y_pos_pre2, y_pos_pre3, y_pos_pre4, y_pos_pre5 : signed(13 downto 0);
signal x_ref, x1, x1_pre1, x1_pre2, x2, x2_pre1, x2_pre2, y_ref, y1, y1_pre1, y1_pre2, y2,
y2_pre1, y2_pre2 : signed(13 downto 0);
signal num_x, num_x1, num_x2, num_y, num_y1, num_y2 : unsigned(3 downto 0);
signal numpixels2 : unsigned(18 downto 0);
signal numpixels_ul1, numpixels_ul2, histo_min_ul1, histo_min_ul2, numpixels_ur1,
numpixels_ur2, histo_min_ur1, histo_min_ur2, numpixels_ll1, numpixels_ll2, histo_min_ll1,
histo_min_ll2, numpixels_lr1, numpixels_lr2, histo_min_lr1, histo_min_lr2 : unsigned(18
downto 0);
begin
process(clk)
begin
if (CLK'EVENT AND CLK = '1') then
if rst = '1' then
start_add <='0';
else
start_add <= start;
end if;
------------------------------beginning of reading block-------------------------------------------------------if rst = '1' or start='1' then --Initialization of variables
97
98
FPGAImplementationofaContrastEnhancementAlgorithm
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
else
num_x <= num_x;
end if;
end_flag <= '0';
end if;
if rst='1' then
y_pos_pre1 <=
y_pos_pre2 <=
y_pos_pre3 <=
y_pos_pre4 <=
y_pos_pre5 <=
x_pos_pre1 <=
x_pos_pre2 <=
x_pos_pre3 <=
x_pos_pre4 <=
x_pos_pre5 <=
x1_pre1
x1_pre2
x2_pre1
x2_pre2
y1_pre1
y1_pre2
y2_pre1
y2_pre2
num_x1<=
num_x2<=
num_y1<=
num_y2<=
<=
<=
<=
<=
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
transformed_ul
transformed_ur
transformed_ll
transformed_lr
<=
<=
<=
<=
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
else
y_pos_pre1 <= y_pos;
y_pos_pre2 <= y_pos_pre1;
y_pos_pre3 <= y_pos_pre2;
y_pos_pre4 <= y_pos_pre3;
y_pos_pre5 <= y_pos_pre4;
x_pos_pre1 <= x_pos;
x_pos_pre2 <= x_pos_pre1;
x_pos_pre3 <= x_pos_pre2;
x_pos_pre4 <= x_pos_pre3;
x_pos_pre5<= x_pos_pre4;
x1_pre1
x1_pre2
x2_pre1
x2_pre2
y1_pre1
y1_pre2
y2_pre1
y2_pre2
num_x1<=
num_x2<=
num_y1<=
num_y2<=
<=
<=
<=
<=
<=
<=
<=
<=
x1;
x1_pre1;
x2;
x2_pre1;
y1;
y1_pre1;
y2;
y2_pre1;
num_x;
num_x1;
num_y;
num_y1;
99
100
FPGAImplementationofaContrastEnhancementAlgorithm
end if;
end process;
x1 <= x_ref-signed(x_size)-1;
x2 <= x_ref;
x_ref <=
signed(shift_right(((shift_left((resize(x_size,11)),1)/2)+1),1))+signed(x_size*num_x);
y_ref <=
signed(shift_right(((shift_left((resize(y_size,11)),1)/2)+1),1))+signed(y_size*num_y);
y1 <= y_ref-signed(y_size)-1;
y2 <= y_ref;
ul_id_pre <= num_x+num_y*10;
ul_id <= ul_id_pre;
ur_id <= ul_id_pre + 1;
ll_id <= ul_id_pre + 10;
lr_id <= ul_id_pre + 11;
histo_raddr <= std_logic_vector(pixel);
--Computation of the equalized pixel
rom_raddr <= std_logic_vector(im_raddr);
wren <= wren1;
numpixels2 <= numpixels-1;
end transformation;
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
C. Modelsimsimulations
Globaltimeline:mainentityview
101
102
FPGAImplementationofaContrastEnhancementAlgorithm
Inthisfigurethesignalscorrespondingtothetoplevelentityareshown:theinputparameters,whichare
constant,[Link]
accessaredistinguishable:thesorteronesatthebeginningareaccessesbythebinarymaskgenerationblock,
[Link]
controlledbythefilteringblock.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Globaltimeline:CLAHEblock
103
104
FPGAImplementationofaContrastEnhancementAlgorithm
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
In the first capture the signals are mostly memory accesses/writes. 2 main phases of the CLAHE
computation are visible here: all the computation needed to have ready a transformation function and the
[Link],thecontrolsignalsofthetilecomputationarevisibleas
well:onecanapreciatethestartimpulsesofeachtileandCLAHEsubblock,aswellastheendflagsandthe
signalsthatstorewhichisthetilethatisbeingcomputedatthemoment.
105
106
FPGAImplementationofaContrastEnhancementAlgorithm
Globaltimelineview:binarymaskgeneration
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
[Link]
inputs and outputs of the mask correction block can be seen, both memories and parameters, as well as
recalculatedimagesizesfortheadditionofzeropadding.
107
108
FPGAImplementationofaContrastEnhancementAlgorithm
Globaltimeline:filterblockview
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
Thispart,thefilteringblock,ismoreorlesslikethepreviousonebecauseitconsistsbasicallyinthesame
kind of windowing implementation, with different operations. The difference is that there are more filters,
whicharepipelined,hencethepresenceofmoresignalswithhighactivitythattransporttheimagebeforeand
aftereachstage,[Link],assaidinmain,[Link]
signalthatisred(unknownstatus)duringmostofthetimeistheoutputofthefirstaddressofthememory
that stores the CLAHE image. It is not problematic because when the CLAHE is computed the image is
overwrittenwiththedesiredknownvalueandtheunspecifiedvaluedoesnotimpactanypartofthefiltering
system.
109
110
FPGAImplementationofaContrastEnhancementAlgorithm
7. References
[1]
[2]
[Link],GraphicsGemsIV,SanDiego:AcademicPress,1994.
[3]
B. Nahar, "Contrast enhancement with the noise removal by a discriminative filtering process,"
ConcordiaUniversity,Montreal,2012.
[4]
[Link],[Link],[Link],[Link],[Link],[Link],[Link]
S. M. Pizer, "Contrast Limited Adaptive Histogram Equalization Image Processing to Improve the
DetectionofSimulatedSpiculationsinDenseMammograms,"JournalofDigitalImaging,vol.11,no.
4,pp.193200,1998.
[5]
[6]
R. C. Gonzalez and R. E. Woods, Digital Image Processing, Third ed., Upper Saddle River, New
Jersey:PearsonPrenticeHall,2008.
[7]
[Link],"DetailPreservingRankedOrderBasedFiltersforImageProcessing,"
IEEETransactionsonacoustics,Speech,andSignalProcessing,vol.37,no.1,pp.8398,1989.
[8]
"Virtex
FPGA
ML605
Evaluation
Kit,"
Xilinx,
2013.
[Online].
Available:
[Link]
[9]
[Link],"ComputeahistograminanFPGAwithoneclock ElectronicsDesignNetwork(EDN),"
UBM Tech, 3 February 2011. [Online]. Available: [Link]
design/4363979/ComputeahistograminanFPGAwithoneclock.[AccessedApril2013].
[10]
[11]
N. Ordua Just, Estudi, modelaci en Matlab i sntesi sobre FPGA d'un sistema de detecci de
contornsperaimatgesHDR,UniversitatpolitcnicadeCatalunya,Barcelona,2012.
ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm
[12]
D. E. Knuth, The Art of Computer Programming, Second ed., vol. 3: Sorting and Searching,
Stanford:AddisonWesleyLongman,1998.
[13]
R. Zeno, "A reference of the bestknown sorting networks for up to 16 inputs," 11 May 2002.
[Online]. Available: [Link]
[AccessedMay2013].
[14]
111