0% found this document useful (0 votes)
137 views111 pages

FPGA Contrast Enhancement Algorithm

This document describes a degree final project involving the FPGA implementation of a contrast enhancement algorithm with discriminative filtering. The algorithm includes histogram equalization, low-pass filtering, image binarization, and mask correction. The hardware implementation is designed to meet requirements for speed and resource usage. Test images are processed to evaluate the implementation and demonstrate the algorithm's effectiveness at enhancing image contrast. Conclusions discuss the results of the project and potential areas for future work.

Uploaded by

rallabhandiSK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views111 pages

FPGA Contrast Enhancement Algorithm

This document describes a degree final project involving the FPGA implementation of a contrast enhancement algorithm with discriminative filtering. The algorithm includes histogram equalization, low-pass filtering, image binarization, and mask correction. The hardware implementation is designed to meet requirements for speed and resource usage. Test images are processed to evaluate the implementation and demonstrate the algorithm's effectiveness at enhancing image contrast. Conclusions discuss the results of the project and potential areas for future work.

Uploaded by

rallabhandiSK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DEGREEFINALPROJECT

FPGAImplementationofaContrastEnhancement
AlgorithmwithDiscriminativeFiltering

Degree Programme: Grau en Enginyeria de Sistemes Electrnics


Author: Roger Oliv Muiz
Directors: Chunyan Wang and Jordi Madrenas Boadas
Year: 2013

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Index
Collaborations.........................................................................................................................................................7
Appreciation...........................................................................................................................................................8
Resumdeltreball....................................................................................................................................................9
Resumendelproyecto..........................................................................................................................................10
Abstract................................................................................................................................................................11
1. Introduction..................................................................................................................................................12
1.1 Context.................................................................................................................................................12
1.2 Motivationandobjectives....................................................................................................................13
1.3 Reportstructure...................................................................................................................................14
2. Backgroundinthecontrastenhancementalgorithm...................................................................................15
2.1 Generaloverview.................................................................................................................................16
2.2 Blockdescriptions.................................................................................................................................17
Histogramequalization...............................................................................................................................17
Lowpassfiltering........................................................................................................................................18
Classification:binarizationofimages..........................................................................................................20
Maskcorrection..........................................................................................................................................21
3. Hardwaredescription....................................................................................................................................22
3.1 Requirementsandspecifications.........................................................................................................22
3.2 Systemstructure..................................................................................................................................23
3.3 Detailedblockstructure.......................................................................................................................26
CLAHEblock.................................................................................................................................................26
Binarymasksblock......................................................................................................................................31
Filteringblock..............................................................................................................................................32
4. Implementationresults.................................................................................................................................35
4.1 Testpictures.........................................................................................................................................35
Picture1......................................................................................................................................................35
Picture2......................................................................................................................................................37
Picture3......................................................................................................................................................38

FPGAImplementationofaContrastEnhancementAlgorithm

4.2 Summary..............................................................................................................................................39
5. Conclusions...................................................................................................................................................40
5.1 Projectresults.......................................................................................................................................40
5.2 Futurework..........................................................................................................................................41
6. Annexes.........................................................................................................................................................42
A.

Matlabcodes....................................................................................................................................42

AlgorithmMatlabImplementation(Author:BadrunNahar)......................................................................42
[Link](suitableforrecordingintoROMwithXilinxCoregen)..............45
ScripttoreadandshowimagefromRAMdump(.memModelsimfile)....................................................46
B.

ProjectVHDLcode............................................................................................................................47

Binary_correction_int.vhd..........................................................................................................................47
binary_generator_int2.vhd.........................................................................................................................51
clahe_complete4.vhd..................................................................................................................................54
clhe_clipping_int4.vhd................................................................................................................................62
clipping_wrapper_int2.vhd.........................................................................................................................65
filter_system_int2.vhd................................................................................................................................67
[Link]..................................................................................................................................................72
histogram_int3.vhd.....................................................................................................................................76
histogram_wrapper_int2.vhd.....................................................................................................................78
[Link]...................................................................................................................................................80
median_filter2.vhd......................................................................................................................................84
tiling_int3.vhd.............................................................................................................................................94
transform_interp17.vhd..............................................................................................................................97
C.

Modelsimsimulations....................................................................................................................101

Globaltimeline:mainentityview.............................................................................................................101
Globaltimeline:CLAHEblock....................................................................................................................103
Globaltimelineview:binarymaskgeneration..........................................................................................106
Globaltimeline:filterblockview.............................................................................................................108
7. References...................................................................................................................................................110

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Indexoffigures
[Link]...........................................................14
[Link].[3]............................................................................................................16
[Link],greenzonesare
justlinearlyinterpolatedandredzonesareleftuntouched...............................................................18
[Link],theexcessisdistributeduniformlyacrossthehistogram................18
[Link][3].........................................................................................................19
[Link]:centerpixel,WDandWO[3]............19
[Link]
theboundariesoftheclassificationregion[3]....................................................................................20
Figure 8. Group one pattern examples. The rest of the patterns can be obtained shifting or rotating
them.....................................................................................................................................................21
Figure 9. Group two pattern examples. The rest of the patterns can be obtained shifting or rotating
them.....................................................................................................................................................21
Figure10.Group3patterns......................................................................................................................................21
[Link]
pipelinedorsetuptoruninparallel...................................................................................................24
[Link](thetoplevelentity).....................................................................25
[Link]...............................................................................27
Figure14.Simplifieddiagramofthehistogram_wrapperentitystructure..............................................................29
[Link]............................................................................29
Figure16.Representationofthe100histogramRAMsarrangedaccordingtotheirspatialpositioninthe
image. Tiles with the same indexes represent duplicated/quadruplicated tiles (light/dark
blueandredrespectively).Wheninterpolating,intheplaceswherethetileisduplicatedin
onedirection,therewillnotbeavisibleinterpolation.......................................................................31
[Link][10].............................................................32
Figure18.Group4patterns.......................................................................................................................................32
[Link]
inputs and outputs are zero padded. Bear in mind that the filter_testbench entity has
somelogicnotrepresentedinthisdiagram........................................................................................33
[Link]=8andn=[Link],because8isnotan
oddnumber,themedianinthatcaseistheaverageofthe2centralvalues.....................................34
Figure21.Originalpicture1anditshistogram.........................................................................................................35
[Link];atrightusingtheMatlabscript.............35
Figure 23. Picture 1 final selectively filtered result. At left, using the hardware design; at right using
Matlab..................................................................................................................................................36

FPGAImplementationofaContrastEnhancementAlgorithm

Figure 24. At left, histogram of the hardware output for picture 1; at right, histogram of the Matlab
script'soutputobtainedwiththesameimage....................................................................................36
Figure25.Originalpicture2anditshistogram.........................................................................................................37
[Link];atrightusingtheMatlabscript.............37
Figure 27. Picture 2 final selectively filtered result. At left, using the hardware design; at right using
Matlab..................................................................................................................................................37
Figure 28. At left, histogram of the hardware output for picture 2; at right, histogram of the Matlab
script'soutputobtainedwiththesameimage....................................................................................38
Figure29.Originalpicture3anditshistogram.........................................................................................................38
[Link];atrightusingtheMatlabscript.............38
Figure 31. Picture 3 final selectively filtered result. At left, using the hardware design; at right using
Matlab..................................................................................................................................................39
Figure 32. At left, histogram of the hardware output for picture 3; at right, histogram of the Matlab
script'soutputobtainedwiththesameimage....................................................................................39

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Collaborations

FPGAImplementationofaContrastEnhancementAlgorithm

Appreciation
[Link]
[Link]
give me last impulse to jump in and live this experience. Without them, this project would probably not exist.
Also, thanks to Ted Obuchowicz for his valuable help in VHDL and Badrun Nahar to orient me in order to
[Link],butnotleast,IwanttomentionmyparentsAlbertandPilar,myfamilyingeneraland
newandoldfriends,whohelped,sufferedmeandshowedmetheirsuportwhenIreallyneededit.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Resumdeltreball
Aquest projecte s la continuaci duna recerca enfocada a millorar els resultats aportats per tcniques
populars de millora del contrast dimatges. Hi ha instantnies que sn preses sota condicions molt pobres
dadquisici, com ara escenes amb un rang dinmic molt gran o imatges mdiques que requereixen aquestes
[Link]
aquestsalgorismes(comaralequalitzaciadaptativadhistogramaambcontrastlimitatoCLAHEperlessiglesen
angls)podenrevelarnonomseldetalldelaimatge,sintambelsorollqueshioculta,fentdifcildistingirel
quesrellevantdelquesinformaciinventadapelsensor.
Com a part de lactivitat de recerca en processat dimatge de Concordia University, un algorisme capa de
milloraraquestsresultatssotacertescondicionsvaserdesenvolupatperunaestudiantcomatesifinaldemster.
Aquest algorisme genera mscares binries que intenten detectar el soroll de la imatge a partir de loriginal.
Desprs, una versi de la imatge amb contrast millorat amb CLAHE s filtrada pasbaix noms en els pxels
detectats com a candidats a tenir soroll. La feina exposada aqu esta basada en aquella tesi i estudia el
comportament,[Link]
escollitvaserVHDL.
Per tal de ferho, es va escollir una metodologia de disseny topdown bottomup. El primer pas va ser la
documentaci i procs daprenentatge per entendre lalgorisme, la seva implementaci inicial en Matlab i els
conceptes de processat dimatge que hi ha al darrere (filtrat pasbaix, equalitzaci dhistograma, classificaci,
etc.).
Desprsdaix,iseguintlaproximacidedissenyesmentada,elsistemavaserdividitenpartsalladesquevan
[Link]
[Link]
dissenyhaestatdestinataunaplacaFPGAXilinxVirtex6.
Els resultats han donat una imatge amb una millora de contrast molt similar a laportada pel codi Matlab
[Link],seguramentacausadecertsproblemesdedisseny,lequalitzacidelaimatgeproporciona
un resultat una mica ms fosc que lesperat, i sense utilitzar els nivells de gris ms prxims al blanc. El filtrat,
daltrabanda,semblafuncionarcomsespera,ielresultatglobalfinalsdifcildedistingirdeloriginalsenseuna
[Link],elstempsdeprocstericsambeldissenyhardwareestanmoltperdavantdel
codioriginal,ipotserunaalternativaviableperprocessatdevdeoentempsreal.

10

FPGAImplementationofaContrastEnhancementAlgorithm

Resumendelproyecto
Este proyecto es la continuacin de una investigacin enfocada a mejorar los resultados aportados por
tcnicas populares de mejora del contraste de imgenes. Hay instantneas que son tomadas bajo condiciones
muypobresdeadquisicin,comoporejemploescenasconunrangodinmicomuygrandeoimgenesmdicas
querequierenestastcnicaspararevelarciertosdetallesquedeotramanerarestaranescondidosalojohumano.
Elproblemaesqueestosalgoritmos(comoporejemplolaecualizacinadaptativadehistogramaconcontraste
limitadooCLAHEporlassiglaseningls)puedenrevelarnosloeldetalledelaimagen,sinotambinelruidoque
se oculta, haciendo difcil distinguir el que es relevante del que es informacin inventada por el sensor.
ComopartedelaactividaddeinvestigacinenprocesadodeimagendeConcordiaUniversity,unalgoritmocapaz
de mejorar estos resultados bajo ciertas condiciones fue desarrollado por una estudiante como tesis final de
[Link].
Despus, una versin de la imagen con contraste mejorado con CLAHE es filtrada pasobajo slo en los pxeles
detectados como candidatos a tener ruido. El trabajo expuesto aqu est basado en aquella tesis y estudia el
comportamiento, rendimiento y posibilidades del algoritmo como implementacin en FPGA. El lenguaje de
descripcinescogidofueVHDL.
Para hacerlo, se escogi una metodologa de diseo topdown bottomup. El primer paso fue la
documentacinyprocesodeaprendizajeparaentenderelalgoritmo,suimplementacininicialenMatlabylos
conceptosdeprocesadodeimagenquehaydetrs(filtradopasobajo,ecualizacindehistograma,clasificacin,
etc.).
Despusdeesto,ysiguiendolaaproximacindediseomencionada,elsistemafuedivididoenpartesaisladas
[Link]
demsaltonivelutilizandoaquelloscomponentesyfinalmentelaentidaddemsaltonivelfueconstruidapara
juntarlotodo.EldiseohasidodestinadoaunaplacaFPGAXilinxVirtex6.
Los resultados han dado una imagen con una mejora de contraste muy similar a la aportada por el cdigo
[Link],seguramentedebidoaciertosproblemasdediseo,laecualizacindelaimagen
proporciona un resultado algo ms oscuro que el esperado, y sin utilizar los niveles de gris ms prximos al
[Link],porotrolado,parecefuncionarcomoseespera,yelresultadoglobalfinalesdifcildedistinguir
[Link],lostiemposdeprocesotericosconeldiseohardware
estn mucho por delante del cdigo original, y puede ser una alternativa viable para procesado de vdeo en
tiemporeal.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Abstract
Thisprojectisthecontinuationofaresearchworkfocusedonimprovingthecurrentpopulartechniquesfor
[Link],suchashigh
dynamicrangeimagesormedicalimagesthatrequirethosetechniquesinordertorevealdetailsthatotherwise
[Link],thosecontrastenhancementalgorithms(likeContrastLimited
AdaptiveHistogramEqualizationCLAHE)canrevealnotjustthedetailoftheimagebutalsothenoisehiddenin
it,makingithardtodistinguishbetweentherelevantinformationandtheinventeddetail.
As part of the research activity in image processing of Concordia University, an algorithm able to improve
[Link]
[Link],aversionoftheimagewith
[Link]
project is based in that thesis, and it studies the behavior, performance and possibilities of the algorithm as a
hardware(FPGA)[Link].
Todoso,[Link]
learning process to understand the algorithm, its initial Matlab implementation and the image processing
conceptsbehindit(lowpassfiltering,histogramequalization,classification,etc.).
Afterthat,followingthedesignapproach,thesystemwasdividedinisolatedpartsthatwereimplementedand
testedseparately,[Link],higherlevelblocksweredesignedusingthosecomponentsandfinally
thetoplevelentitywasbuiltaswell.ThedesignwastargetedtoaXilinxVirtex6FPGAboard.
The results gave an image with a visually very similar contrast enhancement to the one provided by the
[Link],likelyduetosomedesignflaw(s),theequalizationoftheimageprovidesaresulta
littlebitdarkerthanexpected,[Link],ontheother
hand, seems to work just as expected, and the overall result is hard to distinguish from the original without a
comparison side to side. Also, the theoretical processing times with the hardware design are far ahead of the
originalsoftwarecode,anditcanbeaviablealternativeforrealtimevideoprocessingapplications.

11

12

FPGAImplementationofaContrastEnhancementAlgorithm

1. Introduction
In this chapter the work done in this project and the process through its elaboration are going to be
introduced to the reader, as well as the reasons and motivations from which the project was born. Also, the
reportstructureanditscontentwillbebrieflydetailed.

1.1

Context

Inmodernsociety,[Link],someofthemhard
to imagine few years ago, and have improved our lives in equally unexpected ways: entertainment, medicine,
security, industry, productivity in general... But in order to create, manage, improve and distribute these
multimedia resources, a wide variety of specialised hardware and software components have to interoperate
formingacomplexchainfromthecontentsourcetotheuser'[Link]
angularstonesaroundwhichallthistechnologyisbuilt,[Link]
inthepresentday,anditsimportanceisstillgrowing.
One of the most usual operations in image processing is contrast enhancement. Contrast enhancement
algorithmsarepowerfultooltorevealdetailsonalowcontrastimagehiddeninaverysmallrangeofgrey/colour
levels. There are various ways to enhance the contrast of an image. One of the most popular algorithms is
Histogram Equalization (HE), which has several variants that add some improvements like Adaptive Histogram
Equalization(AHE)orContrastLimitedAdaptiveHistogramEqualization(CLAHE)[1].Howeverthatprocedure,in
anyofitsvariations,alsorevealsnoisehiddeninthepicture,asitcannotdistinguishbetweenitandthepicture
detailbyitself.
Thesekindsofalgorithmsareusuallyimplemented[2]usingstandardprogramminglanguageslikeC,C++,Java
orMatlabtogivejustsomeexamples,[Link]
easierwayandenoughforcertaincases,butthislimitsinaseverewaytheachievableperformanceandefficiency
of the design. This is important since certain image processing operations are computationally very intensive.
Computershaveincreaseddramaticallytheirpower,makingthemsuitableforcertainisolatedoperationsinsmall
[Link],however,canoptimizemuchmore
theperformanceperwattandgetbetterresultswithafractionoftheprocessingpowerthankstoparallelization,
[Link],[Link]
final design and longer design and manufacturing process (especially if it is an ASIC instead of an FPGA) are
[Link],theyarestillapreferable,veryinteresting
choice for powersensitive applications such as embedded systems, specialised devices or even just as a co
processingmoduleabletoassistageneralpurposeprocessorinordertoincreasetheoverallprocessingspeed,
eliminatingbottlenecks.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

1.2

Motivationandobjectives

Giventheboostimageprocessingisreceivingandhowitwilllikelystillbegivenaveryimportantroleinthe
nearfuture,itisaveryactivefieldthatisseeingagreatnumberoftechnicaladvancesthatwereunimaginable
few years ago. To bring this new technology to the masses, new waves of hardware able to keep up with the
advancements and fulfill those visions is necessary, and designing that hardware with the tools available
[Link],[Link]
abletoexplorenewfieldsthatIhadbarelystudiedbefore,basiccoursesaside,andconnectthemtomydegrees
speciality was a very good opportunity to have a new point of view and learn new things, in this case image
processingalgorithms.
Last,butnotleast,thekindofimageenhancementstudiedintheprojectwasanintriguingfieldasittriesto
[Link]
question whose answer I was willing to check by myself, which is: how far can one go while trying to make
something look better without manipulating or distorting the source material to the point to make the
enhancementpointless?
The main objective of this project is the implementation in an FPGA of an advanced contrast enhancement
algorithmwithselectivenoisefiltering,asdescribedinBadrunNaharsMasterThesis(ConcordiaUniversity)[3],
using VHDL as the hardware description language. This is done in order to achieve a high performance while
efficientexecutionofthatalgorithmandevaluateitsviabilityintimesensitiveapplicationssuchasrealtimevideo
processing,[Link]
is targeted to a real board, made entirely with synthesizable code. Also, it is expected to acquire a good
knowledgeinimageprocessingandimplementationofthiskindofalgorithmsinhardware.
In order to accomplish those goals, the plan to face this project consisted of two clearly differentiated long
[Link]
focuswasputmainlyinhistogramequalization,[Link]
[Link]
implement each component of the design and implementing and testing each individual part and the whole
systemwithModelsim,usingatopdownbottomupdesignapproach.

13

14

FPGAImplementationofaContrastEnhancementAlgorithm

[Link].

1.3

Reportstructure

[Link],theintroduction,givesabriefexplanationofthe
contrastenhancementfieldinparticularandimageprocessingingeneral,aswellasotherfactorsthatleadtothe
[Link],itprovidesbasicinformationabouttheobjectivesandthecontentsoftherestofthe
[Link]
tothefirstphaseofalgorithmstudy,thesecondchapterprovidesadescriptionoftheimplementedmathematical
algorithm,detailingitsseparablepartsstepbystepandbrieflytalkingtheimageprocessingconceptsassociated
with them. Chapter 3 is associated with the second phase, the VHDL hardware description. With a structure
similartotheoneinchapter2,thehardwareimplementationofeachblockandseparablecomponentmentioned
[Link],inchapter4,consistsindebatingtheresultsofthe
design with different tests and images. Finally, to close the report, a fifth chapter with the conclusions makes
balanceoftheworkandresultsandgivessomeideasonhowitcouldbeimprovedand/orexpanded.
Asadditionalinformation,3annexeswiththedescriptioncode,MatlabscriptsandModelsimsimulationsare
included.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

2. Backgroundinthecontrastenhancementalgorithm
Good contrast is an essential property in most image processing tasks. However, the conditions in which
[Link],thesensorlimitations,lighting
orthephotographedobjectitselfinfluencethefinalresultinwaysthatarenotalwaysdesirable,leadingtoalack
[Link],contrastenhancementbecomesagoodpreprocessing
toolforawiderangeofimageprocessingcases.
Therearemanydifferentimagecontrastenhancementtechniquesbutoneofthemostpopularishistogram
equalization (HE), and the algorithm implemented in this project is built around it. There are various
versions/variationsincludingthebasichistogramequalization,butalsoimprovedvariantslikeAdaptiveHistogram
Equalization(AHE)orClipLimitedAdaptiveHistogramEqualization(CLAHE)[1].
However, HE and its variants not only increase the contrast of the real detail, but also the imperfections
introducedduringtheacquisitionofthepicture,[Link]
canbetroublesomeinsomecontextssuchaswhentheimage'scontrastisextremelyloworwhentherelevant
datacanbeeasilyconfusedwiththenoise.
Forthisreason,somedesigningeffortsinthatfieldarenowconcentratedonreducingtheapparitionofthat
undesired data. There are mainly two points where the problem can be faced: right before or right after the
[Link],relevantinformationcan
belosttogetherwiththeremovednoise,andthusitcannotbedetectedandenhancedduringtheequalization.
Ontheotherhand,ifthenoisereductiontakesplaceaftertheequalization,itishardertoremovebecausethe
enhancementmakesitmorevisibleandrelevant.
In this project, an algorithm [3] based on CLAHE is evaluated and implemented that, following the trend
indicatedinthepreviousparagraphs,[Link]
algorithm was chosen to work in its FPGA implementation is that it can be interesting to see how well it can
performintermsofspeedandatwhatcost,[Link]
include examples like an image or a video stream of a medical image, like an echography, where it would be
valuable not just as an aesthetical improvement, but also as a way to make diagnose easier for a doctor, who
could adjust the parameters on the fly, see the improved image in real time, etcetera. CLAHE is already being
widelyusedforthiskindofpurposes.[1][4][5]
In this chapter, the original mathematical algorithm and its strategy to face the noise problem will be
describedandthemaintheoreticalconceptsbehinditspartsandblockswillbeintroducedaswell.

15

16

FPGAImplementationofaContrastEnhancementAlgorithm

2.1

Generaloverview

As it has already been said, the enhancement algorithm implemented is based on CLAHE, with some extra
processingtoimprovetheendresults,focusingspecificallyonthenoisereduction.
Inordertopartiallyovercomethenoiseproblems,theroutefollowedbytheimplementedalgorithmhasbeen
to selectively filter key areas more likely to have noise in the enhanced picture, which are detected by the
[Link],accordingtothe
followingscheme:

[Link].[3]
I is the original source lowcontrast input image. The HEbased enhancement block is what contains
specifically the CLAHE algorithm, where the contrast enhancement itself takes place. I is also used to generate
somebinarymasksthatwillindicateinwhichareasoftheenhancedimagetheselectivefilteringmusttakeplace
andwhichpartmustbeleftuntouched.
AftertheCLAHEsteptheprefilteringblockappliesasoftlowpassfiltertoeliminatesomehighfrequency
noise inthe whole enhanced bitmap. A common Gaussian filter is enough for ourneeds, and alsoallows some
[Link]
regions, the idea is to keep filtering low in non homogeneous regions, which will be affected only by this pre
[Link].
Finally, the different layers of selective filtering (LPn blocks) are applied to the image to get the final result.
Dependingonthebinarymasksgeneratedwiththeclassificationofthepixelsontheoriginalimage,itisdecided
[Link]
havedifferentlevelsoffilteringstrength:thepixelsmorelikelytohavenoise(homogeneous)willbeclassifiedas
suchinmoremasksandthushavemorefilteringappliedthanthosethataremorelikelytobemisclassifiedbut
still included in a single filtering step. For these steps, the filter of choice has been a bidirectional multistage
medianfilter,thankstoitscapabilitieswhenitcomestopreservingtheedgesafterthefilteringprocess.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

2.2

Blockdescriptions

Histogramequalization
[Link],itadjuststhegrayscaleof
theimagesothatthehistogramoftheoriginalimageismappedontoauniformhistogramusingatransformation
[Link].
[Link],thefunctionmust
be single valued, monotonically increasing and lying between 0 and 1. For discrete values, if we translate the
equations of the continuous domain from probability density functions and integrals to probabilities and
summations[6]:

Probabilityofoccurrenceofagraylevel:

(2.1)

[Link]
imageandnk [Link]
thefollowingsummation:

(2.2)

Theequalizedimage,then,canbeobtainedbymappingeachpixel'slevelrkwithitscorrespondingnewlevel
sk,whichrepresentsacumulativedistributionfunction(cdf).
Unlike the continuous version, it cannot be demonstrated that it will produce the discrete equivalent of a
[Link],itdoestendtospreadthehistogramoftheinputimageinawayit
usesawiderrangeofthegrayspectrum.

Adaptive histogram equalization and contrast limited adaptive histogram


equalization: improving the original
Adaptivehistogramequalization(AHE)andcontrastlimitedhistogramequalization(CLAHE)aremorecomplex,
improvedversionsofthestandardhistogramequalization.
The standardhistogram equalization algorithm hastheproblem that the contrast enhancement is based on
[Link],somelevelswillbeusedtodepictpartsoftheimageoflow
interest.
Adaptivehistogramequalizationtriestominimizethisproblembyusingadifferenthistogramforeachpixelin
the image, calculated using a window with the intensity values immediately surrounding that pixel called
contextual region. This produces an image in which the objects with different intensity values which lie in

17

18

FPGAImplementationofaContrastEnhancementAlgorithm

[Link],itmustbenotedthatitdoesnotguaranteethat
incasepixelavalueisgreaterthanpixelbvaluethisrelationshipwillbepreservedaftertheequalization.
Moreover,inpracticalterms,computingahistogramforeachpixelisnotviablebecauseofitscomputational
[Link],inmostcasesthisapproachisscrappedandinstead[2],theimageisdividedinalimited
number of tiles, and for each of them a histogram is computed. In order to prevent the apparition of the
boundariesofthetileswhenapplyingthetransformationtothedifferentpixels,bilinearinterpolationisusedto
makethetransitionsinthefinalpicturesmoother.

Figure 3. Bilinear interpolation applied to AHE. Blue


zones are bilinearly interpolated, green zones are just
linearlyinterpolatedandredzonesareleftuntouched.

Theothervariantmentioned,contrastlimitedadaptive
histogramequalization(CLAHE),addsanotherlayertothe
AHEinordertolimittheamountofcontrastenhancement

in areas of the image with low variability. This is done by


clipping the highest bins of the histogram and
redistributingtheclippedexcessacrosstherestofbins.

[Link]
example,theexcessisdistributeduniformly
acrossthehistogram.

CLAHEisusefultolimittheappearanceofcertainnoisecontentinzonesoflowgraylevelvariabilitybylimiting
[Link],thereducedcontrastenhancementincertainzonesofthisalternativecouldhidethe
presenceofsomesignificantdataintheimage.
ThereasonwhythevariantchosenisCLAHEisitsabilitytocontrolthedegreeofenhancement,whichcanbe
usefulasatweakingparameter,whilemaintainingalltheimprovementspresentinAHEregardingbettercontrast
enhancement.[3]

Lowpassfiltering
Low pass filters are useful to eliminate high frequency noise present in an image, as the equalization itself
cannotdiscriminatethenoise.
There are various types of lowpass filters, depending on the main purpose of their application. Some are
[Link],the
mathematicalcomplexityisanotherimportantcharacteristictotakeintoaccount.
In the following lines, the different lowpass filters used in different stages of the noise removal will be
described,andtheirstrengthsandweaknesseswillbediscussedaswell.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Gaussian filtering
Gaussian filters have always been very popular and relevant because of their simplicity: they are easily
specifiedandboththeforwardandinverseFouriertransformsarerealGaussianfunctions.
[Link],italsoblurstheedgesandthe
[Link]()willallowacontroloftheblurrinesslevel
appliedintheprefilteringstage:
,

(2.3)

Thehigherthesigmais,thestrongertheblurringeffectwillbecome.
TocomputeafilteringoperationwithaGaussianfilter,agoodhardwarefriendlyapproachistouseaGaussian
kernelandadiscreteconvolutionoperation:
,

(2.4)

WherexistheinputimageandhtheGaussiankernel,whichforourneedscanbejustthesampledversionof
thecontinuousGaussiankernel,obtainedbysamplingF(x,y).

Median filtering
Medianfiltersarenonlinearfilterswithgoodsignalvariationpreservationqualitieswhilesmoothingthenoise
[Link]
as the important information in the revealed detail is not as likely to be lost compared to alternatives like the
[Link],theprinciplebehindmedianfilteringissortingthepixelsinsideawindow
[Link]
[Link],itis
[Link],variousadvancedversionsofthemedian
filteringhavebeendeveloped[7].

[Link]
[Link]:centerpixel,WDandWO[3].

[Link][3].

19

20

FPGAImplementationofaContrastEnhancementAlgorithm

Inourcase,[Link],thebidirectionalmultistage
medianfilter(BMM)[3].Multistagemedianfiltersuseseveralstagesofmedianfiltersinsteadofasinglemedian
for the entire window. BMM filters operate in two steps: first, they find a median of the diagonal pixels and
anotheroftheorthogonalpixels,[Link],theytakethemedianofthesubsetformedbythe
valuescalculatedinthepreviousstageandthecentralpixel.
Putinpropermathematicalterms:
,

(2.5)

Classification:binarizationofimages
Classificationallowsseparationofanimageintodifferentregions,[Link]
arevariousclassificationmethods,butinthiscontrastenhancementalgorithm,specifically,aclassificationbased
onclusteringisemployed:[Link]
histogram, which indicates the presence of a large amount of similar pixels. Then, the image is divided in two
regions:onecontainingallthevaluesaroundthepeakdelimitedbytheselectedthresholds,whichshouldinclude
[Link],outsidethe
[Link],wecangenerateabinarymask.

[Link]
boundariesoftheclassificationregion[3].
This method is useful in low contrast images because they have important peaks in their histograms, as a
consequenceofthislackofcontrast,sobigareasofhomogeneouspixelscanbedefinedwiththisclassification
method.

(2.6)

However,[Link]
topartiallysolvethat,aswewillseeinthenextsubchapter.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

21

Maskcorrection
The correction applied to the generated binary masks consists in checking the similarity of the central pixel
with its neighbors. If a certain pattern likely to be a misclassificationis detected, the pixels in the mask can be
corrected(theirvaluecanbechanged).
Using 3x3 windows, the patterns considered as indicators of nonhomogeneous pixels misclassified as
homogeneouspixelshavebeendividedintodifferentcategoriesaccordingtotheirdetectionmethod[3]:
Group1:[Link]
passfiltering(thismeans,correctedinallthemasks).

[Link].
Group 2: they can also be seen as onepixel wide, but have more singular forms and more variations than
group 1. Compared to group 1, they are not as likely to be misclassified, and probably are located near non
[Link],theywillnotbeincludedinthemasksorientedtostrongfiltering.

[Link].
Group3:onepixelwidepatternsdrawingacross,asshowninthefigurebelow.

Figure10.Group3patterns.
Whenanyofthesepatternsortheirshifted/rotatedvariantsisdetected,thevalueofthecentralpixelshould
[Link],thecorrectionwillbeappliedtoall
[Link],justthemaskforbroaderfilteringwillhavethechangeapplied.

22

FPGAImplementationofaContrastEnhancementAlgorithm

3. Hardwaredescription
Now that thedifferent parts of the original algorithm have been exposed, it is time to see how it has been
portedtoaVHDLdescription.
Programmingalgorithmsinageneralpurposeorembeddeddeviceusingstandardprogramminglanguageshas
proventobeagoodenoughsolutionforquick,[Link],thereisaseriousamount
[Link],theyare
[Link],becausethealgorithmislikelyimplementedusingahighlevelprogramming
languagetospeedupdevelopment,thetranslationprocesstotheexecutablebinarywilladdmoreoverheadas
[Link],thepresenceofotherlayerssuchasanoperatingsystemcanmakethingsevenmoreredundant.
Dedicatedhardwareimplementationshaveamuchlowerdegreeofflexibility,butrequirelesspowertorun
[Link],itprovidespossibilities
relatedtoparallel,customizeddesignthatarenotpossiblewithatraditionalprogramminglanguageexecutedon
topofaprocessor,whichhelpsincreasingthealgorithmexecutionspeedevenmore.
Inordertoimplementthealgorithmdetailedinchapter2,atopdownandbottomupstrategywaschosento
[Link]
that can work autonomously. Then, inside each big block, all the smallest separable parts were identified and
studied in order to find a good way to translate them to hardware with the available resources, which were
studied as well. Each identified part was implemented and tested separately with a test bench and next, the
tested parts were used to assemble bigger VHDL entities and recreate the big blocks. Each block was tested
separatelyusingModelsimandfinally,thetoplevelentitywasdesignedinordertoconnectthebigblocks.
In this chapter it will be described how the whole algorithm has been redesigned in VHDL code, taking
advantageofthepossibilitiesitgivestodefinethelevelofconcurrency:parallelsegments,sequentialparts,etc.
Thefirststepwillbedefiningtherequirementsandspecificationsoftheimplementation,andnextproceedwith
[Link]
thecases,thereasonsbehindeachdesigndecisionwillbeaddressedaswell.

3.1

Requirementsandspecifications

Before going into depth about how the design has been made, it is important to have an idea of what the
differentrequirementsandspecificationsofthedesignare,andalsohowtheyhavebeentargetedinthedesign.
Themainrequirementsofthedesigninclude:

Theabilitytoprocessgrayscale(8bit)imageswitharbitraryprecision/[Link]
theimageisspecifiedthankstodifferentinputsthatincludewidth,lengthandnumberofpixels.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Tweaking options or easy modification of the main parameters of the design. At the end, those
include:
o

CLAHEcliplimit,asapercentage.

Definition of 2 gray zones (by entering an upper and lower limit for each one) for the
generationofbinarymasks,astheywillspecifythethresholdsoftheclassificationprocess.

EasetomodifytheGaussianprefilterVHDLcodetochangeitsstrength.

Performancehasbeenapriorityoverareaandpowerconsumption.

AftersometestswiththeMatlabcode,[Link]
benefitofmakingitbiggerthanthatdoesnotseemtobeworththeextraresources.

ThedesignhasbeentargetedtoaVirtex6basedboard:specifically,theML605[8]developmentboard(listed
witha$1795priceattimeofwritingthisreport).Thespecificationsthataremorerelevanttothisprojectinclude
600MHzmaximumclockfrequency,14976KbitofblockRAMdistributedacrosstheboardand768DSPslicesto
[Link],ifneeded,ithas512MBofregularDDR3RAM.ThereasontochooseML605
boardisthat,giventhelackofmemoryorareaconstraintsforthisfirstdesign,itdidprovideaverycomfortable
environmentthatdoesnotsetverystrictphysicallimits [Link],itiseasiertofocusinjust
tryingtogetthemaximumperformancebytryingtoparallelizeasmuchaspossible,whichcomesattheexpense
ofmoreareaandmemoryslices.
Regardinglibraries,theIEEEstandardlibrarynumeric_stdwillhandlealltheneededarithmeticvariabletypes
(likesignedandunsigned)andoperations(addition,subtraction,multiplication,division,comparisons).However,
some division operations cause problems in the synthesis step using Precision RTL and hence, the design in its
currentstatecannotbesynthesizedyetandwouldneedsomemodificationstobuildproperly.

3.2

Systemstructure

InordertodesigntheFPGAimplementation,thefirststepisdefiningwhatpartsofthealgorithmhavetobe
[Link]
lookatthealgorithmdiagraminchapter2[Figure2]wecanclearlydivideitin3bigblocks,representedinFigure
11:CLAHEcomputation(1),masksgeneration(2)andnoisefiltering(3).
The CLAHE and masks blocks are clearly independent asbothdependonly on the source image tooperate.
Consequently,bothblockscanberuninparallel,concurrently,duringwhatinFigure11isidentifiedassequence
[Link],block3needsboththeenhancedCLAHEimageandthebinarymaskstoselectthezonesthatneedto
befiltered,soitmustnotbecomeactiveuntilblocks1and2finishtheirwork,duringsequence2.
Anotheraspecttobetakenintoaccountisthelatencyaddedbyeachblockandhowthisaffectstheexecution
of the other steps. The main bottleneck in this regard is the histogram equalization step. This is caused by the
needofcomputingthehistogramsforalltheimagepixelsbeforeapplyingthetransformationtothem,whichis
[Link],aswellasthedifferentfilterandmask

23

24

FPGAImplementationofaContrastEnhancementAlgorithm

generationsteps,[Link],itisnegligiblecomparedtothepart
causedbythehistogramgeneration.

Tile
generation
(1.1)

Histogram
computing
(1.2)

Histogram
clipping&
CDF
generation
(1.3)

Equalization
&
interpolation
oftheimage
(1.4)

Computationoftheequalizationfunctionforeachtile(64
iterations)

Gaussian
prefiltering
(3.1)

CLAHEcomputation(1)

1.2

Image
stream
1.11.2

Binary
classification
(2.1)

1.3

1.4

[Link]=(x_size+2)x2x3+y_sizex2+6+17x2+numpixels

Image
stream
1.4

Window
generationx3
(3.1+3.2+3.3)

Zero 3.1
padding
3

3.2
+
3.3

CLAHE
stream
3

Pipelinedstep
Nonpipelinedstep

Pixel
correction
(2.2)

Generationofthemasks(2)

Discriminative
median
filtering2
(3.3)

Filtering(3)

Aproxcycles=(5x64+7x64+numpixels)+(3x64+(256+2)x2x64)+(5+numpixels)
1.1

Discriminative
median
filtering1
(3.2)

Sequence1/Sequence2

Aproxcycles=(x_size+2)x2+1+9+numpixels+y_sizex2
Window
generation
2.2

2.1 2.2

Image
stream
2

Zero
padding
2

[Link]
pipelinedorsetuptoruninparallel.

Also,thefilteringandmaskgenerationstepsingeneraloutputonepixelperclock,exceptwhenjumpingtothe
next image line, because of the implementation of zero padding. The generation and discard process of those
extra pixels in the boundaries of the image adds some small incremental delays directly tied to the images
[Link],thezeropaddingoperationsareonlynecessaryatthebeginningandattheendof
the whole filtering/mask generation process; it is not necessary to repeat it before and after each filter (pre
filtering,discriminative 1, discriminative 2). For moredetails on thisprocess, read the section correspondingto
thenoisefilteringblock.
The top level entity is called main, and is stored inside the file [Link]. According to all the mentioned
superblocksandalgorithmparts,itsstructureisshowninFigure12.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Start_cn tr
End_flag

Start_cn tr

X_size

10

Y_size

10

Clip_limit

X_size

Ram_wea

Y_size

Ram_addra

Im_width
Pulse_start_input
wea

18

Ram_dina

Clip_limit

Numpixels

numpixels

19

CLAHE
computation

addra
dina

doutb

CLAHERAM

Clahe_ram_doutb

Discriminative
filtering End_flag

addrb

Rom_addra

B1_doutb

End_flag

Input_ram_addr

Rom_douta

addra

B2_doutb

douta

numpixels

Sourceimage
ROM
18

18

addrb

18

doutb

wea
addra
End_flag

Pulse_start_input

Numpixels

binaryRAM

Rom_addrb
Binary1_dina

Im_width

doutb

18

Binary_addra

dina
addrb

Binary_wea
Rom_dou tb
Limit1_t

Limit1_t

Limit1_b

Limit2_t

Limit2_b

Limit1_b
Limit2_t

Binarymasks
generation

wea
addra
Binary2_dina

Limit2_b

doutb

binaryRAM
dina
addrb

[Link](thetoplevelentity).
Asthediagramshows,theinputsare:
1.

Imagenumberofpixels(numpixels):necessarytoknowatwhatpointmustthesystemstopreading
the ROM because it has reached the end of the image. It is also necessary to compute certain
parametersusedinternallybycertainblocks,suchasimagetiling,equalization,etc

2.

Imagewidth(x_size)andheight(y_size):neededincertainstepswhereknowingwhenalineorrow
endsandtheaspectratiooftheimageiscritical,mainlythetilingandequalizationsteps.

3.

Cliplimit(clip_limit):[Link].

4.

Top and bottom limits 1 and 2 (limit1_t, limit1_b, limit2_t, limit2_b): used to manually define the
[Link]
generatesthebinarymasksemployedtodeterminewhatisfilteredandwhatnot.

5.

Startsignaltotriggertheprocess(start_cntr).

Ideally,thereshouldbeanotherinputtostreamtheinputimageandputitintoaRAMinsteadoftheROMof
[Link],duetotimeconstraintsandbecauseitisenoughfortestingpurposes,aROMpreloaded
[Link],RAMandFIFOentitiesinstantiatedinthe
various parts of the design has been generated with the Xilinx Core Generator tool using the faster integrated
blockram(BRAM)insteadofthehighercapacityandslowerDDR3RAM,asthereisenoughBRAMcapacityforthe

25

26

FPGAImplementationofaContrastEnhancementAlgorithm

[Link],theyarereducedtotheendflag(end_flagoutput)thatindicatestheendofthe
[Link],anoutputtostreamtheoutputimagefromthefilteringblockwouldbeavailableifthedesign
wasadaptedtoconnecttoanothercomponent,butitwasnotaddedduetotimeconstraints.
Notethatalltheblocksaresynchronousandcontrolledwiththesameclocksignal,andshareaglobalreset
[Link]/enablesignalineachblocktoactivateitwhentheprevious
[Link].

3.3

Detailedblockstructure

The big blocks of the enhancement system are divided into various components that perform different
sequential tasks, described in their own VHDL files. Each big block has its own internal top level entity that
wrapsallthedifferentsubcomponentsandthesystemsglobaltoplevelentity(main)connectsthemandprovides
accesstotheexternalinputsandsharedmemoryresources.

CLAHEblock
[Link]
[Link]:
a)

Thegenerationoftheaddressesforeachindividualtilewhenreadingthemfromthesourcepicture
anddecisionofwheretheboundariesforeachtiledolay.

b) [Link],eachtileneedsamemorypooltostore
histogramdata.
c)

ReuseofthehistogramcomputationcomponentsandROMinput.

d) Relatedtopointsb)andc),animportantamountofmultiplexingisneededtomanagetheaccessto
storageblocks.
Theclaheentityisinstantiatedasclahe_generatoronthemainentityanddescribedinclahe_complete4.[Link]
[Link]
[Link]:

Switching the image rom I/O between the histogram generation and image transformation blocks
whentheprocessingofthetilesfinishes.

Switching histogram rams array read interface betweenclipping and image transform blocks when
theprocessingofallthetilesfinishes.

Switchinghistogramramsarraywriteinterfacebetweenhistogramcomputingandimagetransform
blockswhenthetileshistogramcomputingfinish.

When both processing the different tiles and generating the CLAHE image, it has to manage also
choosing between the different tile RAM pools depending on the one that is being
computed/accessed.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Theinstancesofotherentitiesfoundintheclaheentityare:

tiler (entity image_tiling): outputs sequentially the tiles in which the image is divided, from left to
rightandtoptobottomofthesourcepicture.

histo_wrapper(entityhistogram_wrapper):containsthehistogramgenerationblock.

histo_clipper (entity clipping_wrapper): clips the histogram bins that exceed the specified limit and
replaces the histogram ram content with the cumulative distribution function (cdf) needed by the
transformationfunction.

equalizer(entityhistogram_equalizer):usingthesourceimageandallthecomputedcdfitgenerates
theCLAHEimagewhilesimultaneouslyapplyingbilinearinterpolation.

tiles(099)(entityhisto_ram2):arraywithalltheneededmemorypoolstostorethehistogramsofthe
different tiles. Despite the image being divided in 8x8 tiles, there is a total amount of 100 tiles to
make the implementation of interpolation easier by duplicating the tiles in the sides and corners.
Consequently,weendupwitha10x10=[Link]
theinterpolationsection.

When the CLAHE execution is triggered, tiler starts accessing the addresses of the top left tile, and outputs
them to the histogram generation block, which will deliver the resulting histogram to the corresponding
[Link],theramdatainputsignalsareswitchedtoconnect
them to the clipper block. When histo_clippers end flag rises, the ram data input signals are switched back to
theiroriginalpositionandthetilerblockisactivatedagaintobeginthenexthistogram.
Histogram
clippingandcdf
generation
histo_clipper

Histogram
generation
histo_wrapper

Tilegenerator
tiler

Image
transformation
equalizer

Switchwhenthereis
achangeoftile
Switchduringthe
cycleofatile

histogram
RAM(0)
tiles(0)

histogram
RAM(1)
tiles(1)

histogram
RAM(99)
tiles(99)

[Link].
Whenallthecdfareready,thetilersandclippersend_flagoutputstriggertheimagetransformation.Inthat
moment,[Link]
[Link],thehistogramRAM

27

28

FPGAImplementationofaContrastEnhancementAlgorithm

pools accessed simultaneously change dynamically depending on what is requested by the transformation
componentaccordingtothecurrentpixel.
Now,seetheinnerstructureofthedifferentblocks.

Tiler
Thebehavioroftheinstancetilerisdescribedinthefiletiling_int3.vhdundertheentitynameimage_tiling.
Its operation principle is simple from an algorithmic point of view. Making use of the board DSP blocks, it
calculates the addresses corresponding to the current tile and outputs the pixel values corresponding to those
addressesrowperrow,[Link]:
Computationofthepositioninthex/yaxisreferencedtotheoriginalimage:
;

(3.1)

Wherexposandyposarethecurrentcoordinatesreferencedtothetopleftcornerofthetileandnumx numythe
[Link].
Withthepreviousresults,itiseasytocomputethememoryaddresscorrespondingtothatpixel:
(3.2)
Wherexsizeisthewidthofthesourceimage.
Theblockalsohasacoupleofsmallcountersthatareonlyresetwhentheglobalcircuitresetisemployed,but
[Link]
andypositioninthegridoftiles(numxandnumy).
Thesizeofthetileiscomputedinrealtimebyanothercountereverytime,[Link]
ontherightandbottomcornersoftheimagecanbesmallerthantherestwhenprocessingcertainimagesizes.

Histo_wrapper
This instance, whose entity (histogram_wrapper) is described in the file histogram_wrapper_int2.vhd,
computes the histogram of any input image with the aid of its internal histogram component (instance
histogram_generator),describedinhistogram_int3.vhdandwhichincludespartofthecomputationfunctionality.
Thiscomponentiscapableofcalculatingeachhistogramintheamountofcyclesittakestoreadastreamed
image,justwithafewcyclesofinitiallatencyatthebeginningofthecomputingprocess.[8]
Everytimeapixelisread,itscorrespondingbininthehistogramstoragememoryisreadandoverwrittenwith
[Link],thebinsaregraduallyincrementedaccordingtotheinputsuntilthe
imagereachesitsendandthereads/writesstop:

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

_
1
(3.3)

Wherehisto_datainisthevalueofhisto_ram[device_data]duringthepreviousclockcycle.

Histo_datain
Ram_raddr

Device_data
Ram_wraddr
Cntr_value

Ram_wren

Histogram_int

Start_cntr

histogram
RAM(x)

Ram_datain

Ram_douta

Component
limit

Figure14.Simplifieddiagramofthehistogram_wrapperentitystructure.

Histo_clipper
As told before, despite its name, the instance histo_clipper does not only clip the histogram, but also
generates the cdf, which is used to apply the transformation later. Its entity (clipping_wrapper) is stored in
clipping_wrapperc_int2.vhd. However,
the real functionality is stored in
another

component

clipping_wrapper:

the

inside
instance

histo_clipper (do not confuse it with


the previous one) of the entity
histogram_clipper,

described

in

clhe_clipping_int4.vhd.

The block operation can be divided

[Link].

in two parts: the two sweeps it


performstoclip(1)andnextgeneratethecdf(2).Tomakeitpossible,multiplexingtoselectbetweenthelogicof
[Link],therehastobeacountertogeneratetheaccessaddressesateachsweep.
During the first sweep, the excess detected in the bins that are too high is accumulated in a register that
increases each cycle with the amount of detected excess. Meanwhile, if the detected excess is 0, the original
[Link],thewrittenvalueisthecliplimit.

_
_

(3.4)

29

30

FPGAImplementationofaContrastEnhancementAlgorithm

Where x[n] is the input histogram bin (number of pixels with that gray value), excess the variable that
graduallyaccumulatesthetotalexcessofpixelsandclip_limitthemaximumtoleratedvalueinthehistogram.
Next,thesecondsweepreadsagainthecontentsofthememorysequentially,butthistimeaccumulatesthe
value read plus a fraction of the excess in another register. Then, the read address is overwritten with the
accumulatorvalue,thusgeneratingthecdf:
1

(3.5)

Wherey[n]isthehistogrambinvalueascalculatedinthepreviousstep,ntheRAMposition(bin,greylevel,
between0and255),excessrepresentsthetotalclippedexcessofpixelsandnumpixelsthenumberofpixelsof
thehistogramsinputimage:inthiscontext,thetilesize.

Equalizer (equalization and interpolation)


In order to compute the equalized and interpolated image, the equalizer instance, from the
histogram_equalizerentityincludedinthefiletransform_interp17.vhd,needstohaveaccesstoallthehistogram
RAMpoolsandthesourceimage.
Inthiscase,theblockreadsthesourceimageRAMsequentially,[Link],apartof
theblockcomputesthepixelpositionrelativetothex/yaxisandextractswhichthe4neighboringtilesare,the
positionoftheircentralpixels,thedistancebetweencentralpixelsandbetweenthemandthecurrentpixeland
also retrieves the corresponding cdf value, minimum cdf value and the tile size. Doing so requires a few clock
cyclesoflatencyinordertocomputeandretrievealltheinvolveddata,butstillmanagestooutputathroughput
of one pixel per clock. With this data, adequate timing and the concepts presented in chapter 2.2.1 regarding
equalizationandinterpolation,theequalizerblockcomputesthefinalvalueofthepixel.
Toreachthefulldynamicrange,[Link]:
,

255

(3.6)

Then,interpolatethedifferentresults:
,

,
,

(3.7)
,

Wherex1, x2, y1, y2arethehorizontalandverticaldistancesbetween[m, n] andthecentralpixelsofeachtile.


Next,itstoresthepixelinthecorrespondingaddressofaRAMpreparedtocontaintheimage,waitingforthe
startofthefilteringprocess.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Theoretically, the borders of the image should be just linearly interpolated (in a single direction) and the
[Link]
differentiate between the bilinear, linear and not interpolated cases. To avoid that problem, duplicated border
tiles were introduced. This way, when interpolating a pixel in one direction that uses the same tile twice, the
resultislikeiftherewasnointerpolationatall,avoidingtheimplementationofspecialcases.
0,0 0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 7,0
0,0 0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 7,0

Figure16. Representationofthe100histogramRAMsarranged
[Link]
indexes represent duplicated/quadruplicated tiles (light/dark blue
andredrespectively).Wheninterpolating,intheplaceswherethe
tile is duplicated in one direction, there will not be a visible
interpolation.

0,1 0,1 1,1 2,1 3,1 4,1 5,1 6,1 7,1 7,1
0,2 0,2 1,2 2,2 3,2 4,2 5,2 6,2 7,2 7,2
0,3 0,3 1,3 2,3 3,3 4,3 5,3 6,3 7,3 7,3
0,4 0,4 1,4 2,4 3,4 4,4 5,4 6,4 7,4 7,4
0,5 0,5 1,5 2,5 3,5 4,5 5,5 6,5 7,5 7,5
0,6 0,6 1,6 2,6 3,6 4,6 5,6 6,6 7,6 7,6
0,7 0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7 7,7
0,7 0,7 1,7 2,7 3,7 4,7 5,7 6,7 7,7 7,7

Binarymasksblock
The binary masks generation is handled by the component binarizer, an instantiation of the entity
mask_generatorcontainedinthefilebinary_generator_int2.[Link]
to sequentially read the image and classifies the pixels with differentcomparators that take as a reference the
values entered externally in the main top level entity. The output is 1 if the pixel falls inside the limits
(homogeneous)or0otherwise(equation2.6).
Thenextstepisapplyingthecorrectionthatwilloutput2differentmasksasaresult,storedin2differentRAM
blocks waiting for the beginning of the filtering process. This is done by corrector_l, contained in
binary_correction_int.vhdasthebinary_correction_lessentity.However,whatthiscomponentdoesnothandleis
theadditionofzeropadding,[Link]
[Link],it
justhastobediscardedwhenreceivingtheoutputstreambeforewritingtotheRAM.

Corrector_l
[Link],itisnecessarytohavea3x3
[Link],thecomponentcanuseitsalgorithms
to detect thepatterns susceptible ofcorrection. Depending on the result, it will give a changed output for one
mask,bothorleavethebinaryvalueunchangedinbothmasks.
To get this 3x3 window, the structure employed [9] is the same that will be seen in the filtering section.
[Link]
sequentiallythefirstrowofregistersand,afterthat,[Link]
firstlineofregisters,[Link]
[Link],the
[Link].

31

32

FPGAImplementationofaContrastEnhancementAlgorithm

[Link][10].
Todetectthedifferentpatternsintroducedinchapter2,thefollowedprocedureis:

Group1:ifazeroisdetectedinthecentralpixelandthesumoftheotherzerosinthewindowis2or
less,[Link],thepixeliscorrectedinbothfinalmasks.

Group2:ifazeroisdetectedinthecentralpixelandthesumoftheotherzerosinthewindowis3,it
is likely to bea group 2pattern, but there are some specific cases known as group4 that must be
discarded.Thedetectionofthesegroup4patternsisdonebycheckingthatthedistributionofthe0
doesnotmatchthem.Becausegroup2patternsarenotaslikelytobemisclassifiedasgroup1,only
[Link],thezeroisleftunchanged.

Figure18.Group4patterns.

Group3:ifanyofthe2exactpatternsinFigure10isdetectedbycheckingthevaluesinallpositions
individually,thecentralpixelischangedinthemaskforbroaderfiltering.

Filteringblock
Finally,[Link]
entityfilter_testbench,whichactsasthelocaltoplevelentityandisdescribedinthefilefilter_system_int2.vhd.
Similarlytothesystememployedinthebinarycorrectionsection,all3filteringblocksemployanequivalent
systemtogetthefilteringwindows,[Link],whichneedstobe2pixels
widefora5x5window,isaddedatthebeginningofthefirstfilteringstageandremovedrightbeforewritingthe
final image to the output RAM after the last filter. It pipelines the 3 filtering steps without any intermediate
buffer,whichhelpsminimizingthelatencyintroducedbythisblockofthesystem.
Theonlyextrastepsinvolvedbetweenfiltersarethediscriminationsbetweenfilteredandnonfilteredpixels.
[Link]
moreefficienttojustcomputethewholefilteredimageandreplacepartsoftheoutputwithunfilteredpixelsthan

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

selectively choosing the pixels that have to be filtered. It would not save time (the current system can already
outputonepixelperclockaftertheinitialdelay)andwouldincreasealotthecomplexityofthedesign.
To do so, the source image stream is not just put inside the first filter, but also copied to a FIFO used as a
[Link]
medianfilteranditswriteenableoutputrises,[Link],
the filter output stream is aligned with the binary mask and the unfiltered image. Then, the decision can take
place:[Link],themultiplexerwill
choosethefilteredbit.
CLAHEimage
out

Gaussianpre
filtering

Binarymask
1out

Discriminative
medianfiltering1

Discriminative
medianfiltering2

Imagebuffering1
(FIFO)

Imagebuffering2
(FIFO)

Filteredimage

Mask1buffering
(FIFO)

Binarymask
2out

Mask2buffering
(FIFO)

[Link]
inputsandoutputsarezeropadded.Bearinmindthatthefilter_testbenchentityhassomelogicnotrepresented
inthisdiagram.
Buthowdothefiltersoperateinternally?

Gaussian filter
As said before, the windowing process for the Gaussian filter is the same used in the binary correction
component, but expanded to a 5x5 window. It is described in the filter_system_int2.vhd file, as the entity
smooth_filter.
Using the elements of the window and a pregenerated Gaussian kernel, it computes the output for the
centralpixelusingadiscreteconvolutionaspresentedinchapter2(equation2.4),whichemploysallthepixelsin
the 5x5 window. The kernel present in the current version of the description was generated using the fspecial
functioninMatlabandastandarddeviation=0.5:
0
0.0028
0.0208
0.0028
0

0.0028 0.0208 0.0028


0
1.1332 8.3731 1.1332 0.0028
8.3731 61.8694 8.3731 0.0208
1.1332 8.3731 1.1332 0.0028
0.0028 0.0208 0.0028
0

(3.8)

The kernel can be changed in the code to compile a new filter with a different blurring. The values are
multipliedby100androundedtooperatewithnaturalnumbers.Theoutputisdividedby100againtogetthe
grayvaluebetween0and255.

33

34

FPGAImplementationofaContrastEnhancementAlgorithm

Median filter
Themedianfilter,implementedastheentitymedian_filterinthefilemedian_filter2.vhd,isthesameforboth
stages of discriminative filtering. The code structure is almost identical to the one of the Gaussian filter, just
changing the computation method of the output pixel and the alignment of the output signals, as the
computationhassomecyclesoflatency.
The cause of this latency is the sorting process of the values in the WD and WO masks, as well as the final
[Link].
However, the sorting of WD and WO involves eight different integers for each mask, and consequently the
algorithmisnottrivial.
Numbersortingis,ingeneral,acomplexproblemthathasbeensubjecttoalotofstudyinorderimprovethe
[Link],thereareveryfewparticularcaseswhereasortingnetworkthatis
[Link]

9sofortunately,there

isanoptimalsolutionforthisimplementation.[11][12]

Cycle1

4comparisons

Cycle2

4comparisons

Cycle3

4comparisons

Cycle4

2comparisons

Cycle5

3comparisons

Cycle6

2comparisons

Total:6
clockcycles

Total:19
comparisons

Cycle1

1comparison

Cycle2

1comparison

Cycle3

1comparison

Total:3
clockcycles

Total:3
comparisons

[Link]=8andn=[Link],because8isnotanodd
number,themedianinthatcaseistheaverageofthe2centralvalues.
Usingthen=8networkforWDandWO,anddelayingthesameamountofcyclesthewriteenablesignalthe
implementationofthefirststageofthemedianfilteriscomplete(seeFigure5).Thesecondstageiseasier,asit
involves just 3 numbers. Three cycles, with one comparison per cycle, are enough to sort the values. The
implementationofequation2.5isfinished.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

4. Implementationresults
This chapter will expose the results of the implementation described in chapter 3. To do so, some sample
imageswillbeshownintheiroriginal,Matlabprocessedandhardwareprocessedforms,aswellastheirresulting
[Link],somekeypointswillbehighlighted.

4.1

Testpictures

[Link]
[Link]
ofpixelsandaspectratiosinordertotestthepursuedabilityofthesystemtodealwitharbitrarysizedpictures.
Also,[Link]
inmindthatthescaleoftheMatlabprocessedimagesisdifferentbecauseofhowitoperateswiththeimages,
givingafinalresultconsistinginrealvaluesbetween0and1insteadofanintegervalueinthe0255range.

Picture1

Figure21.Originalpicture1anditshistogram.

[Link];atrightusingtheMatlabscript.

35

36

FPGAImplementationofaContrastEnhancementAlgorithm

[Link],usingthehardwaredesign;atrightusingMatlab.

[Link],histogramofthehardwareoutputforpicture1;atright,histogramoftheMatlabscript's
outputobtainedwiththesameimage.
[Link]
[Link],[Link],
thestandarddeviationoftheGaussianprefilteris=0.5,thecliplimitissetat3%andthegrayrangesofthepixel
classification are 67

79 and 208

224. The differences between the two processed images are

[Link],whichcanbeeasily
associated with the lack pixels in the brighter bins of its histogram, compared to the Matlab results. It is a
deviationalreadyvisibleontheCLAHEimages,priortothenoiseremovalprocess,soitcanbeconcludedthatthe
[Link]
are used to make the synthesis less complex and/or other nonidentified design bugs. On the other hand, the
hardware implementation is as good as the software one at eliminating noise. Running on top of a multicore
desktopprocessorclockedatmorethan1GHz,[Link]
simulationestimatesthat28.5millisecondsareneededtofinishtheoperationswitha25MHzclock,whichiswell
belowthemaximum600MHzclockoftheboard.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Picture2
input image histogram
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0

50

100

150

200

250

Figure25.Originalpicture2anditshistogram.

[Link];atrightusingtheMatlabscript.

[Link],usingthehardwaredesign;atrightusingMatlab.

37

38

FPGAImplementationofaContrastEnhancementAlgorithm

3500

3500

3000

3000
2500

2500

2000

2000

1500

1500

1000

1000

500

500

0
0

50

100

150

200

250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

[Link],histogramofthehardwareoutputforpicture2;atright,histogramoftheMatlabscript's
outputobtainedwiththesameimage.
Inthesecondexample,[Link]
[Link]
darkerimages,[Link]:theequalizationdoes
not make use of the brighter levels of gray. Still, the output of the design has a good contrast enhancement
compared to the original and the differences in filtering are indistinguishable. The processing times are 28.36
[Link]
issimilartothepreviousimage,whichhasasimilaramountofpixels.

Picture3
input image histogram

3000

2500

2000

1500

1000

500

0
0

50

100

150

200

250

Figure29.Originalpicture3anditshistogram.

[Link];atrightusingtheMatlabscript.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

[Link],usingthehardwaredesign;atrightusingMatlab.

2000
2000

1500
1500

1000

1000

500

500

0
0

50

100

150

200

250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

[Link],histogramofthehardwareoutputforpicture3;atright,histogramoftheMatlabscript's
outputobtainedwiththesameimage.
[Link]=0.5fortheGaussian
prefilter,acliplimitof8%andthegrayrangesincludedintheclassificationtogeneratethemasksare0
20and210

[Link]:visuallyverysimilar,justa

[Link],[Link],theMatlabscriptneeds8.85secondsto
[Link]
inbothcasesareexpectedsincetheamountofpixelstoprocessisnotablylowerthaninthepreviouspictures.

4.2

Summary

Overall,theresultsareverysimilar,[Link]
someminorimperfectionsthatpreventitfrombeingonparwiththesoftwareresultsintermsofoutputquality,
[Link] probablyduetosometruncationsincertainoperations
where rounding should be implemented, or other hidden small implementation mistakes in the CLAHE
components,[Link],theselectivefilteringseemstoworkverywellinallthepictures.
Inefficiencyterms,thehardwareimplementationclearlystandsoutwithprocessingtimesthatarevariousorders
ofmagnitudeshorterandanoutputhardtodistinguishwithoutacomparisonsidetosideandwithaccesstothe
histograms.

39

40

FPGAImplementationofaContrastEnhancementAlgorithm

5. Conclusions
In this last chapter some conclusions and last thoughts about the work will be exposed and possible future
[Link],importanteventsduringdevelopment,decisionsandthefinal
outcomeofthedesignwillbetalkedamongstotherexperiencesandlearnedlessons.

5.1

Projectresults

According to the results exposed inchapter 4, it canbeseen that theresults of theoriginal algorithm have
beenalmostmatched,[Link]
and in line with the results seen in the original implementation, with some minor imperfections. Also, the
selective filtering matches the original implementationpixel per pixel, giving a goodsmooth effectif calibrated
correctlybutalsotheexpectedweakerresultsiftheadjustmentsmakeitshowupinnondesiredplaces.
Itiscapableofprocessingcorrectly(withoutglitches)imageswitharbitraryresolutionsupto512x512pixels
without modifying or recompiling the description and, if necessary, the design can be easily scaled to
[Link]
[Link].
Intermsofefficiencyversusspeed,theimprovementsarealsoremarkable,inlinewhatonewouldexpectwith
the shift from a highlevel preliminary software implementation in Matlab to a specific FPGA implementation.
While the original supplied code needed more than 30 seconds to process a single 512x439 image on top of a
multicoreprocessorclockedatseveralGHz,thehardwaredesigncanpotentiallymodifythesamepictureinless
than30millisecondswithatheoretical25MHzclockspeedwhilegettingnearidenticalvisualresults,whichisa
very welcome improvement. With a faster clock, which is achievable with the target board, the results can be
[Link],thesenumbersdefinitelysituateitasaviabletoolfor
realtimevideoprocessing,[Link],theRAM
usehasbeenquitelow,[Link]
[Link],aswillbe
notedlaterinthischapter.
However,duetotimeconstraints,thedesigncouldnotbetestedinthephysicaltargetboardasitwasinitially
[Link],[Link]
conceptsandrevisingoldones,fullyunderstandingallthedetailsofanadvancedimageprocessingalgorithmand
[Link]
thosewasveryvaluableandwillbeveryusefulinfutureprojects,theyalsotookmuchmoretimethanexpected
(almosttwoofthefouravailablemonthsofwork),[Link],due
to some implementation difficulties found while working on the CLAHE logic (mainly the tile generation and
interpolationsteps),thisschedulerapidlybecametootighttofullyrealizetheinitialplaninjustfourmonths.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

5.2

Futurework

Anyway, the final outcome of the project is overall satisfactory. There is a fully working simulation and the
codecouldpotentiallybesynthesizedandtriedinrealhardwarespendingjustfewweekstweakingthecodefor
theproblematicdivisions(probablywithsuitableIPcores)[Link]
[Link],tomaketheimplementationsuitablefor
integration inmore complex systems, it needs interfacesto acquirethe inputpictureanddeliver the equalized
output.
Another welcome addition would be a straightforward method to change the Gaussian prefilter without
recompiling the component. It would involve making a component able to generate a kernel according to a
certainstandarddeviationandchangingthefilteringblocktoretrievethekernelgeneratedbythenewblock.
Additionsaside,[Link]
thatcanbechangedtogetbetterperformance,decreasetheamountofareausedorimprovetheoutputimage.
First,asnotedinchapter4,theoutputimagesareabitdarkerinthehardwareimplementationthantheMatlab
code. Analyzing the histograms, it can be easily seen that it is because the histogram equalization does not
relocate pixels in the highest gray levels, the ones closest to white, whereas the Matlab implementation does
distribute them better. This is probably related to certain truncations during the CLAHE step, mainly in the
equalization and interpolation block. Rounding was not initially implemented to simplify bothcode and logic in
those sections of the system. There might be other factors related to potential differences between the
mathematicalalgorithmsusedbyMatlabandthehardwaredesignaswell.
AnotherpointintheCLAHEblockthatcanbeimprovedisthestorageandaccessoftheblockRAMsthatstore
the histogram data of the different tiles. With some work, the duplicated tile RAMs could be scrapped and
[Link],the
totalRAMusageofthesystemcanbelowered.
Also,inordertoimprovethelatencyandreduceevenmoretheamountofusedRAM,thememorythatstores
theCLAHEprocessedimagecouldbereplacedbyasmallFIFOthatcouldactasabufferandbeginthefiltering
processwhenthefirstCLAHEprocessedpixelsappear(pipelining).Thesameispossiblewiththebinarymasks,but
[Link],notjusttheRAMusagewould
be lower but also the latency would be considerably reduced and the area corresponding to certain address
counterswouldbecuttoo,[Link]
way because the address counters and the RAMs were inherited from the initial test benches for the isolated
componentsoftheimplementation.
[Link],
depending on the final application, it might be desirable to modify other parts of the algorithm in order to
prioritizeeitherareaorperformanceinplaceswheretheycanconflict.

41

42

FPGAImplementationofaContrastEnhancementAlgorithm

6. Annexes
A. Matlabcodes
AlgorithmMatlabImplementation(Author:BadrunNahar)
tic;
input_image= imread('C:\Users\Roger\Documents\UPC\TFG\imatges\microscopic_merge2.jpg');
%input_image= imread('/home/roger/Documents/UPC/TFG/imatges/[Link]');
A1=rgb2gray(input_image);

cliplimit_cla=0.08;
grid_size=[8 8];
figure, imshow(A1),title('input image');
figure, imhist(A1),title('input image histogram');
% CLAHE Enhanced Image
enhanced_A=adapthisteq(A1,'ClipLimit',cliplimit_cla,'NumTiles',grid_size);
figure,imshow(enhanced_A),title('After CLAHE');
enhanced_A1=double(enhanced_A)/255;
filt_gaussian=fspecial('gaussian', [5 5], 0.5);
enhanced_A12=imfilter(enhanced_A1,filt_gaussian,'conv','replicate');

figure, imshow(enhanced_A12),title('pre-filtered output');


%size adjustment with zero padding
[m n]=size(A1);
x1=zeros(m+4,n+4);
x1(3:m+2,3:n+2)=enhanced_A12(:,:);
x1(1,3:n+2)=enhanced_A12(1,1:n);
x1(2,3:n+2)=enhanced_A12(1,1:n);
x1(m+3,3:n+2)=enhanced_A12(m,1:n);
x1(m+4,3:n+2)=enhanced_A12(m,1:n);
x1(3:m+2,1)=enhanced_A12(1:m,1);
x1(3:m+2,2)=enhanced_A12(1:m,1);
x1(3:m+2,n+3)=enhanced_A12(1:m,n);
x1(3:m+2,n+4)=enhanced_A12(1:m,n);
x1(1,1)=enhanced_A12(1,1);
x1(1,2)=enhanced_A12(1,1);
x1(2,1)=enhanced_A12(1,1);
x1(2,2)=enhanced_A12(1,1);
x1(1,n+3)=enhanced_A12(1,n);
x1(1,n+4)=enhanced_A12(1,n);
x1(2,n+3)=enhanced_A12(1,n);
x1(2,n+4)=enhanced_A12(1,n);
x1(m+3,1)=enhanced_A12(m,1);
x1(m+3,2)=enhanced_A12(m,1);
x1(m+4,1)=enhanced_A12(m,1);
x1(m+4,2)=enhanced_A12(m,1);
x1(m+3,n+3)=enhanced_A12(m,n);
x1(m+3,n+4)=enhanced_A12(m,n);
x1(m+4,n+3)=enhanced_A12(m,1);
x1(m+4,n+4)=enhanced_A12(m,1);

% Clustering original image


d=zeros(size(A1));

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

for i=1:m*n
if (A1(i)>=0 && A1(i)<=20)||(A1(i)>=210 && A1(i)<=250)
d(i)=1;
else d(i)=0;
end
end
figure,imshow(d),title('gray level thresholding');
%Region Correction for low-pass2
count1=8*ones(m+4,n+4);
count2=8*ones(m+4,n+4);
d1=zeros(m+4,n+4);
d1(3:m+2,3:n+2)=d(:,:);
d2=d1;
d3=d1;
for i=3:m+2
for j=3:n+2
if d1(i,j)==0
count1(i,j)=count1(i,j)-(d1(i-1,j-1)+d1(i-1,j)+d1(i-1,j+1)+d1(i,j1)+d1(i,j+1)+d1(i+1,j-1)+d1(i+1,j)+d1(i+1,j+1));
if count1(i,j)<=1
d2(i,j)=1;
%elseif ((d1(i-1,j-1)==d1(i-1,j+1)==d1(i+1,j+1)==d1(i+1,j-1)) && (d1(i1,j)==d1(i,j+1)==d1(i+1,j)==d1(i,j-1)))&& (d1(i-1,j-1)~=d1(i-1,j))
% d2(i,j)=1;
end
end
end
end
%figure,imshow(d2(3:m+2,3:n+2)),title('Group-1 & Group-3 corrected for low-pass2');

%%% Region correction for low-pass1


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Group-1,group-3 & Group-2 with some preservation correction

for i=3:m+2
for j=3:n+2
if d1(i,j)==0
count2(i,j)=count2(i,j)-(d1(i-1,j-1)+d1(i-1,j)+d1(i-1,j+1)+d1(i,j1)+d1(i,j+1)+d1(i+1,j-1)+d1(i+1,j)+d1(i+1,j+1));
if count2(i,j)<=3
%if ((d1(i+1,j)==d1(i+1,j+1)==d1(i,j+1)==0)||(d1(i,j+1)==d1(i-1,j+1)==d1(i1,j)==0)...
%
||(d1(i,j-1)==d1(i-1,j-1)==d1(i-1,j)==0)||(d1(i,j-1)==d1(i+1,j1)==d1(i+1,j)==0))
%d3(i,j)=0;
%else d3(i,j)=1;
%end
%elseif count2(i,j)<=2
d3(i,j)=1;
elseif ((d1(i-1,j-1)==d1(i-1,j+1)==d1(i+1,j+1)==d1(i+1,j-1)) && (d1(i1,j)==d1(i,j+1)==d1(i+1,j)==d1(i,j-1)))&& (d1(i-1,j-1)~=d1(i-1,j))
d3(i,j)=1;
end
end
end
end
%figure,imshow(d3(3:m+2,3:n+2)),title('Group-1,Group-3 & Group-2 with some preservation
corrected: R. C. for low-pass1');

% Discriminative filtering

filtered_A123=zeros(m+4,n+4);

43

44

FPGAImplementationofaContrastEnhancementAlgorithm

%low-pass 1, binary mask d3(3:m+2,3:n+2)


for i=3:m+2
for j=3:n+2
if d3(i,j)==1
hor_ver_data= [x1(i,j-2) x1(i,j-1) x1(i,j+1) x1(i,j+2) x1(i-2,j) x1(i-1,j)
x1(i+1,j) x1(i+2,j) x1(i,j)]; Mr=median(hor_ver_data);
diag_data = [x1(i-2, j-2) x1(i-1, j-1) x1(i+1,j+1) x1(i+2,j+2) x1(i+1,j-1)
x1(i+2,j-2) x1(i-1,j+1) x1(i-2,j+2) x1(i,j)]; Md=median(diag_data);
vect1=[Mr Md x1(i,j)];
filtered_A123(i,j)= median(vect1);
else filtered_A123(i,j)=x1(i,j);
end
end
end

filtered_A1234=zeros(m+4,n+4);
%low-pass 2, binary mask d2(3:m+2,3:n+2)
for i=3:m+2
for j=3:n+2
if d2(i,j)==1
hor_ver_data1= [filtered_A123(i,j-2) filtered_A123(i,j-1) filtered_A123(i,j+1)
filtered_A123(i,j+2) filtered_A123(i-2,j) filtered_A123(i-1,j) filtered_A123(i+1,j)
filtered_A123(i+2,j) ]; Mr1=median(hor_ver_data1);
diag_data1 = [filtered_A123(i-2, j-2) filtered_A123(i-1, j-1)
filtered_A123(i+1,j+1) filtered_A123(i+2, j+2) filtered_A123(i+1,j-1) filtered_A123(i+2, j-2)
filtered_A123(i-1,j+1) filtered_A123(i-2, j+2) ]; Md1=median(diag_data1);
vect2=[Mr1 Md1 filtered_A123(i,j)];
filtered_A1234(i,j)= median(vect2);
else filtered_A1234(i,j)= filtered_A123(i,j);
end
end
end

filtered_A12345=zeros(m+4,n+4);
%low-pass 3, binary mask d2(3:m+2,3:n+2)
for i=3:m+2
for j=3:n+2
if d2(i,j)==1
hor_ver_data2= [filtered_A1234(i,j-2) filtered_A1234(i,j-1) filtered_A1234(i,j+1)
filtered_A1234(i,j+2) filtered_A1234(i-2,j) filtered_A1234(i-1,j) filtered_A1234(i+1,j)
filtered_A1234(i+2,j)]; Mr2=median(hor_ver_data2);
diag_data2 = [filtered_A1234(i-2, j-2) filtered_A1234(i-1, j-1)
filtered_A1234(i+1,j+1) filtered_A1234(i+2,j+2) filtered_A1234(i+1,j-1) filtered_A1234(i+2,j2) filtered_A1234(i-1,j+1) filtered_A1234(i-2,j+2)]; Md2=median(diag_data2);
vect3=[Mr2 Md2 filtered_A1234(i,j)];
filtered_A12345(i,j)= median(vect3);
else filtered_A12345(i,j)= filtered_A1234(i,j);
end
end
end
figure,imshow(d3(3:m+2,3:n+2)),title('region corrected mask for low-pass1');
figure,imshow(d2(3:m+2,3:n+2)),title('region corrected mask for low-pass2');
%figure,imshow(d3-d2);
figure,imshow(filtered_A123(3:m+2,3:n+2)); title('output of 1st stage of discriminative
filtering by 5x5 BMM');
figure,imshow(filtered_A1234(3:m+2,3:n+2)); title('output of 2nd stage of discriminative
filtering by 5x5 BMM')
toc;
%figure,imshow(filtered_A12345(3:m+2,3:n+2));%title('output of 3rd stage by 5X5 mult med');
%toc;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Script to convert bitmaps to .coe format (suitable for recording into ROM
withXilinxCoregen)
Adaptedfrom[14].

%A1=rgb2gray(input_image);

%read bmp data in and display it to the screen


%image_name='C:\Users\Roger\Documents\UPC\TFG\imatges\im_grisa.bmp';
image_name='/home/roger/Documents/UPC/TFG/imatges/[Link]';
input_image= imread(image_name);
imdata=rgb2gray(input_image);
%imdata=input_image;
image(imdata);
numpixels=numel(imdata);
%create .COE file
COE_file=image_name;
COE_file(end-2:end)='coe';
fid=fopen(COE_file,'w');
%write header information
fprintf(fid,';******************************************************************\n');
fprintf(fid,';****
BMP file in .COE Format
*****\n');
fprintf(fid,';******************************************************************\n');
fprintf(fid,'; This .COE file specifies initialization values for a\n');
fprintf(fid,'; block memory of depth= %d, and width=8. In this case,\n',numpixels);
fprintf(fid,'; values are specified in hexadecimal format.\n');
%start writing data to the file
fprintf(fid,'memory_initialization_radix=16;\n');
fprintf(fid,'memory_initialization_vector=\n');
%convert image data to row major
newimdata=transpose(double(imdata));
%write image data to file
for j=1:(numpixels-1)
fprintf(fid,'%s,\n',dec2hex(newimdata(j),2));
end
%last data value supposed to have a semicolon instead of a comma
fprintf(fid,'%s;\n',dec2hex(newimdata(numpixels)));
%clean shutdown
fclose(fid)

45

46

FPGAImplementationofaContrastEnhancementAlgorithm

ScripttoreadandshowimagefromRAMdump(.memModelsimfile)
clear
clc
A=fopen('[Link]');
B =fgetl(A);
B =fgetl(A);
B =fgetl(A);
width=415;
height=265;
%width=512;
%height=512;
empty=512*512-width*height;
rubbish = 0;
for i=1:(empty)
rubbish = rubbish + fscanf(A, '%u\n', 1);
end
for i=1:height
i2=height+1-i;
for j=1:width
j2=width+1-j;
C(i2,j2)=uint8(fscanf(A, '%u\n', 1));
end
end
figure;
imshow(C);
fclose(A)

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

B. ProjectVHDLcode
Binary_correction_int.vhd
------------------------------------------------------------------------ Original smooth_filter: Nria Ordua
-- Modified by: Roger Oliv
-- Concordia University
-- 2012-2013
-----------------------------------------------------------------------library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;

entity binary_correction_less is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
binary_in : in std_logic_vector(0 downto 0);
binary_out1 : out std_logic_vector(0 downto 0);
binary_out2 : out std_logic_vector(0 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end binary_correction_less;

architecture behavior of binary_correction_less is

component binary_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(0 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(0 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(8 downto 0));
end component;
------------------------------------------------------------------------- Signal Declarations
------------------------------------------------------------------------

type data_win is array (0 to 4) of unsigned (0 downto 0);


signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal

row1 : data_win;
row2 : data_win;
row3 : data_win;
row4 : data_win;
row5 : data_win;
sbinary_in, sbinary_out1, sbinary_out2
data_in1 : unsigned (0 downto 0);
data_out1 : std_logic_vector (0 downto
data_in2 : unsigned (0 downto 0);
data_out2 : std_logic_vector (0 downto
data_in3 : unsigned (0 downto 0);
data_out3 : std_logic_vector (0 downto
data_in4 : unsigned (0 downto 0);
data_out4 : std_logic_vector (0 downto

: unsigned(0 downto 0);


0);
0);
0);
0);

signal t_setup : unsigned (18 downto 0);


signal activated : std_logic;

47

48

FPGAImplementationofaContrastEnhancementAlgorithm

signal data_count1, data_count2, data_count3, data_count4 : std_logic_vector(8 downto 0);


signal fifo_size : unsigned(10 downto 0);
signal wr_en, rd_en1, rd_en2, rd_en3, rd_en4, rst : std_logic;

begin
fifo1 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in1),
wr_en => wr_en,
rd_en => rd_en1,
dout => data_out1,
full => open,
empty => open,
data_count => data_count1);
fifo2 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in2),
wr_en => wr_en,
rd_en => rd_en2,
dout => data_out2,
full => open,
empty => open,
data_count => data_count2);
fifo3 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in3),
wr_en => wr_en,
rd_en => rd_en3,
dout => data_out3,
full => open,
empty => open,
data_count => data_count3);
fifo4 : binary_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in4),
wr_en => wr_en,
rd_en => rd_en4,
dout => data_out4,
full => open,
empty => open,
data_count => data_count4);

------------------------------------------------------------------------- Module Implementation


------------------------------------------------------------------------

process (clk)
begin
if (clk'event and clk = '1') then
if (clearn = '0') then
t_setup <= (others => '0');
else
if su_flag = '1' and t_setup < (numpixels+im_width*2+3) then
t_setup <= t_setup + 1;
end if;
end if;
end if;
end process;

process (clk)
variable sync_cnt : integer range 0 to 6 := 0;
begin
if (clk'event and clk = '1') then --initialization

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

if (clearn = '0') then


set_up_flag <= '0';
activated <= '0';
for j in 0 to 4 loop -- fifo reset
row1(j) <= (others => '0');
row2(j) <= (others => '0');
row3(j) <= (others => '0');
row4(j) <= (others => '0');
row5(j) <= (others => '0');
end loop;

sbinary_out1 <= (others => '0');


sbinary_out2 <= (others => '0');
data_in1 <= (others => '0');
data_in2 <= (others => '0');
data_in3 <= (others => '0');
data_in4 <= (others => '0');
rd_en1 <= '0';
rd_en2 <= '0';
rd_en3 <= '0';
rd_en4 <= '0';
elsif su_flag = '1' then --Shifts all the registers and fifo's data one position
row1(0) <= sbinary_in;
row1(1 to 4) <= row1(0 to 3);
data_in1 <= row1(4);
if (unsigned(data_count1) >= fifo_size) then --Maintain a constant amount of data
in the fifo
rd_en1 <= '1';
--components depending on the image size
row2(0) <= unsigned(data_out1);
else
rd_en1 <= '0';
row2(0) <= (others=>'0');
end if;
row2(1 to 4) <= row2(0 to 3);
data_in2 <= row2(4);
if (unsigned(data_count2) >= fifo_size) then
rd_en2 <= '1';
row3(0) <= unsigned(data_out2);
else
rd_en2 <= '0';
row3(0) <= (others=>'0');
end if;
row3(1 to 4) <= row3(0 to 3);
data_in3 <= row3(4);
if (unsigned(data_count3) >= fifo_size) then
rd_en3 <= '1';
row4(0) <= unsigned(data_out3);
else
rd_en3 <= '0';
row4(0) <= (others=>'0');
end if;
row4(1 to 4) <= row4(0 to 3);
data_in4 <= row4(4);
if (unsigned(data_count4) >= fifo_size) then
rd_en4 <= '1';
row5(0) <= unsigned(data_out4);
else
rd_en4 <= '0';
row5(0) <= (others=>'0');
end if;
row5(1 to 4) <= row5(0 to 3);
if t_setup >= (im_width*2+3) and t_setup < (numpixels+im_width*2+2) then -- +31+9 en realitat
set_up_flag <= '1';
activated <= '1'; --Apply the convolution operation for the current pixel and
its window.
sync_cnt := 0;

--Mask with broader filtering-if (row3(2)= 0 and ((((resize(not(row2(1)),4)+not(row2(2))+not(row2(3))+


not(row3(1))+not(row3(3))+
not(row4(1))+not(row4(2))+not(row4(3))) < 4) and

49

50

FPGAImplementationofaContrastEnhancementAlgorithm

not((row2(1)=0
and row3(1)=0
and row4(1)=1

and row2(2)=0
and row2(3)=1
and row3(3)=1
and row4(2)=1
and row4(3)=1)

or (row2(1)=1
and row3(1)=1
and row4(1)=1

and row2(2)=0
and row2(3)=0
and row3(3)=0
and row4(2)=1
and row4(3)=1)

or (row2(1)=1
and row3(1)=1
and row4(1)=1

and row2(2)=1
and row2(3)=1
and row3(3)=0
and row4(2)=0
and row4(3)=0)

or (row2(1)=1
and row3(1)=0
and row4(1)=0

and row2(2)=1
and row2(3)=1
and row3(3)=1
and row4(2)=0
and row4(3)=1)))

or (row2(1)=0
and row3(1)=1
and row4(1)=0

and row2(2)=1
and row2(3)=0
and row3(3)=1
and row4(2)=1
and row4(3)=0)

or (row2(1)=1
and row3(1)=0
and row4(1)=1

and row2(2)=0
and row2(3)=1
and row3(3)=0
and row4(2)=0
and row4(3)=1))) then

sbinary_out1 <= "1";


else
sbinary_out1 <= row3(2);
end if;
--Mask with more restricted filtering-if (row3(2)= 0 and ((resize(not(row2(1)),4)+not(row2(2))+not(row2(3))+
not(row3(1))+not(row3(3))+
not(row4(1))+not(row4(2))+not(row4(3))) < 3)) then
sbinary_out2 <= "1";
else
sbinary_out2 <= row3(2);
end if;
else
if activated <= '1'
--if sync_cnt <
-- sync_cnt :=
--else
set_up_flag
--end if;
end if;

then
6 then
sync_cnt + 1;
<= '0'; --Finish

end if;

end if;
end if;
end process;
wr_en <= su_flag;
fifo_size <= im_width - 8; -- Added a -1 initially not forecasted
rst <= not(clearn);
sbinary_in <= unsigned(binary_in);
binary_out1 <= std_logic_vector(sbinary_out1);
binary_out2 <= std_logic_vector(sbinary_out2);

end behavior;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

binary_generator_int2.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity mask_generator is
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--Trigger to start the generation
pulse_start_input: in std_logic;
limit1_t : in unsigned(7 downto 0);
limit1_b : in unsigned(7 downto 0);
limit2_t : in unsigned(7 downto 0);
limit2_b : in unsigned(7 downto 0);
end_flag : out std_logic;
rom_addrb : out std_logic_vector(17 downto 0);
rom_doutb : in std_logic_vector(7 downto 0);
im_width : in unsigned(9 downto 0);
numpixels : in unsigned(18 downto 0);
binary_wea : out std_logic_vector(0 downto 0);
binary_addra : out std_logic_vector(17 downto 0);
binary1_dina : out std_logic_vector(0 downto 0);
binary2_dina : out std_logic_vector(0 downto 0)
);
end mask_generator;
architecture bench of mask_generator is
signal device_data, data_in, filter_out1, filter_out2 : std_logic_vector(0 downto 0); -current pixel value
signal ram_wr_addr, ram_wr_addr2 : unsigned(17 downto 0); --address to be accessed in the RAM
containing the histogram
signal pulse_out, pulse_out2, pulse_out3, end_flag_signal: std_logic;
signal new_width, width_counter, width_counter2 : unsigned(10 downto 0);
signal
signal
signal
signal
signal

image_addr: unsigned(18 downto 0);


pre_binary : std_logic_vector(7 downto 0);
nrst : std_logic;
wren : std_logic_vector(0 downto 0);
new_numpixels : unsigned(18 downto 0);

component binary_correction_less is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
binary_in : in std_logic_vector(0 downto 0);
binary_out1 : out std_logic_vector(0 downto 0);
binary_out2 : out std_logic_vector(0 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end component;

begin

corrector_l : binary_correction_less
port map(
clk => clk,
clearn => nrst,

51

52

FPGAImplementationofaContrastEnhancementAlgorithm

su_flag => pulse_out3,


binary_in => device_data,
binary_out1 => filter_out1,
binary_out2 => filter_out2,
set_up_flag => wren(0),
im_width => new_width,
numpixels => new_numpixels);

process(clk) --Process containing an address counter to read the image in the


--ROM memory sequentialy and compute its histogram or transformed version.
begin
if (CLK'EVENT AND CLK = '1') then
if reset = '1' or pulse_out='0' then
image_addr <= to_unsigned(0, 19);
width_counter <= (others=>'0');
device_data <= (others=>'0');
elsif unsigned(image_addr) >= (unsigned(numpixels)+1) then
image_addr <= image_addr;
width_counter <= width_counter;
device_data <= (others=>'0');
elsif width_counter = (im_width) then
image_addr <= image_addr;
width_counter <= width_counter + 1;
device_data <= data_in;
elsif width_counter = (im_width+1) then
image_addr <= image_addr;
width_counter <= width_counter + 1;
device_data <= (others=>'0');
elsif width_counter = (new_width) then
image_addr <= image_addr+1;
width_counter <= width_counter + 1;
width_counter <= "00000000001";
device_data <= (others=>'0');
else
image_addr <= image_addr + 1;
width_counter <= width_counter + 1;
device_data <= data_in;
end if;
if reset='1' then
pulse_out <= '0';
pulse_out2 <= '0';
pulse_out3 <= '0';
else
pulse_out<=pulse_start_input or pulse_out;
pulse_out2 <= pulse_out;
pulse_out3 <= pulse_out2;
end if;
end if;

if (CLK'EVENT AND CLK = '1') then


if reset = '1' or wren="0" then
ram_wr_addr <= to_unsigned(0, 18);
width_counter2 <= (others=>'0');
binary1_dina <= (others=>'0');
binary2_dina <= (others=>'0');
elsif unsigned(ram_wr_addr) >= (unsigned(numpixels) - 1) then
ram_wr_addr <= ram_wr_addr;
width_counter2 <= width_counter2;
binary1_dina <= filter_out1;
binary2_dina <= filter_out2;
elsif width_counter2 = (im_width) or width_counter2 = (im_width + 1) then
ram_wr_addr <= ram_wr_addr;
width_counter2 <= width_counter2 + 1;
binary1_dina <= filter_out1;
binary2_dina <= filter_out2;
elsif width_counter2 = (im_width + 2) then

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

ram_wr_addr <= ram_wr_addr+1;


width_counter2 <= "00000000001";
binary1_dina <= filter_out1;
binary2_dina <= filter_out2;
else
ram_wr_addr <= ram_wr_addr + 1;
width_counter2 <= width_counter2 + 1;
binary1_dina <= filter_out1;
binary2_dina <= filter_out2;
end if;
if reset = '1' then
end_flag_signal <= '0';
elsif unsigned(ram_wr_addr) >= (unsigned(numpixels) - 1) then
end_flag_signal <= '1';
else
end_flag_signal<=end_flag_signal;
end if;

ram_wr_addr2 <= ram_wr_addr;


end if;
end process;

nrst <= not(reset);


new_width <= resize(im_width, 11)+2;
new_numpixels <= numpixels+resize(2*numpixels/im_width, 19);
data_in(0) <= '1' when ((unsigned(pre_binary) >= limit1_b) and (limit1_t >=
unsigned(pre_binary))) or ((unsigned(pre_binary) >= limit2_b) and (limit2_t >=
unsigned(pre_binary)))
else '0';
pre_binary <= rom_doutb;
rom_addrb <= std_logic_vector(image_addr(17 downto 0));
binary_wea <= wren;
binary_addra <= std_logic_vector(ram_wr_addr2);
end_flag <= end_flag_signal;

end bench;

53

54

FPGAImplementationofaContrastEnhancementAlgorithm

clahe_complete4.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity clahe is
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic;
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
clip_limit : in unsigned(6 downto 0);
rom_addra : out std_logic_vector(17 downto 0);
rom_douta : in std_logic_vector(7 downto 0);
ram_wea : out std_logic_vector(0 downto 0);
ram_addra : out std_logic_vector(17 downto 0);
ram_dina : out std_logic_vector(7 downto 0)
);
end clahe;
architecture wrapper of clahe is

component histo_ram2 --Memory to store the resulting histogram


port (
clka: IN std_logic;
wea: IN std_logic_VECTOR(0 downto 0);
addra: IN std_logic_VECTOR(7 downto 0);
dina: IN std_logic_VECTOR(18 downto 0);
clkb: IN std_logic;
rstb: IN std_logic;
addrb: IN std_logic_VECTOR(7 downto 0);
doutb: OUT std_logic_VECTOR(18 downto 0));
end component;

component clipping_wrapper
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic; --triggers the beginning of the operation
--dataout : out std_logic_vector(17 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram clipping
numpixels : in unsigned(18 downto 0); --Total number of pixels in the image
clip_limit : in unsigned(6 downto 0); --Tolerated bin limit
histo_wea : out std_logic_vector(0 downto 0);
histo_addra : out std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addrb : out std_logic_vector(7 downto 0);
histo_doutb : in std_logic_vector(18 downto 0);
cdf_min : out unsigned(18 downto 0)
);
end component;
component histogram_wrapper
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--number of pixels of the image
cntr_value: in std_logic_vector(18 downto 0);
--Trigger to start the histogram generation
pulse_start_input: in std_logic;
--Output of the histogram data
--histogram_out: out std_logic_vector(17 downto 0);
im_douta : in std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addra : out std_logic_vector(7 downto 0);

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

histo_wea : out std_logic_vector(0 downto 0);


histo_addrb : out std_logic_vector(7 downto 0);
histo_doutb : in std_logic_vector(18 downto 0);
end_flag : out std_logic
);
end component;
component image_tiling
port ( romraddr : out std_logic_vector(17 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (7 downto 0); -- RAM data out
clk : in std_logic;
ramwraddr : out std_logic_vector(17 downto 0);
dataout : out std_logic_vector(7 downto 0); -- RAM data in
numx : out unsigned(2 downto 0);
numy : out unsigned(2 downto 0);
rst : in std_logic;
start_cntr : in std_logic;
wren : out std_logic;
end_flag : out std_logic; --marks the end of the distribution
numpixels : in unsigned(18 downto 0); --number of pixels of the image
tile_numpixels : out unsigned(18 downto 0); --number of pixels of the tile
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
xtile : out unsigned(9 downto 0);
ytile : out unsigned(9 downto 0)
);
end component;
component histogram_equalizer is
port ( histo_raddr : out std_logic_vector(7 downto 0) ; -- device data as address for
RAM
histo_in_ul : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_ur : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_ll : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_lr : in std_logic_vector (18 downto 0); -- histogram CDF value
rom_raddr : out std_logic_vector(17 downto 0); --image pixel address
rom_in : std_logic_vector(7 downto 0);--image pixel value
clk : in std_logic;
clhe_wraddr : out std_logic_vector( 17 downto 0); --Address for the transformed pixel
to write
rst : in std_logic;
start_cntr : in std_logic; --Triggers the transformation operations
wren : out std_logic;
clhe_out : out std_logic_vector(7 downto 0); -- Output for transformed pixel value
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
im_width : in unsigned(9 downto 0);
im_height : in unsigned(9 downto 0);
x_size : in unsigned(9 downto 0);--Subimage
y_size : in unsigned(9 downto 0);--Subimage
ul_id : out unsigned(7 downto 0);
ur_id : out unsigned(7 downto 0);
ll_id : out unsigned(7 downto 0);
lr_id : out unsigned(7 downto 0);
numpixels_ul : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ul : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_ur : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ur : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_ll : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ll : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_lr : in unsigned(18 downto 0); --number of pixels of the image
histo_min_lr : in unsigned(18 downto 0) --Lowest CDF value of the histogram
);
end component;

type
type
type
type

HISTO_IO is array (0 to 99) of std_logic_vector(18 downto 0);


HISTO_ADDR is array (0 to 99) of std_logic_vector(7 downto 0);
HISTO_WEA_T is array (0 to 99) of std_logic_vector(0 downto 0);
TILESIZE is array (0 to 99) of unsigned(18 downto 0);

signal histo_wea1, histo_wea2, wren_trans : std_logic_vector(0 downto 0);


signal histo_wea : HISTO_WEA_T;
signal histo_addra1, histo_addra2, histo_addrb1, histo_addrb2, tiler_datain, tiler_dataout_a,
histo_raddr_trans, rom_in_trans, clhe_out_trans : std_logic_vector (7 downto 0);
signal histo_addra, histo_addrb : HISTO_ADDR;

55

56

FPGAImplementationofaContrastEnhancementAlgorithm

signal tiler_ramraddr, rom_raddr_trans, clhe_wraddr_trans : std_logic_vector (17 downto 0);


signal histo_dina1, histo_dina2, histo_doutb1, histo_doutb2, histo_in_ul_trans,
histo_in_ur_trans, histo_in_ll_trans, histo_in_lr_trans : std_logic_vector (18 downto 0);
signal histo_dina, histo_doutb : HISTO_IO;
signal start_clipping, start_clipping_pre, start_clipping_pre2, rstclip, start_cntr_trans :
std_logic;
signal histo_start_cntr_pre, histo_start_cntr, end_flag_clipper, tiler_start_cntr, tiler_wren,
tiler_wren_pre, tiler_end_flag, end_flag_trans, transform, rst_trans : std_logic;
signal numx, numy : unsigned(2 downto 0);
signal numtile_int : integer range 0 to 99;
signal numpixels_tile, numpixels_tile_pre, cdfmin, numpixels_ul_trans, numpixels_ur_trans,
numpixels_ll_trans, numpixels_lr_trans, histo_min_ul_trans, histo_min_ur_trans,
histo_min_ll_trans, histo_min_lr_trans : unsigned(18 downto 0);
signal tile_numpixels, cdf_min : TILESIZE;
signal x_size_sub, y_size_sub : unsigned(9 downto 0);
signal ul_id_trans, ur_id_trans, ll_id_trans, lr_id_trans : unsigned(7 downto 0);

begin
--Connections between all the memory blocks and the computation block

tiles : for I in 0 to 99 generate


histogram_ram : histo_ram2
port map (
clka => clk,
dina => histo_dina(I),
addra => histo_addra(I),
wea => histo_wea(I),
clkb => clk,
rstb => '0',
addrb => histo_addrb(I),
doutb => histo_doutb(I));
end generate;

histo_wrapper : histogram_wrapper
port map(
--global clock signal, active with its rising edge
clk => clk,
--reset signal, synchronous and active high
reset => rst,
--number of pixels of the image
cntr_value => std_logic_vector(numpixels_tile_pre),
--Trigger to start the histogram generation
pulse_start_input => histo_start_cntr,--tiler_wren... IMPORTANT! TODO!
--Output of the histogram data
im_douta => tiler_dataout_a,
histo_dina => histo_dina1,
histo_addra => histo_addra1,
histo_wea => histo_wea1,
histo_addrb => histo_addrb1,
histo_doutb => histo_doutb1,
end_flag => start_clipping_pre2
);
histo_clipper : clipping_wrapper
port map (
clk => clk,
rst => rstclip,
start_cntr => start_clipping,
end_flag => end_flag_clipper, --marks the end of the histogram clipping
numpixels => numpixels_tile_pre,
clip_limit => clip_limit, --Tolerated bin limit
histo_wea => histo_wea2,
histo_addra => histo_addra2,
histo_dina => histo_dina2,
histo_addrb => histo_addrb2,
histo_doutb => histo_doutb2,
cdf_min => cdfmin
);

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

tiler : image_tiling
port map (
romraddr => tiler_ramraddr,
datain => tiler_datain,
clk => clk,
ramwraddr => open,
dataout => tiler_dataout_a,
numx => numx,
numy => numy,
rst => rst,
start_cntr => tiler_start_cntr,
wren => tiler_wren,
end_flag => tiler_end_flag,
numpixels => numpixels,
tile_numpixels => numpixels_tile_pre,
x_size => x_size,
y_size => y_size,
xtile => x_size_sub,
ytile => y_size_sub
);
equalizer : histogram_equalizer
port map (
histo_raddr => histo_raddr_trans, -- device data as address for RAM
histo_in_ul => histo_in_ul_trans, -- histogram CDF value
histo_in_ur => histo_in_ur_trans, -- histogram CDF value
histo_in_ll => histo_in_ll_trans, -- histogram CDF value
histo_in_lr => histo_in_lr_trans, -- histogram CDF value
rom_raddr => rom_raddr_trans, --image pixel address
rom_in => rom_in_trans,--image pixel value
clk => clk,
clhe_wraddr => clhe_wraddr_trans, --Address for the transformed pixel to
write
rst => rst_trans,
start_cntr => start_cntr_trans, --Triggers the transformation operations
wren => wren_trans(0),
clhe_out => clhe_out_trans, -- Output for transformed pixel value
end_flag => end_flag_trans, --marks the end of the histogram calculation
numpixels => numpixels,
im_width => x_size,
im_height => y_size,
x_size => x_size_sub,--Subimage
y_size => y_size_sub,--Subimage
ul_id => ul_id_trans,
ur_id => ur_id_trans,
ll_id => ll_id_trans,
lr_id => lr_id_trans,
numpixels_ul => numpixels_ul_trans, --number of pixels of the image
histo_min_ul => histo_min_ul_trans, --Lowest CDF value of the histogram
numpixels_ur => numpixels_ur_trans, --number of pixels of the image
histo_min_ur => histo_min_ur_trans, --Lowest CDF value of the histogram
numpixels_ll => numpixels_ll_trans, --number of pixels of the image
histo_min_ll => histo_min_ll_trans, --Lowest CDF value of the histogram
numpixels_lr => numpixels_lr_trans, --number of pixels of the image
histo_min_lr => histo_min_lr_trans); --Lowest CDF value of the histogram

process(clk)
begin
if (CLK'EVENT AND CLK = '1') then
if rst='1' then
start_clipping_pre <= '0';
start_clipping <= '0';
else
start_clipping_pre <= start_clipping_pre2;
start_clipping <= not(start_clipping_pre) and start_clipping_pre2;
end if;
end if;
end process;
process(rst, start_clipping_pre, histo_dina1, histo_dina2, histo_addra1, histo_addra2,
histo_wea1, histo_wea2, histo_addrb1, histo_addrb2, histo_doutb, cdfmin, numpixels_tile,

57

58

FPGAImplementationofaContrastEnhancementAlgorithm

ul_id_trans, ur_id_trans, ll_id_trans, lr_id_trans, tiler_ramraddr, rom_raddr_trans,


numtile_int, histo_raddr_trans, rom_douta)
begin

if rst = '1' then


for j in 0 to 99 loop
histo_dina(j) <= (others=>'0');
histo_addra(j) <= (others=>'0');
histo_wea(j) <= (others=>'0');
histo_addrb(j) <= (others=>'0');
tile_numpixels(j) <= (others=>'0');
cdf_min(j) <= (others=>'0');
--histo_rstb <= histo_rstb1;
end loop;
histo_doutb1 <= (others=>'0');
histo_doutb2 <= (others=>'0');
numpixels_ul_trans
numpixels_ur_trans
numpixels_ll_trans
numpixels_lr_trans

<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

histo_min_ul_trans
histo_min_ur_trans
histo_min_ll_trans
histo_min_lr_trans

<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

histo_in_ul_trans
histo_in_ur_trans
histo_in_ll_trans
histo_in_lr_trans

<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

rom_addra <= (others=>'0');

elsif transform = '0' then


for k in 0 to 9 loop --TODO: haig de fer que s'escriguin a l'hora que la resta (o no)
histo_dina(k) <= histo_dina(k+10);
histo_addra(k) <= histo_addra(k+10);
histo_wea(k) <= histo_wea(k+10);
histo_addrb(k) <= histo_addrb(k+10);
tile_numpixels(k) <= tile_numpixels(k+10);
cdf_min(k) <= cdf_min(k+10);
end loop;
for l in 90 to 99 loop
histo_dina(l) <= histo_dina(l-10);
histo_addra(l) <= histo_addra(l-10);
histo_wea(l) <= histo_wea(l-10);
histo_addrb(l) <= histo_addrb(l-10);
tile_numpixels(l) <= tile_numpixels(l-10);
cdf_min(l) <= cdf_min(l-10);
end loop;
for m in 1 to 8 loop
histo_dina(10*m) <= histo_dina(10*m+1);
histo_addra(10*m) <= histo_addra(10*m+1);
histo_wea(10*m) <= histo_wea(10*m+1);
histo_addrb(10*m) <= histo_addrb(10*m+1);
tile_numpixels(10*m) <= tile_numpixels(10*m+1);
cdf_min(10*m) <= cdf_min(10*m+1);
end loop;
for n in 2 to 9 loop
histo_dina(10*n-1) <= histo_dina(10*n-2);
histo_addra(10*n-1) <= histo_addra(10*n-2);
histo_wea(10*n-1) <= histo_wea(10*n-2);
histo_addrb(10*n-1) <= histo_addrb(10*n-2);
tile_numpixels(10*n-1) <= tile_numpixels(10*n-2);
cdf_min(10*n-1) <= cdf_min(10*n-2);
end loop;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

for p in 1 to 8 loop
for q in 1 to 8 loop
if (p*10+q)=numtile_int then
if start_clipping_pre = '1' then
histo_dina(p*10+q) <= histo_dina2;
histo_addra(p*10+q) <= histo_addra2;
histo_wea(p*10+q) <= histo_wea2;
histo_addrb(p*10+q) <= histo_addrb2;
tile_numpixels(p*10+q) <= tile_numpixels(p*10+q);
cdf_min(p*10+q) <= cdfmin;
else
histo_dina(p*10+q) <= histo_dina1;
histo_addra(p*10+q) <= histo_addra1;
histo_wea(p*10+q) <= histo_wea1;
histo_addrb(p*10+q) <= histo_addrb1;
tile_numpixels(p*10+q) <= numpixels_tile;
cdf_min(p*10+q) <= cdfmin;
end if;
else
histo_dina(p*10+q) <= (others=>'0');
histo_addra(p*10+q) <= (others=>'0');
histo_wea(p*10+q) <= (others=>'0');
histo_addrb(p*10+q) <= (others=>'0');
tile_numpixels(p*10+q) <= tile_numpixels(p*10+q);
cdf_min(p*10+q) <= cdf_min(p*10+q);
end if;
end loop;
end loop;
if start_clipping_pre = '1' then
histo_doutb1 <= (others=>'0');
histo_doutb2 <= histo_doutb(numtile_int);
else
histo_doutb1 <= histo_doutb(numtile_int);
histo_doutb2 <= (others=>'0');
end if;
numpixels_ul_trans
numpixels_ur_trans
numpixels_ll_trans
numpixels_lr_trans

<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

histo_min_ul_trans
histo_min_ur_trans
histo_min_ll_trans
histo_min_lr_trans

<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

histo_in_ul_trans
histo_in_ur_trans
histo_in_ll_trans
histo_in_lr_trans

<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

rom_in_trans <= (others=>'0');

tiler_datain <= rom_douta;


rom_addra <= tiler_ramraddr;
else

59

60

FPGAImplementationofaContrastEnhancementAlgorithm

for p in 0 to 9 loop
for q in 0 to 9 loop

histo_dina(p*10+q) <= (others=> '0');


histo_addra(p*10+q) <= (others=>'0');
histo_wea(p*10+q) <= (others=>'0');
histo_addrb(p*10+q) <= histo_raddr_trans;
tile_numpixels(p*10+q) <= tile_numpixels(p*10+q);
cdf_min(p*10+q) <= cdf_min(p*10+q);

end loop;
end loop;
numpixels_ul_trans
numpixels_ur_trans
numpixels_ll_trans
numpixels_lr_trans

<=
<=
<=
<=

tile_numpixels(to_integer(ul_id_trans));
tile_numpixels(to_integer(ur_id_trans));
tile_numpixels(to_integer(ll_id_trans));
tile_numpixels(to_integer(lr_id_trans));

histo_min_ul_trans
histo_min_ur_trans
histo_min_ll_trans
histo_min_lr_trans

<=
<=
<=
<=

cdf_min(to_integer(ul_id_trans));
cdf_min(to_integer(ur_id_trans));
cdf_min(to_integer(ll_id_trans));
cdf_min(to_integer(lr_id_trans));

histo_in_ul_trans
histo_in_ur_trans
histo_in_ll_trans
histo_in_lr_trans

<=
<=
<=
<=

histo_doutb(to_integer(ul_id_trans));
histo_doutb(to_integer(ur_id_trans));
histo_doutb(to_integer(ll_id_trans));
histo_doutb(to_integer(lr_id_trans));

histo_doutb1 <= (others=>'0');


histo_doutb2 <= (others=>'0');

rom_in_trans <= rom_douta;

tiler_datain <= (others=>'0');


rom_addra <= rom_raddr_trans;
end if;
end process;
----Processes for tiler
process(clk)
begin
if (CLK'EVENT AND CLK = '1') then
if rst='1' then
tiler_start_cntr <= '0';
histo_start_cntr_pre <= '0';
histo_start_cntr <= '0';
tiler_wren_pre <= '0';
elsif tiler_end_flag='1' then
tiler_start_cntr <= '0';
histo_start_cntr_pre <= tiler_start_cntr;
histo_start_cntr <= histo_start_cntr_pre;
else
tiler_start_cntr <= start_cntr or (end_flag_clipper and start_clipping_pre);
histo_start_cntr_pre <= tiler_start_cntr;
histo_start_cntr <= histo_start_cntr_pre;
end if;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

if rst='1' then
numtile_int <= 11;
elsif histo_start_cntr = '1' then
numtile_int <= to_integer(numx)+11+to_integer(numy)*10;
else
numtile_int <= numtile_int;
end if;
if rst='1' then
numpixels_tile <= (others=>'0');
transform <= '0';
start_cntr_trans <= '0';
else
numpixels_tile <= numpixels_tile_pre +1;
transform <= start_cntr_trans or transform;
start_cntr_trans <= (end_flag_clipper and tiler_end_flag);
end if;

end if;
end process;

---End of tiler processes

rstclip <= rst;

rst_trans <= rst or not(transform);


end_flag <= end_flag_trans;

ram_wea <= wren_trans;


ram_addra <= clhe_wraddr_trans;
ram_dina <= clhe_out_trans;
end wrapper;

61

62

FPGAImplementationofaContrastEnhancementAlgorithm

clhe_clipping_int4.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity histogram_clipper is
port ( ramraddr : out std_logic_vector(7 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (18 downto 0); -- RAM data out
clk : in std_logic;
ramwraddr : out std_logic_vector( 7 downto 0); -- written histogram bin as address
for RAM
rst : in std_logic;
start_cntr : in std_logic;--triggers the beginning of the operation
wren : out std_logic;--write enable output for the ram
dataout : out std_logic_vector(18 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0); --number of pixels of the histogram
clip_limit : in unsigned(6 downto 0); -- Clip limit in %
cdf_min : out unsigned(18 downto 0)
);
end histogram_clipper;
architecture clipper of histogram_clipper is
--Signal list
signal start_add, pre_start_add, pre_wren, pre_wren2 : std_logic;
signal
signal
signal
signal
signal

excess, abs_limit, data_out, cdf_min_signal : unsigned(18 downto 0);


binIncr : unsigned(26 downto 0); --Potser es podria reduir la mida?
ramraddru, ramwraddru, ramwraddru2 : unsigned(7 downto 0);
difference : signed(19 downto 0);
incr_trigger, start, start2, operation, wren1, end_flag_signal : std_logic;

-------proves------------signal op1, op2 : unsigned(25 downto 0);

begin

--Process to read all the histogram, calculate the amount of absolute clipping and
generate the CDF for the transformation.
-- This is done in 2 sweeps:
--The first sweep calculates the amount of pixels that exceed the clip limit and clips
them from the corresponding bins.
--The second sweep adds to each bin the increase calculated from those counted excess
pixels and substitutes the bin value
--with the corresponding CDF value.
process(clk)
begin
if (CLK'EVENT AND CLK = '1') then
if rst = '1' then
start_add <='0';
pre_start_add <= '0';
else
start_add <= start2;
pre_start_add <= start_add; --Delay the beginning of the operation start
signal until
end if;
--it is aligned with input data
------------------------------beginning of reading block, shared amongst the 2 sweeps----------------------------------------------if rst = '1' or start2='0' then --initialization of variables
ramraddru <= to_unsigned(0, 8);
ramwraddru <= to_unsigned(0, 8);
ramwraddru2 <= to_unsigned(0, 8);
pre_wren <= '0';
pre_wren2 <= '0';
wren1 <= '0';
incr_trigger <= '0';
end_flag_signal <= '0';

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

elsif ramraddru = 255 then --Don't let it keep writing if the whole memory has
been sweeped.
ramraddru <= to_unsigned(255, 8);
ramwraddru2 <= ramraddru;
ramwraddru <= ramwraddru2;--align the write addresses and write enable with
the output data
pre_wren2 <= '0';
pre_wren <= pre_wren2;
wren1 <= pre_wren;
incr_trigger <= operation; --will turn to 1 only the first time we do the
memory sweep
end_flag_signal <= not(operation);
else
ramraddru <= ramraddru + 1; --Sweep the addresses
ramwraddru2 <= ramraddru;--align the write addresses and write enable with
the output data
ramwraddru <= ramwraddru2;
pre_wren2 <= '1';
pre_wren <= '1';
wren1 <= pre_wren;
incr_trigger <= '0';
end_flag_signal <= '0';
end if;
----------------------------end of reading block, beginning of writing block---------------------------------------if (start_add='0' and operation='1') or rst='1' then --Clipping block: excess
variable counter management
excess <= to_unsigned(0, 19);
elsif operation = '1' then
if difference >= to_signed(1,19) and pre_wren = '1' then --wren condition
here is to avoid increases in
excess <= excess + resize(unsigned(difference), 18); --the excess
variable after finishing the sweep.
else
excess <= excess;
end if;
else
excess<=excess;
end if;
if start_add='0' or rst='1' then --Clipping block: data output management
(clipped output or CDF value, according to
data_out <= (others=>'0');--the sweep number.
elsif operation = '1' then
if difference >= to_signed(1,19) and pre_wren = '1' then --wren condition
here is to avoid increases in
data_out <= abs_limit;
--the excess variable after
finishing the sweep.
else
data_out <= unsigned(datain);
end if;
else
data_out <= (data_out+resize((unsigned(datain) + binIncr),19));
end if;

if rst='1' then --Management of the start signal, used to trigger the shared
logic of the 2 sweeps
start <= '0';
elsif (start_cntr = '1' or incr_trigger = '1') then
start <= '1';
elsif (ramraddru = 255) then
start <= '0';
else
start<=start;
end if;
if rst='1' then

63

64

FPGAImplementationofaContrastEnhancementAlgorithm

start2 <= '0';


else
start2<=start;
end if;
if rst='1' or start_cntr = '1' then --Management of the operation flag. Selects
between the 2 sweeps
operation <= '1';
elsif incr_trigger = '1' then
operation <= '0';
else
operation <= operation;
end if;
if rst='1' then
end_flag <= '0';
else
end_flag <= end_flag_signal;
end if;
if rst='1' then
cdf_min_signal <=
elsif operation = '0'
cdf_min_signal <=
else
cdf_min_signal <=
end if;

(others=>'0');
and ramwraddru = 0 and wren1='1' then
unsigned(data_out);
cdf_min_signal;

end if;
end process;
wren <= wren1;
ramraddr <= std_logic_vector(ramraddru);
ramwraddr <= std_logic_vector(ramwraddru);
---debug--op1 <=(numpixels*clip_limit);
op2 <=op1/to_unsigned(100, 7);
abs_limit <= resize(op2, 19);
---fi debug--difference <= signed('0' & datain) - signed('0'&abs_limit); --Computes the difference
between bin value and clipping limit
binIncr <= resize(excess, 27)/256;
dataout <= std_logic_vector(data_out);
cdf_min <= cdf_min_signal;
end clipper;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

clipping_wrapper_int2.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity clipping_wrapper is
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic; --triggers the beginning of the operation
--dataout : out std_logic_vector(17 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram clipping
numpixels : in unsigned(18 downto 0); --Total number of pixels in the image
clip_limit : in unsigned(6 downto 0); --Tolerated bin limit
histo_wea : out std_logic_vector(0 downto 0);
histo_addra : out std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addrb : out std_logic_vector(7 downto 0);
histo_doutb : in std_logic_vector(18 downto 0);
cdf_min : out unsigned(18 downto 0)
);
end clipping_wrapper;
architecture wrapper of clipping_wrapper is

component histogram_clipper
port ( ramraddr : out std_logic_vector(7 downto 0) ; -- accessed histogram bin as
address for RAM
datain : in std_logic_vector (18 downto 0); -- RAM data out
clk : in std_logic;
ramwraddr : out std_logic_vector( 7 downto 0); -- written histogram bin as address
for RAM
rst : in std_logic;
start_cntr : in std_logic;--triggers the beginning of the operation
wren : out std_logic;--write enable output for the ram
dataout : out std_logic_vector(18 downto 0); -- RAM data in
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0); --number of pixels of the histogram
clip_limit : in unsigned(6 downto 0); -- Clip limit in %
cdf_min : out unsigned(18 downto 0)
);
end component;
signal ramraddr, ramwraddr : std_logic_vector(7 downto 0);
signal wren : std_logic_vector(0 downto 0);
signal datain, dataout_a : std_logic_vector (18 downto 0);

begin
--Process to read all the histogram and calculate the amount of absolute clipping

histo_clipper : histogram_clipper
port map (
ramraddr => ramraddr,
datain => datain,
clk => clk,
ramwraddr => ramwraddr,
rst => rst,
start_cntr => start_cntr,
wren => wren(0),
dataout => dataout_a,
end_flag => end_flag,
numpixels => numpixels,
clip_limit => clip_limit,
cdf_min => cdf_min);

65

66

FPGAImplementationofaContrastEnhancementAlgorithm

histo_dina <= dataout_a;


histo_addra <= ramwraddr;
histo_wea <= wren;
histo_addrb <= ramraddr;
datain <= histo_doutb;
--dataout <= dataout_a;
end wrapper;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

filter_system_int2.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity filter_testbench is
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--Trigger to start the generation
pulse_start_input: in std_logic;
--Output of the data
dataout : out std_logic_vector(7 downto 0);
memout : out std_logic_vector(7 downto 0);
end_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0);
b1_doutb : in std_logic_vector(0 downto 0);
b2_doutb : in std_logic_vector(0 downto 0);
input_ram_addr : out std_logic_vector(17 downto 0);
clahe_ram_doutb : in std_logic_vector(7 downto 0)

);
end filter_testbench;
architecture bench of filter_testbench is
signal device_data, device_data_2, filter_out_s, filter_out_sf, filter_out_sf1,
filter_out_sf2, filter_out_sf3, filter_out_sf4, filter_out_sf5, filter_out_sf6,
filter_out_sf7, filter_out_sf8, filter_out_sf9, filter_out_1, filter_out_1_2, filter_out_1f,
filter_out_2 : std_logic_vector(7 downto 0); --current pixel value
signal ram_wr_addr, ram_wr_addr2 : unsigned(17 downto 0); --address to be accessed in the RAM
containing the histogram
signal pulse_out, pulse_out2, pulse_out3, end_flag_signal: std_logic;
signal new_width, width_counter, width_counter2 : unsigned(10 downto 0);
signal
signal
signal
signal
downto
signal

image_addr: unsigned(18 downto 0);


data_out, data_in, data_ram : std_logic_vector(7 downto 0);
nrst : std_logic;
wren_s, wren_1, wren_1_2, wren_2, binary1_data, binary2_data : std_logic_vector(0
0);
new_numpixels : unsigned(18 downto 0);

component smooth_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end component;
component clhe_ram2
port (
clka: IN std_logic;
wea: IN std_logic_VECTOR(0 downto 0);
addra: IN std_logic_VECTOR(17 downto 0);
dina: IN std_logic_VECTOR(7 downto 0);
douta: OUT std_logic_VECTOR(7 downto 0));
end component;

67

68

FPGAImplementationofaContrastEnhancementAlgorithm

component median_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end component;
component wait_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(7 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(7 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(11 downto 0));
end component;
component binary1_wait_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(0 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(0 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(11 downto 0));
end component;
component binary2_wait_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(0 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(0 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(11 downto 0));
end component;
begin

smoother : smooth_filter
port map(
clk => clk,
clearn => nrst,
su_flag => pulse_out3,
smooth_filter_in => device_data,
smooth_filter_out => filter_out_s,
set_up_flag => wren_s(0),
im_width => new_width,
numpixels => new_numpixels);
discriminative1 : median_filter
port map(
clk => clk,
clearn => nrst,
su_flag => wren_s(0),
smooth_filter_in => filter_out_s,
smooth_filter_out => filter_out_1,
set_up_flag => wren_1(0),
im_width => new_width,
numpixels => new_numpixels);
discriminative2 : median_filter

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

port map(
clk => clk,
clearn => nrst,
su_flag => wren_1_2(0),
smooth_filter_in => device_data_2,
smooth_filter_out => filter_out_2,
set_up_flag => wren_2(0),
im_width => new_width,
numpixels => new_numpixels);
waitf_1 : wait_fifo
port map (
clk => clk,
rst => reset,
din => filter_out_s,
wr_en => wren_s(0),
rd_en => wren_1(0),
dout => filter_out_sf,
full => open,
empty => open,
data_count => open);
waitf_2 : wait_fifo
port map (
clk => clk,
rst => reset,
din => device_data_2,
wr_en => wren_1_2(0),
rd_en => wren_2(0),
dout => filter_out_1f,
full => open,
empty => open,
data_count => open);

binary_wait1 : binary1_wait_fifo
port map (
clk => clk,
rst => reset,
din => b1_doutb,
wr_en => pulse_out3,
rd_en => wren_1(0),
dout => binary1_data,
full => open,
empty => open,
data_count => open);
binary_wait2 : binary2_wait_fifo
port map (
clk => clk,
rst => reset,
din => b2_doutb,
wr_en => pulse_out3,
rd_en => wren_2(0),
dout => binary2_data,
full => open,
empty => open,
data_count => open);
ram : clhe_ram2
port map (
clka => clk,
wea => wren_2,
addra => std_logic_vector(ram_wr_addr2),
dina => data_ram,
douta => memout);

process(clk) --Process containing an address counter to read the image in the


--ROM memory sequentialy and compute its histogram or transformed version.
begin
if (CLK'EVENT AND CLK = '1') then --Part to read the source image. Adds zero padding
in the process.
if reset = '1' or pulse_out='0' then
image_addr <= to_unsigned(0, 19);
width_counter <= (others=>'0');

69

70

FPGAImplementationofaContrastEnhancementAlgorithm

device_data <= (others=>'0'); --zeros for zero padding


elsif unsigned(image_addr) >= (unsigned(numpixels)+1) then --if end of the image
image_addr <= image_addr;
width_counter <= width_counter;
device_data <= (others=>'0'); --zeros for zero padding
elsif width_counter = (im_width) then --if end of the row, stop the address
counter increase. Next 2 cycles will
image_addr <= image_addr;
--add zero padding.
width_counter <= width_counter + 1;
device_data <= data_in;
elsif width_counter = (im_width+1) then
image_addr <= image_addr;
width_counter <= width_counter + 1;
device_data <= (others=>'0'); --zero padding
elsif width_counter = (new_width) then -image_addr <= image_addr+1;
--width_counter <= width_counter + 1;
width_counter <= "00000000001"; --reset width counter to continue with image
addresses during the next cycles
device_data <= (others=>'0'); --zero padding
else
image_addr <= image_addr + 1;
width_counter <= width_counter + 1;
device_data <= data_in; --next image value
end if;
pulse_out <= pulse_start_input;
pulse_out2 <= pulse_out;
pulse_out3 <= pulse_out2;
end if;

if (CLK'EVENT AND CLK = '1') then --Part to write the filtered image. It also
discards the extra pixels added in the
if reset = '1' or wren_2="0" then--edges for zero padding, as they contain no
useful information.
ram_wr_addr <= to_unsigned(0, 18);
width_counter2 <= (others=>'0');
data_out <= (others=>'0');
end_flag_signal <= '0';
end_flag <= '0';
elsif unsigned(ram_wr_addr) >= (unsigned(numpixels) - 1) then
ram_wr_addr <= ram_wr_addr;
width_counter2 <= width_counter2;
data_out <= filter_out_2;
end_flag_signal <= '1';
end_flag <= end_flag_signal;
elsif width_counter2 = (im_width) or width_counter2 = (im_width + 1) then
ram_wr_addr <= ram_wr_addr;
width_counter2 <= width_counter2 + 1;
data_out <= filter_out_2;
end_flag <= end_flag_signal;
elsif width_counter2 = (im_width + 2) then
ram_wr_addr <= ram_wr_addr+1;
width_counter2 <= "00000000001";
data_out <= filter_out_2;
end_flag <= end_flag_signal;
else
ram_wr_addr <= ram_wr_addr + 1;
width_counter2 <= width_counter2 + 1;
data_out <= filter_out_2;
end_flag <= end_flag_signal;
end if;
ram_wr_addr2 <= ram_wr_addr;
if reset = '1' then
filter_out_sf1 <=
filter_out_sf2 <=
filter_out_sf3 <=
filter_out_sf4 <=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

filter_out_sf5
filter_out_sf6
filter_out_sf7
filter_out_sf8
filter_out_sf9

<=
<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

filter_out_1_2 <= (others=>'0');


wren_1_2 <= (others=>'0');
else
filter_out_sf1
filter_out_sf2
filter_out_sf3
filter_out_sf4
filter_out_sf5
filter_out_sf6
filter_out_sf7
filter_out_sf8
filter_out_sf9

<=
<=
<=
<=
<=
<=
<=
<=
<=

filter_out_sf;
filter_out_sf1;
filter_out_sf2;
filter_out_sf3;
filter_out_sf4;
filter_out_sf5;
filter_out_sf6;
filter_out_sf7;
filter_out_sf8;

filter_out_1_2 <= filter_out_1;


wren_1_2 <= wren_1;
end if;
end if;
end process;
data_ram <= data_out when (binary2_data = "1") else filter_out_1f;
device_data_2 <= filter_out_1_2 when (binary1_data = "1") else filter_out_sf;
nrst <= not(reset);
new_width <= im_width+2;
new_numpixels <= numpixels+resize(2*numpixels/im_width, 19);

input_ram_addr <= std_logic_vector(image_addr(17 downto 0));


data_in <= clahe_ram_doutb;
dataout <= data_ram;

end bench;

71

72

FPGAImplementationofaContrastEnhancementAlgorithm

[Link]
------------------------------------------------------------------------ Original smooth_filter: Nria Ordua
-- Modified by: Roger Oliv
-- Concordia University
-- 2012-2013
-----------------------------------------------------------------------library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;

entity smooth_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end smooth_filter;

architecture behavior of smooth_filter is

component filter_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(7 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(7 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(8 downto 0));
end component;
------------------------------------------------------------------------- Signal Declarations
------------------------------------------------------------------------

type data_win is array (0 to 4) of unsigned (7 downto 0);


signal row1 : data_win;
signal row2 : data_win;
signal row3 : data_win;
signal row4 : data_win;
signal row5 : data_win;
signal data_in1 : unsigned (7 downto 0);
signal data_out1 : std_logic_vector (7 downto 0);
signal data_in2 : unsigned (7 downto 0);
signal data_out2 : std_logic_vector (7 downto 0);
signal data_in3 : unsigned (7 downto 0);
signal data_out3 : std_logic_vector (7 downto 0);
signal data_in4 : unsigned (7 downto 0);
signal data_out4 : std_logic_vector (7 downto 0);
signal t_setup : unsigned (18 downto 0);
signal activated : std_logic;
signal data_count1, data_count2, data_count3, data_count4 : std_logic_vector(8 downto 0);
signal fifo_size : unsigned(10 downto 0);
signal wr_en, rd_en1, rd_en2, rd_en3, rd_en4, rst : std_logic;

begin
fifo1 : filter_fifo
port map (

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

clk => clk,


rst => rst,
din => std_logic_vector(data_in1),
wr_en => wr_en,
rd_en => rd_en1,
dout => data_out1,
full => open,
empty => open,
data_count => data_count1);
fifo2 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in2),
wr_en => wr_en,
rd_en => rd_en2,
dout => data_out2,
full => open,
empty => open,
data_count => data_count2);
fifo3 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in3),
wr_en => wr_en,
rd_en => rd_en3,
dout => data_out3,
full => open,
empty => open,
data_count => data_count3);
fifo4 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in4),
wr_en => wr_en,
rd_en => rd_en4,
dout => data_out4,
full => open,
empty => open,
data_count => data_count4);

------------------------------------------------------------------------- Module Implementation


------------------------------------------------------------------------

process (clk)
begin
if (clk'event and clk = '1') then
if (clearn = '0') then
t_setup <= (others => '0');
else
if su_flag = '1' and t_setup < (numpixels+im_width*2+3) then
t_setup <= t_setup + 1;
end if;
end if;
end if;
end process;

process (clk)
variable sync_cnt : integer range 0 to 6 := 0;
begin
if (clk'event and clk = '1') then --initialization
if (clearn = '0') then
set_up_flag <= '0';
activated <= '0';
for j in 0 to 4 loop -- fifo reset
row1(j) <= (others => '0');
row2(j) <= (others => '0');
row3(j) <= (others => '0');
row4(j) <= (others => '0');

73

74

FPGAImplementationofaContrastEnhancementAlgorithm

row5(j) <= (others => '0');


end loop;
smooth_filter_out <= (others => '0');
data_in1 <= (others => '0');
data_in2 <= (others => '0');
data_in3 <= (others => '0');
data_in4 <= (others => '0');
rd_en1 <= '0';
rd_en2 <= '0';
rd_en3 <= '0';
rd_en4 <= '0';
elsif su_flag = '1' then --Shifts all the registers and fifo's data one position
row1(0) <= unsigned(smooth_filter_in);
row1(1 to 4) <= row1(0 to 3);
data_in1 <= row1(4);
if (unsigned(data_count1) >= fifo_size) then --Maintain a constant amount of data
in the fifo
rd_en1 <= '1';
--components depending on the image size
row2(0) <= unsigned(data_out1);
else
rd_en1 <= '0';
row2(0) <= (others=>'0');
end if;
row2(1 to 4) <= row2(0 to 3);
data_in2 <= row2(4);
if (unsigned(data_count2) >= fifo_size) then
rd_en2 <= '1';
row3(0) <= unsigned(data_out2);
else
rd_en2 <= '0';
row3(0) <= (others=>'0');
end if;
row3(1 to 4) <= row3(0 to 3);
data_in3 <= row3(4);
if (unsigned(data_count3) >= fifo_size) then
rd_en3 <= '1';
row4(0) <= unsigned(data_out3);
else
rd_en3 <= '0';
row4(0) <= (others=>'0');
end if;
row4(1 to 4) <= row4(0 to 3);
data_in4 <= row4(4);
if (unsigned(data_count4) >= fifo_size) then
rd_en4 <= '1';
row5(0) <= unsigned(data_out4);
else
rd_en4 <= '0';
row5(0) <= (others=>'0');
end if;
row5(1 to 4) <= row5(0 to 3);
if t_setup >= (im_width*2+3) and t_setup < (numpixels+im_width*2+2) then -- +3-1
en realitat
set_up_flag <= '1';
activated <= '1'; --Apply the convolution operation for the current pixel and
its window.
sync_cnt := 0;
-smooth_filter_out <=
std_logic_vector(resize(((("0000000"&row1(1)) +
("000000"&row1(2)&'0') + ("0000000"&row1(3)) + ("0000000"&row2(0)) +
-("00000"&row2(1)&"00") + ("0000000"&row2(1)) +
("0000"&row2(2)&"000") + ("0000000"&row2(2)) +
-("00000"&row2(3)&"00") + ("0000000"&row2(3)) +
("0000000"&row2(4)) + ("000000"&row3(0)&'0') +
-("0000"&row3(1)&"000") + ("0000000"&row3(1)) +
("000"&row3(2)&"0000") + ("0000"&row3(3)&"000")
-+ ("0000000"&row3(3)) + ("000000"&row3(4)&'0') +
("0000000"&row4(0)) +
-("00000"&row4(1)&"00") + ("0000000"&row4(1)) +
("0000"&row4(2)&"000") + ("0000000"&row4(2)) +
-("00000"&row4(3)&"00") + ("0000000"&row4(3)) +
("0000000"&row4(4)) +
-("0000000"&row5(1)) + ("000000"&row5(2)&'0') +
("0000000"&row5(3)))/100),8));

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

smooth_filter_out <=
std_logic_vector(resize(((("0000000"&row2(1)) +
("0000"&row2(2)&"000") + ("0000000"&row2(3)) + ("0000"&row3(1)&"000") + row3(2)*62 +
("0000"&row3(3)&"000")+("0000000"&row4(1)) + ("0000"&row4(2)&"000") +
("0000000"&row4(3)))/100),8));
else
if activated <= '1' then
--if sync_cnt < 6 then
-- sync_cnt := sync_cnt + 1;
--else
--set_up_flag <= '0'; --Finish
--end if;
end if;
end if;
end if;
end if;
end process;
wr_en <= su_flag;
fifo_size <= im_width - 8; --Added a -1 not initially forecasted
rst <= not(clearn);
end behavior;

75

76

FPGAImplementationofaContrastEnhancementAlgorithm

histogram_int3.vhd
--2013/04/20--Forked from the description available at:
--[Link]
---------------------------------------------------------------------------------------------2013/05/01-- The code has been simplified to remove unneeded functionality and make
interfacing easier.
---------------------------------------------------------------------------------------------2013/05/03-- Fixed a bug which caused histogram count increasing to not work properly when
--two or more consecutive pixels with the same exact value appeared.
--2013/05/04-- Comments added for clarity and future reference.
--2013/05/07-- More bugfixes, related to component reset.
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
entity histogram is
port ( addrin : in std_logic_vector(7 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (18 downto 0); -- RAM data out
clk : in std_logic; --Synchronous rising edge clock
cntr_value : in std_logic_vector (18 downto 0); --Number of pixels of the input image
ramwraddr : out std_logic_vector(7 downto 0); --Address where the updated histogram
value must be
--[Link] to the grey value (from 0 to 255)
rst : in std_logic; --global reset
start_cntr : in std_logic; --triggers the start of the histogram calculation
wren : out std_logic; --write enable output for the ram containing the histogram
data_out : out std_logic_vector(18 downto 0); -- RAM data in
end_flag : out std_logic
);
end histogram;
architecture hlsm of histogram is
signal wr_addr, wr_addr1 : std_logic_vector(7 downto 0);
signal pre_cntr, next_cntr, pre_dout, dout : std_logic_vector(18 downto 0); -- count no. of
samples for which histogram to be computed.
signal addr, pre_addr : std_logic_vector(7 downto 0);
signal end_flag_signal, wren_signal, wren_next, wren_next1, wren_next2, addrpreaddr :
std_logic;

begin
addr <= addrin;
process(clk,rst)
begin
if(clk'event and clk = '1') then
if(rst = '1' or start_cntr='1') then --restart all the procedure
pre_cntr <= (others => '0');
wren_next1 <= '0';
wren_next <= '0';
wren_signal <= '0';
wren <= '0';
pre_addr <= (others=>'0');
addrpreaddr <= '0';
wr_addr1 <= (others => '0');
wr_addr <= (others => '0');
end_flag_signal <= '0';
else
pre_cntr <= next_cntr;
wren_next1 <= wren_next2;
wren_next <= wren_next1;--delay write enable changes to sync it
wren <= wren_next;
--with the output of valid values
wren_signal <= wren_next;
pre_addr <= addrin; --store current pixel gray value and its associated
pre_dout <= dout;
--counter for use if next pixel's gray value is equal
if wren_signal='1' and wren_next='0' then
end_flag_signal <= '1';
else

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

end_flag_signal <= end_flag_signal;


end if;
if (addr=pre_addr) then --if the gray value is the same of the
--previous clock, load pre_dout instead of RAM in
addrpreaddr <= '1';
else
addrpreaddr <= '0';
end if;
end if;

wr_addr1 <= addr;


wr_addr <= wr_addr1; -- delay write address by 2 clock
end if;
end process;
process(datain, addrpreaddr, rst, pre_cntr, cntr_value, pre_dout, wr_addr)
begin
if((pre_cntr >= cntr_value)) then --finish if the internal counter reaches the total
next_cntr <= pre_cntr;
--amount of pixels in the photo
wren_next2 <= '0';
else
wren_next2 <= '1';
--else: keep calculating and writing.
next_cntr <= pre_cntr + '1';
end if;
if(rst = '1') then
dout <= (others => '0');
else
if(datain = "111111111111111110") then -- prevent overflow
dout <= datain;
elsif addrpreaddr='1' then --See previous process for addrpreaddr's "if/else"
functionality
dout <= pre_dout + 1; --to avoid not having yet the updated value in the RAM
else
dout <= datain + '1';
end if;
end if;
ramwraddr <= wr_addr;
data_out <= dout;
end process;
end_flag <= end_flag_signal;
end hlsm;

77

78

FPGAImplementationofaContrastEnhancementAlgorithm

histogram_wrapper_int2.vhd
--Adds transformation function to complete CLHE functionality.
--2013/05/04--Comments added for clarity and future reference.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity histogram_wrapper is
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--number of pixels of the image
cntr_value: in std_logic_vector(18 downto 0);
--Trigger to start the histogram generation
pulse_start_input: in std_logic;
--Output of the histogram data
--histogram_out: out std_logic_vector(17 downto 0);
--im_addra : out std_logic_vector(17 downto 0);
--enable : in std_logic;
im_douta : in std_logic_vector(7 downto 0);
histo_dina : out std_logic_vector(18 downto 0);
histo_addra : out std_logic_vector(7 downto 0);
histo_wea : out std_logic_vector(0 downto 0);
histo_addrb : out std_logic_vector(7 downto 0);
histo_doutb : in std_logic_vector(18 downto 0);
--histo_rstb : out std_logic;
end_flag : out std_logic
);
end histogram_wrapper;
architecture wrapper of histogram_wrapper is
signal device_data : std_logic_vector(7 downto 0); --current pixel value
--signal sel_data_input : std_logic; --selector between histogram generation/reading modes
signal ram_wr_addr : std_logic_vector(7 downto 0); --address to be accessed in the RAM
containing the histogram
signal wren : std_logic_vector(0 downto 0); --histogram ram write enable
signal dataout : std_logic_vector(18 downto 0); --output of the histogram counters values
--signal pulse_out, pulse_out_2, pulse_out_3: std_logic;
signal pulse_out, rstb, end_flag_signal : std_logic;

--signal image_addr : std_logic_vector(3 downto 0);


signal image_addr : unsigned(17 downto 0);
signal histo_out : std_logic_vector(18 downto 0);

constant IMAGE_PIXELS : integer :=262143;

component histogram --computes the histogram


port ( addrin : in std_logic_vector(7 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (18 downto 0); -- RAM data out
clk : in std_logic;
cntr_value : in std_logic_vector (18 downto 0);
ramwraddr : out std_logic_vector( 7 downto 0);
--rstcntr : in std_logic;
rst : in std_logic;
start_cntr : in std_logic;
wren : out std_logic;
data_out : out std_logic_vector(18 downto 0); -- RAM data in
end_flag : out std_logic
);
end component;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

begin
--image_rom, histogram_generator and histogram_ram are interconnected according to the design
--principle proposed in:
--[Link]
--However, some functionality was simplified or removed because it was not needed.
histogram_generator : histogram
port map(
addrin => device_data,
datain => histo_out,
clk => clk,
cntr_value => cntr_value,
ramwraddr => ram_wr_addr,
rst => reset,
start_cntr => pulse_out,
wren => wren(0),
data_out => dataout,
end_flag => end_flag_signal
);

process(clk) --Process containing an address counter to read the image in the


--ROM memory sequentialy and compute its histogram
begin
if (CLK'EVENT AND CLK = '1') then
--if reset = '1' or pulse_out='1' then
--image_addr <= to_unsigned(0, 18);
--else
--image_addr <= image_addr + 1;
--end if;
if reset = '1' then
pulse_out <= '0';
else
pulse_out<=pulse_start_input;
end if;
end if;
end process;
--histogram_out<=histo_out;
--rstb <= reset or pulse_out;
--im_addra <= std_logic_vector(image_addr);
device_data <= im_douta;
histo_dina <= dataout;
histo_addra <= ram_wr_addr;
histo_wea <= wren;
--histo_rstb <= rstb;
histo_addrb <= device_data;
histo_out <= histo_doutb;
end_flag <= end_flag_signal;
--end_flag <= '0';
end wrapper;

79

80

FPGAImplementationofaContrastEnhancementAlgorithm

[Link]

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity main is
port (
clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic;
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
clip_limit : in unsigned(6 downto 0);
limit1_t : in unsigned(7 downto 0);
limit1_b : in unsigned(7 downto 0);
limit2_t : in unsigned(7 downto 0);
limit2_b : in unsigned(7 downto 0)
);
end main;
architecture system of main is
signal rom_addrb, binary_addra, rom_addra, clahe_addra, binary_addrb : std_logic_vector(17
downto 0);
signal binary1_dina, binary2_dina, binary_wea, clahe_wea, binary1_doutb, binary2_doutb :
std_logic_vector(0 downto 0);
signal rom_doutb, rom_douta, clahe_dina, clahe_doutb : std_logic_vector(7 downto 0);
signal start_filters, end_flag_masks, end_flag_clahe : std_logic;
signal x_size2 : unsigned(10 downto 0);
component clahe_ram_dual
port (
clka: IN std_logic;
wea: IN std_logic_VECTOR(0 downto 0);
addra: IN std_logic_VECTOR(17 downto 0);
dina: IN std_logic_VECTOR(7 downto 0);
clkb: IN std_logic;
addrb: IN std_logic_VECTOR(17 downto 0);
doutb: OUT std_logic_VECTOR(7 downto 0));
end component;
component prova_grisa_rom3 --contains the source image
port (
clka: IN std_logic;
addra: IN std_logic_VECTOR(17 downto 0);
douta: OUT std_logic_VECTOR(7 downto 0);
clkb: IN std_logic;
addrb: IN std_logic_VECTOR(17 downto 0);
doutb: OUT std_logic_VECTOR(7 downto 0));
end component;

component binary_ram_dual
port (
clka: IN std_logic;
wea: IN std_logic_VECTOR(0 downto 0);
addra: IN std_logic_VECTOR(17 downto 0);
dina: IN std_logic_VECTOR(0 downto 0);
clkb: IN std_logic;
addrb: IN std_logic_VECTOR(17 downto 0);
doutb: OUT std_logic_VECTOR(0 downto 0));
end component;
component clahe is
port (

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

clk : in std_logic;
rst : in std_logic;
start_cntr : in std_logic;
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
clip_limit : in unsigned(6 downto 0);
rom_addra : out std_logic_vector(17 downto 0);
rom_douta : in std_logic_vector(7 downto 0);
ram_wea : out std_logic_vector(0 downto 0);
ram_addra : out std_logic_vector(17 downto 0);
ram_dina : out std_logic_vector(7 downto 0)
);
end component;

component mask_generator --computes the histogram


port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--Trigger to start the generation
pulse_start_input: in std_logic;
limit1_t : in unsigned(7 downto 0);
limit1_b : in unsigned(7 downto 0);
limit2_t : in unsigned(7 downto 0);
limit2_b : in unsigned(7 downto 0);
--Output of the data
--dataout : out std_logic_vector(0 downto 0);
--memout : out std_logic_vector(0 downto 0);
end_flag : out std_logic;
rom_addrb : out std_logic_vector(17 downto 0);
rom_doutb : in std_logic_vector(7 downto 0);
im_width : in unsigned(9 downto 0);
numpixels : in unsigned(18 downto 0);
binary_wea : out std_logic_vector(0 downto 0);
binary_addra : out std_logic_vector(17 downto 0);
binary1_dina : out std_logic_vector(0 downto 0);
binary2_dina : out std_logic_vector(0 downto 0)
);
end component;
component filter_testbench is
port(
--global clock signal, active with its rising edge
clk: in std_logic;
--reset signal, synchronous and active high
reset: in std_logic;
--Trigger to start the generation
pulse_start_input: in std_logic;
--Output of the data
dataout : out std_logic_vector(7 downto 0);
memout : out std_logic_vector(7 downto 0);
end_flag : out std_logic;
--addrb : in std_logic_vector(17 downto 0);
--doutb : out std_logic_vector(7 downto 0);
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0);
--ram_wea : out std_logic_vector(0 downto 0);
--ram_addra : out std_logic_vector(17 downto 0);
--ram_dina : out std_logic_vector(7 downto 0);
b1_doutb : in std_logic_vector(0 downto 0);
b2_doutb : in std_logic_vector(0 downto 0);
input_ram_addr : out std_logic_vector(17 downto 0);
clahe_ram_doutb : in std_logic_vector(7 downto 0)
);
end component;

begin
clahe_generator : clahe

81

82

FPGAImplementationofaContrastEnhancementAlgorithm

port map (
clk => clk,
rst => rst,
start_cntr => start_cntr,
end_flag => end_flag_clahe,
numpixels => numpixels,
x_size => x_size,
y_size => y_size,
clip_limit => clip_limit,
rom_addra => rom_addra,
rom_douta => rom_douta,
ram_wea => clahe_wea,
ram_addra => clahe_addra,
ram_dina => clahe_dina
);
image_rom : prova_grisa_rom3
port map (
clka => clk,
addra => rom_addra,
douta => rom_douta,
clkb => clk,
addrb => rom_addrb,
doutb => rom_doutb);
bram1 : binary_ram_dual
port map (
clka => clk,
wea => binary_wea,
addra => binary_addra,
dina => binary1_dina,
clkb => clk,
addrb => binary_addrb,
doutb => binary1_doutb);
bram2 : binary_ram_dual
port map (
clka => clk,
wea => binary_wea,
addra => binary_addra,
dina => binary2_dina,
clkb => clk,
addrb => binary_addrb,
doutb => binary2_doutb);

binarizer : mask_generator
port map(
clk => clk,
reset => rst,
pulse_start_input => start_cntr,
limit1_t => limit1_t,
limit1_b => limit1_b,
limit2_t => limit2_t,
limit2_b => limit2_b,
end_flag => end_flag_masks,
rom_addrb => rom_addrb,
rom_doutb => rom_doutb,
im_width => x_size,
numpixels => numpixels,
binary_wea => binary_wea,
binary_addra => binary_addra,
binary1_dina => binary1_dina,
binary2_dina => binary2_dina
);
output_clahe : clahe_ram_dual
port map (
clka => clk,
wea => clahe_wea,
addra => clahe_addra,
dina => clahe_dina,
clkb => clk,
addrb => binary_addrb,
doutb => clahe_doutb);

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

filters : filter_testbench
port map(
--global clock signal, active with its rising edge
clk => clk,
--reset signal, synchronous and active high
reset => rst,
--Trigger to start the generation
pulse_start_input => start_filters,
--Output of the data
dataout => open,
memout => open,
end_flag => end_flag,
--addrb : in std_logic_vector(17 downto 0);
--doutb : out std_logic_vector(7 downto 0);
im_width => x_size2,
numpixels => numpixels,
--ram_wea : out std_logic_vector(0 downto 0);
--ram_addra : out std_logic_vector(17 downto 0);
--ram_dina : out std_logic_vector(7 downto 0);
b1_doutb => binary1_doutb,
b2_doutb => binary2_doutb,
input_ram_addr => binary_addrb,
clahe_ram_doutb => clahe_doutb
);

start_filters <= end_flag_clahe and end_flag_masks;


x_size2 <= resize(x_size,11);

end system;

83

84

FPGAImplementationofaContrastEnhancementAlgorithm

median_filter2.vhd
------------------------------------------------------------------------ Contrast enhancement algorithm with noise removal
-- 2nd level of hierarchy - smooth_filter
-- Original: Nria Ordua
-- Modified by: Roger Oliv
-- Concordia University
-- 2012-2013
-----------------------------------------------------------------------library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;

entity median_filter is
port ( clk, clearn : in std_logic;
su_flag : in std_logic;
smooth_filter_in : in std_logic_vector(7 downto 0);
smooth_filter_out : out std_logic_vector (7 downto 0);
set_up_flag : out std_logic;
im_width : in unsigned(10 downto 0);
numpixels : in unsigned(18 downto 0)
);
end median_filter;

architecture behavior of median_filter is

component filter_fifo
port (
clk: IN std_logic;
rst: IN std_logic;
din: IN std_logic_VECTOR(7 downto 0);
wr_en: IN std_logic;
rd_en: IN std_logic;
dout: OUT std_logic_VECTOR(7 downto 0);
full: OUT std_logic;
empty: OUT std_logic;
data_count: OUT std_logic_VECTOR(8 downto 0));
end component;
------------------------------------------------------------------------- Signal Declarations
------------------------------------------------------------------------

type data_win is array (0 to 4) of unsigned (7 downto 0);


type median_vector is array (0 to 7) of unsigned (7 downto 0);
type final_median_vector is array (0 to 2) of unsigned (7 downto 0);
type center_pixel_wait is array (0 to 5) of unsigned (7 downto 0);
signal wc : center_pixel_wait;
signal wf0, wf1, wf2 : final_median_vector;
signal wd0, wd1, wd2, wd3, wd4, wd5, wd6, wo0, wo1, wo2, wo3, wo4, wo5, wo6 : median_vector;
signal wd, wo : unsigned(7 downto 0);
signal row1 : data_win;
signal row2 : data_win;
signal row3 : data_win;
signal row4 : data_win;
signal row5 : data_win;
signal data_in1 : unsigned (7 downto 0);
signal data_out1 : std_logic_vector (7 downto 0);
signal data_in2 : unsigned (7 downto 0);
signal data_out2 : std_logic_vector (7 downto 0);
signal data_in3 : unsigned (7 downto 0);
signal data_out3 : std_logic_vector (7 downto 0);
signal data_in4 : unsigned (7 downto 0);
signal data_out4 : std_logic_vector (7 downto 0);
signal t_setup : unsigned (18 downto 0);
signal activated : std_logic;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

signal data_count1, data_count2, data_count3, data_count4 : std_logic_vector(8 downto 0);


signal fifo_size : unsigned(10 downto 0);
signal wr_en, rd_en1, rd_en2, rd_en3, rd_en4, rst : std_logic;

begin
fifo1 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in1),
wr_en => wr_en,
rd_en => rd_en1,
dout => data_out1,
full => open,
empty => open,
data_count => data_count1);
fifo2 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in2),
wr_en => wr_en,
rd_en => rd_en2,
dout => data_out2,
full => open,
empty => open,
data_count => data_count2);
fifo3 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in3),
wr_en => wr_en,
rd_en => rd_en3,
dout => data_out3,
full => open,
empty => open,
data_count => data_count3);
fifo4 : filter_fifo
port map (
clk => clk,
rst => rst,
din => std_logic_vector(data_in4),
wr_en => wr_en,
rd_en => rd_en4,
dout => data_out4,
full => open,
empty => open,
data_count => data_count4);

------------------------------------------------------------------------- Module Implementation


------------------------------------------------------------------------

process (clk)
begin
if (clk'event and clk = '1') then
if (clearn = '0') then
t_setup <= (others => '0');
else
if su_flag = '1' and t_setup < (numpixels+im_width*2+3+9) then
t_setup <= t_setup + 1;
end if;
end if;
end if;
end process;

process (clk)
variable sync_cnt : integer range 0 to 6 := 0;
begin

85

86

FPGAImplementationofaContrastEnhancementAlgorithm

if (clk'event and clk = '1') then --initialization


if (clearn = '0') then
set_up_flag <= '0';
activated <= '0';
for j in 0 to 4 loop -- fifo reset
row1(j) <= (others => '0');
row2(j) <= (others => '0');
row3(j) <= (others => '0');
row4(j) <= (others => '0');
row5(j) <= (others => '0');
end loop;
--wd <= (others => '0');
--wo <= (others => '0');
for m in 0 to 7 loop
wd6(m) <= (others => '0');
wd1(m) <= (others => '0');
wd2(m) <= (others => '0');
wd3(m) <= (others => '0');
wd4(m) <= (others => '0');
wd5(m) <= (others => '0');
wo6(m)
wo1(m)
wo2(m)
wo3(m)
wo4(m)
wo5(m)

<=
<=
<=
<=
<=
<=

(others
(others
(others
(others
(others
(others

=>
=>
=>
=>
=>
=>

'0');
'0');
'0');
'0');
'0');
'0');

end loop;
for n in 0
wf0(n)
wf1(n)
wf2(n)
end loop;

to
<=
<=
<=

2 loop
(others => '0');
(others => '0');
(others => '0');

for o in 0 to 5 loop
wc(o) <= (others => '0');
end loop;
smooth_filter_out <= (others => '0');
data_in1 <= (others => '0');
data_in2 <= (others => '0');
data_in3 <= (others => '0');
data_in4 <= (others => '0');
rd_en1 <= '0';
rd_en2 <= '0';
rd_en3 <= '0';
rd_en4 <= '0';
elsif su_flag = '1' then --Shifts all the registers and fifo's data one position
row1(0) <= unsigned(smooth_filter_in);
row1(1 to 4) <= row1(0 to 3);
data_in1 <= row1(4);
if (unsigned(data_count1) >= fifo_size) then --Maintain a constant amount of data
in the fifo
rd_en1 <= '1';
--components depending on the image size
row2(0) <= unsigned(data_out1);
else
rd_en1 <= '0';
row2(0) <= (others=>'0');
end if;
row2(1 to 4) <= row2(0 to 3);
data_in2 <= row2(4);
if (unsigned(data_count2) >= fifo_size) then
rd_en2 <= '1';
row3(0) <= unsigned(data_out2);
else
rd_en2 <= '0';
row3(0) <= (others=>'0');
end if;
row3(1 to 4) <= row3(0 to 3);
data_in3 <= row3(4);
if (unsigned(data_count3) >= fifo_size) then
rd_en3 <= '1';
row4(0) <= unsigned(data_out3);
else

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

rd_en3 <= '0';


row4(0) <= (others=>'0');
end if;
row4(1 to 4) <= row4(0 to 3);
data_in4 <= row4(4);
if (unsigned(data_count4) >= fifo_size) then
rd_en4 <= '1';
row5(0) <= unsigned(data_out4);
else
rd_en4 <= '0';
row5(0) <= (others=>'0');
end if;
row5(1 to 4) <= row5(0 to 3);
if t_setup >= (im_width*2+3+9) and t_setup < (numpixels+im_width*2+2+9) then -+3-1 en realitat
set_up_flag <= '1';
activated <= '1'; --Apply the convolution operation for the current pixel and
its window.
sync_cnt := 0;
smooth_filter_out <= std_logic_vector(wf2(1));
else
if activated <= '1' then
--if sync_cnt < 6 then
-- sync_cnt := sync_cnt + 1;
--else
--set_up_flag <= '0'; --Finish
--end if;
end if;

end if;
wc(0) <= row3(2);
for o in 0 to 4 loop
wc(o+1) <= wc(o);
end loop;
--Calculation of the wd kernel median
if (wd0(0)
wd1(0)
wd1(7)
else
wd1(0)
wd1(7)
end if;

>= wd0(7)) then


<= wd0(7);
<= wd0(0);

if (wd0(1)
wd1(1)
wd1(6)
else
wd1(1)
wd1(6)
end if;

>= wd0(6)) then


<= wd0(6);
<= wd0(1);

if (wd0(2)
wd1(2)
wd1(5)
else
wd1(2)
wd1(5)
end if;

>= wd0(5)) then


<= wd0(5);
<= wd0(2);

if (wd0(3)
wd1(3)
wd1(4)
else
wd1(3)
wd1(4)
end if;

>= wd0(4)) then


<= wd0(4);
<= wd0(3);

<= wd0(0);
<= wd0(7);

<= wd0(1);
<= wd0(6);

<= wd0(2);
<= wd0(5);

<= wd0(3);
<= wd0(4);

--Stage 2

87

88

FPGAImplementationofaContrastEnhancementAlgorithm

if (wd1(0)
wd2(0)
wd2(3)
else
wd2(0)
wd2(3)
end if;

>= wd1(3)) then


<= wd1(3);
<= wd1(0);

if (wd1(4)
wd2(4)
wd2(7)
else
wd2(4)
wd2(7)
end if;

>= wd1(7)) then


<= wd1(7);
<= wd1(4);

if (wd1(1)
wd2(1)
wd2(2)
else
wd2(1)
wd2(2)
end if;

>= wd1(2)) then


<= wd1(2);
<= wd1(1);

if (wd1(5)
wd2(5)
wd2(6)
else
wd2(5)
wd2(6)
end if;

>= wd1(6)) then


<= wd1(6);
<= wd1(5);

<= wd1(0);
<= wd1(3);

<= wd1(4);
<= wd1(7);

<= wd1(1);
<= wd1(2);

<= wd1(5);
<= wd1(6);

--Stage 3
if (wd2(0)
wd3(0)
wd3(1)
else
wd3(0)
wd3(1)
end if;

>= wd2(1)) then


<= wd2(1);
<= wd2(0);

if (wd2(2)
wd3(2)
wd3(3)
else
wd3(2)
wd3(3)
end if;

>= wd2(3)) then


<= wd2(3);
<= wd2(2);

if (wd2(4)
wd3(4)
wd3(5)
else
wd3(4)
wd3(5)
end if;

>= wd2(5)) then


<= wd2(5);
<= wd2(4);

if (wd2(6)
wd3(6)
wd3(7)
else
wd3(6)
wd3(7)
end if;

>= wd2(7)) then


<= wd2(7);
<= wd2(6);

<= wd2(0);
<= wd2(1);

<= wd2(2);
<= wd2(3);

<= wd2(4);
<= wd2(5);

<= wd2(6);
<= wd2(7);

--Stage 4
wd4(0)
wd4(1)
wd4(6)
wd4(7)

<=
<=
<=
<=

wd3(0);
wd3(1);
wd3(6);
wd3(7);

if (wd3(2) >= wd3(4)) then


wd4(2) <= wd3(4);
wd4(4) <= wd3(2);

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

else
wd4(2) <= wd3(2);
wd4(4) <= wd3(4);
end if;
if (wd3(3)
wd4(3)
wd4(5)
else
wd4(3)
wd4(5)
end if;

>= wd3(5)) then


<= wd3(5);
<= wd3(3);
<= wd3(3);
<= wd3(5);

--Stage 5
wd5(0) <= wd4(0);
wd5(7) <= wd4(7);
if (wd4(1)
wd5(1)
wd5(2)
else
wd5(1)
wd5(2)
end if;

>= wd4(2)) then


<= wd4(2);
<= wd4(1);

if (wd4(3)
wd5(3)
wd5(4)
else
wd5(3)
wd5(4)
end if;

>= wd4(4)) then


<= wd4(4);
<= wd4(3);

if (wd4(5)
wd5(5)
wd5(6)
else
wd5(5)
wd5(6)
end if;

>= wd4(6)) then


<= wd4(6);
<= wd4(5);

<= wd4(1);
<= wd4(2);

<= wd4(3);
<= wd4(4);

<= wd4(5);
<= wd4(6);

--Stage 6
wd6(0)
wd6(1)
wd6(6)
wd6(7)

<=
<=
<=
<=

wd5(0);
wd5(1);
wd5(6);
wd5(7);

if (wd5(2)
wd6(2)
wd6(3)
else
wd6(2)
wd6(3)
end if;

>= wd5(3)) then


<= wd5(3);
<= wd5(2);

if (wd5(4)
wd6(4)
wd6(5)
else
wd6(4)
wd6(5)
end if;

>= wd5(5)) then


<= wd5(5);
<= wd5(4);

<= wd5(2);
<= wd5(3);

<= wd5(4);
<= wd5(5);

--Calculation of the wo kernel median


if (wo0(0)
wo1(0)
wo1(7)
else
wo1(0)
wo1(7)
end if;

>= wo0(7)) then


<= wo0(7);
<= wo0(0);
<= wo0(0);
<= wo0(7);

89

90

FPGAImplementationofaContrastEnhancementAlgorithm

if (wo0(1)
wo1(1)
wo1(6)
else
wo1(1)
wo1(6)
end if;

>= wo0(6)) then


<= wo0(6);
<= wo0(1);

if (wo0(2)
wo1(2)
wo1(5)
else
wo1(2)
wo1(5)
end if;

>= wo0(5)) then


<= wo0(5);
<= wo0(2);

if (wo0(3)
wo1(3)
wo1(4)
else
wo1(3)
wo1(4)
end if;

>= wo0(4)) then


<= wo0(4);
<= wo0(3);

<= wo0(1);
<= wo0(6);

<= wo0(2);
<= wo0(5);

<= wo0(3);
<= wo0(4);

--Stage 2
if (wo1(0)
wo2(0)
wo2(3)
else
wo2(0)
wo2(3)
end if;

>= wo1(3)) then


<= wo1(3);
<= wo1(0);

if (wo1(4)
wo2(4)
wo2(7)
else
wo2(4)
wo2(7)
end if;

>= wo1(7)) then


<= wo1(7);
<= wo1(4);

if (wo1(1)
wo2(1)
wo2(2)
else
wo2(1)
wo2(2)
end if;

>= wo1(2)) then


<= wo1(2);
<= wo1(1);

if (wo1(5)
wo2(5)
wo2(6)
else
wo2(5)
wo2(6)
end if;

>= wo1(6)) then


<= wo1(6);
<= wo1(5);

<= wo1(0);
<= wo1(3);

<= wo1(4);
<= wo1(7);

<= wo1(1);
<= wo1(2);

<= wo1(5);
<= wo1(6);

--Stage 3
if (wo2(0)
wo3(0)
wo3(1)
else
wo3(0)
wo3(1)
end if;

>= wo2(1)) then


<= wo2(1);
<= wo2(0);

if (wo2(2)
wo3(2)
wo3(3)
else
wo3(2)
wo3(3)
end if;

>= wo2(3)) then


<= wo2(3);
<= wo2(2);

<= wo2(0);
<= wo2(1);

<= wo2(2);
<= wo2(3);

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

if (wo2(4)
wo3(4)
wo3(5)
else
wo3(4)
wo3(5)
end if;

>= wo2(5)) then


<= wo2(5);
<= wo2(4);

if (wo2(6)
wo3(6)
wo3(7)
else
wo3(6)
wo3(7)
end if;

>= wo2(7)) then


<= wo2(7);
<= wo2(6);

<= wo2(4);
<= wo2(5);

<= wo2(6);
<= wo2(7);

--Stage 4
wo4(0)
wo4(1)
wo4(6)
wo4(7)

<=
<=
<=
<=

wo3(0);
wo3(1);
wo3(6);
wo3(7);

if (wo3(2)
wo4(2)
wo4(4)
else
wo4(2)
wo4(4)
end if;

>= wo3(4)) then


<= wo3(4);
<= wo3(2);

if (wo3(3)
wo4(3)
wo4(5)
else
wo4(3)
wo4(5)
end if;

>= wo3(5)) then


<= wo3(5);
<= wo3(3);

<= wo3(2);
<= wo3(4);

<= wo3(3);
<= wo3(5);

--Stage 5
wo5(0) <= wo4(0);
wo5(7) <= wo4(7);
if (wo4(1)
wo5(1)
wo5(2)
else
wo5(1)
wo5(2)
end if;

>= wo4(2)) then


<= wo4(2);
<= wo4(1);

if (wo4(3)
wo5(3)
wo5(4)
else
wo5(3)
wo5(4)
end if;

>= wo4(4)) then


<= wo4(4);
<= wo4(3);

if (wo4(5)
wo5(5)
wo5(6)
else
wo5(5)
wo5(6)
end if;

>= wo4(6)) then


<= wo4(6);
<= wo4(5);

<= wo4(1);
<= wo4(2);

<= wo4(3);
<= wo4(4);

<= wo4(5);
<= wo4(6);

--Stage 6
wo6(0)
wo6(1)
wo6(6)
wo6(7)

<=
<=
<=
<=

wo5(0);
wo5(1);
wo5(6);
wo5(7);

91

92

FPGAImplementationofaContrastEnhancementAlgorithm

if (wo5(2)
wo6(2)
wo6(3)
else
wo6(2)
wo6(3)
end if;

>= wo5(3)) then


<= wo5(3);
<= wo5(2);

if (wo5(4)
wo6(4)
wo6(5)
else
wo6(4)
wo6(5)
end if;

>= wo5(5)) then


<= wo5(5);
<= wo5(4);

<= wo5(2);
<= wo5(3);

<= wo5(4);
<= wo5(5);

--Final median calculation


wf0(0) <= wc(5);
if (wo >= wd)
wf0(1) <=
wf0(2) <=
else
wf0(1) <=
wf0(2) <=
end if;

then
wd;
wo;
wo;
wd;

wf1(2) <= wf0(2);


if (wf0(0)
wf1(0)
wf1(1)
else
wf1(0)
wf1(1)
end if;

>= wf0(1)) then


<= wf0(1);
<= wf0(0);

if (wf1(1)
wf2(2)
wf2(1)
else
wf2(1)
wf2(2)
end if;

>= wf1(2)) then


<= wf1(1);
<= wf1(2);

<= wf0(0);
<= wf0(1);

<= wf1(1);
<= wf1(2);

wf2(0) <= wf1(0);


end if;
end if;
end process;
wr_en <= su_flag;
fifo_size <= im_width - 8; --Afegit un -1 no previst inicialment
rst <= not(clearn);
wd0(0)
wd0(1)
wd0(2)
wd0(3)
wd0(4)
wd0(5)
wd0(6)
wd0(7)

<=
<=
<=
<=
<=
<=
<=
<=

row1(0);
row1(4);
row2(1);
row2(3);
row4(1);
row4(3);
row5(0);
row5(4);

wo0(0)
wo0(1)
wo0(2)
wo0(3)
wo0(4)
wo0(5)
wo0(6)
wo0(7)

<=
<=
<=
<=
<=
<=
<=
<=

row1(2);
row2(2);
row3(0);
row3(1);
row3(3);
row3(4);
row4(2);
row5(2);

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

wo <= resize(((resize(wo6(3),9)+wo6(4))/2),8);
wd <= resize(((resize(wd6(3),9)+wd6(4))/2),8);

end behavior;

93

94

FPGAImplementationofaContrastEnhancementAlgorithm

tiling_int3.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity image_tiling is
port ( romraddr : out std_logic_vector(17 downto 0) ; -- device data as address for RAM
datain : in std_logic_vector (7 downto 0); -- RAM data out
clk : in std_logic;
ramwraddr : out std_logic_vector(17 downto 0);
dataout : out std_logic_vector(7 downto 0); -- RAM data in
numx : out unsigned(2 downto 0);
numy : out unsigned(2 downto 0);
rst : in std_logic;
start_cntr : in std_logic;
wren : out std_logic;
active_flag : out std_logic;
end_flag : out std_logic; --marks the end of the distribution
numpixels : in unsigned(18 downto 0); --number of pixels of the histogram
tile_numpixels : out unsigned(18 downto 0);
x_size : in unsigned(9 downto 0);
y_size : in unsigned(9 downto 0);
xtile : out unsigned(9 downto 0);
ytile : out unsigned(9 downto 0)
);
end image_tiling;
architecture tiler of image_tiling is
--Llista de senyals
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal
signal

x_pos, y_pos, x_pos_calc, y_pos_calc : unsigned(7 downto 0);


x_coord : unsigned(11 downto 0);
y_coord : unsigned(10 downto 0);
x_tile, x_tile1, y_tile : unsigned(7 downto 0);
x_tile2, x_tile3, x_tile4 : unsigned(11 downto 0);
start, pre_start, active : std_logic;
over_num : std_logic_vector(1 downto 0);
num_x : unsigned(3 downto 0);
num_y : unsigned(2 downto 0);
pixel_count, pixel_count2, pixel_count3 : unsigned(18 downto 0);
romraddr_signal : unsigned(17 downto 0);
wren1, wren2, finish, stop : std_logic;
x_condition, y_condition, y_condition1, y_condition2 : std_logic;
numpixels_tile : unsigned(18 downto 0);

--Llista de components?
begin
--Codi codi codi (processos i no processos)
--Process to read all the histogram and calculate the amount of absolute clipping

process(clk)
begin
if (CLK'EVENT AND CLK = '1') then --calculate tile size
if rst = '1' then
x_tile <= (others=>'0');
y_tile <= (others=>'0');
start <= '0';
pre_start <= '0';
else
x_tile <= resize(shift_right((x_size),3)+1,8);
--x_tile <= x_tile1;
y_tile <= resize(shift_right((y_size),3)+1,8);
start <= start_cntr and not(finish);
pre_start <= start;
end if;
end if;
if (CLK'EVENT AND CLK = '1') then

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

if rst = '1' then


active <= '0';
num_x <= (others =>'0');
num_y <= (others =>'0');
elsif start='1' and pre_start = '0' then
active <= '1';
if num_x = 8 then
num_x <= to_unsigned(1,4);
if num_y = 7 then
num_y <= to_unsigned(7,3);
else
num_y <= num_y+1;
end if;
else
num_x <= num_x+1;
end if;
elsif stop='1' then
active <= '0';
num_x <= num_x;
num_y <= num_y;
else
active <= active;
num_x <= num_x;
num_y <= num_y;
end if;
end if;
if (CLK'EVENT AND CLK = '1') then --romraddr section
if rst = '1' then
romraddr_signal <= (others=>'0');
dataout <= (others=>'0');
ramwraddr <= (others=>'0');
wren <= '0';
numpixels_tile <= (others=>'0');
--pre_active <= '0';
else
--romraddr <= x_size*(y_pos wren+ y_tile*num_y) + x_pos + num_x*x_tile;
romraddr_signal <= (resize((x_size*y_coord + x_coord),18));
dataout <= datain;
ramwraddr <= std_logic_vector(pixel_count3(17 downto 0));
wren <= wren1;
if active='1' then
numpixels_tile <= pixel_count;
else
numpixels_tile <= numpixels_tile;
end if;
--pre_active <= active;
end if;
end if;
if (CLK'EVENT AND CLK = '1') then --x_pos and y_pos section
if rst = '1' or active = '0' then
pixel_count <= (others => '0');
pixel_count2 <= (others => '0');
--pixel_count3 <= (others => '0');
x_pos <= (others => '0');
y_pos <= (others => '0');
finish <= '0';
--wren1 <= wren2; --S'hauria d'arreglar per evitar glitxos. Que amb el reset
si que es posi a zero.
wren2 <= '0';
else
pixel_count <= pixel_count + 1;
pixel_count2 <= pixel_count;
--wren1 <= wren2;
wren2 <= '1';
if (x_condition='1') then
x_pos <= (others=>'0');
if (y_condition2 = '1' or y_condition1 = '1') then
y_pos <= (others=>'0');
--stop <= '1';
else
y_pos <= y_pos_calc;
--stop <= '0';

95

96

FPGAImplementationofaContrastEnhancementAlgorithm

end if;
--if (y_condition1 = '1') then
--finish <= '1';
--else
--finish <= '0';
--end if;
else
--finish <= '0';
--stop <= '0';
x_pos <= x_pos_calc;
y_pos <= y_pos;
end if;
end if;
if rst='1' then
finish<='0';
else
if romraddr_signal>= (numpixels-1) then
finish<='1';
else
finish<=finish;
end if;
end if;

if rst = '1' then


wren1 <= '0'; --S'hauria d'arreglar per evitar glitxos. Que amb el reset si
que es posi a zero.
else
wren1 <= wren2;
end if;

pixel_count3 <= pixel_count2;


end if;

end process;
x_pos_calc <= x_pos + 1;
y_pos_calc <= y_pos + 1;
x_coord <= x_pos + (num_x-1)*x_tile;
y_coord <= y_pos + num_y*y_tile;
x_condition <= '1' when (x_pos_calc >= x_tile or x_coord >= (x_size-1))
else '0';
y_condition1 <= '1' when (y_coord >= (y_size-1))
else '0';
y_condition2 <= '1' when (y_pos_calc >= y_tile)
else '0';
stop <= '1' when (x_condition = '1' and (y_condition1 = '1' or y_condition2='1'))
else '0';
active_flag <= active;
numx <= resize((num_x-1),3);
numy <= num_y;
tile_numpixels <= numpixels_tile;
romraddr <= std_logic_vector(romraddr_signal);
end_flag <= finish;
xtile <= resize(x_tile,10);
ytile <= resize(y_tile,10);
--debug
--x_tile1
--x_tile2
--x_tile3
--x_tile4

<=
<=
<=
<=

resize(x_tile2,8);
shift_right(x_tile3,2);
shift_left((x_tile4+4),2)/8;
resize(x_size,12);

end tiler;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

transform_interp17.vhd
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity histogram_equalizer is
port ( histo_raddr : out std_logic_vector(7 downto 0) ; -- device data as address for
RAM
histo_in_ul : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_ur : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_ll : in std_logic_vector (18 downto 0); -- histogram CDF value
histo_in_lr : in std_logic_vector (18 downto 0); -- histogram CDF value
rom_raddr : out std_logic_vector(17 downto 0); --image pixel address
rom_in : std_logic_vector(7 downto 0);--image pixel value
clk : in std_logic;
clhe_wraddr : out std_logic_vector( 17 downto 0); --Address for the transformed pixel
to write
rst : in std_logic;
start_cntr : in std_logic; --Triggers the transformation operations
wren : out std_logic;
clhe_out : out std_logic_vector(7 downto 0); -- Output for transformed pixel value
end_flag : out std_logic; --marks the end of the histogram calculation
numpixels : in unsigned(18 downto 0);
im_width : in unsigned(9 downto 0);
im_height : in unsigned(9 downto 0);
x_size : in unsigned(9 downto 0);--Subimage
y_size : in unsigned(9 downto 0);--Subimage
ul_id : out unsigned(7 downto 0);
ur_id : out unsigned(7 downto 0);
ll_id : out unsigned(7 downto 0);
lr_id : out unsigned(7 downto 0);
numpixels_ul : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ul : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_ur : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ur : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_ll : in unsigned(18 downto 0); --number of pixels of the image
histo_min_ll : in unsigned(18 downto 0); --Lowest CDF value of the histogram
numpixels_lr : in unsigned(18 downto 0); --number of pixels of the image
histo_min_lr : in unsigned(18 downto 0) --Lowest CDF value of the histogram
);
end histogram_equalizer;
architecture transformation of histogram_equalizer is
signal ramwraddru1, ramwraddru2, ramwraddru3, ramwraddru4, im_raddr : unsigned(17 downto 0);
signal pixel, pixel_pre, transformed_ul, transformed_ur, transformed_ll, transformed_lr,
ul_id_pre : unsigned(7 downto 0);
signal start_add, start, pre_wren, pre_wren2, pre_wren3, pre_wren4, wren1 : std_logic;
signal x_pos, x_pos_pre1, x_pos_pre2, x_pos_pre3, x_pos_pre4, x_pos_pre5, y_pos, y_pos_pre1,
y_pos_pre2, y_pos_pre3, y_pos_pre4, y_pos_pre5 : signed(13 downto 0);
signal x_ref, x1, x1_pre1, x1_pre2, x2, x2_pre1, x2_pre2, y_ref, y1, y1_pre1, y1_pre2, y2,
y2_pre1, y2_pre2 : signed(13 downto 0);
signal num_x, num_x1, num_x2, num_y, num_y1, num_y2 : unsigned(3 downto 0);
signal numpixels2 : unsigned(18 downto 0);
signal numpixels_ul1, numpixels_ul2, histo_min_ul1, histo_min_ul2, numpixels_ur1,
numpixels_ur2, histo_min_ur1, histo_min_ur2, numpixels_ll1, numpixels_ll2, histo_min_ll1,
histo_min_ll2, numpixels_lr1, numpixels_lr2, histo_min_lr1, histo_min_lr2 : unsigned(18
downto 0);
begin

process(clk)
begin
if (CLK'EVENT AND CLK = '1') then
if rst = '1' then
start_add <='0';
else
start_add <= start;
end if;
------------------------------beginning of reading block-------------------------------------------------------if rst = '1' or start='1' then --Initialization of variables

97

98

FPGAImplementationofaContrastEnhancementAlgorithm

im_raddr <= to_unsigned(0, 18);


clhe_wraddr <= (others => '0');
ramwraddru1 <= to_unsigned(0, 18);
ramwraddru2 <= to_unsigned(0, 18);
ramwraddru3 <= to_unsigned(0, 18);
ramwraddru4 <= to_unsigned(0, 18);
pixel <= to_unsigned(0, 8);
pixel_pre <= to_unsigned(0,8);
pre_wren <= '0';
pre_wren2 <= '0';
pre_wren3 <= '0';
pre_wren4 <= '0';
wren1 <= '0';
x_pos <= (others=>'0');
y_pos <= (others=>'0');
num_x <= (others=>'0');
num_y <= (others=>'0');
end_flag <= '0';
elsif unsigned(im_raddr) >= (numpixels2) then --Don't let it keep writing if the
whole memory has been sweeped.
im_raddr <= numpixels2(17 downto 0);
ramwraddru4 <= im_raddr;
ramwraddru3 <= ramwraddru4;
ramwraddru2 <= ramwraddru3; --align the write addresses and write enable with
the output data
ramwraddru1 <= ramwraddru2;
clhe_wraddr <= std_logic_vector(ramwraddru1);
--clhe_wraddr <= std_logic_vector(im_raddr);
pixel <= unsigned(rom_in);
pixel_pre <= pixel;
pre_wren4 <= '0';
pre_wren3 <= pre_wren4;
pre_wren2 <= pre_wren3;
pre_wren <= pre_wren2;
wren1 <= pre_wren;
x_pos <= x_pos;
y_pos <= y_pos;
num_x <= num_x;
num_y <= num_y;
end_flag <= '1';
else
im_raddr <= im_raddr + 1; --Sweep the addresses
ramwraddru4 <= im_raddr;
ramwraddru3 <= ramwraddru4;
ramwraddru2 <= ramwraddru3; --align the write addresses and write enable with
the output data
ramwraddru1 <= ramwraddru2;
--clhe_wraddr <= std_logic_vector(im_raddr);
clhe_wraddr <= std_logic_vector(ramwraddru1);
pixel <= unsigned(rom_in);
pixel_pre <= pixel;
pre_wren4 <= '1';
pre_wren3 <= '1';
pre_wren2 <= pre_wren3;
pre_wren <= pre_wren2;
wren1 <= pre_wren;
if unsigned(x_pos) < (im_width-1) then
x_pos <= x_pos + 1;
y_pos <= y_pos;
else
x_pos <= (others=>'0');
y_pos <= y_pos + 1;
end if;
if unsigned(y_pos_pre1) >=
(shift_right(shift_left(((resize(y_size,11)/2)+1),1),1) + y_size*num_y-1) then
num_y <= num_y + 1;
else
num_y <= num_y;
end if;
if unsigned(x_pos_pre2) >= (im_width-1) then
num_x <= (others=>'0');
elsif unsigned(x_pos) >=
(shift_right(shift_left(((resize(x_size,11)/2)+1),1),1)+num_x*x_size) then
num_x <= num_x+1;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

else
num_x <= num_x;
end if;
end_flag <= '0';
end if;
if rst='1' then
y_pos_pre1 <=
y_pos_pre2 <=
y_pos_pre3 <=
y_pos_pre4 <=
y_pos_pre5 <=
x_pos_pre1 <=
x_pos_pre2 <=
x_pos_pre3 <=
x_pos_pre4 <=
x_pos_pre5 <=

x1_pre1
x1_pre2
x2_pre1
x2_pre2
y1_pre1
y1_pre2
y2_pre1
y2_pre2
num_x1<=
num_x2<=
num_y1<=
num_y2<=

<=
<=
<=
<=
<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

transformed_ul
transformed_ur
transformed_ll
transformed_lr

<=
<=
<=
<=

(others=>'0');
(others=>'0');
(others=>'0');
(others=>'0');

else
y_pos_pre1 <= y_pos;
y_pos_pre2 <= y_pos_pre1;
y_pos_pre3 <= y_pos_pre2;
y_pos_pre4 <= y_pos_pre3;
y_pos_pre5 <= y_pos_pre4;
x_pos_pre1 <= x_pos;
x_pos_pre2 <= x_pos_pre1;
x_pos_pre3 <= x_pos_pre2;
x_pos_pre4 <= x_pos_pre3;
x_pos_pre5<= x_pos_pre4;

x1_pre1
x1_pre2
x2_pre1
x2_pre2
y1_pre1
y1_pre2
y2_pre1
y2_pre2
num_x1<=
num_x2<=
num_y1<=
num_y2<=

<=
<=
<=
<=
<=
<=
<=
<=

x1;
x1_pre1;
x2;
x2_pre1;
y1;
y1_pre1;
y2;
y2_pre1;

num_x;
num_x1;
num_y;
num_y1;

transformed_ul <= resize(((unsigned(histo_in_ul)unsigned(histo_min_ul))*to_unsigned(255,8))/(unsigned(numpixels_ul)-unsigned(histo_min_ul)),


8);
transformed_ur <= resize(((unsigned(histo_in_ur)unsigned(histo_min_ur))*to_unsigned(255,8))/(unsigned(numpixels_ur)-unsigned(histo_min_ur)),
8);
transformed_ll <= resize(((unsigned(histo_in_ll)unsigned(histo_min_ll))*to_unsigned(255,8))/(unsigned(numpixels_ll)-unsigned(histo_min_ll)),
8);

99

100

FPGAImplementationofaContrastEnhancementAlgorithm

transformed_lr <= resize(((unsigned(histo_in_lr)unsigned(histo_min_lr))*to_unsigned(255,8))/(unsigned(numpixels_lr)-unsigned(histo_min_lr)),


8);
end if;

----------------------------end of reading block, beginning of writing block---------------------------------------if (start_add='1' or rst='1') then


clhe_out <= (others=>'0');
else --Transforming routine
clhe_out <= std_logic_vector(resize((transformed_ul*(unsigned(x2_pre1x_pos_pre3))*(unsigned(y2_pre1-y_pos_pre3))+transformed_ur*(unsigned(x_pos_pre3x1_pre1))*(unsigned(y2_pre1-y_pos_pre3))+transformed_ll*(unsigned(x2_pre1x_pos_pre3))*(unsigned(y_pos_pre3-y1_pre1)) + transformed_lr*(unsigned(x_pos_pre3x1_pre1))*(unsigned(y_pos_pre3-y1_pre1)))/(x_size*y_size),8));
end if;
if rst='1' or start_cntr='1' then
start <= '1';
else
start <= '0';
end if;

end if;
end process;
x1 <= x_ref-signed(x_size)-1;
x2 <= x_ref;
x_ref <=
signed(shift_right(((shift_left((resize(x_size,11)),1)/2)+1),1))+signed(x_size*num_x);
y_ref <=
signed(shift_right(((shift_left((resize(y_size,11)),1)/2)+1),1))+signed(y_size*num_y);
y1 <= y_ref-signed(y_size)-1;
y2 <= y_ref;
ul_id_pre <= num_x+num_y*10;
ul_id <= ul_id_pre;
ur_id <= ul_id_pre + 1;
ll_id <= ul_id_pre + 10;
lr_id <= ul_id_pre + 11;
histo_raddr <= std_logic_vector(pixel);
--Computation of the equalized pixel
rom_raddr <= std_logic_vector(im_raddr);
wren <= wren1;
numpixels2 <= numpixels-1;

end transformation;

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

C. Modelsimsimulations
Globaltimeline:mainentityview

101

102

FPGAImplementationofaContrastEnhancementAlgorithm

Inthisfigurethesignalscorrespondingtothetoplevelentityareshown:theinputparameters,whichare
constant,[Link]
accessaredistinguishable:thesorteronesatthebeginningareaccessesbythebinarymaskgenerationblock,
[Link]
controlledbythefilteringblock.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Globaltimeline:CLAHEblock

103

104

FPGAImplementationofaContrastEnhancementAlgorithm

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

In the first capture the signals are mostly memory accesses/writes. 2 main phases of the CLAHE
computation are visible here: all the computation needed to have ready a transformation function and the
[Link],thecontrolsignalsofthetilecomputationarevisibleas
well:onecanapreciatethestartimpulsesofeachtileandCLAHEsubblock,aswellastheendflagsandthe
signalsthatstorewhichisthetilethatisbeingcomputedatthemoment.

105

106

FPGAImplementationofaContrastEnhancementAlgorithm

Globaltimelineview:binarymaskgeneration

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

[Link]
inputs and outputs of the mask correction block can be seen, both memories and parameters, as well as
recalculatedimagesizesfortheadditionofzeropadding.

107

108

FPGAImplementationofaContrastEnhancementAlgorithm

Globaltimeline:filterblockview

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

Thispart,thefilteringblock,ismoreorlesslikethepreviousonebecauseitconsistsbasicallyinthesame
kind of windowing implementation, with different operations. The difference is that there are more filters,
whicharepipelined,hencethepresenceofmoresignalswithhighactivitythattransporttheimagebeforeand
aftereachstage,[Link],assaidinmain,[Link]
signalthatisred(unknownstatus)duringmostofthetimeistheoutputofthefirstaddressofthememory
that stores the CLAHE image. It is not problematic because when the CLAHE is computed the image is
overwrittenwiththedesiredknownvalueandtheunspecifiedvaluedoesnotimpactanypartofthefiltering
system.

109

110

FPGAImplementationofaContrastEnhancementAlgorithm

7. References

[1]

J. B. Zimmerman, S. M. Pizer, E. V. Staab, J. R. Perry, W. Mccartney and B. C. Brenton, "An


EvaluationoftheEffectivenessofAdaptiveHistogramEqualizationforContrastEnhancement,"IEEE
TransactionsonMedicalImaging,vol.7,no.4,pp.3043012,1988.

[2]

[Link],GraphicsGemsIV,SanDiego:AcademicPress,1994.

[3]

B. Nahar, "Contrast enhancement with the noise removal by a discriminative filtering process,"
ConcordiaUniversity,Montreal,2012.

[4]

[Link],[Link],[Link],[Link],[Link],[Link],[Link]
S. M. Pizer, "Contrast Limited Adaptive Histogram Equalization Image Processing to Improve the
DetectionofSimulatedSpiculationsinDenseMammograms,"JournalofDigitalImaging,vol.11,no.
4,pp.193200,1998.

[5]

T. Acharya and A. K. Ray, "Enhancement of Chest Radiographs Using Gradient Operators," in


ImageprocessingPrinciplesandapplications,Hoboken,Wiley,2005,p.270.

[6]

R. C. Gonzalez and R. E. Woods, Digital Image Processing, Third ed., Upper Saddle River, New
Jersey:PearsonPrenticeHall,2008.

[7]

[Link],"DetailPreservingRankedOrderBasedFiltersforImageProcessing,"
IEEETransactionsonacoustics,Speech,andSignalProcessing,vol.37,no.1,pp.8398,1989.

[8]

"Virtex

FPGA

ML605

Evaluation

Kit,"

Xilinx,

2013.

[Online].

Available:

[Link]
[9]

[Link],"ComputeahistograminanFPGAwithoneclock ElectronicsDesignNetwork(EDN),"
UBM Tech, 3 February 2011. [Online]. Available: [Link]
design/4363979/ComputeahistograminanFPGAwithoneclock.[AccessedApril2013].

[10]

A. E. Nelson, "Implementation of Image Processing Algorithms on FPGA Hardware," Vanderbilt


University,Nashville,2000.

[11]

N. Ordua Just, Estudi, modelaci en Matlab i sntesi sobre FPGA d'un sistema de detecci de
contornsperaimatgesHDR,UniversitatpolitcnicadeCatalunya,Barcelona,2012.

ConcordiaUniversityandUniversitatPolitcnicadeCatalunya
FPGAImplementationofaContrastEnhancementAlgorithm

[12]

D. E. Knuth, The Art of Computer Programming, Second ed., vol. 3: Sorting and Searching,
Stanford:AddisonWesleyLongman,1998.

[13]

R. Zeno, "A reference of the bestknown sorting networks for up to 16 inputs," 11 May 2002.
[Online]. Available: [Link]
[AccessedMay2013].

[14]

T. Felty, "BMPtoCOE Matlab Central," Mathworks, 28 September 2008. [Online]. Available:


[Link]
[AccessedApril2013].

111

You might also like