|
|
|
|
. Character Recognition .
06/07/06 02:10 [Thursday]
I have been preparing - in my mind and in a diagram (Fig. 1 below) - an explanation of my ideas about the way neural structures recognise patterns.

Fig. 1(a) on the left shows a computer model of a structure of neurons, and Fig. 1(b) more the way it would be embodied in a living organism. The red spots are to be thought of as terminations in a pre-processing part of the brain of dendritic links from sense organs. You could think of the array of red spots as cells in the retina. The green spots represent the same thing stored up from a past occasion. You could think of the array of green spots as a memory trace (from one particular sense organ, eg a recording of the state of the retina on some past occasion).
What I allege happens is the signal at a red spot - the excitement of a cell in the retina - is transmitted outwards to surrounding green spots, with decay as the signal travels further. (In Fig. 1(b) I have shown an indication of three linking dendrites between present excitement and past excitement.) The green spots receiving the signal respond according to
(1) the signal received (which depends on the excitement of the sense organ at that particular spot and inversely on the dissimilarity in position between red spot and green spot)
(2) the strength of the memory trace - how excited the green spot was at the time the memory trace was recorded.
In the computer model - Fig. 1(a) - the ‘dissimilarity in position’ can be taken to be the distance apart of the red cell and the green cell in question. In the program I have been writing recently the ‘excitement’ of the cells is taken to be either black or white (for purposes of character recognition), and the response of a green cell with a memory of past black excitement to a present black signal in a red cell is taken to be
distancemeasure = Exp(-distanceparam * ((a1 - b1) * (a1 - b1) + (a2 - b2) * (a2 - b2)))
where the red cell is at (a1,a2) and the green cell is at (b1,b2).
distanceparam is a parameter controlling the decay of the signal as it gets further from the site of present excitement. In fact I have found it useful to have two different distancemeasures, corresponding to a low distanceparam giving more significance to the general overall shape of the characters being compared and to a high distanceparam giving more significance to local dissimilarities (but I am still looking into the matter).
The overall response of the green array corresponding to each memory trace is calculated (simply by adding up the distancemeasure for all pairings of red cells and green cells, although I am finding it [as I say] necessary further to look into subtracting a factor for local dissimilarities) and the memory trace corresponding to the highest responding green array (the ‘best match’) is selected. This works a lot of the time in recognising characters.
The reason I am using the above formula for distancemeasure is that the obvious measure of dissimilarity in position is the Cartesian distance, and to convert to the exponential form without (say) taking the square root is easy and has the advantages of the ‘bell curve’ shape of the graph. That is the response is reasonable up to a certain dissimilarity in position and then rapidly falls off. However there are considerations involving the relationship between the number of distancemeasures being added up (which varies as the square of the number of red and green cells) and the sort of value each distancemeasure takes in relation to the dissimilarity in position (the number of cells in a given area varying as the square of the Cartesian radius from the central cell).
Increasing the resolution - by increasing the number of red and green cells so that each one corresponds more closely to a signal from the sense organ at a particular position - would increase the response as the square of the cell density. On the other hand there must be a discounting effect - eg from inhibitory neurons or from habituation - as those of us with higher resolution do not necessarily respond more.
06/07/06 10:31 Initial results

Screenshot1 shows a seemingly irretrievable flaw with scale1=800 (too big evidently). (scale1 scales the measure of local mismatching and sets which black cells are counted as mismatches.)

Screenshot2 shows there are 15 mismatches when scale1=800.

Screenshot3 shows a situation seemingly equivalent to Screenshot1, but subtly different in that the mismatched black cells are in charA on the left.

Screenshot4 shows a clear case where the only real difference between charB on the right (up for recognition) and charA on the left (the best match found) is the extra black element in charA.
06/07/06 11:00 Thoughts on reducing the count of mis-recognitions
The reason the mistake in Screenshot1 arises is that the “B” which is there mismatched by an “8” has thicker lines - more blackness - than the stored exemplar of “B” which ideally it would have matched, and consequently there are unavoidably more local mismatches (because of two black cells to be recognised by one in the exemplar one must mismatch slightly). One way out of this would be to randomly decimate characters with too much blackness, to reduce all characters to much the same number of black cells. Alternatively all lines in characters could be thinned to a thickness of one black cell. The problem with the latter is slightly narrower drawn characters would not match their equivalents drawn slightly stretched out.
I shall try getting rid of the code I added to give more weight to the number of local mismatches as against making gross local mismatches have a major impact.
… This does not change the result at all. (I may as well get rid of the code involving scale2 completely then, to speed things up.)
Another solution which occurs to me is to arrange some interaction between local mismatches in charB (that is black cells in charB which do not closely correspond to black cells in charA) so that they discount each other. If a lot of black cells in charB in a cluster mismatch, then if it is a gross mismatch reducing their effective number will not spoil things, but if they are all only slight mismatches possibly they should not all count additively.
… Having stimulated myself with coffee, I now realise the following:
If there are nearby mismatches in charB which are not gross mismatches, it follows there are nearby black cells in charA. Therefore the way to proceed is fiddle with distanceparam1 to make the tight localisation of local mismatches less. (All this is saying is that reducing distanceparam1 means the thickness of lines in charB is of less consequence as each black cell is more spread out in its effect. This gives a less ad hoc solution, as indeed does getting rid of scale2 and associated code.)
06/07/06 15:44

Fifteen mismatches is the best result I can achieve without some solution involving thinning characters which are too black. All the mismatches of the fifteen consist of charA (the mistaken best match) having extra black elements compared to charB, except the one case (Mismatch 14) where charB is a “B” which has very thick lines. The result is that its potential match in another “B” exemplar (in stored experience) leaves a lot of the thick “B” cells locally mismatched so that the latter better matches an “8” which is almost wholly enclosed by the “B”. There are no stored exemplars of which the thick “B” is a partial element, and as I say the “B” it should match is considerable thinner and does not correspond well in its overall shape.
… What I could try is: if the number of black cells in one of the characters being compared exceeds the number in the other by more than (say) 20%, I could randomly augment the deficient one until the difference is less than (say) 20%.
… As projected, I have amended the prog Read CharfileChars to make up the blackness of charA if required to within (now) 15% of charB with a random accretion of black cells. This processing slows things down, so one does not want to make Apercentage too small. 15% seems OK, and actually improves the mismatch count in the present case to 14 from 15 (although the one additional success is not the thick “B” which motivated the amendment). The reason this thick “B” is still being mismatched is that the only exemplar “B” it should match (see Mismatch 15) is a poor exemplar, not only thin but thin in such a way as to be mis-shapen.
[Note of tidying-up to be done. I note: similaritymeasure1 is used to estimate similarity between char1 (read in from file DocumentChars) and stored exemplars from memory; at present similaritymeasure1 does not blacken charA if required in the way similaritymeasure does. Also: if charA is blacker than charB no corrective pre-processing is done; and locally mismatched black cells in charA play no part (eg making a subtractive contribution to the measure of similarity in the way locally mismatched black cells in charB do). What I need to do is alter the contribution from locally mismatched black cells in charA and as part of this alter the learning process which leads to changes in the stored charA’s.]
19/07/06 13:22 [Wednesday] Explanation of the contribution from local mismatches
To explain my further investigations: I use two different distanceparams, corresponding on the one hand to a greater significance of long-range measures between black cells in two characters being compared and on the other hand to a greater significance of pairs of close-by black cells. The first gives a measure of similarity based on overall shape, while the second gives a series of measures of short-range discrepancies between subparts of the characters (which can be added up to give an overall measure of localised mismatching). The expression
calcsimilarity = simAB - contAtot - contBtot
gives the similarity measure between charA and charB as adjusted, where contAtot is the total localised mismatching of charA compared with charB and contBtot the total localised mismatching of charB compared to charA, and simAB is the similarity measure between charA and charB based on overall shape.
Because the value of distanceparam for localised mismatching gives widely varying values for contAtot and contBtot by virtue of giving large significance to close-by cells, the logarithm - suitably scaled - is taken in producing the values contAtot and contBtot.
Referring to Mismatch 5 then [see above], subtracting the contribution contAtot (corresponding to localised mismatching between charA on the left and charB on the right) disposes of the mismatch by virtue of the right arm of the letter “N” (charA) completely mismatching charB on a localised basis.
Explanation of the basis of the learning algorithm
The measure of local mismatch for each black cell in charA [on the right in the displays] can be used (by devising a threshold) to divide the black cells of charA into those which are matches and those which are mismatches. By remembering which black cells of the various charA’s (in CharfileChars) have been mismatched (in fact by counting them up additively) experience can be stored up on the black cells whose mismatching is of greater significance and the program can learn to pay more attention to these (that is, to enhance their subtractive contribution to similaritymeasure). A possible future development is to treat the patterns of mismatched black cells from charA in each comparison as ‘symbols’ in their own right available for processing. In effect it might be possible to divide off sub-components of characters as sort of features.
18/07/06 08:07 [Tuesday] Note of progress
I have done what I said, viz written in the blackening of charB if it is deficient as well as the already existing blackening of charA. (I have found it necessary to limit the amount of blackening which is done because too much random blackening distorts the shape of the character.) I have also altered the code to bring similaritymeasure1 into line with similaritymeasure. The number of mismatches using the same file CharfileChars as before has gone down to 5.

04/09/06 03:14 [Monday] Explanation of ‘cogitation’
In the cogitation procedure each character in the store CharfileChars is taken and an attempt is made to recognise it using the rest of the characters in the store. To the extent that the character is mis-recognised, learning will occur through the learning algorithm referred to earlier. By ‘cogitating’ on its store of experience - that is by making comparisons of each stored character with others - the prog can learn.
19/07/06 11:10 [Wednesday] Current work in character recognition
I need to do something about the number of cogitations allowed on each character in CharfileChars (ie cogitationslimit) and the amount of alteration allowed to each black cell (the latter embodied in limit100, a value of 7 [later revised] seeming right: see below). As the integer for each black cell found to be mismatched [counting the number of times it has been found to mis-match] increases, the balance is changed (in all comparisons involving altered characters [characters altered by learning]) between the subtractive contribution of local mismatches due to cells having an increased integer value - those found in the past to be mismatches - and that due to other cells.
As regards cogitationslimit, if and when new input becomes available (a new DocumentChars [from a new document]) possibly fresh cogitation on existing characters should be newly allowed. Whereas cogitation over and over on the same sample of characters becomes fruitless. However, I suspect in nature the amount of change allowed to existing structures in memory becomes less with the passage of time, corresponding to old (tried and tested) experience becoming unalterable.
16/08/06 13:08 [Wednesday] More
The latest additional processing the prog Read CharfileChars does allows it to learn from locally mismatched black cells in charB (in effect). The locally mismatched black cells in charB in each cogitation are remembered (by accumulating a count) in the corresponding white cells in charA. In future comparisons using this charA if the relevant white cells are mismatched to black cells a subtractive contribution depending on the count in the charA white cells is made. The effect of this is to make less probable a future mis-recognition of this charA as a previous mismatched charB or any very like it.
04/09/06 03:28 [Monday] Description of the measure of similarity as it currently stands
The measure of similarity has been refined to
calcsimilarity = simAB - (contAtot + contBtot) - (contAtot_adj + contAwhitetot) - adjcluster
The term (contAtot + contBtot) is a measure of total local mismatching of black cells in charA plus black cells in charB ignoring the effect of learning.
The term (contAtot_adj + contAwhitetot) is the contribution of learning, that is the number of times previously each black cell which has been locally mismatched has been mismatched, made available through contAtot_adj (black cells in charA) and contAwhitetot (black cells in charB corresponding to marked white cells in charA - marked through learning that is).
The final term adjcluster is an adjustment according to how clustered the locally mismatched cells (both black and white) are, as scatterings of local mismatches are of less significance.
08/10/06 11:36 [Sunday] Method for a block of mistakenly conjoined characters
If the recognition sub-program comes upon a block of contiguous black cells seemingly text with an over-wide aspect ratio, on the presumption that it will be several adjacent characters mistakenly running together it proceeds as follows: it compares the first portion (the left-most) of the block with exemplars from CharfileChars, making the width of the portion taken to be more or less equal to the width of the exemplar currently under consideration, but varying the width slightly - up and down - to determine the maximum measure of similarity that can be achieved (because characters in the block may be narrower or wider than the exemplars). Given the width which gives the best match that can be achieved that exemplar is presumed to be the first character and a portion is removed corresponding to the best width. The next portion of the block is assessed similarly, and so on until the block has been dealt with (or the left-most portion does not match any exemplar at all, in which case the remainder of the block is abandoned).
An example of a block of text and the analysis arrived at by the prog:

… The prog I am working on - Scan document for OCR - is to produce characters to be made into exemplars to extend the store of character exemplars in CharfileChars. The question arises whether to use characters analysed as above from out of a block of text as exemplars. If the idea is that the prog is being taught new exemplars, then it is up to the teacher to present the exemplars as usefully as possibly, which means distinctly and separately. If however the prog is supposed to be learning as it goes, it has to cope as best it can with the information provided by ‘nature’ and extract from that information as usefully as possible new exemplars.
My compromise is to print the analysed characters from the block of text to a separate file called DocumentCharsSpecial so that they can if necessary be amended by the teacher or simply left as is so that the prog can learn from ‘nature’.
05/12/06 04:38 [Tuesday] Latest work
The basic idea behind my current character recognition work is to reduce each line of text to black blocks of reasonable aspect ratio, then identify all those blocks as characters which the program can with fair certainty identify (I find that making the prog too finicky in its requirement of closeness of similarity is counterproductive). Having found characters the prog can be pretty sure of, it beefs them up by darkening the scan again (having originally lightened it) and marking the black pixels surrounding the known characters as parts of those characters. Here is a visual exposition:

The “s” has been made bolder and also the unrecognised “r” (mistakenly guessed as a possible “t”). The prog then goes on to try to identify the “M” and the “r” but in fact succeeds with neither given the store of character exemplars it presently has. However other lines have shown more success. The latest version of the prog makes tentative guesses as it iteratively blackens the line of text (a tentative guess being one with a similaritymeasure > 0.3). The idea was having a range of guesses might be useful at a later stage when trying to put together letters into a sensible sounding word, but so far I have found that in cases where there is a final tentative guess - as the recognition gets to the ultimate it can, having accounted for all black pixels and no further darkening of the scan being allowed - that final guess is the most nearly correct. (Sometimes earlier tentative guesses become nullified as the aspect ratio of characters joining up with the blackening becomes unreasonable [and there is then no ‘final’ tentative guess as such].)

The final line shows the final version reached of characters identified with fair certainty. The last of the sequence of tentatives (two only shown) is the final - that is as I say the best - tentative guess for the remaining characters.
As regards timing, having experimented with three lines of an address I find the prog takes about 50 seconds per line, and as each line consists of more than ten characters the comparison against the prog I formerly wrote is very advantageous [for this version] (as well as this prog having better and clearer logic).
30/12/06 13:41 [Sunday] The re-modelled OCR program
I have re-modelled what is now the basic OCR prog - which takes a scan presumptively of a document of text and digitises it into black and white and then tries to recognise the characters - so that the processing as such - that is apart from setting initial parameters such as number of lines to try to process out of the document - is performed by code arranged in five VB Modules. Module1 converts the scan to greyscale and then examines each presumptive line of text one by one and varies the threshold for conversion to black-and-white so as to extract blocks of black with an aspect ratio reasonable for printed characters. Module3 contains a procedure ProcessLine which takes a line of text and tries to make out characters from it, again varying the brightness to try to get good recognition separately for each apparent character. The first pass in this procedure extracts characters which are confidently recognised, and subsequent passes darken the scan and again extract characters which can confidently be recognised. At the end of this phase there will usually be a number of part-characters unrecognised, created by characters mistakenly joining up (and individual ones mistakenly breaking up). The prog then darkens the scan of the line and expects the part-characters to join together - including characters already recognised although possibly with mistakes - to form strings of characters which are in effect over-inked. The final phase of the prog - again in several passes - tries to analyse each of these strings into individual characters, varying the brightness (again) until hopefully the best sequence is found - that is the sequence with the highest average similaritymeasure. Module2 contains procedures which calculate similarity between characters - or rather between characters and exemplars - and some which are capable of finding the best match from the store of exemplars, including for the case of a string of conjoined characters. The other Modules contain procedures called by the primary processing.
The phase of Module3 which tries to analyse strings of conjoined characters is still being developed. At present for the starting level brightness the prog finds the best match for the left portion of the string, adjusting the width of the portion taken to maximise similaritymeasure. If there is no match the block under consideration is darkened. The rest of the string is then analysed from the left up to the point at which no character matching with a sufficiently high similarity can be found (or the darkened string becomes over-long). Then the line is darkened and the process repeated. By calculating an average similaritymeasure for each stage in the darkening, the best solution is found. A solution is preferred which analyses the string into a partition with no outstanding black cells.
08/02/07 08:53 [Thursday] Visual Scene Analysis
I have recently been investigating, rather than the recognition of individual characters, the methods the visual system (say the human visual system) may be hypothesised to use in breaking down a scene for processing. This has come down to the specific question of breaking up a lineage of text so as to separate out as far as possible the individual characters. (The matter of analysing an entire 2-D scene - for example a whole document with graphics as well as different areas of text - is too demanding of processing, and the principles are the same considering only one lineage of text.) I have explained what I am doing as follows.
Visual Scene Analysis: description of program
The prog Visual field analysis takes a document scan and seeks lineages of text which it ‘pre-processes’ one by one from top to bottom of the page. A horizontal arrangement of black pixels may fail to be considered a lineage of text on various counts, eg if the vertical extent - taken to relate to the font-size - is too big or too small. In seeking lineages of text the prog looks for tramlines such as are used in teaching writing. A horizontal arrangement may fail if no suitable tramlines can be found. If they can be found, the lineage is scaled so that the tramlinesize is as close to 8 as possible.
The ‘pre-processing’ which the prog does on each lineage of text it finds consists of:
(1) Proc whiteout - an analysis is done of the distribution of greyscale values and those which are too light - making the threshold divide the total number of pixels by the cutoffproportion - are made pure white.
(2) At this stage if there is a great deal of whitespace in the lineage - more white than a ratio of black pixels of 0.15 - the lineage is rejected on the basis that it should be subdivided before further processing.
(3) The prog tries to identify horizontal separations between characters, at present by considering the way the number of black pixels in each column of pixels along the lineage varies, although other methods for example involving the vertical extent of the character stream column by column (from topmost black pixel to bottom-most) are under consideration.
27/02/07 01:35 [Tuesday] The idea of using fragmentation
… The method I developed of varying the whiteness threshold along a lineage of text - which I find is certainly a thing one needs to do, the exposure inevitably varying along each lineage because the scanner in setting the exposure averages over the whole of the scan - by considering the aspect ratio of the black blocks emerging as the lineage is darkened - the presumptive characters - is in more general terms watching the change in the fragmentation of the black parts of the lineage as it is darkened: as it is darkened the fragments join up more and more so that fragmentation is less. I have a suspicion that at the point at which characters emerge at some location along the lineage, the rate of change of fragmentation will fall to a low level - in other words when the black blocks locally reach their definitive form, further darkening the scan will produce little change until the further point is reached where what is genuinely whitespace starts to show up as containing scattered fragments of black because the area has been darkened too much.
It is possible then that the first piece of pre-processing to do on the visual field - exemplified for purposes of simplification by a lineage of text - is to set the ‘exposure’ - the whiteness threshold - for each location in the field that is each position along the lineage so as to divide ‘black’ from ‘white’ in a definitive form. This in effect is what I was doing in considering aspect ratios, but I felt that that method was too specific to the context of character recognition. After all in other contexts the definitive black blocks emerging could have aspect ratios different from that appropriate for characters. I feel considering fragmentation instead of aspect ratio as such is the way to generalise.
16/03/07 16:22 [Friday] More details
... Where I left off two weeks ago was thinking of varying the whiting-out threshold not in such a way as to extract characters of reasonable aspect ratio as they emerge, but so as to extract stable fragments - that is fragments which are unchanging certainly in their number as (in effect) brightness changes. The clear and specific way I thought up of implementing this is to consider each column of pixels individually as brightness decreases. From zero fragments in a given column with total whiting-out, the number of fragments will increase to a stable level (I think) where it will remain until a sudden fragmentation occurs when the brightness is so low even whites begin to appear black in patches. The only difficulty is where there are characters mistakenly running together there will be a single fragment at the abutment - or possibly two or even more - which may be stable over quite a range of brightness. In this case the brightness level for stability should (I hope) be different from that for adjacent columns which are genuinely parts of characters, so that perhaps a moving average along the row of columns, or looking for discontinuities, could be the solution.
08/05/07 04:45 [Tuesday] Specifics of the prog recently tested
The main procedure on starting up [the prog Visual field analysis] is Module1.picload. This takes a string argument which if it is not empty is the filename within My Pictures of the picture to load (presumed to be a document scan containing text). Options for the processing which the prog will do are chosen in the startup form. One option … is testrectangle, in which … the quarter of the scan with highest blackproportion and the quarter with lowest blackproportion are marked up. blackproportion is the ratio of the number of black pixels to the total number of pixels and is calculated by Form2.blackproportion which takes as its first parameter a threshold of greyscale differentiating ‘black’ from ‘white’. In the case of testrectangle … the average greyscale across the scan is used for this threshold.
Apart from the testrectangle option, the scan is examined from the top and lineages putatively of text are extracted (on the proviso of certain limits of blackproportion - or rather total number of black pixels - and certain limits of extent from topmost to bottom-most black pixel [these] presumed to relate to fontsize); the raw version of a lineage is the black-and-white copy taken directly from the scan into pic2 (using whitenessinitvalue as threshold to differentiate ‘black’ from ‘white’).
Module1.scalepic2_to_Picture3 produces Picture3 starting with pic2. Picture3 is a picture of the putative lineage of text with only very little surrounding white. scalepic2_to_Picture3 returns av, the average number of black pixels in a row of pixels in pic2. scalepic2_to_Picture3 fails if pic2 is entirely blank. It presumes that rowheight (the extent from topmost to bottom-most black pixel in pic2) is within bounds (minfontsize to maxfontsize) but it doesn’t … verify this. The resolution of Picture3 is governed by paintpixscale and PaintPicture is used to copy the relevant portion from Picture1 (the [original document] scan) but I do have doubts about the way PaintPicture does its scaling. (One improvement would be to get scalepic2_to_Picture3 to return an error code via av to indicate an improper fontsize, instead of testing fontsize again on the return from calling scalepic2_to_Picture3.)
The procedure Module1.getpic2A_from_Picture3 is next called with as its first argument (the threshold to differentiate ‘black’ from ‘white’) at this stage whitenessinitvalue. (It also requires av as its second argument, and returns error codes via av: error code -1 means either the fontsize is not within bounds - this should actually never be the case at this point - or the tramlinesize is unreasonable - probably implying the putative text is actually not text; error code -2 means there are no black pixels based on the threshold used.) getpic2A_from_Picture3 if it succeeds outputs as pic2A a scaled black-and-white version of the lineage with tramlinesize very close to 8. If option plotgreyscale has been chosen in the startup form, a graph is now produced plotting the frequency of occurrence in the lineage of greyscale values from 0 to 255.
Otherwise (as things stand at present) Form2.PreprocessLine is called which enshrines my experimentation with lightening and darkening pixels according to the amount of local black and whether there appear to be two line ends pointing towards each other. I will just mention that the basic calculation of the greyscale of a pixel in Picture3 is done by Form2.grey3. This takes as its third argument (the first two being the co-ordinates within Picture3) a switch (sw) which if set to 1 ensures simply the ordinary greyscale value is returned. If sw<>1 then grey3aux is called, and if the ordinary greyscale indicates a black pixel (using a threshold previously calculated based on the distribution of frequencies of greyscale values 0 .. 255 for the lineage: as in plotgreyscale) but grey3aux indicates a white pixel, then depending on the pixels surrounding the pixel in question it may be lightened to the grey3aux value instead of the simple greyscale value being used. grey3aux embodies the principle of lightening pixels which may have been rendered unnaturally dark (by the way the scanner or scanner software works) because they are in the midst of a lot of local dark pixels.
The procedure joinlineends (… not in use in the version of the prog before me dated 2007-03-01) makes use of the modified greyscale calculation grey3_alt to mark (within pic2A) pixels which have been presumed black where by the ordinary greyscale comparison against threshold they should be white, by virtue of being located between two seeming line-ends pointing towards each other. The latest character recognition work … was related to identifying such line-ends, and I’m not sure everything was working exactly as it should [….] The identification of such line-ends makes use of a parameter greyscope which is more or less the radius of the circle within which the shape of the putative line-ends is examined (in fact the diameter is 2*greyscope+1).
06/08/07 21:26 [Monday] The fragmentation methodology
... It turns out … that quite a simple … method involving thresholding but thresholding against a local but not immediately local background is the way to go [in visual scene analysis]. My understanding of pre-processing for visual scene analysis is now as follows. The animal visual system has local processing for neighbourhoods within the visual field, this implemented by local networks of neurons processing in parallel across the retina, that is processing across the visual field. There is some hardware fixing of brightness (the aperture mechanism of the pupil) but the equivalent in animal processing of the fixing of contrast (and more recent methodologies such as unsharp masking) lies in thresholding which as a general principle means showing up local variation by comparing against a slightly less local background. A simple example is given by the well-known method for detecting edges.

This edge between ‘grey’ (whatever it might be) and the ‘white’ background is detected at any orientation because the positive ‘yellow’ responders always outweigh the inhibitory ‘red’ responders (simply by virtue of the yellow area compared to the red area, if you like to say it in simple terms). This ties in with what I was saying above because in effect the immediately local ‘red’ detectors are set against a slightly less local neighbourhood (represented by yellow here).
In my character recognition programming to evade the processing requirement of simulated parallel processing for neighbourhoods all over the visual field, I take advantage of the fact that text can be expected to be in horizontal lineages (for languages written across the page) and within each lineage the individual characters should occur in a left-to-right sequence (for languages written thus) with no significant overlapping (although I expect to have to cater for occasional immediate abutment). Thus the first thing that is done is to calculate a threshold for the whole document scan, starting from the average brightness of the scan (but fixed [up] according to the proportion of black pixels turning out), and seek out horizontal lineages standing out from the background using that threshold. Then within each lineage (and I have found it good to include in each lineage a little of the surrounding whitespace from the original scan on the basis that some of what appeared from the first ‘averaging’ thresholding across the whole scan to be white may in fact be grey turning out for the individual lineage dark enough to be considered black) each column of pixels is considered as a model of the ‘neighbourhoods’ considered by processors associated with animal retina.
The essence of the prog … lies in … considering what happens to fragmentation (within each neighbourhood that is each column of pixels), and in particular to the count of separate fragments, as the threshold is altered from very low to very high. I can explain using plots the prog is capable of producing.

As threshold increases (from 1 at left to 255 at right) more pixels show up as black, and in general fragments will arise and at higher values of the threshold merge together. The yellow regions show the fragmentation, the height of the yellow bars showing the number of fragments (so that with low values of the threshold in the plot shown there are zero fragments, then one appears and a second soon after, but the two merge into one near the middle value for the threshold and this single fragment persists right up to the maximum value for the threshold when the whole neighbourhood becomes one black fragment.
The horizontal black bars represent the total number of black pixels in the neighbourhood (the y-distance indicating the number of pixels). Considering the single fragment persisting from near the middle value of the threshold upward, as threshold increases gradually more black pixels appear although simply growing the same fragment and not forming a little way off separately.
The two alternate ways I have thought up of choosing the best number of fragments and consequently the best threshold are (1) the number of fragments occurring over the widest range of threshold values; and (2) the greatest number of fragments ignoring transiently occurring fragments. The way I have ended up with is a combination of these, on the basis that (1) gives the most likely ‘correct’ solution except in cases where it results in a single fragment only (when usually it is difficult to choose a best threshold from the wide range of values over which the single fragment persists) and [in such cases] (2) yields up additional information (through increased resolution, if you like). In the plot shown the two-fragment solution is rejected … on the basis that the range of threshold values over which it persists reveals it is only transient.
The cyan bars delimit the range of threshold values (there may be more than one distinct range corresponding to the best number of fragments) within which the solution is chosen. The vertical blue line shows the chosen best value of threshold (based on the range of threshold values over which the number of black pixels is unchanging) [....]
[Noise] along the length of the lineage is avoided by calculating the moving average of the best threshold for each column, along the length of the lineage. Again it seems likely the ideal solution will involve a combination of a smaller-range moving average with a larger-range.
20/08/07 22:18 [Monday] Current work

The blue line represents the local threshold between ‘black’ and ‘white’. If the blue value is low (that is the blue line is down rather than up within its range) more pixels in that column will be translated as white. (The blue value, as previously explained, is calculated based on the number of fragments emerging as the tentative threshold is varied, then averaged as a moving average along the lineage.) The thing is, there is no information in whitespace (between words mainly) about the required threshold, so whitespace is ignored in calculating the moving average. However, it can be seen that with parameters set to produce the output above, at the edges of words (where they abut on whitespace) the blue value on the whole is too low, with the result that the beginning and end of words appear ‘overexposed’. (The yellow plot is an average value calculated for the threshold along the length of the lineage.)
Where there is local blackdensity (eg the “ass” [in the lineage shown]) considering matters more locally than column by column, the calculation above yields slight ‘underexposure’ so that the individual letters run together. The way earlier in the year I was trying to deal with this started … with plotting local blackdensity as a colour map [to show intensity or ‘depth’ of blackness], and this I have been continuing to try to do [....]