| Mathematical morphology, generally speaking, lets you search for patterns of bits in images. It's a little bit like regular expressions for pixels. OCRopus has several different implementations of mathematical morphology:
If your needs are not performance critical, probably the binary morphology on arrays is your best starting point; those operations operate on simple arrays, just like any other image processing operation. Available OperationsHere is a list of the basic operations for binary morphology available in imglib; these are callable from C++ or Lua:
Simple ExampleHere is a simple script that "opens" an image; that is, it removes small features from an image, including isolated points and thin lines: image = bytearray() read_image_binary(image,arg[1]) binary_erode_circle(image,3) write_png(arg[2],image) Matra ClippingHere's a simple binary morphology script that implements "matra clipping"; that is, it cuts apart the connecting line that link Devanagari or Bengali characters after vertical lines. The idea is somewhat similar to this, except that this code doesn't use any projection operations.-- the parameters are resolution dependent Here is an example of how this actually works in practice:min_width = 10 -- minimum width of matra lines min_height = 8 -- minimum height of vertical lines causing interruption clip_offset = 3 -- how far to offset the clipping from the vertical line -- read the input image and invert (black background) image = bytearray() read_image_binary(image,arg[1]) binary_invert(image) -- find horizontal lines matra = bytearray() narray.copy(matra,image) binary_open_rect(matra,min_width,1) -- find vertical lines vert = bytearray() narray.copy(vert,image) binary_open_rect(vert,1,min_height) -- find places where horizontal and vertical lines intersect binary_and(matra,vert,0,0) -- shift the intersection points by clip_offset and remove binary_dilate_rect(matra,2,2) binary_invert(matra) binary_and(image,matra,clip_offset,0) -- write out the result binary_invert(image) write_png(arg[2],image)
Because it uses only local information, it will clip some horizontal lines in addition the the matra (exercise: fix this). On the other hand, the purely morphological code is more robust to other objects and noise being present in the document, since it only relies on local processing. This distinction is probably academic, however, since there are better approaches to segmentation of Indic scripts available. |

