Leptonica
1.54
|
static void debugAddImage1 | ( | PIXA * | pixa1, |
PIX * | pix1, | ||
PIX * | pix2, | ||
L_BMF * | bmf, | ||
l_float32 | score | ||
) | [static] |
static void debugAddImage2 | ( | PIXA * | pixadb, |
PIXA * | pixa1, | ||
L_BMF * | bmf, | ||
l_int32 | index | ||
) | [static] |
static char * l_charToString | ( | char | byte | ) | [static] |
l_int32 pixaAccumulateSamples | ( | PIXA * | pixa, |
PTA * | pta, | ||
PIX ** | ppixd, | ||
l_float32 * | px, | ||
l_float32 * | py | ||
) |
Input: pixa (of samples from the same class, 1 bpp) pta (<optional> of centroids of the samples) &ppixd (<return> accumulated samples, 8 bpp) &px (<optional return>=""> average x coordinate of centroids) &py (<optional return>=""> average y coordinate of centroids) Return: 0 on success, 1 on failure
Notes: (1) This generates an aligned (by centroid) sum of the input pix. (2) We use only the first 256 samples; that's plenty. (3) If pta is not input, we generate two tables, and discard after use. If this is called many times, it is better to precompute the pta.
Input: recog pixa (1 or more characters) classindex (use -1 if not forcing into a specified class) debug Return: 0 if OK, 1 on error
Notes: (1) The pix in the pixa are all 1 bpp, and the character string labels are embedded in the pix. (2) Note: this function decides what class each pix belongs in. When input is from a multifont pixaa, with a valid value for , the character string label in each pix is ignored, and is used as the class index for all the pix in the pixa. Thus, for that situation we use this class index to avoid making the decision through a lookup based on the character strings embedded in the pix. (3) When a recog is initially filled with samples, the pixaa_u array is initialized to accept up to 256 different classes. When training is finished, the arrays are truncated to the actual number of classes. To pad an existing recog from the boot recognizers, training is started again; if samples from a new class are added, the pixaa_u array must be extended by adding a pixa to hold them.
l_int32 recogaFinishAveraging | ( | L_RECOGA * | recoga | ) |
Input: recoga Return: 0 if OK, 1 on error
l_int32 recogaShowContent | ( | FILE * | fp, |
L_RECOGA * | recoga, | ||
l_int32 | display | ||
) |
Input: stream recoga display (1 for showing template images, 0 otherwise) Return: 0 if OK, 1 on error
l_int32 recogaTrainingDone | ( | L_RECOGA * | recoga, |
l_int32 * | pdone | ||
) |
Input: recoga &done (<return> 1 if training finished on all recog; else 0) Return: 0 if OK, 1 on error
static l_int32 recogAverageClassGeom | ( | L_RECOG * | recog, |
NUMA ** | pnaw, | ||
NUMA ** | pnah | ||
) | [static] |
Input: recog &naw (<optional return>=""> average widths for each class) &nah (<optional return>=""> average heights for each class) Return: 0 if OK, 1 on error
l_int32 recogAverageSamples | ( | L_RECOG * | recog, |
l_int32 | debug | ||
) |
Input: recog debug Return: 0 on success, 1 on failure
Notes: (1) This is called when training is finished, and after outliers have been removed. Both unscaled and scaled inputs are averaged. Averages must be computed before any identification is done. (2) Set debug = 1 to view the resulting templates and their centroids.
l_int32 recogBestCorrelForPadding | ( | L_RECOG * | recog, |
L_RECOGA * | recoga, | ||
NUMA ** | pnaset, | ||
NUMA ** | pnaindex, | ||
NUMA ** | pnascore, | ||
NUMA ** | pnasum, | ||
PIXA * | pixadb | ||
) |
Input: recog (typically the recog to be padded) recoga (array of recogs for potentially providing the padding) &naset (<return> of indices into the sets to be matched) &naindex (<return> of matching indices into the best set) &nascore (<return> of best correlation scores) &naave (<return> average of correlation scores from each recog) pixadb (<optional> debug images; use NULL for no debug) Return: 0 if OK, 1 on error
Notes: (1) This finds, for each class in recog, the best matching template in the recoga. For that best match, it returns: * the recog set index in the recoga, * the index in that recog for the class, * the score for the best match (2) It also returns in , for each recog in recoga, the average overall correlation for all averaged templates to those in the input recog. The recog with the largest average can supply templates in cases where the input recog has no examples. (3) For classes in recog1 for which no corresponding class is found in any recog in recoga, the index -1 is stored in both naset and naindex, and 0.0 is stored in nascore. (4) Both recog and all the recog in recoga should be generated with isotropic scaling to the same character height (e.g., 30).
static l_int32 recogCharsetAvailable | ( | l_int32 | type | ) | [static] |
Input: type (of charset for padding) Return: 1 if available; 0 if not.
l_int32 recogCorrelAverages | ( | L_RECOG * | recog1, |
L_RECOG * | recog2, | ||
NUMA ** | pnaindex, | ||
NUMA ** | pnascore, | ||
PIXA * | pixadb | ||
) |
Input: recog1 (typically the recog to be padded) recog2 (potentially providing the padding) &naindex (<return> of classes in 2 with respect to classes in 1) &nascore (<return> correlation scores of corresponding classes) pixadb (<optional> debug images) Return: 0 if OK, 1 on error
Notes: (1) Use this for potentially padding recog1 with instances in recog2. The recog have been generated with isotropic scaling to the same fixed height (e.g., 30). The training has been "finished" in the sense that all arrays have been computed and they could potentially be used as they are. This is necessary for doing the correlation between scaled images. However, this function is called when there is a request to augument some of the examples in classes in recog1. (2) Iterate over classes in recog1, finding the corresponding class in recog2 and computing the correlation score between the average templates of the two. naindex is a LUT between the index of a class in recog1 and the corresponding one in recog2. (3) For classes in recog1 that do not exist in recog2, the index -1 is stored in naindex, and 0.0 is stored in the score.
l_int32 recogDebugAverages | ( | L_RECOG * | recog, |
l_int32 | debug | ||
) |
Input: recog debug (0 no output; 1 for images; 2 for text; 3 for both) Return: 0 if OK, 1 on error
Notes: (1) Generates an image that pairs each of the input images used in training with the average template that it is best correlated to. This is written into the recog. (2) It also generates pixa_tr of all the input training images, which can be used, e.g., in recogShowMatchesInRange().
static l_int32 recogGetCharsetSize | ( | l_int32 | type | ) | [static] |
Input: type (of charset) Return: size of charset, or 0 if unknown or on error
static l_int32 * recogMapIndexToIndex | ( | L_RECOG * | recog1, |
L_RECOG * | recog2 | ||
) | [static] |
Input: recog1 recog2 Return: lut (from recog1 --> recog2), or null on error
Notes: (1) This returns a map from each index in recog1 to the corresponding index in recog2. Caller must free. (2) If the character string doesn't exist in any of the classes in recog2, the value -1 is inserted in the lut.
l_int32 recogPadTrainingSet | ( | L_RECOG ** | precog, |
l_int32 | debug | ||
) |
Input: &recog (to be replaced if padding or more drastic measures are necessary; otherwise, it is unchanged.) debug (1 for debug output saved to recog; 0 otherwise) Return: 0 if OK, 1 on error
Notes: (1) This function does either padding of the recognizer, or its complete replacement. In both cases, we use a "boot" recognizer to provide the sample images. (2) Before calling this, call recogSetPadParams() if you want non-default values for the character set type, min_nopad and max_afterpad values, paths for labelled bitmap character sets that can be used to augment an input recognizer, and optional augmentation of the input training set using eroded versions of the bitmaps. (3) If all classes in have at least min_nopad samples, nothing is done. If the total number of samples in is very small, is replaced in its entirety by a "boot" recog, either from the specified bootpath, or by a default boot recognizer for that character type. Otherwise (the intermediate case), is replaced by one with scaling to fixed height, where an array of recog are used to augment the input recog. (4) If padding or total replacement is done, this destroys the input recog and replaces it by a new one. If the recog belongs to a recoga, the replacement is also done in the recoga.
l_int32 recogProcessMultLabelled | ( | L_RECOG * | recog, |
PIX * | pixs, | ||
BOX * | box, | ||
char * | text, | ||
PIXA ** | ppixa, | ||
l_int32 | debug | ||
) |
Input: recog (in training mode) pixs (if depth > 1, will be thresholded to 1 bpp) box (<optional> cropping box) text (<optional> if null, use text field in pix) &pixa (<return> of split and thresholded characters) debug (1 to display images of samples not captured) Return: 0 if OK, 1 on error
Notes: (1) This crops and segments one or more labelled and contiguous ascii characters, for input in training. It is a special case. (2) The character images are bundled into a pixa with the character text data embedded in each pix. (3) Where there is more than one character, this does some noise reduction and extracts the resulting character images from left to right. No scaling is performed.
l_int32 recogProcessSingleLabelled | ( | L_RECOG * | recog, |
PIX * | pixs, | ||
BOX * | box, | ||
char * | text, | ||
PIXA ** | ppixa | ||
) |
Input: recog (in training mode) pixs (if depth > 1, will be thresholded to 1 bpp) box (<optional> cropping box) text (<optional> if null, use text field in pix) &pixa (<return> one pix, 1 bpp, labelled) Return: 0 if OK, 1 on error
Notes: (1) This crops and binarizes the input image, generating a pix of one character where the charval is inserted into the pix.
l_int32 recogRemoveOutliers | ( | L_RECOG * | recog, |
l_float32 | targetscore, | ||
l_float32 | minfract, | ||
l_int32 | debug | ||
) |
Input: recog (after training samples are entered) targetscore (keep everything with at least this score) minfract (minimum fraction to retain) debug (1 for debug output) Return: 0 if OK, 1 on error
Notes: (1) Removing outliers is particularly important when recognition goes against all the samples in the training set, as opposed to the averages for each class. The reason is that we get an identification error if a mislabeled sample is a best match for an input bitmap. (2) However, the score values depend strongly on the quality of the character images. To avoid losing too many samples, we supplement a target score for retention with a minimum fraction that we must keep. With poor quality images, we may keep samples with a score less than the targetscore, in order to satisfy the requirement. (3) We always require that at least one sample will be retained. (4) Where the training set is from the same source (e.g., the same book), use a relatively large minscore; say, ~0.8. (5) Method: for each class, generate the averages and match each scaled sample against the average. Decide which samples will be ejected, and throw out both the scaled and unscaled samples and associated data. Recompute the average without the poor matches.
l_int32 recogResetBmf | ( | L_RECOG * | recog, |
l_int32 | size | ||
) |
Input: recog size (of font; even integer between 4 and 20; default is 6) Return: 0 if OK, 1 on error
Notes: (1) Use this to reset the size of the font used for debug labelling.
PIX* recogScaleCharacter | ( | L_RECOG * | recog, |
PIX * | pixs | ||
) |
Input: recog pixs (1 bpp, to be scaled) Return: pixd (scaled) if OK, null on error
l_int32 recogSetPadParams | ( | L_RECOG * | recog, |
const char * | bootdir, | ||
const char * | bootpattern, | ||
const char * | bootpath, | ||
l_int32 | boot_iters, | ||
l_int32 | type, | ||
l_int32 | min_nopad, | ||
l_int32 | max_afterpad, | ||
l_int32 | min_samples | ||
) |
Input: recog (to be padded, if necessary) bootdir (<optional> directory to bootstrap labelled pixa) bootpattern (<optional> pattern for bootstrap labelled pixa) bootpath (<optional> path to single bootstrap labelled pixa) boot_iters (number of 2x2 erosions for extension of boot pixa) type (character set type; -1 for default; see enum in recog.h) size (character set size; -1 for default) min_nopad (min number in a class without padding; -1 default) max_afterpad (max number of samples in padded classes; -1 for default) min_samples (use boot if total num samples is less than this; -1 for default) Return: 0 if OK, 1 on error
Notes: (1) This is used to augment or replace a book-adapted recognizer (BAR). It is called when the recognizer is created, and must be called again before recogPadTrainingSet() if non-default values are to be used. (2) Default values allow for some padding. To disable padding, set = 0. (3) Constraint on and guarantees that padding will be allowed if requested. (4) The file directory () and tail pattern () are used to identify serialized pixa, from which we can generate an array of recog. These can be used to augment an input but incomplete BAR (book adapted recognizer). (5) The boot recog can be extended using erosions. Set boot_iters to the number of 2x2 erosions desired. For a typical font size, <= 2. (6) If the BAR is very sparse, with num_samples < min_samples, we will destroy it and use the generic bootstrap recognizer given at .
l_int32 recogShowAverageTemplates | ( | L_RECOG * | recog | ) |
Input: recog Return: 0 on success, 1 on failure
Notes: (1) This debug routine generates a display of the averaged templates, both scaled and unscaled, with the centroid visible in red.
l_int32 recogShowContent | ( | FILE * | fp, |
L_RECOG * | recog, | ||
l_int32 | display | ||
) |
Input: stream recog display (1 for showing template images, 0 otherwise) Return: 0 if OK, 1 on error
PIX* recogShowMatch | ( | L_RECOG * | recog, |
PIX * | pix1, | ||
PIX * | pix2, | ||
BOX * | box, | ||
l_int32 | index, | ||
l_float32 | score | ||
) |
Input: recog pix1 (input pix; several possibilities) pix2 (<optional> matching template) box (<optional> region in pix1 for which pix2 matches) index (index of matching template; use -1 to disable printing) score (score of match) Return: pixd (pair of images, showing input pix and best template, optionally with matching information), or null on error.
Notes: (1) pix1 can be one of these: (a) The input pix alone, which can be either a single character (box == NULL) or several characters that need to be segmented. If more than character is present, the box region is displayed with an outline. (b) Both the input pix and the matching template. In this case, pix2 and box will both be null. (2) If the bmf has been made (by a call to recogMakeBmf()) and the index >= 0, the text field, match score and index will be rendered; otherwise their values will be ignored.
l_int32 recogShowMatchesInRange | ( | L_RECOG * | recog, |
PIXA * | pixa, | ||
l_float32 | minscore, | ||
l_float32 | maxscore, | ||
l_int32 | display | ||
) |
Input: recog pixa (of 1 bpp images to match) minscore, maxscore (range to include output) display (to display the result) Return: 0 if OK, 1 on error
Notes: (1) This gives a visual output of the best matches for a given range of scores. Each pair of images can optionally be labelled with the index of the best match and the correlation. If the bmf has been previously made, it will be used here. (2) To use this, save a set of 1 bpp images (labelled or unlabelled) that can be given to a recognizer in a pixa. Then call this function with the pixa and parameters to filter a range of score.
l_int32 recogTrainingFinished | ( | L_RECOG * | recog, |
l_int32 | debug | ||
) |
Input: recog debug Return: 0 if OK, 1 on error
Notes: (1) This must be called after all training samples have been added. (2) Set debug = 1 to view the resulting templates and their centroids. (3) The following things are done here: (a) Allocate (or reallocate) storage for (possibly) scaled bitmaps, centroids, and fg areas. (b) Generate the (possibly) scaled bitmaps. (c) Compute centroid and fg area data for both unscaled and scaled bitmaps. (d) Compute the averages for both scaled and unscaled bitmaps (e) Truncate the pixaa, ptaa and numaa arrays down from 256 to the actual size. (4) Putting these operations here makes it simple to recompute the recog with different scaling on the bitmaps. (5) Removal of outliers must happen after this is called.
l_int32 recogTrainLabelled | ( | L_RECOG * | recog, |
PIX * | pixs, | ||
BOX * | box, | ||
char * | text, | ||
l_int32 | multflag, | ||
l_int32 | debug | ||
) |
Input: recog (in training mode) pixs (if depth > 1, will be thresholded to 1 bpp) box (<optional> cropping box) text (<optional> if null, use text field in pix) multflag (1 if one or more contiguous ascii characters; 0 for a single arbitrary character) debug (1 to display images of samples not captured) Return: 0 if OK, 1 on error
Notes: (1) Training is restricted to the addition of either: (a) multflag == 0: a single character in an arbitrary (e.g., UTF8) charset (b) multflag == 1: one or more ascii characters rendered contiguously in pixs (2) If box != null, it should represent the cropped location of the character image. (3) If multflag == 1, samples will be rejected if the number of connected components does not equal to the number of ascii characters in the textstring. In that case, if debug == 1, the rejected samples will be displayed.
l_int32 recogTrainUnlabelled | ( | L_RECOG * | recog, |
L_RECOG * | recogboot, | ||
PIX * | pixs, | ||
BOX * | box, | ||
l_int32 | singlechar, | ||
l_float32 | minscore, | ||
l_int32 | debug | ||
) |
Input: recog (in training mode: the input characters in pixs are inserted after labelling) recogboot (labels the input) pixs (if depth > 1, will be thresholded to 1 bpp) box (<optional> cropping box) singlechar (1 if pixs is a single character; 0 otherwise) minscore (min score for accepting the example; e.g., 0.75) debug (1 for debug output saved to recog; 0 otherwise) Return: 0 if OK, 1 on error
Notes: (1) This trains on one or several characters of unlabelled data, using a bootstrap recognizer to apply the labels. In this way, we can build a recognizer using a source of unlabelled data. (2) The input pix can have several (non-touching) characters. If box != NULL, we treat the region in the box as a single char If box == NULL, use all of pixs: if singlechar == 0, we identify each c.c. as a single character if singlechar == 1, we treat pixs as a single character Multiple chars are identified separately by recogboot and inserted into recog. (3) recogboot is a trained recognizer. It would typically be constructed from a variety of sources, and use the individual templates (not the averages) for scoring. It must be used in scaled mode; typically with width = 20 and height = 32. (4) For debugging, if bmf is defined in the recog, the correlation scores are generated and saved (by adding to the pixadb_boot field) with the matching images.
const char* DEFAULT_BOOT_DIR = "recog/digits" [static] |
const char* DEFAULT_BOOT_PATTERN = "digit_set" [static] |
const l_int32 DEFAULT_CHARSET_TYPE = L_ARABIC_NUMERALS [static] |
const l_int32 DEFAULT_MAX_AFTERPAD = 15 [static] |
const l_float32 DEFAULT_MIN_FRACTION = 0.5 [static] |
const l_int32 DEFAULT_MIN_NOPAD = 3 [static] |
const l_int32 DEFAULT_MIN_SAMPLES = 10 [static] |
const l_float32 DEFAULT_TARGET_SCORE = 0.75 [static] |