You can use ML Kit to recognize text in images or video, such as the text of a street sign.
This API requires Android API level 21 or above. Make sure that your app's build file uses a minSdkVersion value of 21 or higher.
In your project-level build.gradle file, make sure to include Google's Maven repository in both your buildscript and allprojects sections.
Add the dependencies for the ML Kit Android libraries to your module's app-level gradle file, which is usually app/build.gradle:
For bundling the model with your app:
dependencies {
// To recognize Latin script
implementation 'com.google.mlkit:text-recognition:16.0.1'
// To recognize Chinese script
implementation 'com.google.mlkit:text-recognition-chinese:16.0.1'
// To recognize Devanagari script
implementation 'com.google.mlkit:text-recognition-devanagari:16.0.1'
// To recognize Japanese script
implementation 'com.google.mlkit:text-recognition-japanese:16.0.1'
// To recognize Korean script
implementation 'com.google.mlkit:text-recognition-korean:16.0.1'
}
For using the model in Google Play Services:
dependencies {
// To recognize Latin script
implementation 'com.google.android.gms:play-services-mlkit-text-recognition:19.0.1'
// To recognize Chinese script
implementation 'com.google.android.gms:play-services-mlkit-text-recognition-chinese:16.0.1'
// To recognize Devanagari script
implementation 'com.google.android.gms:play-services-mlkit-text-recognition-devanagari:16.0.1'
// To recognize Japanese script
implementation 'com.google.android.gms:play-services-mlkit-text-recognition-japanese:16.0.1'
// To recognize Korean script
implementation 'com.google.android.gms:play-services-mlkit-text-recognition-korean:16.0.1'
}
If you choose to use the model in Google Play Services, you can configure your app to automatically download the model to the device after your app is installed from the Play Store. To do so, add the following declaration to your app's AndroidManifest.xml file:
<application ...>
...
<meta-data
android:name="com.google.mlkit.vision.DEPENDENCIES"
android:value="ocr" >
<!-- To use multiple models: android:value="ocr,ocr_chinese,ocr_devanagari,ocr_japanese,ocr_korean,..." -->
</application>
You can also explicitly check the model availability and request download through Google Play services ModuleInstallClient API. If you don't enable install-time model downloads or request explicit download, the model is downloaded the first time you run the scanner. Requests you make before the download has completed produce no results.
Create an instance of TextRecognizer, passing the options related to the library you declared a dependency on above:
// When using Latin script library
TextRecognizer recognizer =
TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS);
// When using Chinese script library
TextRecognizer recognizer =
TextRecognition.getClient(new ChineseTextRecognizerOptions.Builder().build());
// When using Devanagari script library
TextRecognizer recognizer =
TextRecognition.getClient(new DevanagariTextRecognizerOptions.Builder().build());
// When using Japanese script library
TextRecognizer recognizer =
TextRecognition.getClient(new JapaneseTextRecognizerOptions.Builder().build());
// When using Korean script library
TextRecognizer recognizer =
TextRecognition.getClient(new KoreanTextRecognizerOptions.Builder().build());
To recognize text in an image, create an InputImage object from either a Bitmap, media.Image, ByteBuffer, byte array, or a file on the device. Then, pass the InputImage object to the TextRecognizer's processImage method.
You can create an InputImage object from different sources, each is explained below.
InputImage image = InputImage.fromBitmap(bitmap, rotationDegree);
Pass the image to the process method:
Task<Text> result =
recognizer.process(image)
.addOnSuccessListener(new OnSuccessListener<Text>() {
@Override
public void onSuccess(Text visionText) {
// Task completed successfully
// ...
}
})
.addOnFailureListener(
new OnFailureListener() {
@Override
public void onFailure(@NonNull Exception e) {
// Task failed with an exception
// ...
}
});
If the text recognition operation succeeds, a Text object is passed to the success listener. A Text object contains the full text recognized in the image and zero or more TextBlock objects.
Each TextBlock represents a rectangular block of text, which contains zero or more Line objects. Each Line object represents a line of text, which contains zero or more Element objects. Each Element object represents a word or a word-like entity, which contains zero or more Symbol objects. Each Symbol object represents a character, a digit or a word-like entity.
For each TextBlock, Line, Element and Symbol object, you can get the text recognized in the region, the bounding coordinates of the region and many other attributes such as rotation information, confidence score etc.
For example:
String resultText = result.getText();
for (Text.TextBlock block : result.getTextBlocks()) {
String blockText = block.getText();
Point[] blockCornerPoints = block.getCornerPoints();
Rect blockFrame = block.getBoundingBox();
for (Text.Line line : block.getLines()) {
String lineText = line.getText();
Point[] lineCornerPoints = line.getCornerPoints();
Rect lineFrame = line.getBoundingBox();
for (Text.Element element : line.getElements()) {
String elementText = element.getText();
Point[] elementCornerPoints = element.getCornerPoints();
Rect elementFrame = element.getBoundingBox();
for (Text.Symbol symbol : element.getSymbols()) {
String symbolText = symbol.getText();
Point[] symbolCornerPoints = symbol.getCornerPoints();
Rect symbolFrame = symbol.getBoundingBox();
}
}
}
For ML Kit to accurately recognize text, input images must contain text that is represented by sufficient pixel data. Ideally, each character should be at least 16x16 pixels. There is generally no accuracy benefit for characters to be larger than 24x24 pixels.
So, for example, a 640x480 image might work well to scan a business card that occupies the full width of the image. To scan a document printed on letter-sized paper, a 720x1280 pixel image might be required.
Poor image focus can affect text recognition accuracy. If you aren't getting acceptable results, try asking the user to recapture the image.
If you are recognizing text in a real-time application, you should consider the overall dimensions of the input images. Smaller images can be processed faster. To reduce latency, ensure that the text occupies as much of the image as possible, and capture images at lower resolutions (keeping in mind the accuracy requirements mentioned above). For more information, see Tips to improve performance.
If you use the Camera or camera2 API, throttle calls to the detector. If a new video frame becomes available while the detector is running, drop the frame. See the VisionProcessorBase class in the quickstart sample app for an example.
If you use the CameraX API, be sure that backpressure strategy is set to its default value ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST. This guarantees only one image will be delivered for analysis at a time. If more images are produced when the analyzer is busy, they will be dropped automatically and not queued for delivery. Once the image being analyzed is closed by calling ImageProxy.close(), the next latest image will be delivered.
If you use the output of the detector to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. This renders to the display surface only once for each input frame. See the CameraSourcePreview and GraphicOverlay classes in the quickstart sample app for an example.
If you use the Camera2 API, capture images in ImageFormat.YUV_420_888 format. If you use the older Camera API, capture images in ImageFormat.NV21 format.
Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.