OpenCV를 이용해 OCR 기능을 구현해봅시다. - slog(완료)

dimohy · 2월 11, 2021, 5:06오전

OpenCV(Open Source Computer Vision)은 오픈소스로 컴퓨터 비전을 위해 인텔이 개발하였던 것이 오픈소스화 되면서 보편적으로 사용하게 된 라이브러리 입니다. Windows 및 Linux등 다양한 운영체제에서 지원하며, OpenCV 딥러닝 모듈을 이용해 머신러닝을 통한 영상 인식에도 사용되고 있습니다.

OpenCV - 위키백과, 우리 모두의 백과사전 (wikipedia.org)

이번 공부의 골은

ML.NET을 이용해 영상 학습을 한 결과를 OpenCV 모듈을 이용해 라즈베리파이에서 OCR 인식을 하도록 함.

관련해서 경험이 전혀 없는데요, 이번 경험을 통해 ML.NET에 대해 익숙해지고, 학습한 결과가 라즈베리파이를 통해 잘 동작하는지를 경험하는게 목적이 되겠습니다.

.NET 5, C# 9

Visual Studio 2019 Preview 환경을 구축합니다. ML.NET Preview를 이용하기 위해서는 Visual Studio 2019 Preview 환경을 이용해야 한다고 합니다.

Visual Studio Preview (microsoft.com)

dimohy · 2월 11, 2021, 6:07오전

ML.NET

Visual Studio 2019 Preview를 설치하면 ML.NET Model Builder 및 GPU Support를 바로 사용할 수 있습니다.

dimohy · 2월 11, 2021, 6:10오전

OpenCV Sharp

NuGet을 보면 상당히 많은 OpenCV 관련 패키지가 눈에 보입니다. 가장 최신의 릴리즈어야 하기 때문에, OpenCvSharp4를 선택합니다.

dimohy · 2월 11, 2021, 8:09오전

OCR - Tesseract 4이용

음. ML.NET Builder에 OCR은 없네요. 그래서 OCR 인식용으로 Tesseract 4를 이용해야 할 것 같습니다.
Tesseract 4는 LSTM 네트워크를 이용한 딥 러닝 기술이 포함되어서 Tesseract 3보다 비정형 데이터에 대해 더 나은 성능을 제공한다고 합니다.

tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository) (github.com)

.NET 용으로는

charlesw/tesseract: A .Net wrapper for tesseract-ocr (github.com)

가장 최신의 4.1.1 에 대응하고 .NET Standard 2.0 을 지원하므로 .NET 5에서 사용하는데 문제는 없어 보입니다.

ML.NET으로 텍스트 개체 검색 및 영역 목록화
영역 별 Tesseract 4를 이용해 텍스트화

음 그런데 ML.NET Builder에서의 개체 검색은 템플릿이 Azure ML만 됩니다. 결국엔… 빌더의 도움은 얻지 못하겠군요 T_T

dimohy · 2월 11, 2021, 8:13오전

ML.NET 샘플

dotnet/machinelearning-samples: Samples for ML.NET, an open source and cross-platform machine learning framework for .NET. (github.com)

다양한 샘플들이 있습니다. 이곳 중 컴퓨터 비젼의 Object Detecction이 필요합니다. 슬슬 현타가 옵니다.

dimohy · 2월 11, 2021, 8:24오전

OCR 전처리 관련

OCR의 인식률을 높이기 위해 이미지를 전처리해야 합니다. 비단 OCR뿐만 아니라 머신러닝에서 사용하는 이미지도 마찬가지입니다. 다양한 사이즈의 이미지의 스케일을 맞추어야 하며, 색깊이를 낮추고 통일시켜야 합니다.

[Tesseract & OpenCV]를 이용한 OCR-2-1 전처리(pre-processing) (tistory.com)

dimohy · 2월 13, 2021, 5:27오전

ONNX | Home는 프레임워크 간의 상호 운용성을 지원합니다. 학습한 프레임워크와 상관없이 ONNX를 지원한다면 적용 프레임워크에서 학습데이터를 그대로 사용할 수 있습니다.

Open Neural Network Exchange - Wikipedia

dimohy · 2월 13, 2021, 5:29오전

우리는 신경망 알고리즘 자체를 연구하고 개발할 수는 없습니다. 워낙 이 영역이 전문적이기도 하고 프로그래머의 영역은 아니기 때문입니다. 우리는 이미 존재하는 신경망 알고리즘이 무엇에 효과적이고 정확하며 어떠한 프레임워크가 잘 지원하는지를 파악하고 알아야 합니다. 그런 이후에는 목적에따라 적절한 신경망알고리즘과 프레임워크는 선정해서 사용하는 것을 익숙해지는게 필요한 것 같습니다.

이후에는 복수개의 신경망을 조합하여 사용하는게 필요할 것 같습니다.

dimohy · 2월 13, 2021, 6:58오전

글자 영역을 객체감지가 아닌 외곽선 감지로 잡아낼 경우 OpenCV의 기능을 이용하면 됩니다

dimohy · 2월 13, 2021, 7:03오전

신경망을 통해 텍스트를 감지하려면 다음의 글을 참고하면 될 것 같습니다

dimohy · 2월 13, 2021, 7:59오전

ML.NET에서의 EAST Text Detector 관련

github.com/dotnet/machinelearning

My confusion trying to use EAST text detector model with the ML.net

opened 01:03AM - 20 Jul 20 UTC

closed 01:11AM - 24 Jul 20 UTC

sereal96

question P3

### System information - **OS version/distro:Windows 10**: ### Issue - …**What did you do?** Hi, well I am trying to use a the EAST text detector model with the ML.net from here: https://www.kaggle.com/yelmurat/frozen-east-text-detection however I don't know if I am doing it the right way. (I only began using ML.net last month) First I tried using OpenCV with this example: https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp It runs fine everything is Ok, but when I tried to do the same with ML.net... - **What happened?** The problem is that I dont understand how ML.net handle the input data, and the output data. I had an idea. I run other examples, but I couldn't find something similar. - **What did you expect?** I was expecting to have the same or at least similar results like those from OpenCV example. ### Source code / logs First I define this ` static readonly string _assetsPath = Path.Combine(Environment.CurrentDirectory, "assets"); static readonly string _imagesFolder = Path.Combine(_assetsPath, "imagesText"); static readonly string _predictSingleImage = Path.Combine(_imagesFolder, "page10.jpg"); static readonly string _inceptionTensorFlowModel = Path.Combine(_assetsPath, "models","frozen_east_text_detection.pb"); private const int imageHeight = 3104;// 576; It should be multiple by 32 private const int imageWidth = 2304; //576; It should be multiple by 32 private const int numChannels = 3; private const int inputSize = imageHeight * imageWidth * numChannels;` then I load the TensorFlow model and saved as ML.net model `using var modelX = mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel); var schema = modelX.GetModelSchema(); var inputchema = modelX.GetInputSchema(); var pipelineX = modelX.ScoreTensorFlowModel( outputColumnNames: new[] { "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3" }, nameof(OutputScores.output) }, inputColumnNames: new[] { "input_images" }, addBatchDimensionInput: false); }, addBatchDimensionInput: true); List<TensorData> list = new List<TensorData>(); list.Add(new TensorData() { input = null }); IEnumerable<TensorData> enumerableData = list; var dv = mlContext.Data.LoadFromEnumerable<TensorData>(list);//TensorData ITransformer model = pipelineX.Fit(dv); Directory.CreateDirectory("Model"); mlContext.Model.Save(model, inputchema, "trainedModelEAST3.zip");` At this point everything seems to work, but here is my problem with the outputs In OpenCV I load an Image and use this `cv::dnn::blobFromImage(frame, blob, 1.0, cv::Size(inpWidth, inpHeight), cv::Scalar(123.68, 116.78, 103.94), true, false); ` and only using this ` detector.setInput(blob); tickMeter.start(); detector.forward(outs, outNames); tickMeter.stop(); cv::Mat scores = outs[0]; cv::Mat geometry = outs[1];` It's almost done, my inputs are clear, and my outputs too. But ML.net you need to create a class to hold the sample tensor data. So I did that ` public class TensorData { [VectorType(imageHeight, imageWidth, numChannels)] [ColumnName("input_images")] public float[] input { get; set; } [ColumnName("ImagePath")] public string imageP { get; set; } [ColumnName("Name")] public string imageN { get; set; } }` This is where my confusion began because I know that my input for this model should be like this ![inputs](https://user-images.githubusercontent.com/60855616/87887590-965b7200-c9f4-11ea-9d17-4bc627ea8792.png) using this seems to work ` [VectorType(imageHeight, imageWidth, numChannels)] [ColumnName("input_images")] public float[] input { get; set; }` But for my outputs and how to pass and image to the model I only guessing. so using the information about the model's output that I find using Netron: This is the "scores" ![outputs1](https://user-images.githubusercontent.com/60855616/87887712-71b3ca00-c9f5-11ea-8a6d-237261c1feb3.png) and this is the "geometry" (the box that show you where is a word in the image) ![outputs2](https://user-images.githubusercontent.com/60855616/87887726-aa53a380-c9f5-11ea-8c80-8ee082c2159d.png) I create the class ` class OutputScores { [ColumnName("feature_fusion/concat_3")] public float[] output { get; set; } [ColumnName("feature_fusion/Conv_7/Sigmoid")] public float[] output2 { get; set; } }` white all that I tried to use the predict engine like this using an image ("jpg"): ` Bitmap bitmapImage = (Bitmap)Image.FromFile(_predictSingleImage); float[] a = new float[(bitmapImage.Height * bitmapImage.Width) * 3]; Color[] c = new Color[bitmapImage.Height * bitmapImage.Width]; for (int i = 0; i < bitmapImage.Height * bitmapImage.Width; i++) { int row = i / bitmapImage.Width; int col = i % bitmapImage.Width; var pixel = bitmapImage.GetPixel(col, row); c[i] = pixel; //a[i + 0] = pixel.ToArgb(); a[i * 3 + 0] = pixel.R; a[i * 3 + 1] = pixel.G; a[i * 3 + 2] = pixel.B; } var aux = c.ToArray(); TensorData imageTensorData = new TensorData() { input = a.ToArray() }; PredictionEngine<TensorData, OutputScores> _predictionEngineX; var loadedModelX = mlContex.Model.Load("trainedModelEAST3.zip", out _); _predictionEngineX = mlContex.Model.CreatePredictionEngine<TensorData, OutputScores>(loadedModelX); var predictionX = _predictionEngineX.Predict(imageTensorData); ` that gave this results: For the "geometry" - output {float[2234880]} float[] [0] 164.553131 float [1] 108.803284 float [2] 88.53912 float [3] 157.4754 float [4] -0.00642232737 float [5] 121.783844 float [6] 93.6575 float [7] 89.14729 float [8] 149.1378 float [9] 0.003307178 float [10] 143.044312 float [11] 92.95393 float [12] 93.75145 float [13] 136.486084 float [14] -0.00365050742 float [15] 150.783173 float [16] 105.081482 float [17] 104.515717 float [18] 138.529785 float [19] 0.00163079088 float [20] 155.030853 float For the scores: - output2 {float[446976]} float[] [0] 5.96046448E-08 float [1] 2.38418579E-07 float [2] 2.38418579E-07 float [3] 4.76837158E-07 float [4] 2.682209E-07 float [5] 1.49011612E-07 float [6] 3.27825546E-07 float [7] 5.662441E-07 float [8] 3.27825546E-07 float [9] 5.066395E-07 float [10] 1.10268593E-06 float [11] 1.10268593E-06 float [12] 1.22189522E-06 float [13] 1.10268593E-06 float [14] 6.854534E-07 float [15] 4.76837158E-07 float [16] 2.682209E-07 float [17] 2.682209E-07 float [18] 1.49011612E-07 float [19] 2.38418579E-07 float [20] 1.49011612E-07 float Well that is how far I went. Could some one tell me If I implemented the loading of Image correctly or not. My end goal is to have the same or similar result as in OpenCV this are the packages I am using: ![Packages](https://user-images.githubusercontent.com/60855616/87889195-63b67700-c9fe-11ea-8b9b-198e6513bbc3.png) and yes I tried this to create a pipeline: `var imagesDataFile = @"..\..\DNN_ML_CUDA_01\assets\imagesText\"; var data = mlContext.Data.CreateTextLoader(new TextLoader.Options() { Columns = new[] { new TextLoader.Column("ImagePath", DataKind.String, 0), new TextLoader.Column("Name", DataKind.String, 1), new TextLoader.Column("input_images", DataKind.Single , 2), } }).Load(imagesDataFile); var imagesFolder = Path.GetDirectoryName(imagesDataFile); // Image loading pipeline. var pipelineI = mlContext.Transforms.LoadImages("ImageObject", imagesFolder, "ImagePath") .Append(mlContext.Transforms.ResizeImages("ImageObjectResized", inputColumnName: "ImageObject", imageWidth: imageWidth, imageHeight: imageHeight)) .Append(mlContext.Transforms.ExtractPixels("Pixels", "ImageObjectResized")) .Append(mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel) .ScoreTensorFlowModel( outputColumnNames: new[] { "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3" }, inputColumnNames: new[] { "input_images" }, addBatchDimensionInput: false)) ; List<TensorData> list = new List<TensorData>(); list.Add(new TensorData() { input = null }); IEnumerable<TensorData> enumerableData = list; var dvv = mlContext.Data.LoadFromEnumerable<TensorData>(list);//TensorData var model = pipelineI.Fit(dvv); using var modelX = mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel); var testeschema1 = modelX.GetInputSchema(); Directory.CreateDirectory("Model"); mlContext.Model.Save(model, testeschema1, "trainedModelEAST3.zip"); ` It gave me the same results for reference these are the websites that I use for this project: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.loadimages?view=ml-dotnet https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md#how-do-i-train-my-model-on-categorical-data https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.extractpixels?view=ml-dotnet https://devblogs.microsoft.com/cesardelatorre/run-with-ml-net-c-code-a-tensorflow-model-exported-from-azure-cognitive-services-custom-vision/ https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/ https://devblogs.microsoft.com/cesardelatorre/training-image-classification-recognition-models-based-on-deep-learning-transfer-learning-with-ml-net/ https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.transforms.tensorflowmodel.scoretensorflowmodel?view=ml-dotnet https://github.com/dotnet/machinelearning/issues/5286 https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_TensorFlow If somebody could show me an example, of guide me or anything that would be great.

dimohy · 2월 13, 2021, 8:01오전

dimohy · 2월 13, 2021, 11:07오전

1차 정리

딥러닝으로 OCR을 하기 위해선 두가지로 접근해야 한다.
1. 글자 영역 검출 EAST Text Detector
2. 글자 인식 Tesseract 4
3. 기타 전처리 기법
.NET 환경에 맞게 적절한 딥러닝 프레임워크 및 엔진, 신경망알고리즘을 찾아 적용한다.

dimohy · 2월 14, 2021, 1:41오후

OpenCV : DNN 신경망 및 Text Detection Model - EAST

https://docs.opencv.org/master/d6/d0f/group__dnn.html

https://docs.opencv.org/master/d8/ddc/classcv_1_1dnn_1_1TextDetectionModel__EAST.html

dimohy · 2월 14, 2021, 2:22오후

Tesseract (.NET) Samples

charlesw/tesseract-samples: Samples for the Tesseract.Net wrapper (github.com)

tessdata

dimohy · 2월 14, 2021, 3:05오후

Tesseract 테스트

레가시 방식과 LSTM 방식 모두 지원하고 Default가 LSTM이라고 하는데 생각보다 인식이 좋지는 않네요.

using System;
using System.IO;
using System.Net.Http;

using Tesseract;

// Tesseract를 테스트하는 간단한 예제를 작성합니다.
// 1. Tesseract .NET 패키지 NuGet 설치 - Install-Package Tesseract
// 2. 샘플코드 작성

var a1 = "http://cdn.011st.com/11dims/resize/600x600/quality/75/11src/pd/20/3/7/2/6/0/6/iOLuU/2722372606_B.jpg"; // 이런건 안됨
var a2 = "https://image.chosun.com/sitedata/image/202008/13/2020081303153_0.jpg"; // 잘됨. 단, 상단 로고도 한글로 변환하는것으로 텍스트 영역지정이 필요해 보임
var a3 = "https://www.ibookpark.com/wp-content/uploads/2020/03/Screen-Shot-2019-07-08-at-3.56.10-PM.jpg"; // 이런건 안됨
var a4 = "https://mblogthumb-phinf.pstatic.net/MjAxOTEyMjBfNDkg/MDAxNTc2ODI0NTMwNjA0.r2AOGK4g4ssSCUsiJjImlLRVTpkQg9bWOWXlaEwvqNQg.Mv9Fz2c6tc6FV1Z4kMiaigyU4RqUWxKb8LX0ch9SvBkg.JPEG.feublot/SE-dc86b9b0-807a-4a7c-a54d-17c5e5981123.jpg?type=w800"; // 이런것도 안됨
var a5 = "https://www.computertechreviews.com/wp-content/uploads/2019/11/image-result-for-link-building-min-1200x675.jpg";
var imgUri = a5;
var c = new HttpClient(); 
 var imgData = await c.GetByteArrayAsync(imgUri);
imgData = File.ReadAllBytes(@"w:\input.png");

using var engine = new TesseractEngine("./tessdata", "eng", EngineMode.Default);
using var img = Pix.LoadFromMemory(imgData);
using var page = engine.Process(img);
var text = page.GetText();
Console.WriteLine(text);

Delicious_App · 2월 14, 2021, 10:46오후

감사합니다. 아직은 구현 전이라 잘 읽어 본 후 코멘트 드리겠습니다.

dimohy · 2월 15, 2021, 12:57오전

Tesseract - tessdata 선택

tessdata는 tesseract-ocr (github.com)의 tessdata레파지토리에서 필요한 언어에 맞게 다운로드 받아 사용하면 됩니다.

목적에 따라 tessdata_fast, tessdata, tessdata_best를 선택할 수 있습니다. fast는 빠른대신 인식률이 떨어지고, best는 인식률이 좋은대신 속도가 느립니다.

dimohy · 2월 15, 2021, 10:05오전

OpenCV - EAST Text Detection 테스트

관련 소스코드를 참조하여 코딩했습니다.
OpenCvSharp 패키지를 설치하고 Models/frozen_east_text_detection.pb가 있어야 합니다
EAST Text Detection Model 다운로드

using OpenCvSharp;
using OpenCvSharp.Dnn;

using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Net.Http;

//var imageUri = "https://ganpaneasy.co.kr/uploads/cmallitem/2020/01/1d9a6d44adfe9b20bcbfb964b822be98.jpg";
var imageUri = "https://t1.daumcdn.net/thumb/R720x0/?fname=http://t1.daumcdn.net/brunch/service/user/3FXy/image/sqSmOMklFK34ylouQhXl08CPenw.png";

using var net = CvDnn.ReadNet("./Models/frozen_east_text_detection.pb");

using var c = new HttpClient();
var imageData = await c.GetByteArrayAsync(imageUri);
using var frame = Mat.FromImageData(imageData);
var (newW, newH) = (320, 320);
var rW = (float)frame.Width / newW;
var rH = (float)frame.Height / newH;
var newFrame = frame.Resize(new Size(newW, newH));
//Window.ShowImages(newFrame);
using var blob = CvDnn.BlobFromImage(newFrame, 1.0, new Size(newFrame.Width, newFrame.Height), new Scalar(123.68, 116.78, 103.94), swapRB: true, crop: false);

var outputLayers = new[] { "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3" };
net.SetInput(blob);
var output = new List<Mat>() { new(), new() };
net.Forward(output, outputLayers);

var scores = output[0];
var geometry = output[1];

var numRows = scores.Rows;
var numCols = scores.Cols;

var confThreshold = 0.5f;
Decode(scores, geometry, confThreshold, out var boxes, out var confidences);

var nmsThreshold = 0.4f;
CvDnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, out var indices);

// Render detections.
var ratio = new Point2f(rW, rH);
for (var i = 0; i < indices.Length; ++i)
{
    RotatedRect box = boxes[indices[i]];

    Point2f[] vertices = box.Points();

    for (int j = 0; j < 4; ++j)
    {
        vertices[j].X *= ratio.X;
        vertices[j].Y *= ratio.Y;
    }

    for (int j = 0; j < 4; ++j)
    {
        Cv2.Line(frame, (int)vertices[j].X, (int)vertices[j].Y, (int)vertices[(j + 1) % 4].X, (int)vertices[(j + 1) % 4].Y, new Scalar(0, 255, 0), 3);
    }
}

// Optional - Save detections
var fileName = "output.jpg";
frame.SaveImage(Path.Combine(Path.GetDirectoryName(fileName), $"{Path.GetFileNameWithoutExtension(fileName)}_east.jpg"));

// -------------

static unsafe void Decode(Mat scores, Mat geometry, float confThreshold, out IList<RotatedRect> boxes, out IList<float> confidences)
{
    boxes = new List<RotatedRect>();
    confidences = new List<float>();

    if ((scores == null || scores.Dims != 4 || scores.Size(0) != 1 || scores.Size(1) != 1) ||
        (geometry == null || geometry.Dims != 4 || geometry.Size(0) != 1 || geometry.Size(1) != 5) ||
        (scores.Size(2) != geometry.Size(2) || scores.Size(3) != geometry.Size(3)))
    {
        return;
    }

    int height = scores.Size(2);
    int width = scores.Size(3);

    for (int y = 0; y < height; ++y)
    {
        var scoresData = new ReadOnlySpan<float>((void*)scores.Ptr(0, 0, y), height);
        var x0Data = new ReadOnlySpan<float>((void*)geometry.Ptr(0, 0, y), height);
        var x1Data = new ReadOnlySpan<float>((void*)geometry.Ptr(0, 1, y), height);
        var x2Data = new ReadOnlySpan<float>((void*)geometry.Ptr(0, 2, y), height);
        var x3Data = new ReadOnlySpan<float>((void*)geometry.Ptr(0, 3, y), height);
        var anglesData = new ReadOnlySpan<float>((void*)geometry.Ptr(0, 4, y), height);

        for (int x = 0; x < width; ++x)
        {
            var score = scoresData[x];
            if (score >= confThreshold)
            {
                float offsetX = x * 4.0f;
                float offsetY = y * 4.0f;
                float angle = anglesData[x];
                float cosA = (float)Math.Cos(angle);
                float sinA = (float)Math.Sin(angle);
                float x0 = x0Data[x];
                float x1 = x1Data[x];
                float x2 = x2Data[x];
                float x3 = x3Data[x];
                float h = x0 + x2;
                float w = x1 + x3;
                Point2f offset = new Point2f(offsetX + (cosA * x1) + (sinA * x2), offsetY - (sinA * x1) + (cosA * x2));
                Point2f p1 = new Point2f((-sinA * h) + offset.X, (-cosA * h) + offset.Y);
                Point2f p3 = new Point2f((-cosA * w) + offset.X, (sinA * w) + offset.Y);
                RotatedRect r = new RotatedRect(new Point2f(0.5f * (p1.X + p3.X), 0.5f * (p1.Y + p3.Y)), new Size2f(w, h), (float)(-angle * 180.0f / Math.PI));
                boxes.Add(r);
                confidences.Add(score);
            }
        }
    }
}