恓恮悷ćƒŖćƒ¼ć‚ŗ恧ćÆ态Pythonć®ę§˜ć€…ćŖę“»ē”Øć®ę–¹ę³•ć‚’ē“¹ä»‹ć—ć¦ć„ć¾ć™ć€‚

今回ćÆ怌EasyOCR怍悒ä½æć£ć¦ć€ē”»åƒć‹ć‚‰ćƒ†ć‚­ć‚¹ćƒˆć‚’čŖ­ćæå–ć‚‹ę–¹ę³•ć‚’ē“¹ä»‹ć—ć¾ć™ć€‚

å®Ÿéš›ć«OCRꊀ蔓悒ä½æć£ć¦ćæć¾ć—ć‚‡ć†ć€‚

Google colab悒ä½æē”Ø恗恦ē°”å˜ć«å®Ÿč£…ć™ć‚‹ć“ćØćŒć§ćć¾ć™ć®ć§ć€ćœć²ęœ€å¾Œć¾ć§ć”č¦§ćć ć•ć„ć€‚

ä»Šå›žć®ē›®ęؙ

惻EasyOCRćØćÆ

惻EasyOCR恮åŸŗęœ¬ēš„ćŖä½æć„ę–¹

惻EasyOCR恮ē²¾åŗ¦ę”¹å–„

惻OCR恮åÆč¦–åŒ–

EasyOCRćØćÆ

OCRćØćÆ

OCRļ¼ˆOptical Character Recognitionļ¼‰ćØćÆ态ē”»åƒå†…ć®ćƒ†ć‚­ć‚¹ćƒˆć‚’čŖč­˜ć—ć€ć‚³ćƒ³ćƒ”ćƒ„ćƒ¼ć‚æćƒ¼äøŠć§ē·Ø集åÆčƒ½ćŖćƒ†ć‚­ć‚¹ćƒˆćƒ‡ćƒ¼ć‚æć«å¤‰ę›ć™ć‚‹ęŠ€č”“ć§ć™ć€‚ć“ć®ęŠ€č”“ć«ć‚ˆć‚Šć€ē“™ć®ę›øé”žć‚„ćƒ‡ć‚øć‚æ惫ē”»åƒć«å«ć¾ć‚Œć‚‹ę–‡å­—ęƒ…å ±ć‚’č‡Ŗ動ēš„恫čŖ­ćæå–ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚

OCR恮仕ēµ„ćæćÆ仄äø‹ć®ć‚ˆć†ćŖć‚¹ćƒ†ćƒƒćƒ—ć§ę§‹ęˆć•ć‚Œć¦ć„ć¾ć™ć€‚ć¾ćšć€ć‚¹ć‚­ćƒ£ćƒŠćƒ¼ć‚„ć‚«ćƒ”ćƒ©ć‚’ä½æć£ć¦ē”»åƒć‚’å–ć‚Šč¾¼ćæć¾ć™ć€‚ę¬”ć«ć€ćć®ē”»åƒć«åÆ¾ć—ć¦å‰å‡¦ē†ć‚’č”Œć„ć¾ć™ć€‚ć“ć‚Œć«ćÆ态ē”»åƒć®å‚¾ćč£œę­£ć€ćƒŽć‚¤ć‚ŗé™¤åŽ»ć€ć‚³ćƒ³ćƒˆćƒ©ć‚¹ćƒˆčŖæę•“ćŖć©ćŒå«ć¾ć‚Œć¾ć™ć€‚å‰å‡¦ē†ć‚’č”Œć†ć“ćØć§ć€ćƒ†ć‚­ć‚¹ćƒˆčŖč­˜ć®ē²¾åŗ¦ćŒå‘äøŠć—ć¾ć™ć€‚

ē¶šć„恦态ē”»åƒå†…ć®ę–‡å­—é ˜åŸŸć‚’ē‰¹å®šć—ć€å€‹ć€…ć®ę–‡å­—ć‚’åˆ‡ć‚Šå‡ŗć—ć¾ć™ć€‚ć“ć®éŽēØ‹ć§ćÆć€ę–‡å­—ć®č¼Ŗ郭悒ꤜå‡ŗć—ć€ę–‡å­—ć®å½¢ēŠ¶ć‚’åˆ†ęžć—ć¾ć™ć€‚åˆ‡ć‚Šå‡ŗć•ć‚ŒćŸę–‡å­—ćÆ态ē‰¹å¾“ęŠ½å‡ŗć‚¢ćƒ«ć‚“ćƒŖć‚ŗ惠悒ē”Ø恄恦ē‰¹å¾“惙ć‚Æćƒˆćƒ«ć«å¤‰ę›ć•ć‚Œć¾ć™ć€‚

ęœ€å¾Œć«ć€ę©Ÿę¢°å­¦ēæ’ćƒ¢ćƒ‡ćƒ«ć‚’ä½æć£ć¦ć€ē‰¹å¾“惙ć‚Æćƒˆćƒ«ć‹ć‚‰å®Ÿéš›ć®ę–‡å­—ć‚’č­˜åˆ„ć—ć¾ć™ć€‚ć“ć®ę©Ÿę¢°å­¦ēæ’ćƒ¢ćƒ‡ćƒ«ćÆć€å¤§é‡ć®ę–‡å­—ē”»åƒćƒ‡ćƒ¼ć‚æ悒ē”Ø恄恦äŗ‹å‰ć«čؓē·“ć•ć‚Œć¦ć„ć¾ć™ć€‚č­˜åˆ„ć•ć‚ŒćŸę–‡å­—ćÆć€ćƒ†ć‚­ć‚¹ćƒˆćƒ‡ćƒ¼ć‚æćØ恗恦å‡ŗåŠ›ć•ć‚Œć¾ć™ć€‚

OCRćÆ态ē“™ć®ę›øé”žć®ćƒ‡ć‚øć‚æćƒ«åŒ–ć€ååˆŗē®”ē†ć€č‡Ŗå‹•ćƒ‡ćƒ¼ć‚æ兄力ćŖć©ć€ć•ć¾ć–ć¾ćŖåˆ†é‡Žć§ę“»ē”Øć•ć‚Œć¦ć„ć¾ć™ć€‚Python恧ćÆ态Tesseract悄态OpenCVćØć„ć£ćŸćƒ©ć‚¤ćƒ–ćƒ©ćƒŖ悒ē”Ø恄恦OCRć‚’å®Ÿč£…ć™ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚ć“ć‚Œć‚‰ć®ćƒ©ć‚¤ćƒ–ćƒ©ćƒŖ悒ä½æ恈恰态ęÆ”č¼ƒēš„ē°”å˜ć«OCRć‚·ć‚¹ćƒ†ćƒ ć‚’ę§‹ēÆ‰ć§ćć¾ć™ć€‚

恟恠恗态OCR恮ē²¾åŗ¦ćÆ态ē”»åƒć®å“č³Ŗć‚„ę–‡å­—ć®ēØ®é”žć€ćƒ¬ć‚¤ć‚¢ć‚¦ćƒˆć®č¤‡é›‘ć•ćŖć©ć«å¤§ććä¾å­˜ć—ć¾ć™ć€‚ę‰‹ę›øćę–‡å­—ć‚„č¤‡é›‘ćŖ背ę™Æ悒ꌁ恤ē”»åƒć§ćÆ态čŖč­˜ē²¾åŗ¦ćŒä½Žäø‹ć™ć‚‹å‚¾å‘ćŒć‚ć‚Šć¾ć™ć€‚ćć®ćŸć‚ć€OCR悒適ē”Ø恙悋際ćÆ态ē”»åƒć®å“č³Ŗć‚’é«˜ć‚ć‚‹ćŸć‚ć®å‰å‡¦ē†ć‚„态čŖč­˜ēµęžœć®å¾Œå‡¦ē†ćŒé‡č¦ćØćŖć‚Šć¾ć™ć€‚

OCRćÆć€ćƒ‰ć‚­ćƒ„ćƒ”ćƒ³ćƒˆć®é›»å­åŒ–ć‚„č‡Ŗå‹•åŒ–ć‚’é€²ć‚ć‚‹äøŠć§éžåøøć«ęœ‰ē”ØćŖꊀ蔓恧恙怂Python悒ä½æć£ć¦OCRć‚·ć‚¹ćƒ†ćƒ ć‚’ę§‹ēÆ‰ć™ć‚‹ć“ćØ恧态ē“™ć®ę›øé”žć‹ć‚‰ćƒ‡ćƒ¼ć‚æć‚’åŠ¹ēŽ‡ēš„ć«ęŠ½å‡ŗć—ć€ę„­å‹™ć®ē”Ÿē”£ę€§ć‚’向äøŠć•ć›ć‚‹ć“ćØ恌恧恍悋恧恗悇恆怂

EasyOCRćØćÆ

EasyOCRćÆ态Python恧ę›ø恋悌恟ä½æć„ć‚„ć™ć„å…‰å­¦å¼ę–‡å­—čŖč­˜(OCR)ćƒ©ć‚¤ćƒ–ćƒ©ćƒŖ恧恙怂EasyOCR悒ä½æ恈恰态ē”»åƒć‚„ć‚¹ć‚­ćƒ£ćƒ³ć—ćŸćƒ‰ć‚­ćƒ„ćƒ”ćƒ³ćƒˆå†…ć®ę–‡å­—ć‚’ē°”å˜ć«čŖ­ćæå–ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚

EasyOCRć®å¤§ććŖē‰¹å¾“ćÆ态80仄äøŠć®č؀čŖžć‚’ć‚µćƒćƒ¼ćƒˆć—ć¦ć„ć‚‹ć“ćØ恧恙怂英čŖžć‚„ę—„ęœ¬čŖžćÆć‚‚ć”ć‚ć‚“ć€äø­å›½čŖžć€ć‚¢ćƒ©ćƒ“ć‚¢čŖžć€ć‚­ćƒŖćƒ«ę–‡å­—ćŖ恩态äø–ē•Œäø­ć®ć•ć¾ć–ć¾ćŖč؀čŖžć®ę–‡å­—ć‚’čŖč­˜ć™ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚č؀čŖžć‚’ęŒ‡å®šć™ć‚‹ć ć‘ć§ć€ćć®č؀čŖžć®ćƒ¢ćƒ‡ćƒ«ćŒč‡Ŗ動ēš„ć«ćƒ€ć‚¦ćƒ³ćƒ­ćƒ¼ćƒ‰ć•ć‚Œć€ć™ćć«ä½æē”Øć§ćć‚‹ć‚ˆć†ć«ćŖć‚Šć¾ć™ć€‚

ć¾ćŸć€EasyOCRćÆ态GPU悒ä½æć£ćŸé«˜é€Ÿå‡¦ē†ć«ć‚‚åƾåæœć—ć¦ć„ć¾ć™ć€‚å¤§é‡ć®ē”»åƒć‚’処ē†ć™ć‚‹å “合ćŖć©ć«ć€å‡¦ē†é€Ÿåŗ¦ć‚’å¤§å¹…ć«å‘äøŠć•ć›ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚GPU悒ä½æē”Ø恗ćŖć„å “åˆć§ć‚‚ć€CPUć§ć®å‡¦ē†ć‚‚åÆčƒ½ć§ć™ć€‚

EasyOCR恮ä½æć„ę–¹ćÆ非åøøć«ć‚·ćƒ³ćƒ—ćƒ«ć§ć™ć€‚ć¾ćšć€Readerć‚Ŗ惖ć‚ø悧ć‚Æćƒˆć‚’ä½œęˆć—ć€čŖč­˜ć—ćŸć„č؀čŖžć‚’ęŒ‡å®šć—ć¾ć™ć€‚ę¬”恫态readtextćƒ”ć‚½ćƒƒćƒ‰ć«ē”»åƒć®ćƒ‘ć‚¹ć‚„ē”»åƒćƒ‡ćƒ¼ć‚æ悒ęø”恙恠恑恧态čŖč­˜ēµęžœć‚’å–å¾—ć§ćć¾ć™ć€‚čŖč­˜ēµęžœćÆć€å„ćƒ†ć‚­ć‚¹ćƒˆć®åŗ§ęØ™ć€čŖč­˜ć•ć‚ŒćŸćƒ†ć‚­ć‚¹ćƒˆć€čŖč­˜ć®äæ”é ¼åŗ¦ć‚’ćƒŖć‚¹ćƒˆć§čæ”ć—ć¦ćć‚Œć¾ć™ć€‚

EasyOCRćÆć€ęœ€å…ˆē«Æć®ę·±å±¤å­¦ēæ’ćƒ¢ćƒ‡ćƒ«ć‚’ä½æē”Øć—ć¦ć„ć‚‹ćŸć‚ć€é«˜ć„čŖč­˜ē²¾åŗ¦ć‚’実ē¾ć—ć¦ć„ć¾ć™ć€‚ć¾ćŸć€ć‚³ćƒ¼ćƒ‰ćŒć‚Ŗćƒ¼ćƒ—ćƒ³ć‚½ćƒ¼ć‚¹ć§å…¬é–‹ć•ć‚Œć¦ć„ć‚‹ćŸć‚ć€é–‹ē™ŗ者ćÆćƒ¢ćƒ‡ćƒ«ć‚’č‡Ŗē”±ć«ć‚«ć‚¹ć‚æ惞悤ć‚ŗ恗恟悊态ꖰ恗恄č؀čŖžć«åƾåæœć•ć›ćŸć‚Šć™ć‚‹ć“ćØć‚‚ć§ćć¾ć™ć€‚

EasyOCR恮åŸŗęœ¬ēš„ćŖä½æć„ę–¹

恓恓恋悉ćÆGoogle colabē’°å¢ƒć§é€²ć‚ć¦ć„ćć¾ć™ć€‚

å°Žå…„

ć¾ćšćÆ怌EasyOCRć€ć‚’ć‚¤ćƒ³ć‚¹ćƒˆćƒ¼ćƒ«ć—ć¾ć™ć€‚

!pip install easyocr

ć‚¤ćƒ³ć‚¹ćƒˆćƒ¼ćƒ«ćŒå®Œäŗ†ć—ć¾ć—ćŸć€‚

ē”»åƒć®ęŗ–å‚™

ć¾ćšćÆåŸŗęœ¬ēš„ćŖOCRć‚’å®Ÿč£…ć—ć¦ćæć¾ć™ć€‚

今回ćÆ恓恔悉恮ē”»åƒć‚’ä½æē”Øć—ć¾ć™ć€‚

OCRć®å®Ÿč£…

仄äø‹ć®ä¾‹ć§ćÆ态英čŖžćØę—„ęœ¬čŖžć‚’åÆ¾č±”ćØć—ć¾ć™ć€‚

ć¾ćŸć€GPU悒ä½æē”Ø恛恚恫OCRć‚’å®Ÿč”Œć—ć¾ć™ć€‚

ęœ€å¾Œć«å®Ÿč”Œēµęžœć‚’č”Øē¤ŗć—ć¾ć™ć€‚

import easyocr
reader = easyocr.Reader(['en','ja'], gpu=False) 
result = reader.readtext('29767855_m.jpg')
result

äøŠčØ˜ć®ć‚³ćƒ¼ćƒ‰ć§ćÆ态仄äø‹ć®ć“ćØć‚’č”Œć£ć¦ć„ć¾ć™ć€‚

  1. easyocr.Reader 恧态OCR恮čØ­å®šć‚’č”Œć„ć¾ć™ć€‚[‘ja’, ‘en’] ćÆć€ę—„ęœ¬čŖžćØ英čŖžć‚’čŖč­˜åÆ¾č±”ćØ恙悋恓ćØć‚’ę„å‘³ć—ć¾ć™ć€‚gpu=False ćÆ态GPU悒ä½æć‚ćšć«CPUć§å‡¦ē†ć™ć‚‹ć“ćØć‚’ę„å‘³ć—ć¾ć™ć€‚
  2. reader.readtext(‘29767855_m.jpg’) 恧态29767855_m.jpg ćØ恄恆ē”»åƒćƒ•ć‚”ć‚¤ćƒ«ć«åÆ¾ć—ć¦OCRć‚’å®Ÿč”Œć—ć¾ć™ć€‚

å®Ÿč”Œēµęžœļ¼š

[([[923, 589], [1015, 589], [1015, 691], [923, 691]], 'ꈐ', 0.9969421195671124),
 ([[624.0259849793654, 639.3195309050808],
   [924.1861792718654, 593.1081907488701],
   [932.9740150206346, 712.6804690949192],
   [632.8138207281346, 757.8918092511299]],
  'č³‡ę–™ä½œ',
  0.9984824140989369),
 ([[649.4780193260012, 766.7390096630006],
   [1209.0622660481395, 665.9624206519773],
   [1226.5219806739988, 774.2609903369994],
   [666.9377339518607, 875.0375793480227]],
  'ćƒ—ćƒ¬ć‚¼ćƒ³ć®ē·“ēæ’',
  0.9713997451831543),
 ([[672.8626652879326, 886.6077314031057],
   [1159.9401030422644, 806.6564306633506],
   [1173.1373347120673, 923.3922685968943],
   [686.0598969577356, 1002.3435693366494]],
  'lon1恮ęŗ–å‚™',
  0.7934918448755321),
 ([[688.5825808840516, 1013.6408389724568],
   [1235.1181535068565, 930.1098007637163],
   [1248.4174191159484, 1045.3591610275432],
   [701.8818464931436, 1128.8901992362837]],
  'MTGč³‡ę–™å°åˆ·',
  0.9891207866280942),
 ([[734.5060845773462, 1165.5518253732039],
   [1268.3019336181942, 1074.9668503015719],
   [1283.4939154226538, 1169.4481746267961],
   [749.6980663818058, 1260.0331496984281]],
  'A恕悓恫mail',
  0.9949341920660215)]

ęŠ½å‡ŗć•ć‚ŒćŸę–‡å­—åˆ—ćØåƾåæœć™ć‚‹åŗ§ęØ™ćŒå‡ŗåŠ›ć•ć‚Œć¦ć„ć‚‹ć“ćØćŒć‚ć‹ć‚Šć¾ć™ć€‚

å‡ŗ力ēµęžœć‚’č”Øå½¢å¼ć§å‡ŗ力

å‡ŗ力ēµęžœć‚’č¦‹ć‚„ć™ćć™ć‚‹ćŸć‚ć€č”Øå½¢å¼ć§č”Øē¤ŗ恗恦ćæć¾ć™ć€‚

import easyocr
import pandas as pd

reader = easyocr.Reader(['en', 'ja'], gpu=False)
result = reader.readtext('29767855_m.jpg')

data = []

for detection in result:
    text = detection[1]
    confidence = detection[2]
    coordinates = detection[0]
    x1, y1 = coordinates[0]
    x2, y2 = coordinates[1]
    x3, y3 = coordinates[2]
    x4, y4 = coordinates[3]
    data.append({"ćƒ†ć‚­ć‚¹ćƒˆ": text, "äæ”é ¼åŗ¦": confidence,
                 "x1": x1, "y1": y1, "x2": x2, "y2": y2,
                 "x3": x3, "y3": y3, "x4": x4, "y4": y4})

df = pd.DataFrame(data)
df

å®Ÿč”Œēµęžœļ¼š

indexćƒ†ć‚­ć‚¹ćƒˆäæ”é ¼åŗ¦x1y1x2y2x3y3x4y4
0ꈐ0.9969421195671124923.0589.01015.0589.01015.0691.0923.0691.0
1č³‡ę–™ä½œ0.9984824140989369624.0259849793654639.3195309050808924.1861792718654593.1081907488701932.9740150206346712.6804690949192632.8138207281346757.8918092511299
2ćƒ—ćƒ¬ć‚¼ćƒ³ć®ē·“ēæ’0.9713997451831543649.4780193260012766.73900966300061209.0622660481395665.96242065197731226.5219806739988774.2609903369994666.9377339518607875.0375793480227
3lon1恮ęŗ–å‚™0.7934918448755321672.8626652879326886.60773140310571159.9401030422644806.65643066335061173.1373347120673923.3922685968943686.05989695773561002.3435693366494
4MTGč³‡ę–™å°åˆ·0.9891207866280942688.58258088405161013.64083897245681235.1181535068565930.10980076371631248.41741911594841045.3591610275432701.88184649314361128.8901992362837
5A恕悓恫mail0.9949341920660215734.50608457734621165.55182537320391268.30193361819421074.96685030157191283.49391542265381169.4481746267961749.69806638180581260.0331496984281

ē”»åƒćØå‡ŗ力ēµęžœć‚’ęÆ”č¼ƒć—ć¦ćæ悋ćØć€ć€Œč³‡ę–™ä½œć€ćØć€Œęˆć€ć«åˆ†å‰²ć•ć‚Œć¦ć—ć¾ć£ć¦ć„ć‚‹ć“ćØćŒć‚ć‹ć‚Šć¾ć™ć€‚

ć¾ćŸć€ć€Œ1on1怍ćÆ怌lon1ć€ć«ćŖć£ć¦ć—ć¾ć£ć¦ć„ć‚‹ć“ćØćŒć‚ć‹ć‚Šć¾ć—ćŸć€‚

ć‚‚ć†å°‘ć—ē²¾åŗ¦ć®ę”¹å–„ćŒåæ…要ćØćŖ悊恝恆恧恙怂

EasyOCR恮ē²¾åŗ¦ę”¹å–„

EasyOCR恧ćÆOCRå®Ÿč”Œę™‚ć®å¼•ę•°ć‚’čØ­å®šć™ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚

仄äø‹ć®å¼•ę•°ćØćć®å†…å®¹ć§ć™ć€‚

easyocr.Reader()恧ä½æē”Øć™ć‚‹å¼•ę•°

å¼•ę•°åčŖ¬ę˜Ž
lang_listčŖč­˜ć—ćŸć„č؀čŖžć‚³ćƒ¼ćƒ‰ć®ćƒŖć‚¹ćƒˆć€‚ä¾‹ćˆć°ć€[‘ch_sim’, ‘en’] ć®ć‚ˆć†ć«ęŒ‡å®šć—ć¾ć™ć€‚
gpuGPU ć‚’ęœ‰åŠ¹ć«ć™ć‚‹ć‹ć©ć†ć‹ć‚’ęŒ‡å®šć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ True 恧恙怂gpu=FalsećØ恙悋恓ćØ恧态CPUć§ć‚‚å‹•ä½œć—ć¾ć™ć€‚
model_storage_directoryćƒ¢ćƒ‡ćƒ«ćƒ‡ćƒ¼ć‚æ悒äæå­˜ć™ć‚‹ćƒ‡ć‚£ćƒ¬ć‚Æ惈ćƒŖć®ćƒ‘ć‚¹ć‚’ęŒ‡å®šć—ć¾ć™ć€‚ęŒ‡å®šć—ćŖć„å “åˆć€ē’°å¢ƒå¤‰ę•° EASYOCR_MODULE_PATHļ¼ˆęŽØå„Øļ¼‰ć€MODULE_PATHļ¼ˆå®šē¾©ć•ć‚Œć¦ć„ć‚‹å “合ļ¼‰ć€ć¾ćŸćÆ ~/.EasyOCR/ ć§å®šē¾©ć•ć‚ŒćŸćƒ‡ć‚£ćƒ¬ć‚Æ惈ćƒŖć‹ć‚‰ćƒ¢ćƒ‡ćƒ«ćŒčŖ­ćæč¾¼ć¾ć‚Œć¾ć™ć€‚
download_enabledEasyOCR ćŒćƒ¢ćƒ‡ćƒ«ćƒ•ć‚”ć‚¤ćƒ«ć‚’č¦‹ć¤ć‘ć‚‰ć‚ŒćŖć„å “åˆć«ćƒ€ć‚¦ćƒ³ćƒ­ćƒ¼ćƒ‰ć‚’ęœ‰åŠ¹ć«ć™ć‚‹ć‹ć©ć†ć‹ć‚’ęŒ‡å®šć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ True 恧恙怂
user_network_directoryćƒ¦ćƒ¼ć‚¶ćƒ¼å®šē¾©ć®čŖč­˜ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æć®ćƒ‘ć‚¹ć‚’ęŒ‡å®šć—ć¾ć™ć€‚ęŒ‡å®šć—ćŖć„å “åˆć€MODULE_PATH + ‘/user_network’ (~/.EasyOCR/user_network) ć‹ć‚‰ćƒ¢ćƒ‡ćƒ«ćŒčŖ­ćæč¾¼ć¾ć‚Œć¾ć™ć€‚
recog_networkęؙęŗ–ćƒ¢ćƒ¼ćƒ‰ć®ä»£ć‚ć‚Šć«ć€ē‹¬č‡Ŗ恮čŖč­˜ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ悒éøęŠžć§ćć¾ć™ć€‚ć“ć‚Œć«ć¤ć„ć¦ć®ćƒćƒ„ćƒ¼ćƒˆćƒŖć‚¢ćƒ«ćÆä»Šå¾Œä½œęˆć•ć‚Œć‚‹äŗˆå®šć§ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ ‘standard’ 恧恙怂
detectorꤜå‡ŗćƒ¢ćƒ‡ćƒ«ć‚’ćƒ”ćƒ¢ćƒŖ恫čŖ­ćæč¾¼ć‚€ć‹ć©ć†ć‹ć‚’ęŒ‡å®šć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ True 恧恙怂
recognizerčŖč­˜ćƒ¢ćƒ‡ćƒ«ć‚’ćƒ”ćƒ¢ćƒŖ恫čŖ­ćæč¾¼ć‚€ć‹ć©ć†ć‹ć‚’ęŒ‡å®šć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ True 恧恙怂

reader.readtext()恧ä½æē”Øć™ć‚‹å¼•ę•°

ćƒ‘ćƒ©ćƒ”ćƒ¼ć‚æ名čŖ¬ę˜Ž
image兄力ē”»åƒć€‚ę–‡å­—åˆ—ć€NumPyé…åˆ—ć€ćƒć‚¤ćƒˆåˆ—ć®ć„ćšć‚Œć‹ć§ęŒ‡å®šć€‚
decoderä½æē”Øć™ć‚‹ćƒ‡ć‚³ćƒ¼ćƒ€ćƒ¼ć€‚’greedy’ļ¼ˆč²Ŗę¬²ę³•ļ¼‰ć€’beamsearch’ļ¼ˆćƒ“ćƒ¼ćƒ ęŽ¢ē“¢ļ¼‰ć€’wordbeamsearch’ļ¼ˆå˜čŖžå˜ä½ć®ćƒ“ćƒ¼ćƒ ęŽ¢ē“¢ļ¼‰ć‹ć‚‰éøęŠžć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ’greedy’怂
beamWidth‘beamsearch’ć¾ćŸćÆ’wordbeamsearch’悒ä½æē”Ø恙悋際恫äæęŒć™ć‚‹ćƒ“ćƒ¼ćƒ ć®ę•°ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ5怂
batch_sizećƒćƒƒćƒć‚µć‚¤ć‚ŗ怂1ć‚ˆć‚Šå¤§ćć„å€¤ć‚’ęŒ‡å®šć™ć‚‹ćØEasyOCRć®å‡¦ē†é€Ÿåŗ¦ćŒå‘äøŠć—ć¾ć™ćŒć€ć‚ˆć‚Šå¤šćć®ćƒ”ćƒ¢ćƒŖć‚’ę¶ˆč²»ć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ1怂
workersćƒ‡ćƒ¼ć‚æćƒ­ćƒ¼ćƒ€ćƒ¼ć§ä½æē”Øć™ć‚‹ć‚¹ćƒ¬ćƒƒćƒ‰ę•°ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0怂
allowlistčŖč­˜åÆ¾č±”ć®ę–‡å­—ć‚’åˆ¶é™ć™ć‚‹ę–‡å­—åˆ—ć€‚ē‰¹å®šć®å•é”Œļ¼ˆćƒŠćƒ³ćƒćƒ¼ćƒ—ćƒ¬ćƒ¼ćƒˆćŖ恩ļ¼‰ć«å½¹ē«‹ć”ć¾ć™ć€‚
blocklistčŖč­˜åÆ¾č±”ć‹ć‚‰é™¤å¤–ć™ć‚‹ę–‡å­—ć‚’ęŒ‡å®šć™ć‚‹ę–‡å­—åˆ—ć€‚allowlistćŒęŒ‡å®šć•ć‚Œć¦ć„ć‚‹å “åˆćÆē„”č¦–ć•ć‚Œć¾ć™ć€‚
detailå‡ŗåŠ›ć®č©³ē“°åŗ¦ć‚’ęŒ‡å®šć€‚0恫恙悋ćØē°”ꘓå‡ŗåŠ›ć«ćŖć‚Šć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ1怂
paragraphēµęžœć‚’ę®µč½ćØć—ć¦ć¾ćØć‚ć‚‹ć‹ć©ć†ć‹ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆFalse怂
min_size惔ć‚Æć‚»ćƒ«å˜ä½ć§ć€ć“ć‚Œć‚ˆć‚Šå°ć•ć„ćƒ†ć‚­ć‚¹ćƒˆćƒœćƒƒć‚Æć‚¹ć‚’ćƒ•ć‚£ćƒ«ć‚æćƒŖćƒ³ć‚°ć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ10怂
rotation_infoEasyOCRćŒå„ćƒ†ć‚­ć‚¹ćƒˆćƒœćƒƒć‚Æć‚¹ć‚’å›žč»¢ć•ć›ć€ęœ€ć‚‚ē¢ŗäæ”åŗ¦ć®é«˜ć„悂恮悒čæ”恙恓ćØ悒čرåÆć—ć¾ć™ć€‚90态180态270ć®å€¤ćŒåˆ©ē”ØåÆčƒ½ć§ć™ć€‚ä¾‹ćˆć°ć€[90, 180, 270]ćØ恙悋恓ćØć§ć€č€ƒćˆć‚‰ć‚Œć‚‹ć™ć¹ć¦ć®ćƒ†ć‚­ć‚¹ćƒˆć®å‘ćć‚’č©¦ć™ć“ćØćŒć§ćć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆNone怂
contrast_thsć“ć®å€¤ć‚ˆć‚Šä½Žć„ć‚³ćƒ³ćƒˆćƒ©ć‚¹ćƒˆć®ćƒ†ć‚­ć‚¹ćƒˆćƒœćƒƒć‚Æć‚¹ćÆć€å…ƒć®ē”»åƒćØ’adjust_contrast’ć®å€¤ć«čŖæę•“ć•ć‚ŒćŸē”»åƒć®2å›žćƒ¢ćƒ‡ćƒ«ć«ęø”ć•ć‚Œć¾ć™ć€‚ć‚ˆć‚Šē¢ŗäæ”åŗ¦ć®é«˜ć„ę–¹ćŒēµęžœćØ恗恦čæ”ć•ć‚Œć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.1怂
adjust_contrastä½Žć‚³ćƒ³ćƒˆćƒ©ć‚¹ćƒˆć®ćƒ†ć‚­ć‚¹ćƒˆćƒœćƒƒć‚Æć‚¹ć«åÆ¾ć™ć‚‹ć‚æćƒ¼ć‚²ćƒƒćƒˆć®ć‚³ćƒ³ćƒˆćƒ©ć‚¹ćƒˆćƒ¬ćƒ™ćƒ«ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.5怂
text_thresholdćƒ†ć‚­ć‚¹ćƒˆē¢ŗäæ”åŗ¦ć®é–¾å€¤ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.7怂
low_textćƒ†ć‚­ć‚¹ćƒˆć®äø‹é™ć‚¹ć‚³ć‚¢ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.4怂
link_thresholdćƒŖćƒ³ć‚Æē¢ŗäæ”åŗ¦ć®é–¾å€¤ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.4怂
canvas_sizeęœ€å¤§ē”»åƒć‚µć‚¤ć‚ŗć€‚ć“ć®å€¤ć‚ˆć‚Šå¤§ćć„ē”»åƒćÆćƒŖ悵悤ć‚ŗć•ć‚Œć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ2560怂
mag_ratioē”»åƒć®ę‹”大ēŽ‡ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ1怂
slope_thsćƒžćƒ¼ć‚ø悒ꤜčØŽć™ć‚‹ęœ€å¤§ć®å‚¾ćļ¼ˆdelta y/delta xļ¼‰ć€‚ä½Žć„å€¤ćÆć€å‚¾ć„ćŸćƒœćƒƒć‚Æć‚¹ćŒćƒžćƒ¼ć‚ø恕悌ćŖ恄恓ćØć‚’ę„å‘³ć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.1怂
ycenter_thsyę–¹å‘ć®ęœ€å¤§ć‚·ćƒ•ćƒˆé‡ć€‚ē•°ćŖć‚‹ćƒ¬ćƒ™ćƒ«ć®ćƒœćƒƒć‚Æć‚¹ćÆćƒžćƒ¼ć‚øć•ć‚Œć‚‹ć¹ćć§ćÆć‚ć‚Šć¾ć›ć‚“ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.5怂
height_ths惜惃ć‚Æć‚¹ć®é«˜ć•ć®ęœ€å¤§å·®ć€‚éžåøø恫ē•°ćŖć‚‹ćƒ†ć‚­ć‚¹ćƒˆć‚µć‚¤ć‚ŗć®ćƒœćƒƒć‚Æć‚¹ćÆćƒžćƒ¼ć‚øć•ć‚Œć‚‹ć¹ćć§ćÆć‚ć‚Šć¾ć›ć‚“ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.5怂
width_ths惜惃ć‚Æć‚¹ć‚’ćƒžćƒ¼ć‚øć™ć‚‹ćŸć‚ć®ęœ€å¤§ę°“å¹³č·é›¢ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.5怂
add_marginć™ć¹ć¦ć®ę–¹å‘ć®ćƒć‚¦ćƒ³ćƒ‡ć‚£ćƒ³ć‚°ćƒœćƒƒć‚Æć‚¹ć‚’ē‰¹å®šć®å€¤ć ć‘ę‹”å¼µć—ć¾ć™ć€‚ć“ć‚ŒćÆ态ć‚æ悤čŖžć®ć‚ˆć†ćŖ複雑ćŖę–‡å­—ć‚’ęŒć¤č؀čŖžć«é‡č¦ć§ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.1怂
x_thsparagraph=Trueć®å “åˆć«ćƒ†ć‚­ć‚¹ćƒˆćƒœćƒƒć‚Æć‚¹ć‚’ćƒžćƒ¼ć‚øć™ć‚‹ćŸć‚ć®ęœ€å¤§ę°“å¹³č·é›¢ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ1.0怂
y_thsparagraph=Trueć®å “åˆć«ćƒ†ć‚­ć‚¹ćƒˆćƒœćƒƒć‚Æć‚¹ć‚’ćƒžćƒ¼ć‚øć™ć‚‹ćŸć‚ć®ęœ€å¤§åž‚ē›“č·é›¢ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆćÆ0.5怂

ē²¾åŗ¦ę”¹å–„

å…ˆć»ć©ć®ē”»åƒć«åÆ¾ć—ć¦ć€å¼•ę•°ć‚’å¤‰ę›“ć—ć¦ć€ē²¾åŗ¦ć‚’ę”¹å–„ć—ć¾ć™ć€‚

  • link_threshold 悒 0.3 恫čح定ļ¼šlink_threshold ćÆć€å˜čŖžé–“恮ćƒŖćƒ³ć‚Æē¢ŗäæ”åŗ¦ć®é–¾å€¤ć‚’č”Øć—ć¾ć™ć€‚ć“ć®å€¤ć‚’äø‹ć’悋恓ćØć§ć€å˜čŖžé–“ć®é–¢é€£ę€§ćŒć‚ˆć‚Šä½Žć„å “åˆć§ć‚‚ć€ćć‚Œć‚‰ć‚’1恤恮ꖇē« ćØć—ć¦ć¾ćØ悁悋åÆčƒ½ę€§ćŒé«˜ććŖć‚Šć¾ć™ć€‚ć¤ć¾ć‚Šć€ę–‡ē« ć®é€£ē¶šę€§ć«åÆ¾ć™ć‚‹åˆ¤ę–­åŸŗęŗ–ćŒē·©ććŖć‚Šć¾ć™ć€‚
  • mag_ratio 悒 1.1 恫čح定:mag_ratio ćÆ态ē”»åƒć®ę‹”大ēŽ‡ć‚’č”Øć—ć¾ć™ć€‚ćƒ‡ćƒ•ć‚©ćƒ«ćƒˆå€¤ćÆ 1 恧恙恌态1.1 恫čØ­å®šć™ć‚‹ć“ćØ恧态ē”»åƒć‚’10%ę‹”å¤§ć—ć¦å‡¦ē†ć—ć¾ć™ć€‚ć“ć‚Œć«ć‚ˆć‚Šć€å°ć•ćŖę–‡å­—ć‚’ć‚ˆć‚ŠčŖč­˜ć—ć‚„ć™ććŖć‚Šć¾ć™ć€‚ćŸć ć—ć€ę‹”å¤§ć«ć‚ˆć£ć¦ē”»åƒć®å“č³ŖćŒč‹„å¹²ä½Žäø‹ć™ć‚‹åÆčƒ½ę€§ćŒć‚ć‚Šć¾ć™ć€‚
import easyocr
reader = easyocr.Reader(['en','ja'], gpu=False)
result = reader.readtext('29767855_m.jpg',  link_threshold=0.3,mag_ratio=1.1)

å®Ÿč”Œēµęžœļ¼š

indexćƒ†ć‚­ć‚¹ćƒˆäæ”é ¼åŗ¦x1y1x2y2x3y3x4y4
0č³‡ę–™ä½œęˆ0.997967541217804623.8005305310426643.00132632760641014.6341038492525577.96846454654341025.1994694689574693.9986736723936635.3658961507475759.0315354534566
1ćƒ—ćƒ¬ć‚¼ćƒ³ć®ē·“ēæ’0.977445970814249650.8525919657369766.31155517944221207.1576651173118668.39661223713441224.147408034263773.6884448205578667.8423348826882872.6033877628656
21on1恮ęŗ–å‚™0.5931251849512019672.1005050633884890.10050506338841158.952099161938807.68557366225411173.8994949366117919.8994949366116687.04790083806211001.3144263377459
3MTGč³‡ę–™å°åˆ·0.9954317802241432689.58258088405161013.64083897245681236.101159914352929.06445454903111249.41741911594841045.3591610275432702.89884008564811129.935545450969
4A恕悓恫mail0.9986604964795984732.85827970937691163.54331188375071267.30450066249681073.97405107190751282.14172029062321168.4566881162493747.69549933750321259.0259489280925

å…ˆć»ć©ćØęÆ”č¼ƒć—ć¦ć€ę­£ć—ćę–‡å­—čŖč­˜ćŒć§ćć¦ć„悋恓ćØćŒć‚ć‹ć‚Šć¾ć—ćŸć€‚

ē²¾åŗ¦ę”¹å–„ć®ę‰‹ę³•

OCR恮ē²¾åŗ¦ćÆ态ē”»åƒć®å“č³Ŗć‚„ę–‡å­—ć®ēخ锞ćŖć©ć«ć‚ˆć£ć¦å¤§ććå¤‰åŒ–ć—ć¾ć™ć€‚ä»„äø‹ć®ć‚ˆć†ćŖć‚³ćƒ„ć‚’ä½æ恆恓ćØ恧态ē²¾åŗ¦ć‚’ę”¹å–„ć™ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚

  • ē”»åƒć®å“č³Ŗ悒äøŠć’ć‚‹ļ¼šč§£åƒåŗ¦ćŒé«˜ćć€ćƒŽć‚¤ć‚ŗć®å°‘ćŖ恄ē”»åƒć‚’ē”Øꄏ恙悋恓ćØ恧态OCR恮ē²¾åŗ¦ćŒå‘äøŠć—ć¾ć™ć€‚
  • ē”»åƒć‚’適切ćŖå¤§ćć•ć«ę‹”å¤§ćƒ»ēø®å°ć™ć‚‹ļ¼šmag_ratio ćƒ‘ćƒ©ćƒ”ćƒ¼ć‚æ悒ä½æć£ć¦ē”»åƒć®å¤§ćć•ć‚’čŖæꕓ恙悋恓ćØć§ć€å°ć•ćŖę–‡å­—ć§ć‚‚čŖč­˜ć—ć‚„ć™ććŖć‚Šć¾ć™ć€‚ćŸć ć—ć€ę‹”å¤§ć—ć™ćŽć‚‹ćØć‹ćˆć£ć¦ē²¾åŗ¦ćŒäø‹ćŒć‚‹ć“ćØćŒć‚ć‚‹ć®ć§ę³Øꄏ恌åæ…要恧恙怂
  • ę–‡å­—ć®č‰²ćØ背ę™Æć®ć‚³ćƒ³ćƒˆćƒ©ć‚¹ćƒˆć‚’äøŠć’ć‚‹ļ¼šę–‡å­—ćØ背ę™Æć®č‰²ć®å·®ćŒå¤§ćć„ć»ć©ć€OCR恮ē²¾åŗ¦ćŒäøŠćŒć‚Šć¾ć™ć€‚
  • čŖč­˜åÆ¾č±”ć®ę–‡å­—ć‚’åˆ¶é™ć™ć‚‹ļ¼šę•°å­—恮ćæ悒čŖč­˜ć—ćŸć„å “合ćŖ恩态čŖč­˜åÆ¾č±”ć‚’åˆ¶é™ć™ć‚‹ć“ćØ恧äøč¦ćŖčŖč­˜ēµęžœć‚’ęø›ć‚‰ć™ć“ćØćŒć§ćć¾ć™ć€‚allowlist ćƒ‘ćƒ©ćƒ”ćƒ¼ć‚æ悒ä½æć„ć¾ć™ć€‚
  • 単čŖžć‚„č”Œć‚’ć¾ćØ悁悋ļ¼šlink_threshold ćƒ‘ćƒ©ćƒ”ćƒ¼ć‚æ悒čŖæꕓ恙悋恓ćØć§ć€å˜čŖžć‚„č”Œć‚’ć†ć¾ćēµåˆć§ćć‚‹ć‚ˆć†ć«ćŖć‚Šć¾ć™ć€‚

OCR恮åÆč¦–åŒ–

å‡ŗ力ēµęžœć‚’ęž ē·šć§å›²ć‚“恧åÆč¦–åŒ–ć™ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚

ē°”å˜ć®ćŸć‚ć€å…ˆć»ć©ć®ē”»åƒć‚’å›žč»¢ć—ć¦ćŠćć¾ć™ć€‚

import cv2
import easyocr

# ē”»åƒć‚’čŖ­ćæč¾¼ć‚€
image = cv2.imread('29767855_m_rot.jpg')

# OCRć‚’å®Ÿč”Œ
reader = easyocr.Reader(['en','ja'], gpu=False) 
result = reader.readtext('29767855_m_rot.jpg',  link_threshold=0.3,mag_ratio=1.1)

# ēµęžœć‚’å…ƒć®ē”»åƒć«ęē”»
for (bbox, text, prob) in result:
    # ē¢ŗēŽ‡ćŒ50%仄äøŠć®å “åˆć®ćæꏏē”»
    if prob >= 0.5:
        # Ꞡē·šć®åŗ§ęØ™ć‚’å–å¾—
        (top_left, top_right, bottom_right, bottom_left) = bbox
        top_left = (int(top_left[0]), int(top_left[1]))
        bottom_right = (int(bottom_right[0]), int(bottom_right[1]))
        
        # Ꞡē·šć‚’ꏏē”»
        cv2.rectangle(image, top_left, bottom_right, (0, 255, 0), 2)

# ēµęžœć‚’äæå­˜
cv2.imwrite("result_image.jpg", image)

å‡ŗ力ēµęžœļ¼š

OCRć«ć‚ˆć£ć¦ę–‡å­—čŖč­˜ć•ć‚ŒćŸéƒØåˆ†ć‚’åÆč¦–åŒ–ć™ć‚‹ć“ćØćŒć§ćć¾ć—ćŸć€‚

ć¾ćØ悁

ęœ€å¾Œć¾ć§ć”č¦§ć„ćŸć ćć‚ć‚ŠćŒćØć†ć”ć–ć„ć¾ć—ćŸć€‚

easyOCR悒ä½æćˆć°ć€åˆåæƒč€…恧悂ē°”å˜ć«OCRć‚’å®Ÿč£…ć™ć‚‹ć“ćØćŒć§ćć¾ć™ć€‚

ē²¾åŗ¦ę”¹å–„ć®ć‚³ćƒ„ć‚’ęŠ¼ć•ćˆć¦ć€ē”»åƒć®å‰å‡¦ē†ć‚’å·„å¤«ć™ć‚‹ć“ćØćŒå¤§åˆ‡ć§ć™ć€‚

OCRćÆ꧘怅ćŖå “é¢ć§ę“»ē”Øć§ćć‚‹ęŠ€č”“ćŖć®ć§ć€ćœć²č‰²ć€…ćŖē”»åƒć§č©¦ć—恦ćæć¦ćć ć•ć„ć€‚

ć‚³ćƒ”ćƒ³ćƒˆć‚’ę®‹ć™