做项目的时候需要用到文字识别功能,以下是对一些方案的实践,最后选择了百度文字识别 API,其他的要么是云服务器内存太小,要么是 CPU 太低,没法用。还是直接找最简单快捷的办法。
百度文字识别 api
安装
pip install baidu-aip |
代码
from aip import AipOcr |
运行结果{'log_id': 2510103433725039650, 'words_result_num': 2, 'words_result': [{'words': '·微信搜一搜'}, {'words': 'Q三傻的编程生活'}]}
注意:估计是原图带有二维码,所以导致识别超时,返回requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out'))
;而我们手动给二维码打码之后,才可以正常识别。
测试图片
再来一遍
调用远程 url 图片:
def basic_parse_url(self, url): |
返回结果{'log_id': 8559002589980060578, 'direction': 0, 'words_result_num': 4, 'words_result': [{'words': '迪奥999排滋润', 'probability': {'variance': 0.016349, 'average': 0.931602, 'min': 0.602039}}, {'words': '迪奥#99滋润(经典正红色)', 'probability': {'variance': 7.6e-05, 'average': 0.99548, 'min': 0.969314}}, {'words': '颜色最纯正的一款正红色,不挑肤色,喜庆特别显气质。嘴唇状态不好的小仙女', 'probability': {'variance': 0.00176, 'average': 0.988948, 'min': 0.788893}}, {'words': '要选这款,能让唇妆看起来更美', 'probability': {'variance': 5.8e-05, 'average': 0.996976, 'min': 0.97018}}], 'language': -1}
更多阅读
以下方案由于服务器性能不够,没法正常运行。
pytesseract
装包
pipenv install pytesseract |
下载语言扩展
tessdoc | Tesseract documentation
编写代码
from PIL import Image |
查看运行结果
尼玛,坑爹呢!
chineseocr_lite
下载
git clone -b master https://github.com/ouyanghuiyu/chineseocr_lite.git |
编译安装
cd psenet/pse |
安装 g++yum -y install gcc+ gcc-c++
yum install python3-devel -y
回到主目录
cd ../.. |
安装依赖
如果国内下载依赖太慢可以配置镜像,参考此处:
- 换阿里源
- 换清华源
pip install -r requirements.txt
pip install torch
pip install torchvision运行服务
修改如下:(chineseocr_lite) bash-4.2# python app.py
make: Entering directory `/home/imoyao/chineseocr_lite/psenet/pse'
make: `pse.so' is up to date.
make: Leaving directory `/home/imoyao/chineseocr_lite/psenet/pse'
device: cpu
load model
device: cpu
load model
device: cpu
load model
device: cpu
load model
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py", line 526, in take
yield next(seq)
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "app.py", line 125, in <module>
app = web.application(urls, globals())
File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/application.py", line 62, in __init__
self.init_mapping(mapping)
File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/application.py", line 130, in init_mapping
self.mapping = list(utils.group(mapping, 2))
File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py", line 531, in group
x = list(take(seq, size))
RuntimeError: generator raised StopIteration
(chineseocr_lite) bash-4.2# cp /root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py /root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py.bak启动服务def take(seq, n):
for i in range(n):
try:
yield next(seq)
except StopIteration:
return访问应用:python app.py
http://{{YOUR_DEV_IP}}:8080/ocr # 注意后面的ocr路由而不是首页!
报错
添加代码File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/httpserver.py", line 255, in __iter__
path = self.translate_path(self.path)
File "/usr/local/lib/python3.7/http/server.py", line 820, in translate_path
path = self.directory
AttributeError: 'StaticApp' object has no attribute 'directory'参见此处和此处class StaticApp(SimpleHTTPRequestHandler):
"""WSGI application for serving static files."""
def __init__(self, environ, start_response):
self.headers = []
self.environ = environ
self.start_response = start_response
self.directory = os.getcwd() # 此行
安装 leptonica
tesseract_ocr.cpp:633:34: fatal error: leptonica/allheaders.h: No such file or directory |
安装
wget http://www.leptonica.org/source/leptonica-1.79.0.tar.gz |
验证安装
tesseract -v |