做项目的时候需要用到文字识别功能,以下是对一些方案的实践,最后选择了百度文字识别 API,其他的要么是云服务器内存太小,要么是 CPU 太低,没法用。还是直接找最简单快捷的办法。
百度文字识别 api
安装
| pip install baidu-aip | 
代码
| from aip import AipOcr | 
运行结果{'log_id': 2510103433725039650, 'words_result_num': 2, 'words_result': [{'words': '·微信搜一搜'}, {'words': 'Q三傻的编程生活'}]}
注意:估计是原图带有二维码,所以导致识别超时,返回requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out'));而我们手动给二维码打码之后,才可以正常识别。
测试图片
再来一遍
调用远程 url 图片:
| def basic_parse_url(self, url): | 
返回结果{'log_id': 8559002589980060578, 'direction': 0, 'words_result_num': 4, 'words_result': [{'words': '迪奥999排滋润', 'probability': {'variance': 0.016349, 'average': 0.931602, 'min': 0.602039}}, {'words': '迪奥#99滋润(经典正红色)', 'probability': {'variance': 7.6e-05, 'average': 0.99548, 'min': 0.969314}}, {'words': '颜色最纯正的一款正红色,不挑肤色,喜庆特别显气质。嘴唇状态不好的小仙女', 'probability': {'variance': 0.00176, 'average': 0.988948, 'min': 0.788893}}, {'words': '要选这款,能让唇妆看起来更美', 'probability': {'variance': 5.8e-05, 'average': 0.996976, 'min': 0.97018}}], 'language': -1}
更多阅读
以下方案由于服务器性能不够,没法正常运行。
pytesseract
装包
| pipenv install pytesseract | 
下载语言扩展
tessdoc | Tesseract documentation
编写代码
| from PIL import Image | 
查看运行结果
尼玛,坑爹呢!
chineseocr_lite
下载
| git clone -b master https://github.com/ouyanghuiyu/chineseocr_lite.git | 
编译安装
| cd psenet/pse | 
安装 g++yum -y install gcc+ gcc-c++
yum install python3-devel -y
回到主目录
| cd ../.. | 
安装依赖
如果国内下载依赖太慢可以配置镜像,参考此处:
- 换阿里源
- 换清华源pip install -r requirements.txt 
 pip install torch
 pip install torchvision运行服务修改如下:(chineseocr_lite) bash-4.2# python app.py 
 make: Entering directory `/home/imoyao/chineseocr_lite/psenet/pse'
 make: `pse.so' is up to date.
 make: Leaving directory `/home/imoyao/chineseocr_lite/psenet/pse'
 device: cpu
 load model
 device: cpu
 load model
 device: cpu
 load model
 device: cpu
 load model
 Traceback (most recent call last):
 File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py", line 526, in take
 yield next(seq)
 StopIteration
 The above exception was the direct cause of the following exception:
 Traceback (most recent call last):
 File "app.py", line 125, in <module>
 app = web.application(urls, globals())
 File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/application.py", line 62, in __init__
 self.init_mapping(mapping)
 File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/application.py", line 130, in init_mapping
 self.mapping = list(utils.group(mapping, 2))
 File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py", line 531, in group
 x = list(take(seq, size))
 RuntimeError: generator raised StopIteration
 (chineseocr_lite) bash-4.2# cp /root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py /root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/utils.py.bak启动服务def take(seq, n): 
 for i in range(n):
 try:
 yield next(seq)
 except StopIteration:
 return访问应用:python app.py http://{{YOUR_DEV_IP}}:8080/ocr # 注意后面的ocr路由而不是首页! 报错添加代码File "/root/.local/share/virtualenvs/chineseocr_lite-GxZWTh-N/lib/python3.7/site-packages/web/httpserver.py", line 255, in __iter__ 
 path = self.translate_path(self.path)
 File "/usr/local/lib/python3.7/http/server.py", line 820, in translate_path
 path = self.directory
 AttributeError: 'StaticApp' object has no attribute 'directory'参见此处和此处class StaticApp(SimpleHTTPRequestHandler): 
 """WSGI application for serving static files."""
 def __init__(self, environ, start_response):
 self.headers = []
 self.environ = environ
 self.start_response = start_response
 self.directory = os.getcwd() # 此行
安装 leptonica
| tesseract_ocr.cpp:633:34: fatal error: leptonica/allheaders.h: No such file or directory | 
安装
| wget http://www.leptonica.org/source/leptonica-1.79.0.tar.gz | 
验证安装
| tesseract -v | 











