需求分析(从系统分析看):
下载网页上的验证码图片,将其解析为可识别的文字
软件设计(从系统架构看):
Http get -> image -> ocr->word
资源实现(从项目管理看):
环境: Ubuntu 7.10
- sudo apt-get install ocrad
- sudo apt-get install gocr
技术:
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
public class TestAtSoodinDotCom {
public static String callCmd(String[] cmd) {
String result = "";
String line = "";
try {
Process proc = Runtime.getRuntime().exec(cmd);
InputStreamReader is = new InputStreamReader(proc.getInputStream());
BufferedReader br = new BufferedReader (is);
while ((line = br.readLine ()) != null) {
result += line;
}
}
catch(Exception e) {
e.printStackTrace();
}
return result;
}
/*--------------------------------------------------
* Process a response from a server
*-------------------------------------------------*/
private static boolean processServerResponse(HttpURLConnection http, InputStream iStrm) throws IOException
{
// 1) Get status Line
if (http.getResponseCode() == HttpURLConnection.HTTP_OK)
{
// 2) Get header information - none
// 3) Get body (data)
int length = (int) http.getContentLength();
if (length != -1)
{
byte servletData[] = new byte[length];
iStrm.read(servletData);
}
else // Length not available...
{
OutputStream oStrm = null;
oStrm = new BufferedOutputStream( new FileOutputStream("/tmp/verifycode.jpeg")); //图片是jpeg格式
int ch;
while ((ch = iStrm.read()) != -1)
oStrm.write(ch);
oStrm.close();
}
return true;
}
return false;
}
public static void main(String[] args) throws Exception{
InputStream iStrm = null;
String result = null;
boolean ret = false;
String address = "http://www.soodin.com/verifycode";
URL url = new URL(address);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Host"," www.soodin.com");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2");
connection.setRequestProperty("Cookie:","__utma=166789948.4122467951340428000.1244696088.1262916100.1263192353.114; __utmz=166789948.1256018048.99.5.utmcsr=bbs.soodin.com|utmccn=(referral)|utmcmd=referral|utmcct=/search.php; rtime=21; ltime=1263194805481; cnzz_eid=68012444-1253685384-; JSESSIONID=750D1D2078D37976A15EF35B5FF5899C; cnzz_a1688487=10; sin1688487=; __utmc=166789948");
connection.setRequestProperty("Referer", "http://www.soodin.com/user/login.do?method=login");
connection.setDoOutput(true);
iStrm = connection.getInputStream();
processServerResponse(connection, iStrm);
String[] cmd = {
"/bin/sh",
"-c",
"djpeg -grey -pnm /tmp/verifycode.jpeg | ocrad -x /tmp/b.txt", //图片是jpeg格式
};
String verifycode = callCmd(cmd);
verifycode = verifycode.replaceAll("[\\ ]", "");
System.out.println(verifycode);
}
测试结果:
经过噪音干扰后的识别率低于10%.
分享到:
相关推荐
目前通常使用如下几种方法: | 方法名称 | 相关要点 || ------ | ------ || tesseract | 仅适合识别没有干扰和扭曲的图片,训练起来很麻烦 || 其他开源识别库 | 不够通用,识别率未知 || 付费OCR API | 需求量大的...
captcha-ocr
赠送jar包:captcha-1.3.0.jar; 赠送原API文档:captcha-1.3.0-javadoc.jar; 赠送源代码:captcha-1.3.0-sources.jar; 赠送Maven依赖信息文件:captcha-1.3.0.pom; 包含翻译后的API文档:captcha-1.3.0-javadoc-...
一个可以在python爬虫中用于验证码识别的库, pypi上没了
python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库
###参数s: user defined captcha text c: captcha type 可以在课堂上更改更多设置... ###如何使用它只需调用 captcha.php 文件并传递所需的类型和/或预定义的验证码文本。 captcha.php?s=123456 输出: ...
Java OCR 识别组件(基于Tesseract OCR 引擎)。能自动完成图片清理、识别 CAPTCHA 验证码.zip
赠送jar包:captcha-core-2.2.1.jar; 赠送原API文档:captcha-core-2.2.1-javadoc.jar; 赠送源代码:captcha-core-2.2.1-sources.jar; 赠送Maven依赖信息文件:captcha-core-2.2.1.pom; 包含翻译后的API文档:...
captcha 验证码 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <TITLE>b2evo Captcha Class :: DEMO ; charset=iso-8859-1"> <h3>This is a demo of b2evo_captcha.class.php</h3> require...
Drupal 如何配置CAPTCHA模块; Captcha模块用于表单验证码的配置,开启即可在发表留言,发布文章,用户注册等行为上加载验证码安全校验。
验证码 captcha
$captcha = new SimpleCaptcha();// Change configuration...//$captcha->wordsFile = null; // Disable dictionary words//$captcha->wordsFile = 'words/es.txt'; // Enable spanish words//$captcha->session_...
thinkphp5图片组件解决captcha_src()/captcha_img() 已经生成好 直接解压到vendor目录即可 快速解决壁盯墙
no-captcha, Laravel 没有 CAPTCHA reCAPTCHA 没有验证码 reCAPTCHA 对于 Laravel 4,使用 v1 分支。安装composer require anhskohbo/no-captcha Laravel 5设置注意这
集成aj-captcha实现滑块验证码.zip
赠送jar包:captcha-core-2.2.1.jar; 赠送原API文档:captcha-core-2.2.1-javadoc.jar; 赠送源代码:captcha-core-2.2.1-sources.jar; 赠送Maven依赖信息文件:captcha-core-2.2.1.pom; 包含翻译后的API文档:...
vue3-captcha(vue3行为验证码)
php captcha code.Only for captcha implement,you know.
python语言的captcha资源包
验证码生成插件