用Common Lisp抓取网页的内容

1. HTTP GET方法抓取网页内容
2. HTTP POST方法抓取网页内容
3. 下载文件
4. 调用API接口获取数据

Drakma 是用 Common Lisp 实现的全功能 HTTP 客户端。使用drakma:http-request函数可以从客户端发送HTTP请求给指定的网站资源，本文给出抓取网页内容(HTML源码)和下载网站文件的方式。

1 HTTP GET方法抓取网页内容

CL-USER>(drakma:http-request "http://www.lispss.cn/")

"<html>      
     <head lang='en'>
     <base href=\"http://www.lispss.cn/\" target=\"_blank\">
     ...

可以看到成功向网站服务端发送了一次HTTP GET请求，并获取HTTP响应。如示例所示，对于本次请求，响应为HTML源码的字符串。

2 HTTP POST方法抓取网页内容

CL-USER>(drakma:http-request "http://www.lispss.cn/message" 
                               :method :post
                               :parameters '(("body" . "Someone")("message" . "Hi,this's a test.")))

"<!DOCTYPE html >
<html><head><title>LispSu</title><meta charset=\"utf-8\" name=\"viewport\" content=...

可以看到成功向网站服务端发送了一次HTTP POST请求，请求同时发送了参数body和message。

3 下载文件


(with-open-file(out 
		  "/root/download/file-name.jpg"
		  :element-type '(unsigned-byte 8)
		  :if-exists :supersede
		  :direction :output)
    (cl-fad:copy-stream
     (flexi-streams:flexi-stream-stream
      (drakma:http-request "http://www.lispss.cn/st/pic/lisp_logo.jpg"
			   :want-stream t))
     out))

以上代码将指定URL下的图片文件下载到客户端并保存为指定文件名。

4 调用API接口获取数据

在网上可以找到阿里云市场、万维易源、Rapid API Hub等提供API数据接口的服务。以下示范如何用Drakma的http-request函数调用API接口。

4.1 dbpedia百科数据接口


(defstruct dbpedia-data uri label description)

(defun dbpedia-lookup (search-string)
       "此函数用于查询dbpedia数据库。"
       (let* ((s-str (ws::replace-all search-string " " "+"))
	      (s-uri (concatenate 'string "https://lookup.dbpedia.org/api/search?query=" s-str))
	      response-body response-status response-headers xml nil)
	 (multiple-value-setq
	     (response-body response-status response-headers)
	   (drakma:http-request s-uri :method :get  :accept "application/xml"))
	 (setf xml
	       (s-xml:parse-xml-string
		(babel:octets-to-string response-body)))
	 (let* ((result1 (first (cdr xml))))
	   (make-dbpedia-data
	    :uri (cadr (nth 2 result1))
	    :label (cadr (nth 1 result1))
	    :description (string-trim
                 	  '(#\Space #\NewLine #\Tab)
                  	  (cadr (nth 3 result1)))))))

以上代码先用drakma:http-request函数调用dbpedia的数据接口，然后将API响应的数据转化为指定的格式。要体验该数据接口的使用，可以从本网站测试。

4.2 Rapid API Hub的炉石传说数据接口


;;; 这是一段注释
(let*   ((s-uri (concatenate  'string  "https://omgvamp-hearthstone-v1.p.rapidapi.com/cards/" "ysera"))
	      response-body response-status response-headers hash)
       (multiple-value-setq
	   (response-body response-status response-headers)
	 (drakma:http-request s-uri :method :get :additional-headers
                              '(("X-RapidAPI-Key" . "your-apikey")
				("Accept-Language" . "zh-CN"))
			      :accept "application/json"))
       (setf hash (first (yason:parse    (babel:octets-to-string response-body))))
       (loop for k being the hash-keys in hash using (hash-value v) collect (cons k v)))

以上代码先用drakma:http-request函数调用炉石传说的数据接口，然后将API响应的数据转化为指定的格式。要体验该数据接口的使用，可以从本网站测试。

4.3 调用豆包大模型API

字节跳动豆包大模型的收费对于个人用户来说接近于免费，值得考虑。以下使用Common Lisp执行shell的curl命令来调用豆包大模型API。以下命令未经Lisp包装的原始代码可参考豆包大模型API说明文档。

(defun doubao-api (query) (let* ( (command (concatenate 'string "curl https://ark.cn-beijing.volces.com/api/v3/chat/completions \\ -H 'Content-Type: application/json' \\ -H 'Authorization: Bearer your-api-key' \\ -d '{ \"model\": \"your-endpoint-id\", \"messages\": [ { \"role\": \"system\", \"content\": \"You are a helpful assistant.\" }, { \"role\": \"user\", \"content\": \"" query "\" } ], \"stream\": false }' " )) (response (uiop:run-program command :output :string))) (gethash "content" (gethash "message" (first (gethash "choices" (yason:parse response)))))))

要体验该数据接口的使用，可以从本网站测试。

Author: LispSu

LispSu