主页 > 开源代码 >

CSDN文章质量分查询系统【赠python爬虫、提分攻略】

开源代码
2025-08-26 22:42:01

CSDN文章质量分查询系统

.csdn.net/qc

点击链接-----> CSDN文章质量分查询系统 <------点击链接

点击链接-----> .csdn.net/qc <------点击链接

点击链接-----> CSDN文章质量分查询系统 <------点击链接

点击链接-----> .csdn.net/qc <------点击链接

说明：一定要是CSDN站内博文链接

效果举例展示

作者以自己这编文章展示效果

java机器学习计算指标动态阈值-CSDN博客

CSDN个人博客平均质量分查询

内容管理---》数据---》作品数据---》博客数据（默认页签）---》博客统计数据（默认页签）

获取CSDN个人博客链接地址方式一

文章浏览页面---》复制地址栏的地址

方式二

文章浏览页面（底部）---》分享---》复制链接

Python爬虫应用【爬质量分】

Python爬虫爬csdn个人所有文章质量分

这里以MacOS为例，Windows和Linux类似

安装python3

安装过的跳过，如果有python（python2）也行

brew install python3 安装pip3

安装过的跳过，如果有python（python2）也行

brew install pip3 安装所需的库 requests：用于发送HTTP请求MultipartEncoder：用于构造POST请求的请求体 # windows或是没有装homebrew的操作系统可以不带--break-system-packages pip3 install requests --break-system-packages pip3 install requests_toolbelt --break-system-packages pip3 install openpyxl --break-system-packages pip3 install pandas --break-system-packages 获取所需的请求 URL 和请求标头第一步：打开目标网页

第二步：使用开发者工具

第三步：获取请求 URL 和请求标头

点击负载找到请求参数

第四步：分析请求url，构造参数字典 url = " bizapi.csdn.net/blog/phoenix/console/v1/article/list" 参数： pageSize: 20 第五步：整代码

调整下面的代码（不同时候由于csdn官方可能有更新，地址可能会有调整）

编辑文件：csdnArticleScore.py

# pip3 install pandas --break-system-packages import json import pandas as pd from openpyxl import Workbook, load_workbook from openpyxl.utils.dataframe import dataframe_to_rows import math import requests # 批量获取文章信息并保存到excel class CSDNArticleExporter: def __init__(self, username, cookies, Referer, page, size, filename): self.username = username self.cookies = cookies self.Referer = Referer self.size = size self.filename = filename self.page = page def get_articles(self): url = " blog.csdn.net/community/home-api/v1/get-business-list" params = { "page": {self.page}, "size": {self.size}, "businessType": "blog", "username": {self.username} } headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3', 'Cookie': self.cookies, # Setting the cookies string directly in headers 'Referer': self.Referer } try: response = requests.get(url, params=params, headers=headers) response.raise_for_status() # Raises an HTTPError if the response status code is 4XX or 5XX data = response.json() return data.get('data', {}).get('list', []) except requests.exceptions.HTTPError as e: print(f"HTTP错误: {e.response.status_code} {e.response.reason}") except requests.exceptions.RequestException as e: print(f"请求异常: {e}") except json.JSONDecodeError: print("解析JSON失败") return [] def export_to_excel(self): df = pd.DataFrame(self.get_articles()) df = df[['title', 'url', 'postTime', 'viewCount', 'collectCount', 'diggCount', 'commentCount']] df.columns = ['文章标题', 'URL', '发布时间', '阅读量', '收藏量', '点赞量', '评论量'] # df.to_excel(self.filename) # 下面的代码会让excel每列都是合适的列宽，如达到最佳阅读效果 # 你只用上面的保存也是可以的 # Create a new workbook and select the active sheet wb = Workbook() sheet = wb.active # Write DataFrame to sheet for r in dataframe_to_rows(df, index=False, header=True): sheet.append(r) # Iterate over the columns and set column width to the max length in each column for column in sheet.columns: max_length = 0 column = [cell for cell in column] for cell in column: try: if len(str(cell.value)) > max_length: max_length = len(cell.value) except: pass adjusted_width = (max_length + 5) sheet.column_dimensions[column[0].column_letter].width = adjusted_width # Save the workbook wb.save(self.filename) class ArticleScores: def __init__(self, filepath): self.filepath = filepath @staticmethod def get_article_score(article_url): url = " bizapi.csdn.net/trends/api/v1/get-article-score" # TODO: Replace with your actual headers headers = { "Accept": "application/json, text/plain, */*", "X-Ca-Key": "203930474", "X-Ca-Nonce": "7e4ece49-5b7d-41e0-b548-30972a3e3989", "X-Ca-Signature": "mXV5P9OGdBpKyv7v+OfuSmtbN66OwLg3ujL2kwGk5mw=", "X-Ca-Signature-Headers": "x-ca-key,x-ca-nonce", "X-Ca-Signed-Content-Type": "multipart/form-data", } data = {"url": article_url} try: response = requests.post(url, headers=headers, data=data) response.raise_for_status() # This will raise an error for bad responses return response.json().get('data', {}).get('score', 'Score not found') except requests.RequestException as e: print(f"Request failed: {e}") return "Error fetching score" def get_scores_from_excel(self): df = pd.read_excel(self.filepath) urls = df['URL'].tolist() scores = [self.get_article_score(url) for url in urls] return scores def write_scores_to_excel(self): df = pd.read_excel(self.filepath) df['质量分'] = self.get_scores_from_excel() df.to_excel(self.filepath, index=False) if __name__ == '__main__': total = 10 #已发文章总数量 # TODO:调整为你自己的cookies，Referer，CSDNid, headers cookies = 'UN=jjk_02027; fi_id=default; log_Id_pv=******。。。' # Simplified for brevity Referer = ' blog.csdn.net/jjk_02027?type=blog' CSDNid = 'jjk_02027' t_index = math.ceil(total/100)+1 #向上取整，半闭半开区间，开区间+1。 # 获取文章信息 # CSDNArticleExporter("待查询用户名", 2（分页数量，按总文章数量/100所得的分页数）,总文章数量仅为设置为全部可见的文章总数。 # 100（最大单次查询文章数量不大于100）, 'score1.xlsx'（待保存数据的文件，需要和下面的一致）) for index in range(1,t_index): #文章总数 filename = "score"+str(index)+".xlsx" exporter = CSDNArticleExporter(CSDNid, cookies, Referer, index, 100, filename) # Replace with your username exporter.export_to_excel() # 批量获取质量分 score = ArticleScores(filename) score.write_scores_to_excel() 第六步：运行Python爬虫 python3 csdnArticleScore.py 第七步：查询质量分文件

运行Python爬虫后会在当前目录生成excel文件：

文件示例：

我从事了10多年的java工作，是个python新手，真正被python的强大惊讶到了，从性能、便捷性方面不输java，虽然我用java也写过，性能也相差无几，但是这里还是觉得python更好用～

附件一：Python官网及教程

Python官网 .python.org/

Python3教程 Python3 教程 | 菜鸟教程

附件二：Python抓分常见问题

1、macOS python3安装requests库报error: externally-managed-environment

使用Homebrew来安装requests库，而不是直接使用pip（跳过，装python库用pip3，非python库才用brew）

brew install python-requests

2、macOS pip3安装pipx报error: externally-managed-environment

pip3 install pipx --break-system-packages