DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,China Archives Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
FLOTUS goes all out in final impassioned speech at Clinton rally
2025-06-27 07:32
1164 views
Read More
'Game of Thrones': Everything to remember about Rhaegar and Lyanna
2025-06-27 07:24
520 views
Read More
Joss Whedon calls on celebrity superheroes one last time to get out the vote
2025-06-27 05:50
1904 views
Read More
Joss Whedon calls on celebrity superheroes one last time to get out the vote
2025-06-27 05:41
391 views
Read More
The best early Prime Day outdoor deals: Yeti, Stanley, Jackery, and more
2025-06-27 05:20
1887 views
Read More