English
全部
搜索
图片
视频
地图
资讯
Copilot
更多
购物
航班
旅游
笔记本
Top stories
Sports
U.S.
Local
World
Science
Technology
Entertainment
Business
More
Politics
过去 24 小时
时间不限
过去 1 小时
过去 7 天
过去 30 天
最佳匹配
最新
头部财经
23 小时
北航开源Code2Bench:双扩展动态评测,代码大模型告别躺平刷分
为了打破这种「高分幻觉」,来自北京航空航天大学的研究团队提出了一种全新的基准构建哲学 ——双重扩展(Dual Scaling),并基于此构建了端到端的自动化框架Code2Bench。该研究旨在为代码大模型的评估,建立一个更动态、更严苛、也更具诊断性的新范式。
一些您可能无法访问的结果已被隐去。
显示无法访问的结果
今日热点
Cuba: 4 dead on US boat
Larry Summers to resign
Lauren Chapin dies
Apologizes to staff
Trump unveils retirement plan
FBI raids home, office
Plane found with bullet hole
Closing in three states
US military plane crash
Judge bars seized data search
Militaries to have joint drill
Lynch case driver sentenced
Frees, deports American
Mary Cosby's son dies at 23
Deportation plan rejected
Infected with bird flu
FBI obtained phone records
Oliver 'Power' Grant dies
NE sinkhole swallows 2 cars
Awards Medal of Honor
Delays age verification plans
Hits Iran with new sanctions
Blocks ICE contractor's appeal
1981 coup leader dies
Olympian dies at 80
Narrows AI safety pledge
DOE loans $26.5 billion
Venezuela AG resigns
Target of grand jury probe
US loosens Cuba fuel ban
反馈