Engineering at Anthropic: Inside the team building reliable AI systemsStart buildingDeveloper docsFeaturedRaising the bar on SWE-bench Verified with Claude 3.5 SonnetSWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.Building effective agentsDec 19, 2024Introducing Contextual RetrievalSep 19, 2024Want to help us build the future of safe AI?See open roles
FeaturedRaising the bar on SWE-bench Verified with Claude 3.5 SonnetSWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.