This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.
Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)
lhl 13 hours ago [-]
For those that don't want their data trained on, OpenRouter allows you to have account-wide or per-request routing with either provider.data_collection: "deny" or zdr: true (zero data retention).
Also, you can use HuggingFace Inference for DeepSeek V4 or Kimi K2.6, both of which work quite well and route through providers that you can enable/disable (like Together AI, DeepInfra, etc) - you'll have to check their policies but I think most of those commercial inference providers claim to not train on your data either.
jorvi 11 hours ago [-]
That doesn't work, if you do that it will mark DeepSeek's models with a warning symbol along with the error "paid model training violation".
striking 7 hours ago [-]
In those cases, OpenRouter just chooses providers that agree not to train / offer ZDR. Which sometimes means you start off without access to the model until some other providers start offering it.
BeetleB 10 hours ago [-]
In a sense, it's working as intended. If you set zdr to true, you currently can't use DeepSeek v4. However, once other providers offer it (it is an open model, after all), some may allow zdr.
specproc 9 hours ago [-]
Yeah, OR gives a bunch of providers, including Deepseek, which does train.
I set ZDR to true, and it only calls from the third party ZDR Deepseek APIs. Bit more expensive, but my client wants it.
miroljub 13 hours ago [-]
I wonder why the question about data security and training comes often with DeepSeek, Kimi, Glm and never with Anthropic, OpenAI, and Google models.
Why is that?
IIRC, USA data protection protects data of US citizens only, foreigners data is not protected, and the companies are not even allowed to disclose when they collect those data.
zeendo 10 hours ago [-]
Because Anthropic, at least, gives you the option to opt out of training? I think Google and OpenAI do, too.
Matl 12 hours ago [-]
> USA data protection protects data of US citizens only, foreigners data is not protected
HN is an American site. If you look at the US government, it is going to fearmonger about anything China related, because they haven't had a genuine competitor for decades and they're scared and lashing out. Most US news just parrot the government line, sometimes more so than state TV would, and so it reflects here.
I also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies.
giwook 11 hours ago [-]
> I also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies.
This is true. There are also many of us who do care.
This brings to mind something I heard recently about the so-called "Rule of 10". There will always be 3 people who support you, 3 people who are against you, and 4 people who have no idea what's going on and don't care.
Don't just focus on the 3 people who are being negative.
Matl 11 hours ago [-]
Oh absolutely.
boondongle 10 hours ago [-]
Wolf Warrior diplomacy isn't even 10 years dead. The HK treaty was violated and continues to be. Taiwan gets threatened every other week.
People can have problems with America and I'm fine with that. But pretending China isn't subsidizing industry (land, education, transportation) in a predatory fashion is silly. Too many companies have gone out of business because of it. We can all have our friends in China without pretending the CCP is playing the ballgame fairly. The government doesn't need to point it out. That doesn't even get into influence operations (which are especially easy on platforms like this.)
Seriously - there may be a day in the future where Western nations and China get along but it really can't/won't happen while it's holding all the industry and trying to take the Services income as well.
Matl 8 hours ago [-]
The US assisted a genocide, literally kidnapped the president of a sovereign country so it could take its oil, threatened its own allies with invasion and started a war of aggression against another so that it can take their oil, all in a span of a few months.
But tell me more.
lostdog 8 hours ago [-]
Yes, if you just list 3 more problems about the US then it means that China has no problems at all.
Matl 7 hours ago [-]
No it means that perhaps the US should finally start looking at itself instead of just asserting that it doesn't need to because China.
That doesn't mean China should not be criticized. But to me it's clear that the China blame game is not about a genuine concern for Chinese people or its neighbors, it's about trying to keep it down because China should never dared to rise in the first place.
Anglo Saxons and maybe the French should be in charge and the rest should be resource colonies. It very much feels like that Western mentality is still there.
beedeebeedee 6 hours ago [-]
> No it means that perhaps the US should finally start looking at itself instead of just asserting that it doesn't need to because China.
Agreed, the US definitely needs to do some introspection to sort out its own shit (and stop spraying it on everyone else).
However, that does not mean that China gets a pass. Fundamentally, the Chinese model of governance does not protect the individual. For all its faults, the US model is based upon the idea of individual liberty, which acts as a touchstone and allows it to self-correct whenever it goes to far in the wrong direction. That's something the Chinese model does not do, and means that, short of a revolution, it will continue to be an authoritarian state with all of the malignant features that entails.
Matl 6 hours ago [-]
> Fundamentally, the Chinese model of governance does not protect the individual. For all its faults, the US model is based upon the idea of individual liberty
Look, am not here to defend the Chinese model but I find it interesting how convinced you seem that individualism is the right model for everyone.
While I would generally agree with you, I have spoken to many from poorer countries who say that they prefer to trade some individualism for a steady hand of economic development and lifting the population from poverty. That is the Chinese model.
These people would argue that they can reclaim more and more individual freedom as the country gets richer and more self confident.
I am not saying they are right, but looking at a nominal democracy like India and a nominal autocracy like China, I know which government works better as far as raising the living standards of its population and it's not the Indian one.
My hope is that China will continue to liberalize on its own. Forcing it will likely only reverse the gains.
Individualism also leads to the sort of healthcare system the US had or Skid Row. So it's not all roses.
SJMG 10 hours ago [-]
> also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies
What's the point of this kind of statement for you? Does this help you understand others or just continue to drive the wedge in? Where are you from? Ask yourself can the statement,
"many {of my country} don't care one bit what happens to foreigners, be it by action of the government or companies" not be read as true?
There are self-absorbed, disinterested, uncompassionate people in every country which will satisfy your "many" qualifier.
Matl 7 hours ago [-]
I am from Europe. I feel comfortable saying that many in Europe do not care about what their governments or companies do to foreigners, (at least not enough to inform themselves about it).
However looking at the polls in the US gives you a fairly decent idea that there's a decent chunk of people that seem to get off on violence towards non-Americans. Why do you think ICE went with the violent tactics it did?
As to
> What's the point of this kind of statement for you? Does this help you understand others or just continue to drive the wedge in?
The point is to maybe make some Americans ask what it is that they can do to reform the government they have the most direct influence over (their own) instead of trying to reassure themselves that theirs is still better than country's X.
ricardobeat 5 hours ago [-]
ANTHROPIC_SUBAGENT_MODEL is not a valid setting, should be CLAUDE_CODE_SUBAGENT_MODEL.
rapind 4 hours ago [-]
This is correct. Sorry I was using my phone to post. Here's what my bash alias verbatim looks like (.bashrc / .zshrc). The DEEPSEEK_API_KEY var is setup separately (so claude doesn't see it):
----
alias clauded='ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic ANTHROPIC_AUTH_TOKEN=$DEEPSEEK_API_KEY ANTHROPIC_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_HAIKU_MODEL=deepseek-v4-flash CLAUDE_CODE_SUBAGENT_MODEL=deepseek-v4-flash CLAUDE_CODE_EFFORT_LEVEL=max claude'
----
I doubt that the opus, sonnet, and haiku model args actually matter if you want to omit them.
I run this on a VPS that has no other credentials or project access so I can give it the skip permissions arg.
maxgashkov 13 hours ago [-]
As of now, OpenRouter offers multiple providers for DeepSeek with ZDR (not sure if they respect it but still).
vidarh 13 hours ago [-]
At several times the price of DeepSeek, though, so it's a tradeoff... Even then Pro is still cheaper than Haiku.
tariky 17 hours ago [-]
I wanted to try this. To bring back opus and sonnet do I just reset those env's?
snqb 11 hours ago [-]
yes, this is pretty much just rerouting Claude to call Deepseek's Anthropic-style-compatible endpoints instead of its own defaults
Once removed, it'll work just like before
ianmurrays 16 hours ago [-]
Correct.
varenc 19 hours ago [-]
The more interesting part of deepclaude is the local proxy it runs to switch models mid-session and do combined cost tracking. Though these features seem quite buried in the LLM-generated readme. Looking at the history, it appears they were added later, and the readme wasn't restructured to highlight this.
How come such slop is allowed here, what value do these vibe coded zero shot "projects" add? Why not just post the prompt?
throwatdem12311 11 hours ago [-]
Seriously. When I first looked this project had been pushed the first commit two hours prior. Projects should be at least 3 months old or automatically removed.
ulimn 11 hours ago [-]
But then that would have the downside of falsely blocking projects that were developed in private and then just pushed to Github (or any public repo). Like I always use my own, self-hosted Forgejo for everything by default.
throwatdem12311 11 hours ago [-]
If you develop on your own private instance and then mirror to GitHub to release it then there will be 3 months of git history in the logs.
11 hours ago [-]
sumeno 11 hours ago [-]
If it's a project you actually care about and are actively working on it'll be just as good 3 months from now.
If it's something that'll be irrelevant in 3 months why should anyone care about it?
ulimn 9 hours ago [-]
That is true in most cases I guess but just look at the current product in OP. In 3 months, at the pace AI products evolve, we might "all" be using the next AI coding harness and Claude Code could be a thing of the past. So it's not a long lasting tool like curl for example.
All I'm trying to say that generalizing like suggested might exclude some useful things.
7 hours ago [-]
KallDrexx 9 hours ago [-]
Fwiw git history can be forged pretty easily. You can re-timestamp commits
woctordho 15 hours ago [-]
For the same reason that GitHub has a releases page for uploading binaries.
fragmede 18 hours ago [-]
Convenience? Am I supposed to take the prompt and use my own tokens on it? Why should I have to do that?
jpadkins 8 hours ago [-]
is the value the working outputs or the inputs? A prompt alone would not let you recreate this project.
otabdeveloper4 19 hours ago [-]
Recruiters used to use the candidate's Github "sources" page for evaluating candidates as a kind of proof-of-work.
groestl 18 hours ago [-]
And recruiter agents still do.
jimmypk 10 hours ago [-]
[flagged]
aaurelions 24 hours ago [-]
It seems like any project that makes fun of Claude is bound to reach the top spot on Hacker News. Even if it’s just a project consisting of four lines of code.
23 hours ago [-]
oblio 14 hours ago [-]
You're just mean. I count 6 lines of code!
ihsw 23 hours ago [-]
[dead]
spirit23 20 hours ago [-]
So I created https://getaivo.dev, one can use model in the coding agent directly. Just `aivo claude -m deepseek-v4-pro`
Tanxsinxlnx 15 hours ago [-]
does it support aws bedrock provider support,does i can use any model in this
spirit23 11 hours ago [-]
Ah, for aws bedrock, just use `aivo keys add` add baseurl and apikey, everything is ready, `aivo models` to see models
This in essence is what allows one to use any model with CC -- including local.
neutrinobro 11 hours ago [-]
I know. I'm struggling to understand how this is a github repo/HN article. I've been using claude-code with a llama.cpp server and a dummy API key, and all that is required is to define 2 environmental variables to point claude at the local endpoint. Am I missing something?
KronisLV 6 hours ago [-]
Wonder if there's a way to launch the desktop Claude app like that, especially on Windows, not just the Claude Code TUI/CLI. Might not be possible and you'd just have to use --remote as a workaround.
port11 9 hours ago [-]
DeepClaude doesn't support MCP tool use; does your solution work with MCP tools such as Serena?
nadermx 24 hours ago [-]
The AI wars have begun
heisenbit 17 hours ago [-]
And they are enticing human agents to further their agendas using techniques learned from the white mice.
stingraycharles 21 hours ago [-]
This has been possible since the beginning.
19 hours ago [-]
faangguyindia 20 hours ago [-]
those who use deepseek v4, what level of output you get? Codex 5.3 or GPT 5.4?
is flash version on level of gpt 5.4 mini
adonese 18 hours ago [-]
I tried it on a non trivial, but also well documented and self contained task. It did amazingly well. I used deepseek v4 pro via deepseek platform. The model is very fast and also it is super cheap. I burned only 0.06 USD (I reckon how the same task would have cost me had I used e.g., amp).
PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
zozbot234 17 hours ago [-]
> But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
My understanding is that DeepSeek V4 Pro is going to be uniquely good at working on consumer platforms with SSD offload, due to its extremely lean KV cache. Even if you only have a slow consumer platform, you should be able to just let it grind on a huge batch of tasks in parallel entirely unattended, and wake up later to a finished job.
AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow. (This used to be considered a bad idea with bulky KV caches, due to concerns about wearout and performance, but the much leaner KV cache of DeepSeek V4 changes the picture quite radically.)
torginus 15 hours ago [-]
Good. It's hard to overstate how nervous most executives are about relying on cloud-based providers.
AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.
And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.
zozbot234 15 hours ago [-]
Yes, this is the most straightforward argument for local AI inference. "Why buy cloud-based SOTA AI? We have SOTA AI at home." It's great that DeepSeek may now be about to make this possible, once the support in local inference frameworks is up to the task.
adonese 17 hours ago [-]
Is there any place I can read about KV? Excuse my ignorance as I'm not familiar with this topic and I read scattered notes that deepseek's cost are well optimized due to how their kv cache work. But I want to read more how kv cache relates to the inference stack and where does it actually sit.
> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.
Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?
zozbot234 17 hours ago [-]
KV cache generally grows linearly with your current context; it gets filled-in with your prompts during prompt processing, and newly created context gets tacked on during token generation. LLM inference uses it to semantically relate the currently-processed token to its pre-existing context.
> Any reason that this idea was considered bad?
Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.
spaceman_2020 15 hours ago [-]
have you used it for non coding tasks via MCP, like Figma/Paper for design or Ableton MVP for sound design?
The token cost makes it tempting to use for token-heavy tasks like this
miroljub 13 hours ago [-]
> even when model subsidization is done, those open source models are quite viable alternatives.
Model inference was never subsidized. Inference is highly profitable with today's prices. That's why you have many inference providers. My guess, the prices for inference will go down, as more competition starts cutting the margin.
It's model training, development and R&D that cost a lot, and companies creating closed models don't have any business model except astroturfing and trying to recover training costs through overpriced inference.
63stack 12 hours ago [-]
It's close to Opus 4.5 for me
niobe 16 hours ago [-]
thanks, that was super easy.
I have been wanting to try CC with different models since Opus went downhill last month..
What limitations or issues have you noticed when using DeepSeek with Claude Code if any?
guluarte 5 hours ago [-]
I'm using this
deepseek() {
unset ANTHROPIC_AUTH_TOKEN
local -x ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
local -x ANTHROPIC_AUTH_TOKEN="${DEEPSEEK_API_KEY}"
local -x ANTHROPIC_MODEL="deepseek-v4-pro"
local -x ANTHROPIC_SMALL_FAST_MODEL="deepseek-v4-flash"
local -x API_TIMEOUT_MS=600000
local -x CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
TMUX= command claude "$@"
}
vitaflo 1 days ago [-]
I'm not exactly sure what the point of this is. Deepseek already has instructions to use its API with many CLI's including Claude Code directly:
The readme absolutely buries the features that are actually non-trivial: It runs a proxy to switch models mid-session, and does combined cost tracking between Anthropic and other models you might be using. The LLM that wrote the readme never updated the general project description to highlight these features.
There probably isn't a point. Someone didn't understand something, didn't research it, so they 1 shotted their first thought and sent it to the front page of HN and all of their socials. It's the future bruh
georgeburdell 19 hours ago [-]
I embrace it at this point. It ends all the shilling of vibe coded tools at work that I have endured over the past year. Everyone can now make their own tools with zero obligation to coordinate beyond shared hardware resources
altmanaltman 19 hours ago [-]
To be fair, HN sent it to the front page, not the user. The rest I agree.
sumeno 11 hours ago [-]
A project that obviously bought stars on GitHub probably bought upvotes on HN too
vinckr 7 hours ago [-]
How is it obvious that this project bought starts on GitHub?
dev_hugepages 18 hours ago [-]
And now, because we all upvoted and commented on it, the vibe coded slop of the new user is on the front page now.
2ndorderthought 14 hours ago [-]
Same place same time tomorrow?
eloisant 6 hours ago [-]
Also Claude Code is one of the worse CLI, the only good thing is that it's the default for Claude.
I don't see why anyone would want to use it for any other model than Claude instead of OpenCode or Pi.
croes 23 hours ago [-]
From vibe coders for vibe coders
2ndorderthought 22 hours ago [-]
I don't always copy paste vibe coded project readme mds into Claude code and ask them to rewrite it but when I do... actually that's all I do now because my goal in life is to make wealthy overvalued companies wealthier.
incrudible 17 hours ago [-]
Anthropic is the opposite of wealthy, the more you use their service, the more money they lose. Unless you think your precious MDs being used for training data is gonna make them rich eventually.
adastra22 17 hours ago [-]
Their marginal inference cost is less than what they charge for it. Normally that is considered profitable...
yard2010 15 hours ago [-]
It's not the md files it's how you interact with their agents.
ex-aws-dude 30 minutes ago [-]
Its vibes all the way down
kordlessagain 21 hours ago [-]
Problem?
19 hours ago [-]
crooked-v 23 hours ago [-]
I'm curious how well it actually works. I tried Deepseek with Hermes and Opencode and it seemed extremely bad about using some of the basic tools given, like the Hermes holographic memory tools, even with system prompt instructions strongly pointing them out.
_345 9 hours ago [-]
I've been experimenting with Hermes, I'm convinced hermes is also just bad. Like as a harness it has got to be doing something to lobotomize these models- Even GPT-5.4 performs badly in Hermes vs just using it in Codex.
ttoinou 1 days ago [-]
I thought the tool format wasnt exactly the same ? So plugging any IA into claude code requires a conversion of format
selcuka 23 hours ago [-]
DeepSeek has a dedicated Anthropic-compatible endpoint [1].
This one still lacks some features. They still recommend using their OpenAI compatible endpoint.
But I guess Anthropic is just not capable of implementing the OpenAI API compatible client in Claude Code.
ricardobeat 24 hours ago [-]
Many of them expose “anthropic-compatible” APIs for this very purpose.
faangguyindia 20 hours ago [-]
qwen also offers openai compatible endpoint.
TacticalCoder 13 hours ago [-]
It's really getting a lot of upvotes so it's nearly as if people were feeling locked-in and wanted a way out but...
Why would you keep using CC CLI if you want to use the much cheaper DeepSeek v4 models (Flash and Pro): isn't it the opportunity to kiss CC CLI goodbye and use something not controlled by Anthropic?
Anyone here successfully moved from CC CLI to a fully open-source project? I'm asking this as a Claude Code CLI (Sonnet/Opus) user. My "stack" is all open-source: from Linux to Emacs to what-have-you. I'd rather also have open-weight models and a fully open-source (not controlled by a single company) AI CLI.
Any suggestion for something that works well? (by "well" I mean "as well as Claude Code CLI", which is not a panacea so my bar ain't the end of the world either).
justech 24 hours ago [-]
If you're looking for Claude Code alternatives, I would first suggest looking into pi.dev or opencode for your harness. And then for models, you can choose from OpenCode Go (IMO most cost effect at this moment), OpenRouter, or direct from DeepSeek. Better if you go the Kimi route IMO and just buy a subscription from kimi.com
Looks interesting. Does it offer anything special that pi.dev or opencode does not?
wolttam 18 hours ago [-]
Probably not, `lmcli` is very lean. I would consider it a slightly lower-level tool than either pi.dev or opencode. E.g. there is no built-in coding agent, but it's easy to build one up in the config with your own prompt (or use the example).
It's proven useful for me, and I figure others might appreciate how light of a shim it is between you and the models.
Aeroi 23 hours ago [-]
agreed. OpenCode is a strong base, and with a couple modifications it can become a very effective harness. my sideproject mouse.dev I’ve been combining parts from OpenCode, Claude Code, and Hermes to build a cloud agent architecture that works well from mobile.
ryanlitalien 5 hours ago [-]
Kudos! Cool idea, I'm on the same path you are, yet you're just one step ahead. For mouse.dev, what are you using for the cloud agent sandbox piece? I haven't moved my agents to the cloud yet (for on the go mobile enablement). Would Islo be a competitor to mouse?
cool! I've been mostly building for what coding from an iphone can look like. the cloud agent sandbox portion is definitely not polished yet but working well so far. i looked at daytona, e2b, modal ect. but decided to roll my own with fly.io. ttl on agent create. mouse uses per-thread sandboxes (not shared-container multi-workspace) and then post-gres for agent history ect.
i'll have to look more at islo, I definitely think its a growing space with alot of opportunity for those that participate and solve problems.
ryanlitalien 3 hours ago [-]
Great ideas, I'll look at those too. We're only a few steps away from building our own cloud providers :D.
CharlesW 23 hours ago [-]
> OpenCode is a strong base, and with a couple modifications it can become a very effective harness.
I personally didn't find it to be competitve with Claude Code as a harness. Can I ask how you modified it to perform better?
Aeroi 22 hours ago [-]
I haven’t run formal evals but i improved the experience for my own needs and it feels noticeably better with these modifications.
-Claude-style subagents
-an MCP layer for higher-level tools
-Cursor-style control plane modes like Ask, Plan, Debug, and Build.
The MCP layer lets the harness use things like GitHub file/code read, PR creation, web search/fetch, structured user questions, plan-mode switching, user skills, and subagents.
So the improvement is mostly from better ui/ux orchestration and tool access. There's some things from hermes that are interesting as well.
Most of my focus has been on applying this stack to sandboxed cloud agents so you can properly code and work from mobile devices.
I can't definitively say that the stack is better or worse than Claude code, more just tuned for my use case I guess.
eloisant 6 hours ago [-]
What issues do you have with OpenCode?
Personally I use it for the TUI, it's way better than Claude Code's one.
adobrawy 18 hours ago [-]
I'm a Claude Code Web fan and a rather heavy user. So I was interested in your product. However, I couldn't find an answer on the website. What parts did you find so good that you ported them?
Aeroi 12 hours ago [-]
Nothing groundbreaking but i'll do a blog writeup on the architecture if it would be helpful for people. My focus has been on mobile.
The main pieces I've integrated for mouse.dev inspired by claude/cursor was plan mode, agent questions, subagents, pre/post hooks, context compaction, repo-local skills, and permission modes. So mostly tools like enter_plan_mode, ask_user_question, and spawn_subagent, plus .mouse/skills and .mouse/plans.
One nice feature is continuity. If you’re working on desktop and save a plan to .mouse/plans, you can pick it up later on mobile with cloud agents, or do the reverse. You can plan something from your phone, then when you’re back at your desk, review it/build it. That was my initial goal with this project because I've found the plan act loop so helpful.
Mouse Cloud Agents is mostly an OpenCode-based harness, but everything routes through our MCP/event system so it’s mobile-first and provider-agnostic.
I intentionally skipped a lot of IDE and Claude Code style desktop features. The bet is that this new style of coding is becoming less “edit files in an IDE” and more steer a capable coding chatbot.
Would love to hear from anyone reading that's iterating on harness architecture, it's been really fun to work on.
aaurelions 24 hours ago [-]
Another very cost-effective option is Ollama Cloud. In a month of use, I only hit the 5-hour limit once, when I ran 8 agents simultaneously for 2 hours.
tomw1808 11 hours ago [-]
for me its unbearably slow - especially with deepseek v4 pro. Is that just myself? I literally signed up and canceled again, because for one prompt I needed around 5 minutes to get 600 tokens back (via ollama launch claude --mode ...)
23 hours ago [-]
kopirgan 20 hours ago [-]
On which tier?
postatic 23 hours ago [-]
definitely worth it - have both ollama cloud, opencode and hermes running to test them all out, working great so far.
mgoetzke 12 hours ago [-]
I liked pi.dev but why is registering endpoints and models not as simple as possible ? Or am i missing something ? I always have to fiddle with the config file.
miroljub 12 hours ago [-]
Editing config files is not necessary. Just do /login from your session, choose your provider, and there you go.
cpursley 14 hours ago [-]
How does the kimi subscription compare to Codex and Claude Code in terms of how much mileage you get for the pricing? I mean, I see the prices but not sure how usage that buys.
taytus 8 hours ago [-]
Kimi feels almost limitless. I have the $40/month plan and I've never ever remotely close to hitting the limit. Using opus as the orchestrator.
cpursley 3 hours ago [-]
I've had some good results with Kimi in Opencode. Can you tell me more about using Opus as orchestrator - what type of harness setup?
DeathArrow 17 hours ago [-]
>If you're looking for Claude Code alternatives, I would first suggest looking into pi.dev or opencode for your harness.
While those are nice, Claude Code has the largest amount of plugins and skills I want to use.
wizhi 17 hours ago [-]
Aren't skills just literal plaintext files? Why not just copy them?
DeathArrow 13 hours ago [-]
Yes, they are .md files but they can rely on builtin behaviors in the harness or on plugins.
bakugo 23 hours ago [-]
> I would first suggest looking into pi.dev
Looked into this one. Thought it was suspicious that it only had 7 open issues on github. Turns out they have a bot that auto-closes every single issue just because.
> Maintainers review auto-closed issues daily and reopen worthwhile ones. Issues that do not meet the quality bar below will not be reopened or receive a reply.
Seems like not an unreasonable way to deal with the problem of large numbers of low quality issues being submitted.
oefrha 17 hours ago [-]
If that process actually happens then there’s absolutely no reason not to have the reviewing maintainer close it after review instead. The only reasonable conclusion is that documented process is aspirational at best and vibed itself at worst.
cromka 19 hours ago [-]
Sounds like a perfect way to agitate the community going against the established culture like that.
mikeocool 2 hours ago [-]
::shrug::
I quite like pi and learned about the contribution guidelines a while after using it. Hard to complain about people making software for free using a process that works for them.
I will say having a project with a slim issue tracker that only contains things the maintainers have blessed (and thus presumably are more likely to get worked on) is pretty nice.
If you’re googling for a bug your hitting and come across and auto closed issue, you know you have to submit a higher quality issue to get it looked at, rather than just +1ing the existing lacking issue.
63stack 12 hours ago [-]
The established culture on a lot of projects is that you open an issue, and then you have to keep pinging it every week otherwise the stale bot closes it with "this issue is stale, closing, but your contribution is very important to us".
It's crap either way.
altmanaltman 19 hours ago [-]
But how is it any different from keeping them open?
Like if they are going to sort through all the issues eventually (like they claim), why not just close the ones that are not worthy when they get to them instead of closing all by default?
Is it just so that the project doesnt have open issues on its github page? But they are open issues in reality because the maintainer will eventually go through them?
Nothing is "unreasonable" in the sense that an open source project should have the right to do what it wants with its rules but its definitely a weird stance.
mellosouls 17 hours ago [-]
They address the decision at the end of those contribution guidelines linked above, specifically:
It is a guardrail against burnout and tracker spam
Its based on their implied perspective that the majority of submissions don't follow those guidelines which helps determine their quality threshold.
> But how is it any different from keeping them open?
If all open issues are actionable items, that makes expected workload a lot easier to handle.
If most open issues are actually in "needs triage / needs review" state, you lose the signal from the noise.
The issue tracker for a project exists primarily as a tool for maintainers, not for outsiders. Yes, the maintainers could change their workflow to create a new view that only shows triaged tickets.
Or, they could ensure the default 'open' view serves their needs.
vanchor3 15 hours ago [-]
Somehow going through closed issues just to reopen them sounds like more effort than just using the built in label system which is made for this purpose, but maybe that's just me.
oarsinsync 14 hours ago [-]
I can either change my daily workflow to accommodate the noisy herd, or I can change the noisy herd to accommodate my daily workflow.
__cayenne__ 20 hours ago [-]
The maintainer, Mario, sometimes declares the repo is on an “issue holiday” where issues are auto closed. This particular holiday is because there is a big refactor coming up. In non holiday periods issues can be reported as normal.
These aren't normal times though. Low quality submissions have suddenly seriously amplified with the use of LLMs. There has to be a response so project quality and maintainer sanity can be preserved.
LPisGood 22 hours ago [-]
The idea is for it to he extremely minimal which strikes me as a very opinionated stance, and not opinions I agree with.
justinhj 20 hours ago [-]
It's a very interesting project. Many popular open source projects are inundated with poor quality issues and prs, hence the defences they are starting to erect.
rsanek 14 hours ago [-]
>DeepSeek V4 Pro scores 96.4% on LiveCodeBench and costs $0.87/M output tokens
This is a heavily subsidized price and will only last until the end of the month: "The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC." [0]
The "supported backends" table is also deceiving -- while OpenRouter's server's may be in the US, the only way to get the $0.44/$0.87 pricing is to pass through to the DeepSeek API, which of course is China-based. [1]
I do think the model is quite good, I myself use it through Ollama Cloud for simple tasks. But I think some folks have bought in a little too much to the marketing hype around it.
They expect inference prices to structurally drop once they receive their big batch of Huawei Ascend chips by the second half of the year.
syntex 15 hours ago [-]
Not sure you can replace Claude with DeepSeek V4 that easily and have same results.
From what I see while building my own agentic system in Elixir, the problem is in training for your specific harness/contracts. Claude/GPT-style models seem to be trained around very specific contracts used by the harness like tool call formats, planning structure, patching, reading files, recovering from errors, and knowing when to stop.
In practice, you either need a very strong general model that can infer and follow those contracts (expensive), or a weaker model that has been fine-tuned / trained specifically on your own agent contracts. Otherwise, the whole thing becomes flaky very quickly. And I suspect with Deepseek V4 you may get last options.
vidarh 13 hours ago [-]
There are certainly quirks, but identifying and conforming to those quirks is not that complex. E.g. I had Kimi "fix" my harness to work better with Kimi by pointing it at the (open source) kimi-cli + web search and telling it to figure out which differences might matter (it made compaction more aggressive, and worked around some known looping issues (by triggering compaction if it spotted looping tool calls). Largely addressing the quirks tend to harden the harness for other models too. But, yeah, it is more work to make the smaller models work with instead of against the harness.
dandaka 13 hours ago [-]
I hope they collaborate with open source harness providers (Pi, Opencode) and train models with those. So next generations will have better integration and better overall quality.
cpursley 14 hours ago [-]
I love to learn more about the system you’re building out in Elixir and your learnings if any of it is public.
syntex 11 hours ago [-]
Its semi public, but I probably publish it soon once its less embarrassing.
Its an Elixir agent runtime with a thin Go TUI (bubble-tea). Im building it mostly to explore agent orchestration: planner/workers/finalizer flows, local file/code-edit tools, MCP tools, permission gates, run context, compaction, and eventually larger swarms. Erlang/Elixir is interesting for this because the actor/supervision model maps pretty naturally to lots of isolated agents and long-running supervised tasks.
As i said, The main lesson so far is that everything around contracts is much more fragile than I expected unless you use a very strong model. Planners return Markdown instead of JSON, tools get called with subtly wrong args, subagents repeat broken tool calls, finalizers lie about success after workers failed. And various permissions may be interpreted by agents in unexpexted way
I also started with too many modes too early instead of making agentic path extremely solid. That made me understand better why these codebases become huge: there are endless corner cases if you want a harness to work across models, providers, tools...
Stronger models hide a lot of harness weakness and weaker models expose. Making weaker models good enough requires a surprising amount of contract hardening. But that hardening tends to make the system better for stronger models too.
Also elixir http stack was causing a lot of problems (needed to use gun eventually)
cpursley 10 hours ago [-]
Thank you for the writeup, integration with a TUI sounds great. Have you played with Jido (it's built on ReqLLM)? OpenAI also has an interesting Elixir orchestration project (surprisingly).
syntex 10 hours ago [-]
Thanks! I wasn't aware of Jido or ReqLLM before. ReqLLM looks especially promising, and I will likely use it. At the moment, I'm only integrated with OpenRouter.
cpursley 9 hours ago [-]
Yeah, I use ReqLLM in my product and some side projects. So far, so good.
mihailupu 10 hours ago [-]
[dead]
o10449366 14 hours ago [-]
Idk, my recent experience with Claude is that 4.7 barely knows how to use basic bash tools - how to properly check when programs have finished running, even basic stuff like how to run pytest suites and read the failed tests from the output without re-running the suite to specifically look for them. It's shockingly dumb for all of the tooling they've built into Claude Code (the useless Monitoring tool that blocks bash polling/sleeping that actually works, etc.).
I finally get fed up and started using GPT 5.5 the past 4 days and its a breath a fresh air despite feeling much more minimal. With Claude I had to write so many hooks to enforce behaviors it wouldn't remember and it lacked common sense on. GPT 5.5 does a much better job with things like knowing the AWS CDK CLI can hang on long CloudFormation deployments and it should actively check the deployment status using CloudFormation API rather than hanging for 30+ minutes - and it does this all without asking.
Maybe there's better tooling built into Codex too, but at least on the surface level it seems like how smart the model is makes a significant difference because Claude has more tools than I can count and still struggles to use "grep".
Edit: Like just now - I can't tell you how many times I day I see this sequence:
"Sorry, I'll run in parallel"
"Error editing file"
"File must be read first"
Repeat 10x for the 10 subagents Claude spawned and then it gets stuck until you press escape and it says "You rejected the parallel agents. Running directly now"
rirze 10 hours ago [-]
I’m finding great success having Claude design and review code but having codex actually implement it.
dalekkskaro 14 hours ago [-]
[flagged]
connorwhitlock 9 hours ago [-]
[dead]
iosjunkie 5 hours ago [-]
Get comfortable with Deepseek's privacy policy for using this for anything serious.
"To improve and develop the Services and to train and improve our technology, such as our machine learning models and algorithms. Including by monitoring interactions and usage across your devices, analyzing how people are using it, and training and improving our technology."
> Claude Code is the best autonomous coding agent.
If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.
So it's quite the opposite. Claude Code is arguably the worst harness to run models with.
DaanDL 16 hours ago [-]
Okay, but not all results on there are valid, ForgeCode for instance has been cheating in the past:
It's surprisingly easy to hit $200 worth of tokens even at ~$1/M token though. No matter how many times I do the math the coding plans are the better value.
ojr 10 hours ago [-]
I don't think it surprisingly easy at all unless you are running multiple background tasks overnight, I think Gemini will eventually be the default agentic coding agent in the future when it comes to price and efficiency because its backed by sound money.
I don't use the Claude Code harness, grep instead of using a combination of vector search is super expensive and not sure how their read file implementation is. I built my own harness for example that restrict reads and writes in a token efficient manner. Building your own harness will always be the cheapest option in the long run.
If you're okay with sonnet level performance, this sounds like a straight upgrade. But I find that sonnet messes up too much, that it ends up not being worth cost optimizing down to using it or another sonnet-level model. Glad to have this as an option though
2ndorderthought 1 days ago [-]
A lot of people are having good experiences doing things like using opus for designing and using locally hosted qwen3.6 for implementation.
I could see a serious cost reduction story by using opus for design and deepseek for implementation.
Personally I would avoid anthropic entirely. But I get why people don't.
girvo 1 days ago [-]
Like me: that’s what I do. Either Opus 4.7 or GLM 5.1 for planning, write it out to a markdown file, then farm it out to Qwen 3.6 27B on my DGX Spark-alike using Pi. Works amusingly well all things considered.
brianjking 21 hours ago [-]
How are you interacting with GLM 5.1? Via the Claude Code harness? I really wish they'd release a fully multimodal model already.
girvo 12 hours ago [-]
Through Pi, mostly! Also my own for-fun agent I wrote
Yeah so would I, I do miss having vision tools sadly.
2ndorderthought 1 days ago [-]
How is glm 5.1? I have t tried it yet but have been meaning too
girvo 23 hours ago [-]
It's surprisingly good. Beats MiniMax 2.7 and Qwen 3.5 Plus in my testing (I haven't tested 3.6 plus though), quite handily. It's far better than Sonnet, and often equivalent to Opus for the web development and OCaml tasks I'm using it for. It definitely isn't Opus 4.7, but its far good enough to earn it's keep and is substantially cheaper.
sshine 22 hours ago [-]
I agree with this. And also: it uses more thinking time to reach this. So while you get a lot of tokens on their plan, the peak 3x token usage multiplier + the extra thinking means you run into the rate limit anyways.
girvo 22 hours ago [-]
True, though the $20 equivalent used for planning only I don’t hit those limits often, vs Claude where the Pro can literally hit limits with a single prompt haha
amunozo 11 hours ago [-]
Did you compare it with Kimi K2.6 and DeepSeek V4 Pro? I feel they're similar but as GLM is more expensive, I am not using it much.
girvo 2 hours ago [-]
[dead]
Alifatisk 17 hours ago [-]
I second this, glm-5.1 is incredible.
aftbit 1 days ago [-]
What hardware are you using to power this?
girvo 23 hours ago [-]
> DGX Spark-alike
Probably wasn't clear enough if you don't know what that is already, apologies
It's an Asus Ascent GX10, which is a little mini PC with 128GB of LPDDR5X as shared memory for an Nvidia GB10 "Blackwell" (kind of, it's a long story) GPU and a MediaTek ARM CPU
aftbit 23 hours ago [-]
Ah yeah I saw that, I was just curious which particular mini-PC you were using. I was considering picking up one of the various AI Max 395 boxes before the RAMpocalypse but didn't take the plunge. Thanks for the response!
girvo 22 hours ago [-]
I heavily considered one of the AMD Strix Halo boxes, but part of the reason I wanted this was to learn CUDA :)
sterlind 21 hours ago [-]
pulls up chair
could you tell me the long story?
edit: or wait, is it quasi-Blackwell the way all DGX Sparks are quasi-Blackwell? like the actual silicon is different but it's sorta Blackwell-shaped?
girvo 20 hours ago [-]
Yeah exactly. Shader model 121 is different to SM 120 (consumer Blackwell) and is different again to data centre Blackwell SM100.
The promise of this chip was “write your code locally, then deploy to the same architecture in the data centre!”
Which is nonsense, because the GB10 is better described as “Hopper with Blackwell characteristics” IMO.
Still great hardware, especially for the price and learning. But we are only just starting to get the kernels written to take advantage of it, and mma.sync is sad compared to tcgen05
chrsw 23 hours ago [-]
I keep re-learning this lesson: I chug along with a lesser model then throw a problem at it that's too complex. Then I try different models until I give up and bring in Opus 4.6 to clean up.
energy123 20 hours ago [-]
It's not even that much cheaper, GPT 5.5 is about 2x more expensive per task than Deepseek v4 Pro when you adjust for less token usage, according to Artificial Analysis. Doesn't seem worth it to me.
cpursley 14 hours ago [-]
Are we talking pay as you go API or vs plans?
energy123 13 hours ago [-]
Pay as you go API rates.
brianwawok 23 hours ago [-]
And I keep using Opus to like, make git commits. Really just need a smart router that is actually smart, vs having to micromanage model
sterlind 21 hours ago [-]
the problem is managing the contexts. your session might fit in Opus, but will that smaller model you dispatch the git commit to fit? even so, will it eat too much on prefill? do you keep compactions around for this, or RAG before dispatch or something? how do you button back up the response?
all doable but all vaguely squishy and nuanced problems operationally. kinda like harness design in general.
maxdo 20 hours ago [-]
This is the problem: you need the best model, not just a good one, for:
- Good architecture, which requires reading specs, code, etc. reads like: lots of tokens in/out
- Bug fixing — same, plus logs, e.g. datadog
Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.
testing gets more and more complicated. Take a look at opencode go, and you see this:
>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash
and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?
JSR_FDED 17 hours ago [-]
I'd argue that you need the model that's good enough, not the best.
Culonavirus 18 hours ago [-]
We're not yet at a point of saturation when all the frontier models would be of somewhat comparable "intelligence" and we could decide which to use based on other factors (speed, effective context window etc.), so I honestly don't see why would you (as a company or an employee) not use the best available model with the highest (or at least second highest) thinking effort. The fees are not exactly cheap, but not that expensive either.
nyssos 17 hours ago [-]
Agreed that we're not at saturation, but we don't have a canonical "best" either. For example ChatGPT 5.5 + Codex is, in my experience, vastly superior to Opus 4.7 + Claude Code at sufficiently well-specified Haskell, but equally vastly inferior at correctly inferring my intent. Deepseek may well have its own niche, though I haven't used it enough to guess what it might be.
willio58 23 hours ago [-]
I don’t find this with sonnet at all. As long as I have a solid Claude.md and periodically review the output and enforce good code practices via basic CI gates I’ve rarely ever found myself having to switch to opus
2ndorderthought 22 hours ago [-]
You might be surprised then at how good cheaper models solve your problems
mohsen1 17 hours ago [-]
This has been my experience working on tsz.dev. Only Opus 4.7 and GPT 5.5 can really be productive for the remaining test cases.
24 hours ago [-]
sbinnee 9 hours ago [-]
After some time replacing gemini 3 flash preview with deepseek v4 flash for a chat model, the biggest difference is the auto reasoning effort. Gemini flash is super fast and perfect for a chat model. But when I need some thought experiments with a handful of constraints, it struggles a bit and I switch to sonnet. But with deepseek v4 flash, it can do long complex reasoning and it gets things often right. Generating a lot of reasoning tokens means that it takes a lot of time of course. But I am happy to find a cheaper model and excited to try something other than gemini flash. Gemini flash has been so good that I was locked on it for a while.
izietto 15 hours ago [-]
Just want to say that I faced this very problem the last week, I discovered OpenCode agent and it works great, with DeepSeek and other models. Try it out guys.
No sub-agents. There's many ways to do this. Spawn Pi instances via tmux, or build your own with extensions, or install a package that does it your way.
No permission popups. Run in a container, or build your own confirmation flow with extensions inline with your environment and security requirements.
No plan mode. Write plans to files, or build it with extensions, or install a package.
No built-in to-dos. Use a TODO.md file, or build your own with extensions.
No background bash. Use tmux. Full observability, direct interaction.
eloisant 6 hours ago [-]
I've tried both, I prefer OpenCode. I get that Pi is more customizable but I prefer a nice, complete out-of-the-box experience and that's what OpenCode provides.
dopeepsreaddocs 22 hours ago [-]
Did... Did you just ask an AI to one-shot something that normally amounts to no more than setting two env variables?
alexdns 1 days ago [-]
obviously vibe coded ( co authored ) + the prices dont even match
2ndorderthought 1 days ago [-]
It's going to be real hard to find headlines that weren't vibe coded from here on out unfortunately.
SchemaLoad 24 hours ago [-]
Unless I actually know the author I assume everything here is vibeslop and full of mistakes.
Maybe I need to switch to some news publication that actually does real research and writing still. Because public forums like this have been completely destroyed by LLMs.
cyanydeez 1 days ago [-]
welp, pack it it in boys, it was nice conceptualizing all you as real humans on the internet. I guess I'll just have to go touch grass if I want to feel parasocial.
dragontamer 24 hours ago [-]
I mean, we have the tech and community to actually build in person meetups and sign CRT certificates, right?
If we touch grass in person and swap certificate requests, we can actually rebuild a trust network.
This is a pretty old problem with regards to clubs / secret societies and whatnot. And with certificates / PKI, our modern security tools have solved all the technical problems.
2ndorderthought 23 hours ago [-]
I wish I could be invited to a secret club of guaranteed humans. Someone hand me a certificate next time you see me! Also don't stab me kthxbye
cyanydeez 23 hours ago [-]
Unfortunately, a lot of whats happening in the tech world seems to be from some super serious AI cults, so not sure goin offline like this is any better.
2ndorderthought 23 hours ago [-]
Yea but we could have fun. Play some dnd. Drink tea or whiskey. Eat pizza pie. Light saber battle. Buy a megaphone and hang out at a street corner telling passerbys they are perfectly acceptable and worthy of kindness and love
inciampati 23 hours ago [-]
poorly vibe coded. machines can check details easily, use them.
xbmcuser 6 hours ago [-]
With how cheap and fast DeepSeek is and now using its free chat over the last few days I just cancelled my claude subscription so far I was using the chat interface only but I might just have to learn how to use the api.
dzink 10 hours ago [-]
Tried DeepSeek V4 Pro and Flash on Open Router and they worked fine - flash might have actually produced a better result, but also the same prompt across different inference providers produced the same result. Then tried DS4 Pro again via tinfoil.sh and got the same design but littered with random Chinese characters in the code. Tinfoil pegs prompt data as private / not trained on. Do know know DS4 providers that are verifiably private and not training on your prompts and outputs?
iloveplants 7 hours ago [-]
i'm confused. how does this prove, or even insinuate, that they're training on your data? and if they were, there's no way it would appear that soon
nclin_ 21 hours ago [-]
Is claude code the best coding harness? Anyone running evals on that?
ahmadyan 21 hours ago [-]
In my anecdotal experience, it is not. Same model, opus, works better in 3P harnesses such as Factory Droid or Amp.
Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.
viking123 10 hours ago [-]
codex is way more subsidized currently, much more generous limits even for 20 dollars a month
arendtio 5 hours ago [-]
I always wonder why everybody is so hyped about the cli harnesses. I am quite happy with Cursor (even though I used to do coding with tmux+vim in the old days). One of the features I like very much is that I can switch between different models without having to have an account with every model provider.
The main thing I am missing is having it on all my devices (like using it via the smartphone). They have a solution for that too (the cloud version), but that is too expensive IMHO. The last time I checked out Claude Code, it was too expensive for my taste as well (burning through tokens like there was no tomorrow).
jedisct1 10 hours ago [-]
Ironically, there are plenty of evals showing that it’s not actually that great. Even with Anthropic models, other harnesses are more efficient, both in terms of the number of problems solved and token usage.
Significant regressions also seem to be introduced from time to time after releases.
The UX is great, and if you need a kitchen sink packed with tons of features, even though you’ll probably only end up using a fraction of them, it’s fine.
But if you want something that performs well, you’re better off using something like Opencode or Swival.dev
DeathArrow 16 hours ago [-]
Terminal Bench is testing agent harness.
The best two are Codex and Forge Code.
However I am using plugins and skills that are only compatible with Claude Code or work best with Claude Code.
So, for me, Claude Code with plugins like claude-meme, Context Mode, Superpowers and Get Shit Done is better than other tools.
I think everyone should test multiple models and multiple agent harness for his specific needs, codebase and way of working.
orliesaurus 1 days ago [-]
Is there a way to do this directly by using claudecode CLI (which I already have installed) and openrouter??
ANTHROPIC_BASE_URL="https://openrouter.ai/api" ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY" ANTHROPIC_DEFAULT_SONNET_MODEL="deepseek/deepseek-v4-flash" CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 claude
mobeigi 8 hours ago [-]
Been using Claude as a harness to OpenRouter for a while now. It's a nice setup if you don't mind API based billing.
23 hours ago [-]
23 hours ago [-]
gnat 1 days ago [-]
This repo's README explains how it works and you can do it yourself. claude looks for environment variables that say which API endpoint to talk to, which key to pass, which model name to use for haiku/sonnet/opus-level workloads, etc.
21 hours ago [-]
lukaslalinsky 17 hours ago [-]
I've been using DeepSeek v4 pro as an alternative to Claude models and for the first time I can see it as a real replacement. With the other Chinese models, I was missing something, but DeepSeek seems good enough for the kind of development I want to do.
ultrasandwich 10 hours ago [-]
Using this "out-of-the-box" with an OpenRouter subscription using DeepSeekv4, I just blew through 15 dollars in 45 minutes on a moderate sized code base, just making a plan and executing a refactoring of an upload pipeline to use a state machine. Not really seeing the cost savings for real-world work tbh.
jay1996523 17 hours ago [-]
Claude code can already use the DeepSeek API, so what are the advantages of this tool?
999900000999 20 hours ago [-]
I just spent half my day getting CUDA and LLAMA to work with my 5070TI.
I was able to use it in agent mode with Roo, I stopped after having it write out a plan, but I'll continue when I have more time.
Deepseek feels less likely to do a straight up rug pull since you can self host with enough money, but I'm still more excited about local solutions.
Usually I just need grunt work done. I'm not solving difficult problems.
sowild_fun 19 hours ago [-]
Using a bunch of CLIs to work with DeepSeek V4, I've found that Langcli is the best fit for DeepSeek V4. For programming tasks, the cache hit rate is above 95%.
Not only can it seamlessly and dynamically switch between DeepSeek V4 Flash, V4 Pro, and other mainstream models within the same context, but it is also 100% compatible with Claude Code.
sfewfweg 19 hours ago [-]
Langcli + deepseek v4 is very good
18 hours ago [-]
zkmon 13 hours ago [-]
Next claude news (trump style): Recent versions of Claude code no longer allow talking to other models, or helping with any code that has the goal of moving away from anthropic models.
connorwhitlock 11 hours ago [-]
96.4% on LiveCodeBench is impressive but LiveCodeBench is single-shot. The interesting test is multi-turn agentic — has anyone benchmarked DeepSeek V4 Pro vs Opus on SWE-bench Verified or similar where the cheaper model has to be more decisive about tool use over 30+ turns? Curious if there's a cliff at higher tool-call depths.
rib3ye 10 hours ago [-]
How is this different than using ollama to launch Claude with
ollama launch claude --model deepseek-v4-pro:cloud
jlokier 4 hours ago [-]
That should work, but you need an Ollama Cloud account and for much usage you need to pay the Ollama Cloud $20/mo or $100/mo subscription fee.
Using the API from DeepSeek or OpenRouter also requires a fee, but it's a different, pay-as-you-go payment model.
vagab0nd 23 hours ago [-]
This has become a problem for me. I like trying new things. But I also know that in about a week, there's going to be a better/cheaper setup. And a week after that. And ideally I'd like to get some coding done when I'm not tinkering with the tools.
So I think I'll stay with CC for now.
kordlessagain 21 hours ago [-]
CC has the ability to use Ollama as well, which includes the ability for Ollama to proxy to Ollama's cloud models. It's brilliant, and works with a single Ollama command that doesn't mess with CC at all (so you can run them at the same time).
I'm wondering why DeepSeek didn't create an AI coding agent like Kimi Code.
sourcecodeplz 14 hours ago [-]
Think it is because they focus on what they know best. Coding a LLM harness is nothing spectacular.
diamondosas 12 hours ago [-]
I have a question. does anyone have a problem with switihng context between AI and your terminal
shay1607m 14 hours ago [-]
Interesting setup
do you have any benchmarks on:
- token usage over time
- failures/retry rates
would be great to see how it behaves in production
Copenjin 16 hours ago [-]
I wonder if openrouter will replicate that 120x caching, I suppose they will?
dbeley 20 hours ago [-]
Honestly with the likes of Opencode / pi / hermes I don't really find the "Claude Code agent loop" part particularly interesting.
The edge Anthropic has on others lies on its models performance. CLI tooling (and obviously pricing) is definitely not better than others.
danny_codes 20 hours ago [-]
Except the model isn't particularly better anymore, as compared to the newest wave of FOSS models
DeathArrow 17 hours ago [-]
You don't need Deep Claude. Claude Code is working with any model that exposes an endpoint for an Anthropic compatible API.
I am using Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.
Lihh27 23 hours ago [-]
the wrapper is basically env var glue. You’re still betting the whole loop on Anthropic's closed client.
game_the0ry 23 hours ago [-]
Cost engineering [1] will be the next hot topic for AI.
[1] A fancier way of saying "reducing cost."
tgautot 15 hours ago [-]
Nice, it's quite usefull to have a project like this which streamlines the setup necessary to use other "brains" in claude code "body". I personally will give this a try, but Ijust find the message on pricing a bit disingenuous, the deepseek price of "$0.87/M output tokens" is a discount, and this setup anyways needs a calude.ai subscription offering claude code, which now is 100$/month min.
itrunsdoomguy 14 hours ago [-]
Does it play Doom?
triyambakam 19 hours ago [-]
And if I don't care about cost, what about actual performance?
akartit 13 hours ago [-]
why not opencode with deepseek?
Tanxsinxlnx 15 hours ago [-]
does it support aws bedrock provider
karel-3d 14 hours ago [-]
Can I... somehow run this locally? DeepSeek is opensource? Do I even need their API key?
(I have no experience with running anything locally, maybe it's a stupid question)
zozbot234 14 hours ago [-]
Waiting for official support in llama.cpp. There is a fork that can run a lightly quantized (Q2 expert layers) DeepSeek V4 Flash in 128GB RAM without offloading weight fetches from disk.
karel-3d 13 hours ago [-]
Ouch. Can't run that on my M4 mac with 48GB RAM.
dukeofdoom 20 hours ago [-]
Is there some way to make claude/codex beep when it finishes a task.
0xjeffro 15 hours ago [-]
[dead]
esafak 1 days ago [-]
Why wouldn't you use something open source like OpenCode, which already support DSv4 and has more features than CC?
CharlesW 23 hours ago [-]
Coding harnesses make a big difference, and OpenCode is notably less effective than Claude Code (1) in my experience, (2) with the models I've tried it on. (I've not yet tried it with DSv4.)
dlx 24 hours ago [-]
As someone who does use other models with CC, I am curious about opencode, what extra features does it have that you find essential?
esafak 24 hours ago [-]
I like being able to add a wide array of models, define perms for agents and subagents, turn MCPs on and off at will, and be able to fix bugs I find in it.
dlx 23 hours ago [-]
fair enough...any drawbacks that you've found?
esafak 23 hours ago [-]
Its UI isn't as slick, and it has bugs, but so does CC and you can submit a PR to have them fixed in OC.
DeathArrow 16 hours ago [-]
If using something open source, I'd say Forge Code has better results than Open Code, at least according to Terminal Bench.
ttoinou 1 days ago [-]
More features than CC ?
Also opencode tracks you by default. Its not safe. Every first prompt you send is routed through their servers, logged and they can use your data however they want
sedawkgrep 24 hours ago [-]
I thought this was debunked awhile ago. ?
esafak 24 hours ago [-]
I could not find any evidence of prompt logging. The code is open; can you point me to it?
ttoinou 12 hours ago [-]
When you send your first message and opencode generates a title automatically, it uses the free opencode/gpt-5-nano model from the OpenCode Zen provider at https://opencode.ai/zen/v1
Ask AI to look at the code for you
We don't have any proof they're not logging everything, by default we should assume they are
OpenCode Zen is their own service provider; they see your prompt the same way OpenAI and Anthropic do. Somebody has to see your code, unless you host your models yourself.
ttoinou 4 hours ago [-]
So, by default, they read our prompts. What else is hiding ?
1 days ago [-]
portsentinel 20 hours ago [-]
I am now thinking how far can agentic AI can go how far we can achieve
fHr 22 hours ago [-]
layer on layer on layer to refactor bunch of lines xD
23 hours ago [-]
2ndorderthought 1 days ago [-]
Oh shoot now the next CC upgrade will blow your subscription for doing this
morpheos137 24 hours ago [-]
anthropic messed up big time harness works with any muh commodity LLM, meanwhile VCs were duped on the myth of FOOM AGI, probably not a cooincidence Anthropic is enmeshed with the scifi fan fic forum known as lesswrong. The world wants useful tools. The bay area bubble in contrast thrives on Mythos.
hgyyy 23 hours ago [-]
I think OAI and Anthropic will be ok for a year or two. But after that If they still continue to earn revenues from selling tokens to firms/software engineers they will be in serious trouble.
The American firms are not demonstrating escape velocity and as long as china offers something somewhat comparable and offers it at a very low price to compensate for any difference in quality, they will not be generating enough in cash flows to finance reinvestment. I highly doubt they’ll be able to continue raising external financing for numerous periods from here on out - they gotta start showing strong financials and that they are running away from the open source models.
LeFantome 22 hours ago [-]
The performance gap will likely close as Chinese hardware improves. This is happening very rapidly.
Already DeepSeek v4 is being hosted on Huawei Ascend 950. What do you think those cost relative to NVIDIA gear?
morpheos137 22 hours ago [-]
I wouldnt put it past the US gov to ban foreign models. they tried to ban tiktok. what is being demosrrated here is silicon valley can not withstand a competitive market.
LeFantome 22 hours ago [-]
Good luck banning Open Source models.
Not only that but other countries are very unlikely to follow suit, so it is just a straight-up productivity tax on the US.
morpheos137 21 hours ago [-]
Yeah see the Nvidia china us gov self own. The assumption seems to be 1.4 billion people in a middle income country are dependent on 300 million for tech.
bwfan123 21 hours ago [-]
> anthropic messed up big time harness works with any muh commodity LLM
that surprised me too. The intelligence is at the client, and by making that open, anthropic has commoditized the coding agent.
ManuelSuarez 8 hours ago [-]
I mean, it's not like claude code is the most impressive agent of the pack.
eleion_ai 5 hours ago [-]
[flagged]
JosefAlbers 4 hours ago [-]
[dead]
maxothex 7 hours ago [-]
[flagged]
nikhilpareek13 11 hours ago [-]
[flagged]
vorsken 10 hours ago [-]
[flagged]
claud_ia 13 hours ago [-]
[dead]
alattaran 1 days ago [-]
[flagged]
dividendflow 14 hours ago [-]
[flagged]
kk_mors 20 hours ago [-]
[dead]
aliljet 19 hours ago [-]
[dead]
11 hours ago [-]
volume_tech 23 hours ago [-]
[flagged]
deadbabe 23 hours ago [-]
I had a call with our CTO and we are pivoting away from Claude Code to DeepClaude because the cost savings are too substantial to ignore.
This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.
Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)
Also, you can use HuggingFace Inference for DeepSeek V4 or Kimi K2.6, both of which work quite well and route through providers that you can enable/disable (like Together AI, DeepInfra, etc) - you'll have to check their policies but I think most of those commercial inference providers claim to not train on your data either.
I set ZDR to true, and it only calls from the third party ZDR Deepseek APIs. Bit more expensive, but my client wants it.
Why is that?
IIRC, USA data protection protects data of US citizens only, foreigners data is not protected, and the companies are not even allowed to disclose when they collect those data.
HN is an American site. If you look at the US government, it is going to fearmonger about anything China related, because they haven't had a genuine competitor for decades and they're scared and lashing out. Most US news just parrot the government line, sometimes more so than state TV would, and so it reflects here.
I also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies.
This is true. There are also many of us who do care.
This brings to mind something I heard recently about the so-called "Rule of 10". There will always be 3 people who support you, 3 people who are against you, and 4 people who have no idea what's going on and don't care.
Don't just focus on the 3 people who are being negative.
People can have problems with America and I'm fine with that. But pretending China isn't subsidizing industry (land, education, transportation) in a predatory fashion is silly. Too many companies have gone out of business because of it. We can all have our friends in China without pretending the CCP is playing the ballgame fairly. The government doesn't need to point it out. That doesn't even get into influence operations (which are especially easy on platforms like this.)
Seriously - there may be a day in the future where Western nations and China get along but it really can't/won't happen while it's holding all the industry and trying to take the Services income as well.
But tell me more.
That doesn't mean China should not be criticized. But to me it's clear that the China blame game is not about a genuine concern for Chinese people or its neighbors, it's about trying to keep it down because China should never dared to rise in the first place.
Anglo Saxons and maybe the French should be in charge and the rest should be resource colonies. It very much feels like that Western mentality is still there.
Agreed, the US definitely needs to do some introspection to sort out its own shit (and stop spraying it on everyone else).
However, that does not mean that China gets a pass. Fundamentally, the Chinese model of governance does not protect the individual. For all its faults, the US model is based upon the idea of individual liberty, which acts as a touchstone and allows it to self-correct whenever it goes to far in the wrong direction. That's something the Chinese model does not do, and means that, short of a revolution, it will continue to be an authoritarian state with all of the malignant features that entails.
Look, am not here to defend the Chinese model but I find it interesting how convinced you seem that individualism is the right model for everyone.
While I would generally agree with you, I have spoken to many from poorer countries who say that they prefer to trade some individualism for a steady hand of economic development and lifting the population from poverty. That is the Chinese model.
These people would argue that they can reclaim more and more individual freedom as the country gets richer and more self confident.
I am not saying they are right, but looking at a nominal democracy like India and a nominal autocracy like China, I know which government works better as far as raising the living standards of its population and it's not the Indian one.
My hope is that China will continue to liberalize on its own. Forcing it will likely only reverse the gains.
Individualism also leads to the sort of healthcare system the US had or Skid Row. So it's not all roses.
What's the point of this kind of statement for you? Does this help you understand others or just continue to drive the wedge in? Where are you from? Ask yourself can the statement,
"many {of my country} don't care one bit what happens to foreigners, be it by action of the government or companies" not be read as true?
There are self-absorbed, disinterested, uncompassionate people in every country which will satisfy your "many" qualifier.
However looking at the polls in the US gives you a fairly decent idea that there's a decent chunk of people that seem to get off on violence towards non-Americans. Why do you think ICE went with the violent tactics it did?
As to
> What's the point of this kind of statement for you? Does this help you understand others or just continue to drive the wedge in?
The point is to maybe make some Americans ask what it is that they can do to reform the government they have the most direct influence over (their own) instead of trying to reassure themselves that theirs is still better than country's X.
----
alias clauded='ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic ANTHROPIC_AUTH_TOKEN=$DEEPSEEK_API_KEY ANTHROPIC_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_HAIKU_MODEL=deepseek-v4-flash CLAUDE_CODE_SUBAGENT_MODEL=deepseek-v4-flash CLAUDE_CODE_EFFORT_LEVEL=max claude'
----
I doubt that the opus, sonnet, and haiku model args actually matter if you want to omit them.
I run this on a VPS that has no other credentials or project access so I can give it the skip permissions arg.
Also, the author checked in their apparently effective social media advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc... (which seems to be working)
If it's something that'll be irrelevant in 3 months why should anyone care about it?
All I'm trying to say that generalizing like suggested might exclude some useful things.
$ ollama launch claude --model qwen3.5 [1]
[1] https://docs.ollama.com/integrations/claude-code
is flash version on level of gpt 5.4 mini
PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
My understanding is that DeepSeek V4 Pro is going to be uniquely good at working on consumer platforms with SSD offload, due to its extremely lean KV cache. Even if you only have a slow consumer platform, you should be able to just let it grind on a huge batch of tasks in parallel entirely unattended, and wake up later to a finished job.
AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow. (This used to be considered a bad idea with bulky KV caches, due to concerns about wearout and performance, but the much leaner KV cache of DeepSeek V4 changes the picture quite radically.)
AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.
And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.
> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.
Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?
> Any reason that this idea was considered bad?
Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.
The token cost makes it tempting to use for token-heavy tasks like this
Model inference was never subsidized. Inference is highly profitable with today's prices. That's why you have many inference providers. My guess, the prices for inference will go down, as more competition starts cutting the margin.
It's model training, development and R&D that cost a lot, and companies creating closed models don't have any business model except astroturfing and trying to recover training costs through overpriced inference.
I have been wanting to try CC with different models since Opus went downhill last month..
What limitations or issues have you noticed when using DeepSeek with Claude Code if any?
https://api-docs.deepseek.com/quick_start/agent_integrations...
Also the author checked in their advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc...
I don't see why anyone would want to use it for any other model than Claude instead of OpenCode or Pi.
[1] https://api-docs.deepseek.com/guides/anthropic_api
But I guess Anthropic is just not capable of implementing the OpenAI API compatible client in Claude Code.
Why would you keep using CC CLI if you want to use the much cheaper DeepSeek v4 models (Flash and Pro): isn't it the opportunity to kiss CC CLI goodbye and use something not controlled by Anthropic?
Anyone here successfully moved from CC CLI to a fully open-source project? I'm asking this as a Claude Code CLI (Sonnet/Opus) user. My "stack" is all open-source: from Linux to Emacs to what-have-you. I'd rather also have open-weight models and a fully open-source (not controlled by a single company) AI CLI.
Any suggestion for something that works well? (by "well" I mean "as well as Claude Code CLI", which is not a panacea so my bar ain't the end of the world either).
It's proven useful for me, and I figure others might appreciate how light of a shim it is between you and the models.
https://islo.dev/
https://www.incredibuild.com/blog/why-we-built-islo-ai-codin...
i'll have to look more at islo, I definitely think its a growing space with alot of opportunity for those that participate and solve problems.
I personally didn't find it to be competitve with Claude Code as a harness. Can I ask how you modified it to perform better?
-Claude-style subagents -an MCP layer for higher-level tools -Cursor-style control plane modes like Ask, Plan, Debug, and Build.
The MCP layer lets the harness use things like GitHub file/code read, PR creation, web search/fetch, structured user questions, plan-mode switching, user skills, and subagents.
So the improvement is mostly from better ui/ux orchestration and tool access. There's some things from hermes that are interesting as well.
Most of my focus has been on applying this stack to sandboxed cloud agents so you can properly code and work from mobile devices.
I can't definitively say that the stack is better or worse than Claude code, more just tuned for my use case I guess.
Personally I use it for the TUI, it's way better than Claude Code's one.
The main pieces I've integrated for mouse.dev inspired by claude/cursor was plan mode, agent questions, subagents, pre/post hooks, context compaction, repo-local skills, and permission modes. So mostly tools like enter_plan_mode, ask_user_question, and spawn_subagent, plus .mouse/skills and .mouse/plans.
One nice feature is continuity. If you’re working on desktop and save a plan to .mouse/plans, you can pick it up later on mobile with cloud agents, or do the reverse. You can plan something from your phone, then when you’re back at your desk, review it/build it. That was my initial goal with this project because I've found the plan act loop so helpful.
Mouse Cloud Agents is mostly an OpenCode-based harness, but everything routes through our MCP/event system so it’s mobile-first and provider-agnostic.
I intentionally skipped a lot of IDE and Claude Code style desktop features. The bet is that this new style of coding is becoming less “edit files in an IDE” and more steer a capable coding chatbot.
Would love to hear from anyone reading that's iterating on harness architecture, it's been really fun to work on.
While those are nice, Claude Code has the largest amount of plugins and skills I want to use.
Looked into this one. Thought it was suspicious that it only had 7 open issues on github. Turns out they have a bot that auto-closes every single issue just because.
I honestly have no words.
> Maintainers review auto-closed issues daily and reopen worthwhile ones. Issues that do not meet the quality bar below will not be reopened or receive a reply.
Seems like not an unreasonable way to deal with the problem of large numbers of low quality issues being submitted.
I quite like pi and learned about the contribution guidelines a while after using it. Hard to complain about people making software for free using a process that works for them.
I will say having a project with a slim issue tracker that only contains things the maintainers have blessed (and thus presumably are more likely to get worked on) is pretty nice.
If you’re googling for a bug your hitting and come across and auto closed issue, you know you have to submit a higher quality issue to get it looked at, rather than just +1ing the existing lacking issue.
It's crap either way.
Like if they are going to sort through all the issues eventually (like they claim), why not just close the ones that are not worthy when they get to them instead of closing all by default?
Is it just so that the project doesnt have open issues on its github page? But they are open issues in reality because the maintainer will eventually go through them?
Nothing is "unreasonable" in the sense that an open source project should have the right to do what it wants with its rules but its definitely a weird stance.
It is a guardrail against burnout and tracker spam
Its based on their implied perspective that the majority of submissions don't follow those guidelines which helps determine their quality threshold.
https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...
If all open issues are actionable items, that makes expected workload a lot easier to handle.
If most open issues are actually in "needs triage / needs review" state, you lose the signal from the noise.
The issue tracker for a project exists primarily as a tool for maintainers, not for outsiders. Yes, the maintainers could change their workflow to create a new view that only shows triaged tickets.
Or, they could ensure the default 'open' view serves their needs.
https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...
- https://news.ycombinator.com/item?id=46930961 - https://github.com/mitchellh/vouch
This is a heavily subsidized price and will only last until the end of the month: "The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC." [0]
The "supported backends" table is also deceiving -- while OpenRouter's server's may be in the US, the only way to get the $0.44/$0.87 pricing is to pass through to the DeepSeek API, which of course is China-based. [1]
I do think the model is quite good, I myself use it through Ollama Cloud for simple tasks. But I think some folks have bought in a little too much to the marketing hype around it.
[0] https://api-docs.deepseek.com/quick_start/pricing [1] https://openrouter.ai/deepseek/deepseek-v4-pro/providers
From what I see while building my own agentic system in Elixir, the problem is in training for your specific harness/contracts. Claude/GPT-style models seem to be trained around very specific contracts used by the harness like tool call formats, planning structure, patching, reading files, recovering from errors, and knowing when to stop.
In practice, you either need a very strong general model that can infer and follow those contracts (expensive), or a weaker model that has been fine-tuned / trained specifically on your own agent contracts. Otherwise, the whole thing becomes flaky very quickly. And I suspect with Deepseek V4 you may get last options.
Its an Elixir agent runtime with a thin Go TUI (bubble-tea). Im building it mostly to explore agent orchestration: planner/workers/finalizer flows, local file/code-edit tools, MCP tools, permission gates, run context, compaction, and eventually larger swarms. Erlang/Elixir is interesting for this because the actor/supervision model maps pretty naturally to lots of isolated agents and long-running supervised tasks.
As i said, The main lesson so far is that everything around contracts is much more fragile than I expected unless you use a very strong model. Planners return Markdown instead of JSON, tools get called with subtly wrong args, subagents repeat broken tool calls, finalizers lie about success after workers failed. And various permissions may be interpreted by agents in unexpexted way
I also started with too many modes too early instead of making agentic path extremely solid. That made me understand better why these codebases become huge: there are endless corner cases if you want a harness to work across models, providers, tools...
Stronger models hide a lot of harness weakness and weaker models expose. Making weaker models good enough requires a surprising amount of contract hardening. But that hardening tends to make the system better for stronger models too.
Also elixir http stack was causing a lot of problems (needed to use gun eventually)
I finally get fed up and started using GPT 5.5 the past 4 days and its a breath a fresh air despite feeling much more minimal. With Claude I had to write so many hooks to enforce behaviors it wouldn't remember and it lacked common sense on. GPT 5.5 does a much better job with things like knowing the AWS CDK CLI can hang on long CloudFormation deployments and it should actively check the deployment status using CloudFormation API rather than hanging for 30+ minutes - and it does this all without asking.
Maybe there's better tooling built into Codex too, but at least on the surface level it seems like how smart the model is makes a significant difference because Claude has more tools than I can count and still struggles to use "grep".
Edit: Like just now - I can't tell you how many times I day I see this sequence:
"Sorry, I'll run in parallel"
"Error editing file"
"File must be read first"
Repeat 10x for the 10 subagents Claude spawned and then it gets stuck until you press escape and it says "You rejected the parallel agents. Running directly now"
"To improve and develop the Services and to train and improve our technology, such as our machine learning models and algorithms. Including by monitoring interactions and usage across your devices, analyzing how people are using it, and training and improving our technology."
https://cdn.deepseek.com/policies/en-US/deepseek-privacy-pol...
If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.
So it's quite the opposite. Claude Code is arguably the worst harness to run models with.
https://debugml.github.io/cheating-agents/#sneaking-the-answ...
Yes and this is a temporary discount which increases to 3.48 USD on 2026/05/31 15:59 UTC.
Source: https://api-docs.deepseek.com/quick_start/pricing
I don't use the Claude Code harness, grep instead of using a combination of vector search is super expensive and not sure how their read file implementation is. I built my own harness for example that restrict reads and writes in a token efficient manner. Building your own harness will always be the cheapest option in the long run.
My own harness, minimalistic GUI gets the job done nothing too fancy https://slidebits.com/isogen
I could see a serious cost reduction story by using opus for design and deepseek for implementation.
Personally I would avoid anthropic entirely. But I get why people don't.
Yeah so would I, I do miss having vision tools sadly.
Probably wasn't clear enough if you don't know what that is already, apologies
It's an Asus Ascent GX10, which is a little mini PC with 128GB of LPDDR5X as shared memory for an Nvidia GB10 "Blackwell" (kind of, it's a long story) GPU and a MediaTek ARM CPU
could you tell me the long story?
edit: or wait, is it quasi-Blackwell the way all DGX Sparks are quasi-Blackwell? like the actual silicon is different but it's sorta Blackwell-shaped?
The promise of this chip was “write your code locally, then deploy to the same architecture in the data centre!”
Which is nonsense, because the GB10 is better described as “Hopper with Blackwell characteristics” IMO.
Still great hardware, especially for the price and learning. But we are only just starting to get the kernels written to take advantage of it, and mma.sync is sad compared to tcgen05
all doable but all vaguely squishy and nuanced problems operationally. kinda like harness design in general.
Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.
testing gets more and more complicated. Take a look at opencode go, and you see this:
>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash
and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?
No sub-agents. There's many ways to do this. Spawn Pi instances via tmux, or build your own with extensions, or install a package that does it your way.
No permission popups. Run in a container, or build your own confirmation flow with extensions inline with your environment and security requirements.
No plan mode. Write plans to files, or build it with extensions, or install a package.
No built-in to-dos. Use a TODO.md file, or build your own with extensions.
No background bash. Use tmux. Full observability, direct interaction.
Maybe I need to switch to some news publication that actually does real research and writing still. Because public forums like this have been completely destroyed by LLMs.
If we touch grass in person and swap certificate requests, we can actually rebuild a trust network.
This is a pretty old problem with regards to clubs / secret societies and whatnot. And with certificates / PKI, our modern security tools have solved all the technical problems.
Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.
The main thing I am missing is having it on all my devices (like using it via the smartphone). They have a solution for that too (the cloud version), but that is too expensive IMHO. The last time I checked out Claude Code, it was too expensive for my taste as well (burning through tokens like there was no tomorrow).
Significant regressions also seem to be introduced from time to time after releases.
The UX is great, and if you need a kitchen sink packed with tons of features, even though you’ll probably only end up using a fraction of them, it’s fine.
But if you want something that performs well, you’re better off using something like Opencode or Swival.dev
The best two are Codex and Forge Code.
However I am using plugins and skills that are only compatible with Claude Code or work best with Claude Code.
So, for me, Claude Code with plugins like claude-meme, Context Mode, Superpowers and Get Shit Done is better than other tools.
I think everyone should test multiple models and multiple agent harness for his specific needs, codebase and way of working.
https://api-docs.deepseek.com/quick_start/agent_integrations...
I was able to use it in agent mode with Roo, I stopped after having it write out a plan, but I'll continue when I have more time.
Deepseek feels less likely to do a straight up rug pull since you can self host with enough money, but I'm still more excited about local solutions.
Usually I just need grunt work done. I'm not solving difficult problems.
Not only can it seamlessly and dynamically switch between DeepSeek V4 Flash, V4 Pro, and other mainstream models within the same context, but it is also 100% compatible with Claude Code.
ollama launch claude --model deepseek-v4-pro:cloud
Using the API from DeepSeek or OpenRouter also requires a fee, but it's a different, pay-as-you-go payment model.
So I think I'll stay with CC for now.
If you are interested, I've built an agentic terminal that helps manage these types of things better: https://deepbluedynamics.com/hyperia
do you have any benchmarks on: - token usage over time - failures/retry rates
would be great to see how it behaves in production
The edge Anthropic has on others lies on its models performance. CLI tooling (and obviously pricing) is definitely not better than others.
I am using Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.
[1] A fancier way of saying "reducing cost."
(I have no experience with running anything locally, maybe it's a stupid question)
Also opencode tracks you by default. Its not safe. Every first prompt you send is routed through their servers, logged and they can use your data however they want
Ask AI to look at the code for you
We don't have any proof they're not logging everything, by default we should assume they are
OpenCode Zen is their own service provider; they see your prompt the same way OpenAI and Anthropic do. Somebody has to see your code, unless you host your models yourself.
The American firms are not demonstrating escape velocity and as long as china offers something somewhat comparable and offers it at a very low price to compensate for any difference in quality, they will not be generating enough in cash flows to finance reinvestment. I highly doubt they’ll be able to continue raising external financing for numerous periods from here on out - they gotta start showing strong financials and that they are running away from the open source models.
Already DeepSeek v4 is being hosted on Huawei Ascend 950. What do you think those cost relative to NVIDIA gear?
Not only that but other countries are very unlikely to follow suit, so it is just a straight-up productivity tax on the US.
that surprised me too. The intelligence is at the client, and by making that open, anthropic has commoditized the coding agent.