Deepseek nerds, have you tried any CLI/Coding agent integrations with it?

aanes_appreciator [he/him, comrade/them]@hexbear.net · 2 months ago

Deepseek nerds, have you tried any CLI/Coding agent integrations with it?

piccolo [any]@hexbear.net · 2 months ago

I’ve been using and enjoying Zed. It’s like Cursor but open source and written from scratch in Rust instead of a VS Code fork. Highly recommend checking it out for seeing what coding agents are capable of. Zed recommends using Claude as the model, and in my experience Claude does work the best, but other models work quite well as well.

Zed, in my opinion, is the right way to do coding agents. It’s part of your chat interface, and it can whip out an entire project from the ground up, but it really feels like you’re in the driver’s seat still, and it will show you diffs of what it changed and you can approve or reject them easily. It feels natural to use and also the best way to actually be a software engineer using a tool.

I don’t really like DeepSeek as a coding model (though I do like it in a no-tools chat context), but GLM 4.6 (another open source Chinese model, made by a company called z.ai) has been very good imo and it works great in Zed. You can use it through OpenRouter or the official z.ai API. If you just buy tokens directly, it costs pennies and you only pay for what you use. Also, z.ai has a very cheap monthly plan for like $3 USD with a lot of usage.

Though, imo OpenRouter makes the most sense for trying out LLMs because you can just put a few dollars into their service and use it for GLM 4.6 or DeepSeek-V3.2 or Qwen-Coder or Claude or any other model and see which ones you like the most. Also, new models come out of China very regularly and are often quite good (Minimax-M2 just came out recently and seems super promising), and if your money is in OpenRouter you can just try it out easily. Plus, if you ever, at any point, put $10 into an OpenRouter account, you get 1000 free messages/day across their free models forever, which is the most generous “free” tier I’ve found. And then you can use those credits to play around with LLMs as coding agents. I put $10 in my account a few months ago and still haven’t run out with reasonable usage as a coding agent.

One caveat about OpenRouter is that you should set it up to prioritize the first party APIs under the hood (e.g. if you want to use GLM 4.6, you can configure OpenRouter to use the official z.ai servers). This is because many providers quantize the models, which makes them less good, and also then it’s a 3rd party getting your data rather than the (possibly CPC-affililated) company that actually makes the model.

IMO, local models don’t make sense for the average person. A computer capable of running DeepSeek-V3.2 at full precision would cost well over $50k (iirc). Of course, you can’t be sure your data isn’t being mined without running it locally, but I’m just writing open source software for fun, so I don’t really care that much.

Please feel free to ask any questions if you want more info! I have strong opinions about this stuff and I’m happy to share.

Inui [comrade/them]@hexbear.net · 2 months ago

I also use Zed and I hook it up to small Qwen models like the new 4B 2507 Thinking model through LM Studio. I just have a 3070 with 8GB of VRAM, and 32GB of regular ram to help offload.

Small models leapfrog each other every 6 months or so kind of like computer hardware and phones. I don’t think you really need to be able to use full 30B or higher models to get use out of them. They’re of course smarter, but if you’re mainly using them as tools for syntax correction, error finding, and small problems like that vs. asking it to spit an entire program, the small ones are pretty good.

piccolo [any]@hexbear.net · 2 months ago

Fair enough, I must say I haven’t tried local models (tfw no GPU ;_;). I guess my take is that if it costs a tenth of a cent on OpenRouter to use a SOTA open source model, I might as well do that, but I can see the appeal of local models for easier queries.

aanes_appreciator [he/him, comrade/them]@hexbear.net · 1 month ago

Maybe in a few years I’ll have the hardware to host AI locally. Right now my home server is just an i5-9500 (or 8500 i forgor 💀) for the iGPU transcoding on Jellyfin. 3070 would double my power draw immediately at full tilt.

Thankfully, I dont think the mental capacity and knowledge to write code is going to balloon in the future, so eventually something will be adequate enough for my purposes but for local hosting!

Moidialectica [he/him, comrade/them]@hexbear.net · 2 months ago

oh wait you’re meant to use zed with AI? what have I been doing then

piccolo [any]@hexbear.net · 2 months ago

I mean, it’s a good editor without those features too, but imo they have a really good implementation of the LLM stuff

Inui [comrade/them]@hexbear.net · 2 months ago

There’s a single setting to turn off all the AI integrations if you don’t want them. I like Zed even without them because its very fast and lightweight, but still tend to prefer Kate for the same reason.

Its been a big focus of Zed’s development though. Hooking things up to VSCode and other IDEs can be a pain in the ass with tons of extensions, but Zed has built-in functionality for ollama, LM Studio, etc for local models. You can also connect them to APIs for ChatGPT, Claude, etc if you pay for pro accounts of those.

sudoer777@lemmy.ml · 2 months ago

A computer capable of running DeepSeek-V3.2 at full precision would cost well over $50k (iirc).

I saw a Hacker News article of someone running Deepseek R1 for $6k, although still too expensive IMO

GLM 4.6

I need to try this.

Minimax-M2

Kimi K2 Thinking also just came out

piccolo [any]@hexbear.net · 2 months ago

Honestly I have not been super impressed with Kimi K2. Maybe the thinking model is better, but in my experience GLM has been much better. I’ll still give it a shot though.

I saw a Hacker News article of someone running Deepseek R1 for $6k, although still too expensive IMO

Do you remember what their setup was? My guess would be CPU inference with a metric fuckton of RAM if they were running it at the full quantization, which could work but would be pretty slow. But for $6k it’d be impossible to buy enough VRAM to run it at full quant on GPUs.

sudoer777@lemmy.ml · 2 months ago

This was the article: https://news.ycombinator.com/item?id=42861628

They bought 768 GB of RAM and 2 AMD EPYC CPUs, for 6-8 tokens per second

aanes_appreciator [he/him, comrade/them]@hexbear.net · 1 month ago

Zed seems interesting! Ill check it out.

I’m a Flutter developer so I do enjoy a lot of the tooling you get inside VSCode, but that’s not something I need, just helps me stay productive sans-AI.