[HN Gopher] MCP in LM Studio | |
___________________________________________________________________ | |
MCP in LM Studio | |
Author : yags | |
Score : 230 points | |
Date : 2025-06-25 17:27 UTC (1 days ago) | |
web link (lmstudio.ai) | |
w3m dump (lmstudio.ai) | |
| chisleu wrote: | |
| Just ordered a $12k mac studio w/ 512GB of integrated RAM. | |
| | |
| Can't wait for it to arrive and crank up LM Studio. It's | |
| literally the first install. I'm going to download it with | |
| safari. | |
| | |
| LM Studio is newish, and it's not a perfect interface yet, but | |
| it's fantastic at what it does which is bring local LLMs to the | |
| masses w/o them having to know much. | |
| | |
| There is another project that people should be aware of: | |
| https://github.com/exo-explore/exo | |
| | |
| Exo is this radically cool tool that automatically clusters all | |
| hosts on your network running Exo and uses their combined GPUs | |
| for increased throughput. | |
| | |
| Like HPC environments, you are going to need ultra fast | |
| interconnects, but it's just IP based. | |
| dchest wrote: | |
| I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to | |
| generate summaries and tags for my vibe-coded Bloomberg | |
| Terminal-style RSS reader :-) It works fine (the laptop gets | |
| hot and slow, but fine). | |
| | |
| Probably should just use llama.cpp server/ollama and not waste | |
| a gig of memory on Electron, but I like GUIs. | |
| minimaxir wrote: | |
| 8 GB of RAM with local LLMs in general is iffy: a 8-bit | |
| quantized Qwen3-4B is 4.2GB on disk and likely more in | |
| memory. 16 GB is usually the minimum to be able to run decent | |
| models without compromising on heavy quantization. | |
| hnuser123456 wrote: | |
| But 8GB of Apple RAM is 16GB of normal RAM. | |
| | |
| https://www.pcgamer.com/apple-vp-says-8gb-ram-on-a- | |
| macbook-p... | |
| arrty88 wrote: | |
| I concur. I just upgraded from m1 air with 8gb to m4 with | |
| 24gb. Excited to run bigger models. | |
| diggan wrote: | |
| > m4 with 24gb | |
| | |
| Wow, that is probably analogous to 48GB on other systems | |
| then, if we were to ask an Apple VP? | |
| vntok wrote: | |
| Not sure what Apple VPs have to do with the tech but | |
| yeah, pretty much any core engineer you ask at Apple will | |
| tell you this. | |
| | |
| Here is a nice article with some info about what memory | |
| compression is and how it works: https://arstechnica.com/ | |
| gadgets/2013/10/os-x-10-9/#page-17 | |
| | |
| It's been a hard technical problem but is pretty much | |
| solved by now since its first debut in 2012-2013. | |
| minimaxir wrote: | |
| Interestingly it was AI (Apple Intelligence) that was the | |
| primary reason Apple abandoned that hedge. | |
| dchest wrote: | |
| It's 4-bit quantized (Q4_K_M, 2.5 GB) and still works well | |
| for this task. It's amazing. I've been running various | |
| small models on this 8 GB Air since the first Llama and | |
| GPT-J, and they improved so much! | |
| | |
| macOS virtual memory works well on swapping in and out | |
| stuff to SSD. | |
| karmakaze wrote: | |
| Nice. Ironically well suited for non-Apple Intelligence. | |
| incognito124 wrote: | |
| > I'm going to download it with Safari | |
| | |
| Oof you were NOT joking | |
| noman-land wrote: | |
| Safari to download LM Studio. LM Studio to download models. | |
| Models to download Firefox. | |
| teaearlgraycold wrote: | |
| The modern ninite | |
| sneak wrote: | |
| I already got one of these. I'm spoiled by Claude 4 Opus; local | |
| LLMs are slower and lower quality. | |
| | |
| I haven't been using it much. All it has on it is LM Studio, | |
| Ollama, and Stats.app. | |
| | |
| > _Can 't wait for it to arrive and crank up LM Studio. It's | |
| literally the first install. I'm going to download it with | |
| safari._ | |
| | |
| lol, yup. same. | |
| chisleu wrote: | |
| Yup, I'm spoiled by Claude 3.7 Sonnet right now. I had to | |
| stop using opus for plan mode in my Agent because it is just | |
| so expensive. I'm using Gemini 2.5 pro for that now. | |
| | |
| I'm considering ordering one of these today: https://www.newe | |
| gg.com/p/N82E16816139451?Item=N82E1681613945... | |
| | |
| It looks like it will hold 5 GPUs with a single slot open for | |
| infiniband | |
| | |
| Then local models might be lower quality, but it won't be | |
| slow! :) | |
| kristopolous wrote: | |
| The GPUs are the hard things to find unless you want to pay | |
| like 50% markup | |
| sneak wrote: | |
| That's just what they cost; MSRP is irrelevant. They're | |
| not hard to find, they're just expensive. | |
| evo_9 wrote: | |
| I was using Claude 3.7 exclusively for coding, but it sure | |
| seems like it got worse suddenly about 2-3 weeks back. It | |
| went from writing pretty solid code I had to make only | |
| minor changes to, to being completely off its rails, | |
| altering files unrelated to my prompt, undoing fixes from | |
| the same conversation, reinventing db access and ignoring | |
| existing coding 'standards' established in the existing | |
| codebase. Became so untrustworthy I finally gave OpenAi O3 | |
| a try and honestly, I was pretty surprised how solid it has | |
| been. I've been using o3 since, and I find it generally | |
| does exactly what I ask, esp if you have a well established | |
| project with plenty of code for it to reference. | |
| | |
| Just wondering if Claude 3.7 has seemed differently lately | |
| for anyone else? Was my go to for several months, and I'm | |
| no fan of OpenAI, but o3 has been rock solid. | |
| jessmartin wrote: | |
| Could be the prompt and/or tool descriptions in whatever | |
| tool you are using Claude in that degraded. Have | |
| definitely noticed variance across Cursor, Claude Code, | |
| etc even with the exact same models. | |
| | |
| Prompts + tools matter. | |
| esskay wrote: | |
| Cursor became awful over the last few weeks so it's | |
| likely them, no idea what they did to their prompt but | |
| its just been incredibly poor at most tasks regardless of | |
| which model you pick. | |
| sneak wrote: | |
| Me too. (re: Claude; I haven't switched models.) It sucks | |
| because I was happily paying >$1k/mo in usage charges and | |
| then it all went south. | |
| sneak wrote: | |
| I'm firehosing about $1k/mo at Cursor on pay-as-you-go and | |
| am happy to do it (it's delivering 2-10k of value each | |
| month). | |
| | |
| What cards are you gonna put in that chassis? | |
| teaearlgraycold wrote: | |
| What are you going to do with the LLMs you run? | |
| chisleu wrote: | |
| Currently I'm using gemini 2.5 and claude 3.7 sonnet for | |
| coding tasks. | |
| | |
| I'm interested in using models for code generation, but I'm | |
| not expecting much in that regard. | |
| | |
| I'm planning to attempt fine tuning open source models on | |
| certain tool sets, especially MCP tools. | |
| prettyblocks wrote: | |
| I've been using openwebui and am pretty happy with it. Why do | |
| you like lm studio more? | |
| truemotive wrote: | |
| Open WebUI can leverage the built in web server in LM Studio, | |
| just FYI in case you thought it was primarily a chat | |
| interface. | |
| prophesi wrote: | |
| Not OP, but with LM Studio I get a chat interface out-of-the- | |
| box for local models, while with openwebui I'd need to | |
| configure it to point to an OpenAI API-compatible server | |
| (like LM Studio). It can also help determine which models | |
| will work well with your hardware. | |
| | |
| LM Studio isn't FOSS though. | |
| | |
| I did enjoy hooking up OpenWebUI to Firefox's experimental AI | |
| Chatbot. (browser.ml.chat.hideLocalhost to false, | |
| browser.ml.chat.provider to localhost:${openwebui-port}) | |
| s1mplicissimus wrote: | |
| i recently tried openwebui but it was so painful to get it to | |
| run with local model. that "first run experience" of lm | |
| studio is pretty fire in comparison. can't really talk about | |
| actually working with it though, still waiting for the 8GB | |
| download | |
| prettyblocks wrote: | |
| Interesting. I run my local llms through ollama and it's | |
| zero trouble to get that working in openwebui as long as | |
| the ollama server is running. | |
| diggan wrote: | |
| I think that's the thing. Compared to LM Studio, just | |
| running Ollama (fiddling around with terminals) is more | |
| complicated than the full E2E of chatting with LM Studio. | |
| | |
| Of course, for folks used to terminals, daemons and so on | |
| it makes sense from the get go, but for others it | |
| seemingly doesn't, and it doesn't help that Ollama | |
| refuses to communicate what people should understand | |
| before trying to use it. | |
| noman-land wrote: | |
| I love LM Studio. It's a great tool. I'm waiting for another | |
| generation of Macbook Pros to do as you did :). | |
| imranq wrote: | |
| I'd love to host my own LLMs but I keep getting held back from | |
| the quality and affordability of Cloud LLMs. Why go local | |
| unless there's private data involved? | |
| mycall wrote: | |
| Offline is another use case. | |
| seanmcdirmid wrote: | |
| Nothing like playing around with LLMs on an airplane | |
| without an internet connection. | |
| asteroidburger wrote: | |
| If I can afford a seat above economy with room to | |
| actually, comfortably work on a laptop, I can afford the | |
| couple bucks for wifi for the flight. | |
| seanmcdirmid wrote: | |
| If you are assuming that your Hainan airlines flight has | |
| wifi that isn't behind the GFW, even outside of cattle | |
| class, I have some news for you... | |
| sach1 wrote: | |
| Getting around the GFW is trivially easy. | |
| seanmcdirmid wrote: | |
| ya ya, just buy a VPN, pay the yearly subscription, and | |
| then have them disappear the week after you paid. Super | |
| trivially frustrating. | |
| vntok wrote: | |
| VPN providers are first and foremost trust businesses. | |
| Why would you choose and pay one that is not well | |
| established and trusted? Mine have been there for more | |
| than a decade by now. | |
| | |
| Alternatively, you could just set up your own (cheaper?) | |
| VPN relay on the tiniest VPS you can rent on AWS or IBM | |
| Cloud, right? | |
| MangoToupe wrote: | |
| Woah there Mr Money, slow down with these assumptions. A | |
| computer is worth the investment. But paying a cent extra | |
| to airlines? Unacceptable. | |
| diggan wrote: | |
| Some of us don't have the most reliable ISPs or even | |
| network infrastructure, and I say that as someone who | |
| lives in Spain :) I live outside a huge metropolitan area | |
| and Vodafone fiber went down twice this year, not even | |
| counting the time the country's electricity grid was down | |
| for like 24 hours. | |
| PeterStuer wrote: | |
| Same. For 'sovereignty ' reasons I eventually will move to | |
| local processing, but for now in development/prototyping the | |
| gap with hosted LLM's seems too wide. | |
| diggan wrote: | |
| There are some use cases I use LLMs for where I don't care a | |
| lot about the data being private (although that's a plus) but | |
| I don't want to pay XXXEUR for classifying some data and I | |
| particularly don't want to worry about having to pay that | |
| _again_ if I want to redo it with some changes. | |
| | |
| Using local LLMs for this I don't worry about the price at | |
| all, I can leave it doing three tries per "task" without | |
| tripling the cost if I wanted to. | |
| | |
| It's true that there is an upfront cost but way easier to get | |
| over that hump than on-demand/per-token costs, at least for | |
| me. | |
| zackify wrote: | |
| I love LM studio but I'd never waste 12k like that. The memory | |
| bandwidth is too low trust me. | |
| | |
| Get the RTX Pro 6000 for 8.5k with double the bandwidth. It | |
| will be way better | |
| marci wrote: | |
| You can't run deepseek-v3/r1 on the RTX Pro 6000, not to | |
| mention the upcomming 1 million context qwen models, or the | |
| current qwen3-235b. | |
| tymscar wrote: | |
| Why would they pay 2/3 of the price for something with 1/5 of | |
| ram? | |
| | |
| The whole point of spending that much money for them is to | |
| run massive models, like the full R1, which the Pro 6000 cant | |
| zackify wrote: | |
| Because waiting forever for initial prompt processing with | |
| realistic number of MCP tools enabled on a prompt is going | |
| to suck without the most bandwidth possible | |
| | |
| And you are never going to sit around waiting for anything | |
| larger than the 96+gb of ram that the RTX pro has. | |
| | |
| If you're using it for background tasks and not coding it's | |
| a different story | |
| johndough wrote: | |
| If the MPC tools come first in the conversation, it | |
| should be technically possible to cache the activations, | |
| so you do not have to recompute them each time. | |
| pests wrote: | |
| Initial prompt processing with a large static context | |
| (system prompt + tools + whatever) could technically be | |
| improved by checkpointing the model state and reusing for | |
| future prompts. Not sure if any tools support this. | |
| tucnak wrote: | |
| https://docs.vllm.ai/projects/production- | |
| stack/en/latest/tut... | |
| storus wrote: | |
| M3 Ultra GPU is around 3070-3080 for the initial token | |
| processing. Not great, not terrible. | |
| MangoToupe wrote: | |
| > And you are never going to sit around waiting for | |
| anything larger than the 96+gb of ram that the RTX pro | |
| has. | |
| | |
| Am I the only person that gives aider instructions and | |
| leaves it alone for a few hours? This doesn't seem that | |
| difficult to integrate into my workflow. | |
| diggan wrote: | |
| > Am I the only person that gives aider instructions and | |
| leaves it alone for a few hours? | |
| | |
| Probably not, but in my experience, if it takes longer | |
| than 10-15 minutes it's either stuck in a loop or down | |
| the wrong rabbit hole. But I don't use it for vibe coding | |
| or anything "big scope" like that, but more focused | |
| changes/refactors so YMMV | |
| t1amat wrote: | |
| (Replying to both siblings questioning this) | |
| | |
| If the primary use case is input heavy, which is true of | |
| agentic tools, there's a world where partial GPU offload with | |
| many channels of DDR5 system RAM leads to an overall better | |
| experience. A good GPU will process input many times faster, | |
| and with good RAM you might end up with decent output speed | |
| still. Seems like that would come in close to $12k? | |
| | |
| And there would be no competition for models that do fit | |
| entirely inside that VRAM, for example Qwen3 32B. | |
| storus wrote: | |
| RTX Pro 6000 can't do DeepSeek R1 671B Q4, you'd need 5-6 of | |
| them, which makes it way more expensive. Moreover, MacStudio | |
| will do it at 150W whereas Pro 6000 would start at 1500W. | |
| diggan wrote: | |
| > Moreover, MacStudio will do it at 150W whereas Pro 6000 | |
| would start at 1500W. | |
| | |
| No, Pro 6000 pulls max 600W, not sure where you get 1500W | |
| from, that's more than double the specification. | |
| | |
| Besides, what is the token/second or second/token, and | |
| prompt processing speed for running DeepSeek R1 671B on a | |
| Mac Studio with Q4? Curious about those numbers, because I | |
| have a feeling they're very far off each other. | |
| smcleod wrote: | |
| RTX is nice, but it's memory limited and requires to have a | |
| full desktop machine to run it in. I'd take slower inference | |
| (as long as it's not less than 15tk/s) for more memory any | |
| day! | |
| diggan wrote: | |
| I'd love to see more Very-Large-Memory Mac Studio | |
| benchmarks for prompt processing and inference. The few | |
| benchmarks I've seem either missed to take prompt | |
| processing into account, didn't share exact weights+setup | |
| that were used or showed really abysmal performance. | |
| storus wrote: | |
| If the rumors about splitting CPU/GPU in new Macs are true, | |
| your MacStudio will be the last one capable of running DeepSeek | |
| R1 671B Q4. It looks like Apple had an accidental winner that | |
| will go away with the end of unified RAM. | |
| phren0logy wrote: | |
| I have not heard this rumor. Source? | |
| prophesi wrote: | |
| I believe they're talking about the rumors by an Apple | |
| supply chain analyst, Ming-Chi Kuo. | |
| | |
| https://www.techspot.com/news/106159-apple-m5-silicon- | |
| rumore... | |
| diggan wrote: | |
| Seems Apple is waking up to the fact that if it's too | |
| easy to run weights locally, there really isn't much | |
| sense to having their own remote inference endpoints, so | |
| time to stop the party :) | |
| whatevsmate wrote: | |
| I did this a month ago and don't regret it one bit. I had a | |
| long laundry list of ML "stuff" I wanted to play with or | |
| questions to answer. There's no world in which I'm paying by | |
| the request, or token, or whatever, for hacking on fun | |
| projects. Keeping an eye on the meter is the opposite of having | |
| fun and I have absolutely nowhere I can put a loud, hot GPU | |
| (that probably has "gamer" lighting no less) in my fam's small | |
| apartment. | |
| datpuz wrote: | |
| I genuinely cannot wrap my head around spending this much money | |
| on hardware that is dramatically inferior to hardware that | |
| costs half the price. MacOS is not even great anymore, they | |
| stopped improving their UX like a decade ago. | |
| minimaxir wrote: | |
| LM Studio has quickly become the best way to run local LLMs on an | |
| Apple Silicon Mac: no offense to vllm/ollama and other terminal- | |
| based approaches, but LLMs have _many_ levers for tweaking output | |
| and sometimes you need a UI to manage it. Now that LM Studio | |
| supports MLX models, it 's one of the most efficient too. | |
| | |
| I'm not bullish on MCP, but at the least this approach gives a | |
| good way to experiment with it for free. | |
| nix0n wrote: | |
| LM Studio is quite good on Windows with Nvidia RTX also. | |
| boredemployee wrote: | |
| care to elaborate? i have rtx 4070 12gb vram + 64gb ram, i | |
| wonder what models I can run with it. Anything useful? | |
| nix0n wrote: | |
| LM Studio's model search is pretty good at showing what | |
| models will fit in your VRAM. | |
| | |
| For my 16gb of VRAM, those models do not include anything | |
| that's good at coding, even when I provide the API | |
| documents via PDF upload (another thing that LM Studio | |
| makes easy). | |
| | |
| So, not really, but LM Studio at least makes it easier to | |
| find that out. | |
| boredemployee wrote: | |
| ok, ty for the reply! | |
| pzo wrote: | |
| I just wish they did some facelifting of UI. Right now is too | |
| colorfull for me and many different shades of similar colors. I | |
| wish they copy some color pallet from google ai studio or from | |
| trae or pycharm. | |
| chisleu wrote: | |
| > I'm not bullish on MCP | |
| | |
| You gotta help me out. What do you see holding it back? | |
| minimaxir wrote: | |
| tl;dr the current hype around it is a solution looking for a | |
| problem and at a high level, it's just a rebrand of the Tools | |
| paradigm. | |
| mhast wrote: | |
| It's "Tools as a service", so it's really trying to make | |
| tool calling easier to use. | |
| ijk wrote: | |
| Near as I can tell it's supposed to make _calling other | |
| people 's_ tools easier. But I don't want to spin up an | |
| entire server to invoke a calculator. So far it seems to | |
| make _building_ my own local tools harder, unless there | |
| 's some guidebook I'm missing. | |
| xyc wrote: | |
| It's a protocol that doesn't dictate how you are calling | |
| the tool. You can use in-memory transport without needing | |
| to spin up a server. Your tool can just be a function, | |
| but with the flexibility of serving to other clients. | |
| ijk wrote: | |
| Are there any examples of that? All the documentation I | |
| saw seemed to be about building an MCP server, with very | |
| little about connecting an existing inference | |
| infrastructure to local functions. | |
| cchance wrote: | |
| Your not spinning up a whole server lol, most MCP's can | |
| be run locally, and talked to over stdio, like their just | |
| apps that the LLM can call, what they talk to or do is up | |
| to the MCP writer, its easier to have a MCP that | |
| communicates what it can do and handles the back and | |
| forth, than writing a non-standard middleware to handle | |
| say calls to an API or handle using applescript, or | |
| vmware or something else... | |
| ijk wrote: | |
| I wish the documentation was clearer on that point; I | |
| went looking through their site and didn't see any | |
| examples that weren't oversimplified REST API calls. I | |
| imagine they might have updated it since then, or I | |
| missed something. | |
| zackify wrote: | |
| Ollama doesn't even have a way to customize the context size | |
| per model and persist it. LM studio does :) | |
| Anaphylaxis wrote: | |
| This isn't true. You can `ollama run {model}`, `/set | |
| parameter num_ctx {ctx}` and then `/save`. Recommended to | |
| `/save {model}:{ctx}` to persist on model update | |
| truemotive wrote: | |
| This can be done with custom Modelfiles as well, I was | |
| pretty bent when I found out that 2048 was the default | |
| context length. | |
| | |
| https://ollama.readthedocs.io/en/modelfile/ | |
| zackify wrote: | |
| As of 2 weeks back if I did this, it would reset back the | |
| moment cline made an api call. But lm studio would work | |
| correctly. I'll have to try again. Even confirmed cline was | |
| not overriding num context | |
| visiondude wrote: | |
| LMStudio works surprisingly well on M3 Ultra 64gb and 27b models. | |
| | |
| Nice to have a local option, especially for some prompts. | |
| squanchingio wrote: | |
| I'll be nice to have the MCP servers exposed like LMStudio | |
| OpenAI-like endpoints. | |
| patates wrote: | |
| What models are you using on LM Studio for what task and with how | |
| much memory? | |
| | |
| I have a 48GB macbook pro and Gemma3 (one of the abliterated | |
| ones) fits my non-code use case perfectly (generating crime | |
| stories which the reader tries to guess the killer). | |
| | |
| For code, I still call Google to use Gemini. | |
| robbru wrote: | |
| I've been using the Google Gemma QAT models in 4B, 12B, and 27B | |
| with LM Studio with my M1 Max. https://huggingface.co/lmstudio- | |
| community/gemma-3-12B-it-qat... | |
| t1amat wrote: | |
| I would recommend Qwen3 30B A3B for you. The MLX 4bit DWQ | |
| quants are fantastic. | |
| redman25 wrote: | |
| Qwen is great but for creative writing I think Gemma is a | |
| good choice. It has better EQ than Qwen IMO. | |
| api wrote: | |
| I wish LM Studio had a pure daemon mode. It's better than ollama | |
| in a lot of ways but I'd rather be able to use BoltAI as the UI, | |
| as well as use it from Zed and VSCode and aider. | |
| | |
| What I like about ollama is that it provides a self-hosted AI | |
| provider that can be used by a variety of things. LM Studio has | |
| that too, but you have to have the whole big chonky Electron UI | |
| running. Its UI is powerful but a lot less nice than e.g. BoltAI | |
| for casual use. | |
| SparkyMcUnicorn wrote: | |
| There's a "headless" checkbox in settings->developer | |
| diggan wrote: | |
| Still, you need to install and run the AppImage at least once | |
| to enable the "lms" cli which can later be used. Would be | |
| nice with a completely GUI-less installation/use method too. | |
| t1amat wrote: | |
| The UI is the product. If you just want the engine, use | |
| mlx-omni-server (for MLX) or llama-swap (for GGUF) and | |
| huggingface-cli (for model downloads). | |
| diggan wrote: | |
| Those don't offer the same features as LM Studio itself | |
| does, even when you don't consider the UI. If there was a | |
| "LM Engine" CLI I could install, then yeah, but there | |
| isn't, hence the need to run the UI once to get "the | |
| engine". | |
| rhet0rica wrote: | |
| Oh, that horrible Electron UI. Under Windows it pegs a core on | |
| my CPU at all times! | |
| | |
| If you're just working as a single user via the OpenAI | |
| protocol, you might want to consider koboldcpp. It bundles a | |
| GUI launcher, then starts in text-only mode. You can also tell | |
| it to just run a saved configuration, bypassing the GUI; I've | |
| successfully run it as a system service on Windows using nssm. | |
| | |
| https://github.com/LostRuins/koboldcpp/releases | |
| | |
| Though there are a lot of roleplay-centric gimmicks in its | |
| feature set, its context-shifting feature is singular. It | |
| caches the intermediate state used by your last query, | |
| extending it to build the next one. As a result you save on | |
| generation time with large contexts, and also any conversation | |
| that has been pushed out of the context window still indirectly | |
| influences the current exchange. | |
| diggan wrote: | |
| > Oh, that horrible Electron UI. Under Windows it pegs a core | |
| on my CPU at all times! | |
| | |
| Worse I'd say, considering what people use LM Studio for, is | |
| the VRAM it occupies up even when the UI and everything is | |
| idle. Somehow, it's using 500MB VRAM while doing nothing, | |
| while Firefox with ~60 active tabs is using 480MB. gnome- | |
| shell itself also sits around 450MB and is responsible for | |
| quite a bit more than LM Studio. | |
| | |
| Still, LM Studio is probably the best all-in-one GUI around | |
| for local LLM usage, unless you go terminal usage. | |
| b0a04gl wrote: | |
| claude going mcp over remote kinda normalised the protocol for | |
| inference routing. now with lmstudio running as local mcp host, | |
| you can just tunnel it (cloudflared/ngrok), drop a tiny gateway | |
| script and boom your laptop basically acts like a mcp node in | |
| hybrid mesh. short prompts hit qwen local, heavier ones go | |
| claude. with same payload and interface we can actually get | |
| multihost local inference clusters wired together by mcp | |
| politelemon wrote: | |
| The initial experience with LMStudio and MCP doesn't seem to be | |
| great, I think their docs could do with a happy path demo for | |
| newcomers. | |
| | |
| Upon installing the first model offered is google/gemma-3-12b - | |
| which in fairness is pretty decent compared to others. | |
| | |
| It's not obvious how to show the right sidebar they're talking | |
| about, it's the flask icon which turns into a collapse icon when | |
| you click it. | |
| | |
| I set the MCP up with playwright, asked it to read the top | |
| headline from HN and it got stuck on an infinite loop of | |
| navigating to Hacker News, but doing nothing with the output. | |
| | |
| I wanted to try it out with a few other models, but figuring out | |
| how to download new models isn't obvious either, it turned out to | |
| be the search icon. Anyway other models didn't fare much better | |
| either, some outright ignored the tools despite having the | |
| capacity for 'tool use'. | |
| t1amat wrote: | |
| Gemma3 models can follow instructions but were not trained to | |
| call tools, which is the backbone of MCP support. You would | |
| likely have a better experience with models from the Qwen3 | |
| family. | |
| cchance wrote: | |
| That latter issue isnt a lmstudio issue... its a model issue, | |
| Thews wrote: | |
| Others mentioned qwen3, but which works fine with HN stories | |
| for me, but the comments still trip it up and it'll start | |
| thinking the comments are part of the original question after a | |
| while. | |
| | |
| I also tried the recent deepseek 8b distill, but it was much | |
| worse for tool calling than qwen3 8b. | |
| maxcomperatore wrote: | |
| good. | |
| v3ss0n wrote: | |
| Closed source - wont touch. | |
| xyc wrote: | |
| Great to see more local AI tools supporting MCP! Recently I've | |
| also added MCP support to recurse.chat. When running locally | |
| (LLaMA.cpp and Ollama) it still needs to catch up in terms of | |
| tool calling capabilities (for example tool call accuracy / | |
| parallel tool calls) compared to the well known providers but | |
| it's starting to get pretty usable. | |
| rshemet wrote: | |
| hey! we're building Cactus (https://github.com/cactus-compute), | |
| effectively Ollama for smartphones. | |
| | |
| I'd love to learn more about your MCP implementation. Wanna | |
| chat? | |
| zaps wrote: | |
| Not to be confused with FL Studio | |
| bbno4 wrote: | |
| Is there an app that uses OpenRouter / Claude or something | |
| locally but has MCP support? | |
| eajr wrote: | |
| I've been considering building this. Havent found anything yet. | |
| cchance wrote: | |
| vscode with roocode... just use the chat window :S | |
| cedws wrote: | |
| I'm looking for something like this too. Msty is my favourite | |
| LLM UI (supports remote + local models) but unfortunately has | |
| no MCP support. It looks like they're trying to nudge people | |
| into their web SaaS offering which I have no interest in. | |
| jtreminio wrote: | |
| I've been wanting to try LM Studio but I can't figure out how to | |
| use it over local network. My desktop in the living room has the | |
| beefy GPU, but I want to use LM Studio from my laptop in bed. | |
| | |
| Any suggestions? | |
| skygazer wrote: | |
| Use an openai compatible API client on your laptop, and LM | |
| Studio on your server, and point the client to your server. LM | |
| Server can serve an LLM on a desired port using the openai | |
| style chat completion API. You can also install openwebui on | |
| your server and connect to it via a web browser, and configure | |
| it to use the LM Studio connection for its LLM. | |
| numpad0 wrote: | |
| [>_] -> [.* Settings] -> Serve on local network ( o) | |
| | |
| Any OpenAI-compatible client app should work - use IP address | |
| of host machine as API server address. API key can be bogus or | |
| blank. | |
| sixhobbits wrote: | |
| MCP terminology is already super confusing, but this seems to | |
| just introduce "MCP Host" randomly in a way that makes no sense | |
| to me at all. | |
| | |
| > "MCP Host": applications (like LM Studio or Claude Desktop) | |
| that can connect to MCP servers, and make their resources | |
| available to models. | |
| | |
| I think everyone else is calling this an "MCP Client", so I'm not | |
| sure why they would want to call themselves a host - makes it | |
| sound like they are hosting MCP servers (definitely something | |
| that people are doing, even though often the server is run on the | |
| same machine as the client), when in fact they are just a client? | |
| Or am I confused? | |
| guywhocodes wrote: | |
| MCP Host is terminology from the spec. It's the software that | |
| makes llm calls, build prompts, interprets tool call requests | |
| and performs them etc. | |
| sixhobbits wrote: | |
| So it is, I stand corrected. I googled mcp host and the | |
| lmstudio link was the first result. | |
| | |
| Some more discussion on the confusion here https://github.com | |
| /modelcontextprotocol/modelcontextprotocol... where they | |
| acknowledge that most people call it a client and that that's | |
| ok unless the distinction is important. | |
| | |
| I think host is a bad term for it though as it makes more | |
| intuitive sense for the host to host the server and the | |
| client to connect to it, especially for remote MCP servers | |
| which are probably going to become the default way of using | |
| them. | |
| kreetx wrote: | |
| I'm with you on the confusion, it makes no sense at all to | |
| call it a host. MCP host should _host_ the MCP server (yes, | |
| I know - that is yet a separate term). | |
| | |
| The MCP standard seems a mess, e.g take this paragraph from | |
| here[1] | |
| | |
| > In the Streamable HTTP transport, the server operates as | |
| an independent process that can handle multiple client | |
| connections. | |
| | |
| Yes, obviously, that is what servers do. Also, what is | |
| "Streamable HTTP"? Comet, HTTP2, or even websockets? SSE | |
| _could be_ a candidate, but it isn 't as it says | |
| "Streamable HTTP" replaces SSE. | |
| | |
| > This transport uses HTTP POST and GET requests. | |
| | |
| Guys, POST and GET are verbs for HTTP _protocol_ , TCP is | |
| the transport. I guess they could say that they use HTTP | |
| protocol, which _only_ uses POST and GET verbs (if that is | |
| the case). | |
| | |
| > Server can optionally make use of Server-Sent Events | |
| (SSE) to stream multiple server messages. | |
| | |
| This would make sense if there weren't the note "This | |
| replaces the HTTP+SSE transport" right below the title. | |
| | |
| > This permits basic MCP servers, as well as more feature- | |
| rich servers supporting streaming and server-to-client | |
| notifications and requests. | |
| | |
| Again, how is streaming implemented (what is "Streaming | |
| HTTP")?. Also, "server-to-client .. requests"? SSE is | |
| unidirectional, so those requests are happening over | |
| secondary HTTP requests? | |
| | |
| -- | |
| | |
| And then the 2.0.1 Security Warning seems like a blob of | |
| words on security, no reference to maybe same-origin. Also, | |
| "for local servers bind to localhost and then implement | |
| proper authentication" - are both of those together ever | |
| required? Is it worth it to even say that servers should | |
| implement proper authentication? | |
| | |
| Anyway, reading the entire documentation one might be able | |
| to put a charitable version of the MCP puzzle together that | |
| might actually make sense. But it does seem that it isn't | |
| written by engineers, in which case I don't understand why | |
| or to whom is this written for. | |
| | |
| [1] https://modelcontextprotocol.io/specification/draft/bas | |
| ic/tr... | |
| diggan wrote: | |
| > But it does seem that it isn't written by engineers | |
| | |
| As far as I can tell, unsurprisingly, the MCP | |
| specification was written with the help of LLMs, and | |
| seemingly hasn't been carefully reviewed because as you | |
| say, a bunch of the terms have straight up wrong | |
| definitions. | |
| kreetx wrote: | |
| Using LLMs is entirely fine, but poor review for a | |
| protocol definition is ..degenerate. Aren't protocols | |
| supposed to be precise? | |
| remram wrote: | |
| It was written by one vendor for their own use. It is | |
| miles away from an RFC or "standard" | |
| qntty wrote: | |
| It's confusing but you just have to read the official docs | |
| | |
| https://modelcontextprotocol.io/specification/2025-03-26/arc... | |
| mkagenius wrote: | |
| On M1/M2/M3 Mac, you can use Apple Containers to automate[1] the | |
| execution of the generated code. | |
| | |
| I have one running locally with this config: { | |
| "mcpServers": { "coderunner": { "url": | |
| "http://coderunner.local:8222/sse" } } | |
| } | |
| | |
| 1. CodeRunner: https://github.com/BandarLabs/coderunner (I am one | |
| of the authors) | |
| smcleod wrote: | |
| I really like LM Studio but their license / terms of use are very | |
| hostile. You're in breach if you use it for anything work related | |
| - so just be careful folks! | |
| jmetrikat wrote: | |
| great! it's very convenient to try mcp servers with local models | |
| that way. | |
| | |
| just added the `Add to LM Studio` button to the anytype mcp | |
| server, looks nice: https://github.com/anyproto/anytype-mcp | |
| b0dhimind wrote: | |
| I wonder how LM Studio and AnythingLLM contrasts especially in | |
| upcoming months... I like AnythingLLM's workflow editor. I'd like | |
| something to grow into for my doc-heavy job. Don't want to be | |
| installing and trying both. | |
___________________________________________________________________ | |
(page generated 2025-06-27 03:01 UTC) |