[HN Gopher] MCP in LM Studio

	[HN Gopher] MCP in LM Studio
	___________________________________________________________________

	MCP in LM Studio

	Author : yags
	Score : 230 points
	Date : 2025-06-25 17:27 UTC (1 days ago)

	web link (lmstudio.ai)
	w3m dump (lmstudio.ai)

	\| chisleu wrote:
	\| Just ordered a $12k mac studio w/ 512GB of integrated RAM.
	\|
	\| Can't wait for it to arrive and crank up LM Studio. It's
	\| literally the first install. I'm going to download it with
	\| safari.
	\|
	\| LM Studio is newish, and it's not a perfect interface yet, but
	\| it's fantastic at what it does which is bring local LLMs to the
	\| masses w/o them having to know much.
	\|
	\| There is another project that people should be aware of:
	\| https://github.com/exo-explore/exo
	\|
	\| Exo is this radically cool tool that automatically clusters all
	\| hosts on your network running Exo and uses their combined GPUs
	\| for increased throughput.
	\|
	\| Like HPC environments, you are going to need ultra fast
	\| interconnects, but it's just IP based.

	\| dchest wrote:
	\| I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to
	\| generate summaries and tags for my vibe-coded Bloomberg
	\| Terminal-style RSS reader :-) It works fine (the laptop gets
	\| hot and slow, but fine).
	\|
	\| Probably should just use llama.cpp server/ollama and not waste
	\| a gig of memory on Electron, but I like GUIs.

	\| minimaxir wrote:
	\| 8 GB of RAM with local LLMs in general is iffy: a 8-bit
	\| quantized Qwen3-4B is 4.2GB on disk and likely more in
	\| memory. 16 GB is usually the minimum to be able to run decent
	\| models without compromising on heavy quantization.

	\| hnuser123456 wrote:
	\| But 8GB of Apple RAM is 16GB of normal RAM.
	\|
	\| https://www.pcgamer.com/apple-vp-says-8gb-ram-on-a-
	\| macbook-p...

	\| arrty88 wrote:
	\| I concur. I just upgraded from m1 air with 8gb to m4 with
	\| 24gb. Excited to run bigger models.

	\| diggan wrote:
	\| > m4 with 24gb
	\|
	\| Wow, that is probably analogous to 48GB on other systems
	\| then, if we were to ask an Apple VP?

	\| vntok wrote:
	\| Not sure what Apple VPs have to do with the tech but
	\| yeah, pretty much any core engineer you ask at Apple will
	\| tell you this.
	\|
	\| Here is a nice article with some info about what memory
	\| compression is and how it works: https://arstechnica.com/
	\| gadgets/2013/10/os-x-10-9/#page-17
	\|
	\| It's been a hard technical problem but is pretty much
	\| solved by now since its first debut in 2012-2013.

	\| minimaxir wrote:
	\| Interestingly it was AI (Apple Intelligence) that was the
	\| primary reason Apple abandoned that hedge.

	\| dchest wrote:
	\| It's 4-bit quantized (Q4_K_M, 2.5 GB) and still works well
	\| for this task. It's amazing. I've been running various
	\| small models on this 8 GB Air since the first Llama and
	\| GPT-J, and they improved so much!
	\|
	\| macOS virtual memory works well on swapping in and out
	\| stuff to SSD.

	\| karmakaze wrote:
	\| Nice. Ironically well suited for non-Apple Intelligence.

	\| incognito124 wrote:
	\| > I'm going to download it with Safari
	\|
	\| Oof you were NOT joking

	\| noman-land wrote:
	\| Safari to download LM Studio. LM Studio to download models.
	\| Models to download Firefox.

	\| teaearlgraycold wrote:
	\| The modern ninite

	\| sneak wrote:
	\| I already got one of these. I'm spoiled by Claude 4 Opus; local
	\| LLMs are slower and lower quality.
	\|
	\| I haven't been using it much. All it has on it is LM Studio,
	\| Ollama, and Stats.app.
	\|
	\| > _Can 't wait for it to arrive and crank up LM Studio. It's
	\| literally the first install. I'm going to download it with
	\| safari._
	\|
	\| lol, yup. same.

	\| chisleu wrote:
	\| Yup, I'm spoiled by Claude 3.7 Sonnet right now. I had to
	\| stop using opus for plan mode in my Agent because it is just
	\| so expensive. I'm using Gemini 2.5 pro for that now.
	\|
	\| I'm considering ordering one of these today: https://www.newe
	\| gg.com/p/N82E16816139451?Item=N82E1681613945...
	\|
	\| It looks like it will hold 5 GPUs with a single slot open for
	\| infiniband
	\|
	\| Then local models might be lower quality, but it won't be
	\| slow! :)

	\| kristopolous wrote:
	\| The GPUs are the hard things to find unless you want to pay
	\| like 50% markup

	\| sneak wrote:
	\| That's just what they cost; MSRP is irrelevant. They're
	\| not hard to find, they're just expensive.

	\| evo_9 wrote:
	\| I was using Claude 3.7 exclusively for coding, but it sure
	\| seems like it got worse suddenly about 2-3 weeks back. It
	\| went from writing pretty solid code I had to make only
	\| minor changes to, to being completely off its rails,
	\| altering files unrelated to my prompt, undoing fixes from
	\| the same conversation, reinventing db access and ignoring
	\| existing coding 'standards' established in the existing
	\| codebase. Became so untrustworthy I finally gave OpenAi O3
	\| a try and honestly, I was pretty surprised how solid it has
	\| been. I've been using o3 since, and I find it generally
	\| does exactly what I ask, esp if you have a well established
	\| project with plenty of code for it to reference.
	\|
	\| Just wondering if Claude 3.7 has seemed differently lately
	\| for anyone else? Was my go to for several months, and I'm
	\| no fan of OpenAI, but o3 has been rock solid.

	\| jessmartin wrote:
	\| Could be the prompt and/or tool descriptions in whatever
	\| tool you are using Claude in that degraded. Have
	\| definitely noticed variance across Cursor, Claude Code,
	\| etc even with the exact same models.
	\|
	\| Prompts + tools matter.

	\| esskay wrote:
	\| Cursor became awful over the last few weeks so it's
	\| likely them, no idea what they did to their prompt but
	\| its just been incredibly poor at most tasks regardless of
	\| which model you pick.

	\| sneak wrote:
	\| Me too. (re: Claude; I haven't switched models.) It sucks
	\| because I was happily paying >$1k/mo in usage charges and
	\| then it all went south.

	\| sneak wrote:
	\| I'm firehosing about $1k/mo at Cursor on pay-as-you-go and
	\| am happy to do it (it's delivering 2-10k of value each
	\| month).
	\|
	\| What cards are you gonna put in that chassis?

	\| teaearlgraycold wrote:
	\| What are you going to do with the LLMs you run?

	\| chisleu wrote:
	\| Currently I'm using gemini 2.5 and claude 3.7 sonnet for
	\| coding tasks.
	\|
	\| I'm interested in using models for code generation, but I'm
	\| not expecting much in that regard.
	\|
	\| I'm planning to attempt fine tuning open source models on
	\| certain tool sets, especially MCP tools.

	\| prettyblocks wrote:
	\| I've been using openwebui and am pretty happy with it. Why do
	\| you like lm studio more?

	\| truemotive wrote:
	\| Open WebUI can leverage the built in web server in LM Studio,
	\| just FYI in case you thought it was primarily a chat
	\| interface.

	\| prophesi wrote:
	\| Not OP, but with LM Studio I get a chat interface out-of-the-
	\| box for local models, while with openwebui I'd need to
	\| configure it to point to an OpenAI API-compatible server
	\| (like LM Studio). It can also help determine which models
	\| will work well with your hardware.
	\|
	\| LM Studio isn't FOSS though.
	\|
	\| I did enjoy hooking up OpenWebUI to Firefox's experimental AI
	\| Chatbot. (browser.ml.chat.hideLocalhost to false,
	\| browser.ml.chat.provider to localhost:${openwebui-port})

	\| s1mplicissimus wrote:
	\| i recently tried openwebui but it was so painful to get it to
	\| run with local model. that "first run experience" of lm
	\| studio is pretty fire in comparison. can't really talk about
	\| actually working with it though, still waiting for the 8GB
	\| download

	\| prettyblocks wrote:
	\| Interesting. I run my local llms through ollama and it's
	\| zero trouble to get that working in openwebui as long as
	\| the ollama server is running.

	\| diggan wrote:
	\| I think that's the thing. Compared to LM Studio, just
	\| running Ollama (fiddling around with terminals) is more
	\| complicated than the full E2E of chatting with LM Studio.
	\|
	\| Of course, for folks used to terminals, daemons and so on
	\| it makes sense from the get go, but for others it
	\| seemingly doesn't, and it doesn't help that Ollama
	\| refuses to communicate what people should understand
	\| before trying to use it.

	\| noman-land wrote:
	\| I love LM Studio. It's a great tool. I'm waiting for another
	\| generation of Macbook Pros to do as you did :).

	\| imranq wrote:
	\| I'd love to host my own LLMs but I keep getting held back from
	\| the quality and affordability of Cloud LLMs. Why go local
	\| unless there's private data involved?

	\| mycall wrote:
	\| Offline is another use case.

	\| seanmcdirmid wrote:
	\| Nothing like playing around with LLMs on an airplane
	\| without an internet connection.

	\| asteroidburger wrote:
	\| If I can afford a seat above economy with room to
	\| actually, comfortably work on a laptop, I can afford the
	\| couple bucks for wifi for the flight.

	\| seanmcdirmid wrote:
	\| If you are assuming that your Hainan airlines flight has
	\| wifi that isn't behind the GFW, even outside of cattle
	\| class, I have some news for you...

	\| sach1 wrote:
	\| Getting around the GFW is trivially easy.

	\| seanmcdirmid wrote:
	\| ya ya, just buy a VPN, pay the yearly subscription, and
	\| then have them disappear the week after you paid. Super
	\| trivially frustrating.

	\| vntok wrote:
	\| VPN providers are first and foremost trust businesses.
	\| Why would you choose and pay one that is not well
	\| established and trusted? Mine have been there for more
	\| than a decade by now.
	\|
	\| Alternatively, you could just set up your own (cheaper?)
	\| VPN relay on the tiniest VPS you can rent on AWS or IBM
	\| Cloud, right?

	\| MangoToupe wrote:
	\| Woah there Mr Money, slow down with these assumptions. A
	\| computer is worth the investment. But paying a cent extra
	\| to airlines? Unacceptable.

	\| diggan wrote:
	\| Some of us don't have the most reliable ISPs or even
	\| network infrastructure, and I say that as someone who
	\| lives in Spain :) I live outside a huge metropolitan area
	\| and Vodafone fiber went down twice this year, not even
	\| counting the time the country's electricity grid was down
	\| for like 24 hours.

	\| PeterStuer wrote:
	\| Same. For 'sovereignty ' reasons I eventually will move to
	\| local processing, but for now in development/prototyping the
	\| gap with hosted LLM's seems too wide.

	\| diggan wrote:
	\| There are some use cases I use LLMs for where I don't care a
	\| lot about the data being private (although that's a plus) but
	\| I don't want to pay XXXEUR for classifying some data and I
	\| particularly don't want to worry about having to pay that
	\| _again_ if I want to redo it with some changes.
	\|
	\| Using local LLMs for this I don't worry about the price at
	\| all, I can leave it doing three tries per "task" without
	\| tripling the cost if I wanted to.
	\|
	\| It's true that there is an upfront cost but way easier to get
	\| over that hump than on-demand/per-token costs, at least for
	\| me.

	\| zackify wrote:
	\| I love LM studio but I'd never waste 12k like that. The memory
	\| bandwidth is too low trust me.
	\|
	\| Get the RTX Pro 6000 for 8.5k with double the bandwidth. It
	\| will be way better

	\| marci wrote:
	\| You can't run deepseek-v3/r1 on the RTX Pro 6000, not to
	\| mention the upcomming 1 million context qwen models, or the
	\| current qwen3-235b.

	\| tymscar wrote:
	\| Why would they pay 2/3 of the price for something with 1/5 of
	\| ram?
	\|
	\| The whole point of spending that much money for them is to
	\| run massive models, like the full R1, which the Pro 6000 cant

	\| zackify wrote:
	\| Because waiting forever for initial prompt processing with
	\| realistic number of MCP tools enabled on a prompt is going
	\| to suck without the most bandwidth possible
	\|
	\| And you are never going to sit around waiting for anything
	\| larger than the 96+gb of ram that the RTX pro has.
	\|
	\| If you're using it for background tasks and not coding it's
	\| a different story

	\| johndough wrote:
	\| If the MPC tools come first in the conversation, it
	\| should be technically possible to cache the activations,
	\| so you do not have to recompute them each time.

	\| pests wrote:
	\| Initial prompt processing with a large static context
	\| (system prompt + tools + whatever) could technically be
	\| improved by checkpointing the model state and reusing for
	\| future prompts. Not sure if any tools support this.

	\| tucnak wrote:
	\| https://docs.vllm.ai/projects/production-
	\| stack/en/latest/tut...

	\| storus wrote:
	\| M3 Ultra GPU is around 3070-3080 for the initial token
	\| processing. Not great, not terrible.

	\| MangoToupe wrote:
	\| > And you are never going to sit around waiting for
	\| anything larger than the 96+gb of ram that the RTX pro
	\| has.
	\|
	\| Am I the only person that gives aider instructions and
	\| leaves it alone for a few hours? This doesn't seem that
	\| difficult to integrate into my workflow.

	\| diggan wrote:
	\| > Am I the only person that gives aider instructions and
	\| leaves it alone for a few hours?
	\|
	\| Probably not, but in my experience, if it takes longer
	\| than 10-15 minutes it's either stuck in a loop or down
	\| the wrong rabbit hole. But I don't use it for vibe coding
	\| or anything "big scope" like that, but more focused
	\| changes/refactors so YMMV

	\| t1amat wrote:
	\| (Replying to both siblings questioning this)
	\|
	\| If the primary use case is input heavy, which is true of
	\| agentic tools, there's a world where partial GPU offload with
	\| many channels of DDR5 system RAM leads to an overall better
	\| experience. A good GPU will process input many times faster,
	\| and with good RAM you might end up with decent output speed
	\| still. Seems like that would come in close to $12k?
	\|
	\| And there would be no competition for models that do fit
	\| entirely inside that VRAM, for example Qwen3 32B.

	\| storus wrote:
	\| RTX Pro 6000 can't do DeepSeek R1 671B Q4, you'd need 5-6 of
	\| them, which makes it way more expensive. Moreover, MacStudio
	\| will do it at 150W whereas Pro 6000 would start at 1500W.

	\| diggan wrote:
	\| > Moreover, MacStudio will do it at 150W whereas Pro 6000
	\| would start at 1500W.
	\|
	\| No, Pro 6000 pulls max 600W, not sure where you get 1500W
	\| from, that's more than double the specification.
	\|
	\| Besides, what is the token/second or second/token, and
	\| prompt processing speed for running DeepSeek R1 671B on a
	\| Mac Studio with Q4? Curious about those numbers, because I
	\| have a feeling they're very far off each other.

	\| smcleod wrote:
	\| RTX is nice, but it's memory limited and requires to have a
	\| full desktop machine to run it in. I'd take slower inference
	\| (as long as it's not less than 15tk/s) for more memory any
	\| day!

	\| diggan wrote:
	\| I'd love to see more Very-Large-Memory Mac Studio
	\| benchmarks for prompt processing and inference. The few
	\| benchmarks I've seem either missed to take prompt
	\| processing into account, didn't share exact weights+setup
	\| that were used or showed really abysmal performance.

	\| storus wrote:
	\| If the rumors about splitting CPU/GPU in new Macs are true,
	\| your MacStudio will be the last one capable of running DeepSeek
	\| R1 671B Q4. It looks like Apple had an accidental winner that
	\| will go away with the end of unified RAM.

	\| phren0logy wrote:
	\| I have not heard this rumor. Source?

	\| prophesi wrote:
	\| I believe they're talking about the rumors by an Apple
	\| supply chain analyst, Ming-Chi Kuo.
	\|
	\| https://www.techspot.com/news/106159-apple-m5-silicon-
	\| rumore...

	\| diggan wrote:
	\| Seems Apple is waking up to the fact that if it's too
	\| easy to run weights locally, there really isn't much
	\| sense to having their own remote inference endpoints, so
	\| time to stop the party :)

	\| whatevsmate wrote:
	\| I did this a month ago and don't regret it one bit. I had a
	\| long laundry list of ML "stuff" I wanted to play with or
	\| questions to answer. There's no world in which I'm paying by
	\| the request, or token, or whatever, for hacking on fun
	\| projects. Keeping an eye on the meter is the opposite of having
	\| fun and I have absolutely nowhere I can put a loud, hot GPU
	\| (that probably has "gamer" lighting no less) in my fam's small
	\| apartment.

	\| datpuz wrote:
	\| I genuinely cannot wrap my head around spending this much money
	\| on hardware that is dramatically inferior to hardware that
	\| costs half the price. MacOS is not even great anymore, they
	\| stopped improving their UX like a decade ago.

	\| minimaxir wrote:
	\| LM Studio has quickly become the best way to run local LLMs on an
	\| Apple Silicon Mac: no offense to vllm/ollama and other terminal-
	\| based approaches, but LLMs have _many_ levers for tweaking output
	\| and sometimes you need a UI to manage it. Now that LM Studio
	\| supports MLX models, it 's one of the most efficient too.
	\|
	\| I'm not bullish on MCP, but at the least this approach gives a
	\| good way to experiment with it for free.

	\| nix0n wrote:
	\| LM Studio is quite good on Windows with Nvidia RTX also.

	\| boredemployee wrote:
	\| care to elaborate? i have rtx 4070 12gb vram + 64gb ram, i
	\| wonder what models I can run with it. Anything useful?

	\| nix0n wrote:
	\| LM Studio's model search is pretty good at showing what
	\| models will fit in your VRAM.
	\|
	\| For my 16gb of VRAM, those models do not include anything
	\| that's good at coding, even when I provide the API
	\| documents via PDF upload (another thing that LM Studio
	\| makes easy).
	\|
	\| So, not really, but LM Studio at least makes it easier to
	\| find that out.

	\| boredemployee wrote:
	\| ok, ty for the reply!

	\| pzo wrote:
	\| I just wish they did some facelifting of UI. Right now is too
	\| colorfull for me and many different shades of similar colors. I
	\| wish they copy some color pallet from google ai studio or from
	\| trae or pycharm.

	\| chisleu wrote:
	\| > I'm not bullish on MCP
	\|
	\| You gotta help me out. What do you see holding it back?

	\| minimaxir wrote:
	\| tl;dr the current hype around it is a solution looking for a
	\| problem and at a high level, it's just a rebrand of the Tools
	\| paradigm.

	\| mhast wrote:
	\| It's "Tools as a service", so it's really trying to make
	\| tool calling easier to use.

	\| ijk wrote:
	\| Near as I can tell it's supposed to make _calling other
	\| people 's_ tools easier. But I don't want to spin up an
	\| entire server to invoke a calculator. So far it seems to
	\| make _building_ my own local tools harder, unless there
	\| 's some guidebook I'm missing.

	\| xyc wrote:
	\| It's a protocol that doesn't dictate how you are calling
	\| the tool. You can use in-memory transport without needing
	\| to spin up a server. Your tool can just be a function,
	\| but with the flexibility of serving to other clients.

	\| ijk wrote:
	\| Are there any examples of that? All the documentation I
	\| saw seemed to be about building an MCP server, with very
	\| little about connecting an existing inference
	\| infrastructure to local functions.

	\| cchance wrote:
	\| Your not spinning up a whole server lol, most MCP's can
	\| be run locally, and talked to over stdio, like their just
	\| apps that the LLM can call, what they talk to or do is up
	\| to the MCP writer, its easier to have a MCP that
	\| communicates what it can do and handles the back and
	\| forth, than writing a non-standard middleware to handle
	\| say calls to an API or handle using applescript, or
	\| vmware or something else...

	\| ijk wrote:
	\| I wish the documentation was clearer on that point; I
	\| went looking through their site and didn't see any
	\| examples that weren't oversimplified REST API calls. I
	\| imagine they might have updated it since then, or I
	\| missed something.

	\| zackify wrote:
	\| Ollama doesn't even have a way to customize the context size
	\| per model and persist it. LM studio does :)

	\| Anaphylaxis wrote:
	\| This isn't true. You can `ollama run {model}`, `/set
	\| parameter num_ctx {ctx}` and then `/save`. Recommended to
	\| `/save {model}:{ctx}` to persist on model update

	\| truemotive wrote:
	\| This can be done with custom Modelfiles as well, I was
	\| pretty bent when I found out that 2048 was the default
	\| context length.
	\|
	\| https://ollama.readthedocs.io/en/modelfile/

	\| zackify wrote:
	\| As of 2 weeks back if I did this, it would reset back the
	\| moment cline made an api call. But lm studio would work
	\| correctly. I'll have to try again. Even confirmed cline was
	\| not overriding num context

	\| visiondude wrote:
	\| LMStudio works surprisingly well on M3 Ultra 64gb and 27b models.
	\|
	\| Nice to have a local option, especially for some prompts.

	\| squanchingio wrote:
	\| I'll be nice to have the MCP servers exposed like LMStudio
	\| OpenAI-like endpoints.

	\| patates wrote:
	\| What models are you using on LM Studio for what task and with how
	\| much memory?
	\|
	\| I have a 48GB macbook pro and Gemma3 (one of the abliterated
	\| ones) fits my non-code use case perfectly (generating crime
	\| stories which the reader tries to guess the killer).
	\|
	\| For code, I still call Google to use Gemini.

	\| robbru wrote:
	\| I've been using the Google Gemma QAT models in 4B, 12B, and 27B
	\| with LM Studio with my M1 Max. https://huggingface.co/lmstudio-
	\| community/gemma-3-12B-it-qat...

	\| t1amat wrote:
	\| I would recommend Qwen3 30B A3B for you. The MLX 4bit DWQ
	\| quants are fantastic.

	\| redman25 wrote:
	\| Qwen is great but for creative writing I think Gemma is a
	\| good choice. It has better EQ than Qwen IMO.

	\| api wrote:
	\| I wish LM Studio had a pure daemon mode. It's better than ollama
	\| in a lot of ways but I'd rather be able to use BoltAI as the UI,
	\| as well as use it from Zed and VSCode and aider.
	\|
	\| What I like about ollama is that it provides a self-hosted AI
	\| provider that can be used by a variety of things. LM Studio has
	\| that too, but you have to have the whole big chonky Electron UI
	\| running. Its UI is powerful but a lot less nice than e.g. BoltAI
	\| for casual use.

	\| SparkyMcUnicorn wrote:
	\| There's a "headless" checkbox in settings->developer

	\| diggan wrote:
	\| Still, you need to install and run the AppImage at least once
	\| to enable the "lms" cli which can later be used. Would be
	\| nice with a completely GUI-less installation/use method too.

	\| t1amat wrote:
	\| The UI is the product. If you just want the engine, use
	\| mlx-omni-server (for MLX) or llama-swap (for GGUF) and
	\| huggingface-cli (for model downloads).

	\| diggan wrote:
	\| Those don't offer the same features as LM Studio itself
	\| does, even when you don't consider the UI. If there was a
	\| "LM Engine" CLI I could install, then yeah, but there
	\| isn't, hence the need to run the UI once to get "the
	\| engine".

	\| rhet0rica wrote:
	\| Oh, that horrible Electron UI. Under Windows it pegs a core on
	\| my CPU at all times!
	\|
	\| If you're just working as a single user via the OpenAI
	\| protocol, you might want to consider koboldcpp. It bundles a
	\| GUI launcher, then starts in text-only mode. You can also tell
	\| it to just run a saved configuration, bypassing the GUI; I've
	\| successfully run it as a system service on Windows using nssm.
	\|
	\| https://github.com/LostRuins/koboldcpp/releases
	\|
	\| Though there are a lot of roleplay-centric gimmicks in its
	\| feature set, its context-shifting feature is singular. It
	\| caches the intermediate state used by your last query,
	\| extending it to build the next one. As a result you save on
	\| generation time with large contexts, and also any conversation
	\| that has been pushed out of the context window still indirectly
	\| influences the current exchange.

	\| diggan wrote:
	\| > Oh, that horrible Electron UI. Under Windows it pegs a core
	\| on my CPU at all times!
	\|
	\| Worse I'd say, considering what people use LM Studio for, is
	\| the VRAM it occupies up even when the UI and everything is
	\| idle. Somehow, it's using 500MB VRAM while doing nothing,
	\| while Firefox with ~60 active tabs is using 480MB. gnome-
	\| shell itself also sits around 450MB and is responsible for
	\| quite a bit more than LM Studio.
	\|
	\| Still, LM Studio is probably the best all-in-one GUI around
	\| for local LLM usage, unless you go terminal usage.

	\| b0a04gl wrote:
	\| claude going mcp over remote kinda normalised the protocol for
	\| inference routing. now with lmstudio running as local mcp host,
	\| you can just tunnel it (cloudflared/ngrok), drop a tiny gateway
	\| script and boom your laptop basically acts like a mcp node in
	\| hybrid mesh. short prompts hit qwen local, heavier ones go
	\| claude. with same payload and interface we can actually get
	\| multihost local inference clusters wired together by mcp

	\| politelemon wrote:
	\| The initial experience with LMStudio and MCP doesn't seem to be
	\| great, I think their docs could do with a happy path demo for
	\| newcomers.
	\|
	\| Upon installing the first model offered is google/gemma-3-12b -
	\| which in fairness is pretty decent compared to others.
	\|
	\| It's not obvious how to show the right sidebar they're talking
	\| about, it's the flask icon which turns into a collapse icon when
	\| you click it.
	\|
	\| I set the MCP up with playwright, asked it to read the top
	\| headline from HN and it got stuck on an infinite loop of
	\| navigating to Hacker News, but doing nothing with the output.
	\|
	\| I wanted to try it out with a few other models, but figuring out
	\| how to download new models isn't obvious either, it turned out to
	\| be the search icon. Anyway other models didn't fare much better
	\| either, some outright ignored the tools despite having the
	\| capacity for 'tool use'.

	\| t1amat wrote:
	\| Gemma3 models can follow instructions but were not trained to
	\| call tools, which is the backbone of MCP support. You would
	\| likely have a better experience with models from the Qwen3
	\| family.

	\| cchance wrote:
	\| That latter issue isnt a lmstudio issue... its a model issue,

	\| Thews wrote:
	\| Others mentioned qwen3, but which works fine with HN stories
	\| for me, but the comments still trip it up and it'll start
	\| thinking the comments are part of the original question after a
	\| while.
	\|
	\| I also tried the recent deepseek 8b distill, but it was much
	\| worse for tool calling than qwen3 8b.

	\| maxcomperatore wrote:
	\| good.

	\| v3ss0n wrote:
	\| Closed source - wont touch.

	\| xyc wrote:
	\| Great to see more local AI tools supporting MCP! Recently I've
	\| also added MCP support to recurse.chat. When running locally
	\| (LLaMA.cpp and Ollama) it still needs to catch up in terms of
	\| tool calling capabilities (for example tool call accuracy /
	\| parallel tool calls) compared to the well known providers but
	\| it's starting to get pretty usable.

	\| rshemet wrote:
	\| hey! we're building Cactus (https://github.com/cactus-compute),
	\| effectively Ollama for smartphones.
	\|
	\| I'd love to learn more about your MCP implementation. Wanna
	\| chat?

	\| zaps wrote:
	\| Not to be confused with FL Studio

	\| bbno4 wrote:
	\| Is there an app that uses OpenRouter / Claude or something
	\| locally but has MCP support?

	\| eajr wrote:
	\| I've been considering building this. Havent found anything yet.

	\| cchance wrote:
	\| vscode with roocode... just use the chat window :S

	\| cedws wrote:
	\| I'm looking for something like this too. Msty is my favourite
	\| LLM UI (supports remote + local models) but unfortunately has
	\| no MCP support. It looks like they're trying to nudge people
	\| into their web SaaS offering which I have no interest in.

	\| jtreminio wrote:
	\| I've been wanting to try LM Studio but I can't figure out how to
	\| use it over local network. My desktop in the living room has the
	\| beefy GPU, but I want to use LM Studio from my laptop in bed.
	\|
	\| Any suggestions?

	\| skygazer wrote:
	\| Use an openai compatible API client on your laptop, and LM
	\| Studio on your server, and point the client to your server. LM
	\| Server can serve an LLM on a desired port using the openai
	\| style chat completion API. You can also install openwebui on
	\| your server and connect to it via a web browser, and configure
	\| it to use the LM Studio connection for its LLM.

	\| numpad0 wrote:
	\| [>_] -> [.* Settings] -> Serve on local network ( o)
	\|
	\| Any OpenAI-compatible client app should work - use IP address
	\| of host machine as API server address. API key can be bogus or
	\| blank.

	\| sixhobbits wrote:
	\| MCP terminology is already super confusing, but this seems to
	\| just introduce "MCP Host" randomly in a way that makes no sense
	\| to me at all.
	\|
	\| > "MCP Host": applications (like LM Studio or Claude Desktop)
	\| that can connect to MCP servers, and make their resources
	\| available to models.
	\|
	\| I think everyone else is calling this an "MCP Client", so I'm not
	\| sure why they would want to call themselves a host - makes it
	\| sound like they are hosting MCP servers (definitely something
	\| that people are doing, even though often the server is run on the
	\| same machine as the client), when in fact they are just a client?
	\| Or am I confused?

	\| guywhocodes wrote:
	\| MCP Host is terminology from the spec. It's the software that
	\| makes llm calls, build prompts, interprets tool call requests
	\| and performs them etc.

	\| sixhobbits wrote:
	\| So it is, I stand corrected. I googled mcp host and the
	\| lmstudio link was the first result.
	\|
	\| Some more discussion on the confusion here https://github.com
	\| /modelcontextprotocol/modelcontextprotocol... where they
	\| acknowledge that most people call it a client and that that's
	\| ok unless the distinction is important.
	\|
	\| I think host is a bad term for it though as it makes more
	\| intuitive sense for the host to host the server and the
	\| client to connect to it, especially for remote MCP servers
	\| which are probably going to become the default way of using
	\| them.

	\| kreetx wrote:
	\| I'm with you on the confusion, it makes no sense at all to
	\| call it a host. MCP host should _host_ the MCP server (yes,
	\| I know - that is yet a separate term).
	\|
	\| The MCP standard seems a mess, e.g take this paragraph from
	\| here[1]
	\|
	\| > In the Streamable HTTP transport, the server operates as
	\| an independent process that can handle multiple client
	\| connections.
	\|
	\| Yes, obviously, that is what servers do. Also, what is
	\| "Streamable HTTP"? Comet, HTTP2, or even websockets? SSE
	\| _could be_ a candidate, but it isn 't as it says
	\| "Streamable HTTP" replaces SSE.
	\|
	\| > This transport uses HTTP POST and GET requests.
	\|
	\| Guys, POST and GET are verbs for HTTP _protocol_ , TCP is
	\| the transport. I guess they could say that they use HTTP
	\| protocol, which _only_ uses POST and GET verbs (if that is
	\| the case).
	\|
	\| > Server can optionally make use of Server-Sent Events
	\| (SSE) to stream multiple server messages.
	\|
	\| This would make sense if there weren't the note "This
	\| replaces the HTTP+SSE transport" right below the title.
	\|
	\| > This permits basic MCP servers, as well as more feature-
	\| rich servers supporting streaming and server-to-client
	\| notifications and requests.
	\|
	\| Again, how is streaming implemented (what is "Streaming
	\| HTTP")?. Also, "server-to-client .. requests"? SSE is
	\| unidirectional, so those requests are happening over
	\| secondary HTTP requests?
	\|
	\| --
	\|
	\| And then the 2.0.1 Security Warning seems like a blob of
	\| words on security, no reference to maybe same-origin. Also,
	\| "for local servers bind to localhost and then implement
	\| proper authentication" - are both of those together ever
	\| required? Is it worth it to even say that servers should
	\| implement proper authentication?
	\|
	\| Anyway, reading the entire documentation one might be able
	\| to put a charitable version of the MCP puzzle together that
	\| might actually make sense. But it does seem that it isn't
	\| written by engineers, in which case I don't understand why
	\| or to whom is this written for.
	\|
	\| [1] https://modelcontextprotocol.io/specification/draft/bas
	\| ic/tr...

	\| diggan wrote:
	\| > But it does seem that it isn't written by engineers
	\|
	\| As far as I can tell, unsurprisingly, the MCP
	\| specification was written with the help of LLMs, and
	\| seemingly hasn't been carefully reviewed because as you
	\| say, a bunch of the terms have straight up wrong
	\| definitions.

	\| kreetx wrote:
	\| Using LLMs is entirely fine, but poor review for a
	\| protocol definition is ..degenerate. Aren't protocols
	\| supposed to be precise?

	\| remram wrote:
	\| It was written by one vendor for their own use. It is
	\| miles away from an RFC or "standard"

	\| qntty wrote:
	\| It's confusing but you just have to read the official docs
	\|
	\| https://modelcontextprotocol.io/specification/2025-03-26/arc...

	\| mkagenius wrote:
	\| On M1/M2/M3 Mac, you can use Apple Containers to automate[1] the
	\| execution of the generated code.
	\|
	\| I have one running locally with this config: {
	\| "mcpServers": { "coderunner": { "url":
	\| "http://coderunner.local:8222/sse" } }
	\| }
	\|
	\| 1. CodeRunner: https://github.com/BandarLabs/coderunner (I am one
	\| of the authors)

	\| smcleod wrote:
	\| I really like LM Studio but their license / terms of use are very
	\| hostile. You're in breach if you use it for anything work related
	\| - so just be careful folks!

	\| jmetrikat wrote:
	\| great! it's very convenient to try mcp servers with local models
	\| that way.
	\|
	\| just added the `Add to LM Studio` button to the anytype mcp
	\| server, looks nice: https://github.com/anyproto/anytype-mcp

	\| b0dhimind wrote:
	\| I wonder how LM Studio and AnythingLLM contrasts especially in
	\| upcoming months... I like AnythingLLM's workflow editor. I'd like
	\| something to grow into for my doc-heavy job. Don't want to be
	\| installing and trying both.

	___________________________________________________________________
	(page generated 2025-06-27 03:01 UTC)