| [HN Gopher] Let's be honest, Generative AI isn't going all that ... | |
| ___________________________________________________________________ | |
| Let's be honest, Generative AI isn't going all that well | |
| Author : 7777777phil | |
| Score : 182 points | |
| Date : 2026-01-13 18:37 UTC (17 hours ago) | |
| web link (garymarcus.substack.com) | |
| w3m dump (garymarcus.substack.com) | |
| | daedrdev wrote: | |
| | This post is literally just 4 screenshots of articles, not even | |
| | its own commentary or discussion. | |
| | laughingcurve wrote: | |
| | Don't be too harsh, it's the most effort Gary has put into his | |
| | criticism in a while </s> | |
| | | |
| | I appreciate good critique but this is not it | |
| | sghiassy wrote: | |
| | LLMs help me read code 10x faster - I'll take the win and say | |
| | thanks | |
| | thechao wrote: | |
| | You're absolutely right! | |
| | | |
| | The irony of a five sentence article making giant claims isn't | |
| | lost on me. Don't get me wrong: I'm amenable to the _idea_ ; but, | |
| | y'know, my kids wrote longer essays in 4th grade. | |
| | emp17344 wrote: | |
| | Guessing this isn't going to be popular here, but he's right. AI | |
| | has some use cases, but isn't the world-changing paradigm shift | |
| | it's marketed as. It's becoming clear the tech is ultimately just | |
| | a tool, not a precursor to AGI. | |
| | teej wrote: | |
| | Is that the claim the OP is making? | |
| | sajithdilshan wrote: | |
| | not YET. | |
| | avaer wrote: | |
| | If AGI is ever going to happen, then it's definitionally a | |
| | precursor to it. | |
| | | |
| | So I'm not really sure how to parse your statement. | |
| | alex_young wrote: | |
| | I'm not sure I follow. What if LLMs are helpful but not | |
| | useful to AGI, but some other technology is? Seems likely. | |
| | avaer wrote: | |
| | The comment wasn't referencing LLMs, but generative AI. | |
| | | |
| | Even then, given the deep impact of LLMs and how many | |
| | people are using them already, it's a stretch to say LLMs | |
| | will have no effect on the development of AGI. | |
| | | |
| | I think it's pretty obvious that AGI requires something | |
| | more than LLMs, but I think it's equally obvious LLMs will | |
| | have been involved in its development somewhere, even if | |
| | just a stepping stone. So, a "precursor". | |
| | mattmaroon wrote: | |
| | Meanwhile, my cofounder is rewriting code we spent millions of | |
| | salary on in the past by himself in a few weeks. | |
| | | |
| | I myself am saving a small fortune on design and photography and | |
| | getting better results while doing it. | |
| | | |
| | If this is not all that well I can't wait until we get to | |
| | mediocre! | |
| | merlincorey wrote: | |
| | > Meanwhile, my cofounder is rewriting code we spent millions | |
| | of salary on in the past by himself in a few weeks. | |
| | | |
| | Code is not an asset it's a liability, and code that no one has | |
| | reviewed is even more of a liability. | |
| | | |
| | However, in the end, execution is all that matters so if you | |
| | and your cofounder are able to execute successfully with | |
| | mountains of generated code then it doesn't matter what assets | |
| | and liabilities you hold in the short term. | |
| | | |
| | The long term is a lot harder to predict in any case. | |
| | _vertigo wrote: | |
| | > Code is not an asset it's a liability, and code that no one | |
| | has reviewed is even more of a liability. | |
| | | |
| | Code that solves problems and makes you money is by | |
| | definition an asset. Whether or not the code in question does | |
| | those things remains to be seen, but code is not strictly a | |
| | liability or else no one would write it. | |
| | merlincorey wrote: | |
| | "Code is a liability. What the code does for you is an | |
| | asset." as quoted from | |
| | https://wiki.c2.com/?SoftwareAsLiability with Last edit | |
| | December 17, 2013. | |
| | | |
| | This discussion and distinction used to be well known, but | |
| | I'm happy to help some people become "one of today's lucky | |
| | 10,000" as quoted from https://xkcd.com/1053/ because it is | |
| | indeed much more interesting than the alternative approach. | |
| | sswatson wrote: | |
| | It's well known and also wrong. | |
| | | |
| | Delta's airplanes also require a great deal of | |
| | maintenance, and I'm sure they strive to have no more | |
| | than are necessary for their objectives. But if you talk | |
| | to one of Delta's accountants, they will be happy to | |
| | disabuse you of the notion that the planes are entered in | |
| | the books as a liability. | |
| | kortilla wrote: | |
| | Delta leases a big portion of its fleet, which makes your | |
| | example pretty bad. | |
| | simonsmithies wrote: | |
| | Not a terrible example. The planes delta owns are delta's | |
| | assets; the planes the leasing company owns are the | |
| | leasing company's assets. The point is, the code and the | |
| | planes are assets despite the maintenance required to | |
| | keep them in revenue-generating state. | |
| | OneMorePerson wrote: | |
| | It's possible for something to be both an asset and a | |
| | potential liability, it isn't strictly one or the other. | |
| | _heimdall wrote: | |
| | You're hinting at the underlying problem with the quote. | |
| | "Asset" in the quote reads, at least to me, in the | |
| | financial or accounting meaning of the term. "Liability" | |
| | reads, again to me, in the sense of potential risk rather | |
| | than the financial meaning. Its apples and oranges. | |
| | Ygg2 wrote: | |
| | Liability is also an economic term. As in, "The bank's | |
| | assets (debt) are my liability, and my assets (house) are | |
| | the bank's liability." | |
| | | |
| | I don't think it's a wrong quote. Code's behavior is the | |
| | asset, and code's source is the liability. You want to | |
| | achieve maximum functionality for minimal source code | |
| | investment. | |
| | hshdhdhj4444 wrote: | |
| | If Delta was going bankrupt it would likely be able to | |
| | sell individual planes for the depreciated book value or | |
| | close to it. | |
| | | |
| | If a software company is going bankrupt, it's very | |
| | unlikely they will be able to sell code for individual | |
| | apps and services they may have written for much at all, | |
| | even if they might be able to sell the whole company for | |
| | something. | |
| | wouldbecouldbe wrote: | |
| | Developers that can't see the change are blind. | |
| | | |
| | Just this week, sun-tue. I added a fully functional | |
| | subscription model to an existing platform, build out a bulk | |
| | async elasticjs indexing for a huge database and migrated a | |
| | very large Wordpress website to NextJS. 2.5 days, would have | |
| | cost me at least a month 2 years ago. | |
| | fxtentacle wrote: | |
| | To me, this sounds like: | |
| | | |
| | AI is helping me solve all the issues that using AI has | |
| | caused. | |
| | | |
| | Wordpress has a pretty good export and Markdown is widely | |
| | supported. If you estimate 1 month of work to get that into | |
| | NextJS, then maybe the latter is not a suitable choice. | |
| | tengbretson wrote: | |
| | To me, this sounds like: | |
| | | |
| | If AI was good at a certain task then it was a bad task | |
| | in the first place. | |
| | | |
| | Which is just run of the mill dogmatic thinking. | |
| | serf wrote: | |
| | it's wild that somehow with regards to AI conversations | |
| | lately someone can say "I saved 3 months doing X" and | |
| | someone can willfully and thoughtfully reply "No you | |
| | didn't , you're wrong." without hesitation. | |
| | | |
| | I feel bad for AI opponents mostly because it seems like | |
| | the drive to be against the thing is stronger than the | |
| | drive towards fact or even kindness. | |
| | | |
| | My .02c: I am saving months of efforts using AI tools to | |
| | fix old (PRE-AI, PREHISTORIC!) codebases that have | |
| | literally zero AI technical debt associated to them. | |
| | | |
| | I'm not going to bother with the charts & stats, you'll | |
| | just have to trust me and my opinion like humans must do | |
| | in lots of cases. I have lots of sharp knives in my | |
| | kitchen, too -- but I don't want to have to go slice my | |
| | hands on every one to prove to strangers that they are | |
| | indeed sharp -- you'll just have to take my word. | |
| | jbgt wrote: | |
| | Slice THEIR hands. They might say yours are rigged. | |
| | | |
| | I'm a non dev and the things I'm building blow me away. I | |
| | think many of these people criticizing are perhaps more | |
| | on the execution side and have a legitimate craft they | |
| | are protecting. | |
| | | |
| | If you're more on the managerial side, and I'd say a | |
| | trusting manager not a show me your work kind, then | |
| | you're more likely to be open and results oriented. | |
| | array_key_first wrote: | |
| | From a developer POV, or at least _my_ developer POV, | |
| | less code is always better. The best code is no code at | |
| | all. | |
| | | |
| | I think getting results can be very easy, at first. But I | |
| | force myself to not just spit out code, because I've been | |
| | burned so, so, so many times by that. | |
| | | |
| | As software grows, the complexity explodes. It's not | |
| | linear like the growth of the software itself, it feels | |
| | exponential. Adding one feature takes 100x the time it | |
| | should because everything is just squished together and | |
| | barely working. Poorly designed systems eventually bring | |
| | velocity to a halt, and you can eventually reach a point | |
| | where even the most trivial of changes are close to | |
| | impossible. | |
| | | |
| | That being said, there is value in throwaway code. After | |
| | all, what is an Excel workbook if not throwaway code? But | |
| | never let the throwaway become a product, or grow too | |
| | big. Otherwise, you become a prisoner. That cheeky little | |
| | Excel workbook can turn into a full-blown backend | |
| | application sitting on a share drive, and it WILL take | |
| | you a decade to migrate off of it. | |
| | mycall wrote: | |
| | You can use AI to simplify software stacks too, only your | |
| | imagination limits you. How do you see things working | |
| | with many less abstraction layers? | |
| | | |
| | I remember coding BASIC with POKE/PEEK assembly inside | |
| | it, same with Turbo Pascal with assembly (C/C++ has | |
| | similar extern abilities). Perhaps you want no more web | |
| | or UI (TUI?). Once you imagine what you are looking for, | |
| | you can label it and go from there. | |
| | wouldbecouldbe wrote: | |
| | yeah AI is perfect at refactor and cleaning things up, | |
| | you just have to instruct it. I've improved my code | |
| | significanlty by asking it to clean up, refactor function | |
| | to pure that I can use & test over a messy application. | |
| | Without creating new bugs. | |
| | wouldbecouldbe wrote: | |
| | You are assuming a lot of things. | |
| | | |
| | The work was moving the many landing pages & content | |
| | elements to NextJS, so we can test, iterate and develop | |
| | faster. While having a more stable system. This was a 10 | |
| | year old website, with a very large custom WordPress | |
| | codebase and many plugins. | |
| | | |
| | The content is still in WordPress backend & will be | |
| | migrated in the second phase. | |
| | Zababa wrote: | |
| | >Code is not an asset it's a liability | |
| | | |
| | This would imply companies could delete all their code and do | |
| | better, which doesn't seem true? | |
| | aprdm wrote: | |
| | lol same. I just wrote a bunch of diagrams with mermaid that | |
| | would legit take me a week, also did a mock of an UI for a | |
| | frontend engineer that would take me another week to do .. or | |
| | some designers. All of that in between meetings... | |
| | | |
| | Waiting for it to actually go well to see what else I can do ! | |
| | nonethewiser wrote: | |
| | The more I have this experience and read people maligning AI | |
| | for coding, the more I think the junior developers are | |
| | actually not the ones in danger. | |
| | daxfohl wrote: | |
| | Oh I've thought this for years. As an L7, basically my | |
| | primary role is to serve as someone to bounce ideas off of, | |
| | and to make recommendations based on experience. A chatbot, | |
| | with its virtually infinite supply of experience, could | |
| | ostensibly replace my role way sooner than it could a solid | |
| | junior/mid-level coder. The main thing it needs is a | |
| | consistent vision and direction that aligns with the needs | |
| | of nearby teams, which frankly sounds not all that hard to | |
| | write in code (I've been considering doing this). | |
| | | |
| | Probably the biggest gap would be the ability to ignite, | |
| | drive, and launch new initiatives. How does an AI agent | |
| | "lead" an engineering team? That's not something you can | |
| | code up in an agent runtime. It'd require a whole culture | |
| | change that I have a hard time seeing in reality. But of | |
| | course if there comes a point where AI takes all the junior | |
| | and mid-level coding jobs, then at that point there's no | |
| | culture to change, so staff/principal jobs would be just as | |
| | at risk. | |
| | TACIXAT wrote: | |
| | I have the complete opposite impression w.r.t. | |
| | architecture decisions. The LLMs can cargo cult an | |
| | existing design, but they do not think through design | |
| | consequences well at all. I use them as a rubber duck | |
| | non-stop, but I think I respect less than one out of | |
| | every six of their suggestions. | |
| | daxfohl wrote: | |
| | They've gotten pretty good IME so long as you guide it to | |
| | think out of the box, give it the right level of | |
| | background info, have it provide alternatives instead of | |
| | recommendations, and do your best not to bias it in any | |
| | particular direction. | |
| | | |
| | That said, the thing it really struggles with is when the | |
| | best approach is "do nothing". Which, given that a huge | |
| | chunk of principal level work is in deciding what NOT to | |
| | do, it may be a while before LLMs can viably take that | |
| | role. A principal LLM based on current tech would approve | |
| | every idea that comes across it, and moreover sell each | |
| | of them as "the exact best thing needed by the | |
| | organization right now!" | |
| | XenophileJKO wrote: | |
| | Knowing when to nudge it out of a rut (or say skip it) is | |
| | probably the biggest current skill. This is why | |
| | experienced people get generally much better results. | |
| | code_martial wrote: | |
| | I'm not sure. I keep asking the LLMs whether I should | |
| | rewrite project X in language Y and it just asks back, | |
| | "what's your problem?" And most of the times it shoots my | |
| | problems down showing exactly why rewriting won't fix | |
| | that particular problem. Heck, it even quoted Joel | |
| | Spolsky once! | |
| | | |
| | Of course, I could just _tell_ it to rewrite, but that's | |
| | different. | |
| | wombat-man wrote: | |
| | I have been able to prototype way faster. I can explain how I | |
| | want a prototype reworked and it's often successful. Doesn't | |
| | always work, but super useful more often than not. | |
| | windowpains wrote: | |
| | That line on the chart labeled "profit" is really going to go | |
| | up now! | |
| | segfaultex wrote: | |
| | Sounds like an argument for better hiring practices and | |
| | planning. | |
| | | |
| | Producing a lot of code isn't proof of anything. | |
| | sheeh wrote: | |
| | Yep. Let's see the projects and more importantly the | |
| | incremental returns... | |
| | fzeroracer wrote: | |
| | > Meanwhile, my cofounder is rewriting code we spent millions | |
| | of salary on in the past by himself in a few weeks. | |
| | | |
| | This is one of those statements that would horrify any halfway | |
| | competent engineer. A cowboy coder going in, seeing a bunch of | |
| | code and going 'I should rewrite this' is one of the biggest | |
| | liabilities to any stable system. | |
| | hactually wrote: | |
| | I assume this is because they're already insanely profitable | |
| | after hitting PMF and are now trying to bring down infra | |
| | costs? | |
| | | |
| | Right? RIGHT?! | |
| | habinero wrote: | |
| | Every professional SWE is going to stare off into the middle | |
| | distance, as they flashback to some PM or VP deciding to show | |
| | everyone they still got it. | |
| | | |
| | The "how hard could it be" fallacy claims another! | |
| | iwontberude wrote: | |
| | Definitely been in that room multiple times. | |
| | sheeh wrote: | |
| | As someone who is more involved in shaping the product | |
| | direction rather than engineering what composes the product | |
| | - I will readily admit many product people are utterly, | |
| | utterly clueless. | |
| | | |
| | Most people have no clue the craftsmanship, work etc it | |
| | takes to create a great product. LLMs are not going to | |
| | change this, in fact they serve as a distraction. | |
| | | |
| | I'm not a SWE so I gain nothing by being bearish on the | |
| | contributions of LLMs to the real economy ;) | |
| | habinero wrote: | |
| | Oh, it wasn't a bash on product people, I'm sorry if it | |
| | came off that way. | |
| | | |
| | It's a reference to a trope where the VP of Eng or CTO | |
| | (who was an engineer decades ago) gets it in their head | |
| | that they want to code again and writes something | |
| | absolute dogshit terrible because their skills have | |
| | degraded. Unfortunately they are your boss's boss's boss | |
| | and can make you deal with it anyways. | |
| | | |
| | I've actually seen it IRL once, to his credit the dude | |
| | finally realized the engineer smiles were pained grimaces | |
| | and it got quietly dropped lol. | |
| | bonesss wrote: | |
| | LLMs do the jobs of developers, thereby eating up countless | |
| | jobs. | |
| | | |
| | LLMs do the jobs of developers without telling semi- | |
| | technical arrogant MBA holders " _no, you're dumb_ ", | |
| | thereby creating all the same jobs as before but also a | |
| | butt-ton more juggling expensive cleanup mixed with ego- | |
| | massaging. | |
| | | |
| | We're talking a 2-10x improvement in 'how hard could it | |
| | be?' iterations. Consultant candy. | |
| | mattmaroon wrote: | |
| | My cofounder is an all the way competent engineer. Making | |
| | this many assumptions would horrify someone halfway competent | |
| | with logic though. | |
| | phito wrote: | |
| | It's crazy how some people here will just make all the | |
| | assumptions possible in order to refuse to believe you. | |
| | Anyone who's used a good model with open code or equivalent | |
| | will know that it's plausible. Refactoring is really cheap | |
| | now when paired with someone competent. | |
| | | |
| | I'm doing the same as your co-founder currently. In a few | |
| | days, I've rewritten old code that took previous employees | |
| | months to do. Their implementation sucked and barely | |
| | worked, the new one is so much better and has tests to | |
| | prove it. | |
| | nsoonhui wrote: | |
| | It's not directly comparable. The first time writing the code | |
| | is always the hardest because you might have to figure out the | |
| | requirements along the way. When you have the initial system | |
| | running for a while, doing a second one is easier because all | |
| | the requirements kinks are figured out. | |
| | | |
| | By the way, why does your co-founder have to do the rewrite at | |
| | all? | |
| | el_benhameen wrote: | |
| | I find the opposite to be true. Once you know the problem | |
| | you're trying to solve (which admittedly can be the biggest | |
| | lift), writing the fist cut of the code is fun, and you can | |
| | design the system and set precedent however you want. Once | |
| | it's in the wild, you have to work within the consequences of | |
| | your initial decisions, including bad ones. | |
| | touristtam wrote: | |
| | ... And the undocumented code spaghetti that might come | |
| | with a codebase that was touch by numerous hands. | |
| | nonethewiser wrote: | |
| | You can compare it - just factor that in. And compare writing | |
| | it with AI vs. writing it without AI. | |
| | | |
| | We have no clue the scope of the rewrite but for anything | |
| | non-trivial, 2 weeks just isn't going to be possible without | |
| | AI. To the point of you probably not doing it at all. | |
| | | |
| | I have no idea why they are rewriting the code. That's | |
| | another matter. | |
| | bwestergard wrote: | |
| | Out of curiosity, what is your product? | |
| | venndeezl wrote: | |
| | I suspect he means as a trillion dollar corporation led | |
| | endeavor. | |
| | | |
| | I trained a small neural net on pics of a cat I had in the 00s | |
| | (RIP George, you were a good cat). | |
| | | |
| | Mounted a webcam I had gotten for free from somewhere, above | |
| | the cat door, in the exterior of the house. | |
| | | |
| | If the neural net recognized my cat it switched off an | |
| | electromagnetic holding the pet door locked. Worked perfectly | |
| | until I moved out of the rental. | |
| | | |
| | Neural nets are, end of the day, pretty cool. It's the data | |
| | center business that's the problem. Just more landlords, | |
| | wannabe oligarchs, claiming ownership over anything they can | |
| | get the politicians to give them. | |
| | mschuster91 wrote: | |
| | The problem is... you're going to deprive yourself of the | |
| | talent chain in the long run, and so is everyone else who is | |
| | switching over to AI, both generative like ChatGPT and | |
| | transformative like the various translation, speech | |
| | recognition/transcription or data wrangling models. | |
| | | |
| | For now, it works out for companies - but forward to, say, ten | |
| | years in the future. There won't be new intermediates or | |
| | seniors any more to replace the ones that age out or quit the | |
| | industry entirely in frustration of them not being there for | |
| | actual creativity but to clean up AI slop, simply because there | |
| | won't have been a pipeline of trainees and juniors for a | |
| | decade. | |
| | | |
| | But by the time that plus the demographic collapse shows its | |
| | effects, the people who currently call the shots will be in | |
| | pension, having long since made their money. And my generation | |
| | will be left with collapse everywhere and find ways to somehow | |
| | keep stuff running. | |
| | | |
| | Hell, it's _already_ bad to get qualified human support these | |
| | days. Large corporations effectively rule with impunity, with | |
| | the only recourse consumers have being to either shell out | |
| | immense sums of money for lawyers and court fees or turning to | |
| | consumer protection /regulatory authorities that are being | |
| | gutted as we speak both in money and legal protections, or | |
| | being swamped with AI slop like "legal assistance" AI | |
| | hallucinating case law. | |
| | saxenaabhi wrote: | |
| | > There won't be new intermediates or seniors any more to | |
| | replace the ones that age out or quit the industry entirely | |
| | in frustration of them not being there for actual creativity | |
| | but to clean up AI slop, simply because there won't have been | |
| | a pipeline of trainees and juniors for a decade. | |
| | | |
| | There are be plenty of self taught developers who didn't need | |
| | any "traineeship". That proportion will increase even more | |
| | with AI/LLMs and the fact that there are no more jobs for | |
| | youngsters. And actually from looking at the purely toxic | |
| | comments on this thread, I would say that's a good thing for | |
| | youngsters to be not be exposed to such "seniors". | |
| | | |
| | Credentialism is dead. "Either ship or shutup" should be the | |
| | mantra of this age. | |
| | nonethewiser wrote: | |
| | > Meanwhile, my cofounder is rewriting code we spent millions | |
| | of salary on in the past by himself in a few weeks. | |
| | | |
| | Why? | |
| | | |
| | Im not even casting shade - I think AI is quite amazing for | |
| | coding and can increase productivity and quality a lot. | |
| | | |
| | But I'm curious why he's doing this. | |
| | mattmaroon wrote: | |
| | The codebase is old and really hard to work on. It's a game | |
| | that existed pre-iPhone and still has decent revenue but | |
| | could use some updating. We intentionally shrank our company | |
| | down to auto-pilot mode and frankly don't even have a working | |
| | development environment anymore. | |
| | | |
| | It was basically cost prohibitive to change anything | |
| | significant until Claude became able to do most of the work | |
| | for us. My cofounder (also CTO of another startup in the | |
| | interim) found himself with a lot of time on his hands | |
| | unexpectedly and thought it would be a neat experiment and | |
| | has been wowed by the results. | |
| | | |
| | Much in the same way people on HN debate when we will have | |
| | self driving cars while millions of people actually have | |
| | their Teslas self-driving every day (it reminds me of when I | |
| | got to bet that Joe Biden would win the election after he | |
| | already did) those who think AI coding is years away are | |
| | missing what's happening now. It's a powerful force magnifier | |
| | in the hands of a skilled programmer and it'll only get | |
| | better. | |
| | idiotsecant wrote: | |
| | I agree that code is being written in _exactly_ the same | |
| | sense that Teslas are driving themselves. | |
| | ggfdh wrote: | |
| | Do you have tests at least? Seems reckless to yolo the | |
| | codebase if you don't or can't test easily. | |
| | wolvoleo wrote: | |
| | When I say I want a self driving car I mean one that | |
| | actually drives itself so I don't have to be involved other | |
| | than setting the destination. | |
| | | |
| | What Tesla is selling now is the worst of both worlds. You | |
| | still have to pay attention but it's way more boring so | |
| | it's really hard to do so. Well until it suddenly decides | |
| | to ram a barrier at highway speeds. | |
| | | |
| | Wake me up when I can have a beer and watch a movie while | |
| | it's driving. | |
| | vlod wrote: | |
| | >my cofounder is rewriting code we spent millions of salary on | |
| | in the past by himself in a few weeks. | |
| | | |
| | I was expecting a language reference (we all know which one), | |
| | to get more speed, safety and dare I say it "web scale" (insert | |
| | meme). :) | |
| | oenton wrote: | |
| | > and dare I say it "web scale" | |
| | | |
| | Obligatory reference | |
| | https://www.youtube.com/watch?v=b2F-DItXtZs | |
| | rf15 wrote: | |
| | no need to wait, by using AI you already are mediocre at best | |
| | (because you forego skill and quality for speed) | |
| | thefz wrote: | |
| | > Meanwhile, my cofounder is rewriting code we spent millions | |
| | of salary on in the past by himself in a few weeks. | |
| | | |
| | If the LLM generating the code introduced a bug, who will be | |
| | fixing it? The founder that does not know how to code or the | |
| | LLM that made the mistake first? | |
| | gloosx wrote: | |
| | >rewriting code | |
| | | |
| | Key thing here. The code was already written, so rewriting it | |
| | isn't exactly adding a lot of quantifiable value. If millions | |
| | weren't spent in the first place, there would be no code to | |
| | rewrite. | |
| | mawadev wrote: | |
| | Doesn't this imply that you were not getting the level of | |
| | efficiency out of your investment? It would be a little odd to | |
| | say this publicly as this says more about you and your company. | |
| | The question would be what your code does and if it is | |
| | profitable. | |
| | RamblingCTO wrote: | |
| | In this thread: people throwing shade on tech that works, | |
| | comparing it to a perfect world and making weird assumptions | |
| | like no tests, no E2E or manual testing just to make a case. | |
| | Hot take: most SWEs produce shit code, be it by constraints of | |
| | any kind or their own abilities. LLMs do the same but cost less | |
| | and can move faster. If you know how to use it, code will be | |
| | fine. Code is a commodity and a lot of people will be | |
| | blindsided by that in the future. If your value proposition is | |
| | translating requirements into code, I feel sorry for you. The | |
| | output quality of the LLM depends on the abilities of the | |
| | operator. And most SWEs lack the system thinking to be good | |
| | here, in my experience. | |
| | | |
| | As a fractional CTO and in my decade of being co-founder/CTO I | |
| | saw a lot of people and codebases and most of it is just bad. | |
| | You need to compare real life codebases and outputs of | |
| | developers, not what people wished it would be like. And the | |
| | reality is that most of it sucks and most SWEs are bad at their | |
| | jobs. | |
| | sjw987 wrote: | |
| | Good luck with fixing that future mess. This is such an | |
| | incredibly short sighted approach to running a company and | |
| | software dev that I think your cofounder is likely going to | |
| | torpedo your company. | |
| | adrian_b wrote: | |
| | All the productivity enhancement provided by LLMs for | |
| | programming is caused by circumventing the copyright | |
| | restrictions of the programs on which they have been trained. | |
| | | |
| | You and anyone else could have avoided spending millions for | |
| | programmer salaries, had you been allowed to reuse freely any | |
| | of the many existing proprietary or open-source programs that | |
| | solved the same or very similar problems. | |
| | | |
| | I would have no problem with everyone being able to reuse any | |
| | program, without restrictions, but with these AI programming | |
| | tools the rich are now permitted to ignore copyrights, while | |
| | the poor remain constrained by them, as before. | |
| | | |
| | The copyright for programs has caused a huge multiplication of | |
| | the programming effort for many decades, with everyone | |
| | rewriting again and again similar programs, in order for their | |
| | employing company to own the "IP". Now LLMs are exposing what | |
| | would have happened in an alternative timeline. | |
| | | |
| | The LLMs have the additional advantage of fast and easy | |
| | searching through a huge database of programs, but this | |
| | advantage would not have been enough for a significant | |
| | productivity increase over a competent programmer that would | |
| | have searched the same database by traditional means, to find | |
| | reusable code. | |
| | ehnto wrote: | |
| | Howcome you need to re-write millions of dollars in code? | |
| | cadamsdotcom wrote: | |
| | G'day Matt from myself another person with a cofounder both | |
| | getting insane value out of AI and astounded at the attitudes | |
| | around HN. | |
| | | |
| | You sound like complete clones of us :-) | |
| | | |
| | We've been at it since July and have built what used to take | |
| | 3-5 people that long. | |
| | | |
| | To the haters: I use TDD and review every line of code, I'm not | |
| | an animal. | |
| | | |
| | There's just 2 of us but some days it feels like we command an | |
| | army. | |
| | smashed wrote: | |
| | Should have used an LLM to proofread.. LLMs can still cannot be | |
| | trusted? | |
| | warkdarrior wrote: | |
| | How dare you accuse Gary-Marcus-5.2-2025-12-11 of being an | |
| | LLM?? | |
| | tombert wrote: | |
| | I find it a bit odd that people are acting like this stuff is an | |
| | abject failure because it's not perfect yet. | |
| | | |
| | Generative AI, as we know it, has only existed ~5-6 years, and it | |
| | has improved substantially, and is likely to keep improving. | |
| | | |
| | Yes, people have probably been deploying it in spots where it's | |
| | not quite ready but it's myopic to act like it's "not going all | |
| | that well" when it's pretty clear that it actually _is_ going | |
| | pretty well, just that we need to work out the kinks. New | |
| | technology is always buggy for awhile, and eventually it becomes | |
| | boring. | |
| | maccard wrote: | |
| | > Generative AI, as we know it, has only existed ~5-6 years, | |
| | and it has improved substantially, and is likely to keep | |
| | improving. | |
| | | |
| | Every 2/3 months we're hearing there's a new model that just | |
| | blows the last one out of the water for coding. Meanwhile, here | |
| | I am with Opus and Sonnet for $20/mo and it's regularly failing | |
| | at basic tasks, antigravity getting stuck in loops and burning | |
| | credits. We're talking "copy basic examples and don't | |
| | hallucinate APIs" here, not deep complicated system design | |
| | topics. | |
| | | |
| | It can one shot a web frontend, just like v0 could in 2023. But | |
| | that's still about all I've seen it work on. | |
| | BeetleB wrote: | |
| | > We're talking "copy basic examples and don't hallucinate | |
| | APIs" here, not deep complicated system design topics. | |
| | | |
| | If your metric is an LLM that can copy/paste without | |
| | alterations, and never hallucinate APIs, then yeah, you'll | |
| | always be disappointed with them. | |
| | | |
| | The rest of us learn how to be productive with them despite | |
| | these problems. | |
| | drewbug01 wrote: | |
| | > If your metric is an LLM that can copy/paste without | |
| | alterations, and never hallucinate APIs, then yeah, you'll | |
| | always be disappointed with them. | |
| | | |
| | I struggle to take comments like this seriously - yes, it | |
| | is very reasonable to expect these magical tools to copy | |
| | and paste something without alterations. How on _earth_ is | |
| | that an unreasonable ask? | |
| | | |
| | The whole discourse around LLMs is so utterly exhausting. | |
| | If I say I don't like them for almost any reason, I'm a | |
| | luddite. If I complain about their shortcomings, I'm just | |
| | using it wrong. If I try and use it the "right" way and it | |
| | still gets extremely basic things wrong, then my | |
| | expectations are too high. | |
| | | |
| | What, precisely, are they good for? | |
| | tombert wrote: | |
| | I think what they're best at right now is the initial | |
| | scaffolding work of projects. A lot of the annoying | |
| | bootstrap shit that I hate doing is actually generally | |
| | handled really well by Codex. | |
| | | |
| | I agree that there's definitely some overhype to them | |
| | right now. At least for the stuff I've done they have | |
| | gotten considerably better though, to a point where the | |
| | code it generates is often usable, if sub-optimal. | |
| | | |
| | For example, about three years ago, I was trying to get | |
| | ChatGPT to write me a C program to do a fairly basic | |
| | ZeroMQ program. It generated something that looked | |
| | correct, but it would crash pretty much immediately, | |
| | because it kept trying to use a pointer after free. | |
| | | |
| | I tried the same thing again with Codex about a week ago, | |
| | and it worked out of the box, and I was even able to get | |
| | it to do more stuff. | |
| | smithkl42 wrote: | |
| | I think it USED to be true that you couldn't really use | |
| | an LLM on a large, existing codebase. Our codebase is | |
| | about 2 million LOC, and a year ago you couldn't use an | |
| | LLM on it for anything but occasional small tasks. Now, | |
| | probably 90% of the code I commit each week was written | |
| | by Claude (and reviewed by me and other humans - and also | |
| | by Copilot and ZeroPath). | |
| | blibble wrote: | |
| | > What, precisely, are they good for? | |
| | | |
| | scamming people | |
| | viking123 wrote: | |
| | Also good for manufacturing consent in Reddit and other | |
| | places. Intelligence services busy with certain country | |
| | now, bots using LLMs to pump out insane amounts of | |
| | content to mold the information atmosphere. | |
| | ubercow13 wrote: | |
| | It seems like just such a weird and rigid way to evaluate | |
| | it? I am a somewhat reasonable human coder, but I can't | |
| | copy and paste a bunch of code without alterations from | |
| | memory either. Can someone still find a use for me? | |
| | falloutx wrote: | |
| | Its strong enough to replace humans at their jobs and | |
| | weak enough that it cant do basic things. Its a paradox. | |
| | Just learn to be productive with them. Pay $200/month and | |
| | work around with its little quirks. /s | |
| | BeetleB wrote: | |
| | For a long time, I've wanted to write a blog post on why | |
| | programmers don't understand the utility of LLMs[1], | |
| | whereas non-programmers _easily_ see it. But I struggle | |
| | to articulate it well. | |
| | | |
| | The gist is this: Programmers view computers as | |
| | _deterministic_. They can 't tolerate a tool that behaves | |
| | differently from run to run. They have a very binary view | |
| | of the world: If it can't satisfy this "basic" | |
| | requirement, it's crap. | |
| | | |
| | Programmers have made their career (and possibly life) | |
| | being experts at solving problems that greatly benefit | |
| | from determinism. A problem that doesn't - well either | |
| | that needs to be solved by sophisticated machine | |
| | learning, or by a human. They're trained on essentially | |
| | ignoring those problems - it's not their expertise. | |
| | | |
| | And so they get really thrown off when people use | |
| | computers in a nondeterministic way to solve a | |
| | deterministic problem. | |
| | | |
| | For everyone else, the world, and its solutions, are | |
| | mostly non-deterministic. When they solve a problem, or | |
| | when they pay people to solve a problem, the guarantees | |
| | are much lower. They don't expect perfection every time. | |
| | | |
| | When a normal human asks a programmer to make a change, | |
| | they understand that communication is lossy, and even if | |
| | it isn't, programmers make mistakes. | |
| | | |
| | Using a tool like an LLM is like any other tool. Or like | |
| | asking any other human to do something. | |
| | | |
| | For programmers, it's a cardinal sin if the tool is | |
| | unpredictable. So they dismiss it. For everyone else, | |
| | it's just another tool. They embrace it. | |
| | | |
| | [1] This, of course, is changing as they become better at | |
| | coding. | |
| | maccard wrote: | |
| | I'm perfectly happy for my tooling to not be | |
| | deterministic. I'm not happy for it to make up solutions | |
| | that don't exist, and get stuck in loops because of that. | |
| | | |
| | I use LLMs, I code with a mix of antigravity and Claude | |
| | code depending on the task, but I feel like I'm living in | |
| | a different reality when the code I get out of these | |
| | tools _regularly just doesn't work, at all_. And to the | |
| | parents point, I'm doing something wrong for noticing | |
| | that? | |
| | BeetleB wrote: | |
| | If it were terrible, you wouldn't use them, right? Isn't | |
| | the fact that you continue to use AI coding tools a sign | |
| | that you find them a net positive? Or is it being imposed | |
| | on you? | |
| | | |
| | > And to the parents point, I'm doing something wrong for | |
| | noticing that? | |
| | | |
| | There's nothing wrong pointing out your experience. What | |
| | the OP was implying was he expects them to be able to | |
| | copy/paste reliably almost 100% of the time, and not | |
| | hallucinate. I was merely pointing out that he'll never | |
| | get that with LLMs, and that their inability to do so | |
| | isn't a barrier to getting productive use out of them. | |
| | maccard wrote: | |
| | I was the person who said it can't copy from examples | |
| | without making up APIs but. | |
| | | |
| | > he'll never get that with LLMs, and that their | |
| | inability to do so isn't a barrier to getting productive | |
| | use out of them. | |
| | | |
| | This is _exactly_ what the comment thread we're in said - | |
| | and I agree with him. > The whole discourse around LLMs | |
| | is so utterly exhausting. If I say I don't like them for | |
| | almost any reason, I'm a luddite. If I complain about | |
| | their shortcomings, I'm just using it wrong. If I try and | |
| | use it the "right" way and _it still gets extremely basic | |
| | things wrong, then my expectations are too high._ | |
| | | |
| | > If it were terrible, you wouldn't use them, right? | |
| | Isn't the fact that you continue to use AI coding tools a | |
| | sign that you find them a net positive? Or is it being | |
| | imposed on you? | |
| | | |
| | You're putting words in my mouth here - I'm not saying | |
| | that they're terrible, I'm saying they're way, way, way | |
| | overhyped, their abilities are overblown, (look at this | |
| | post and the replies of people saying they're writing 90% | |
| | of code with claude and using AI tools to review it), but | |
| | when we challenge that, we're wrong. | |
| | habinero wrote: | |
| | > And so they get really thrown off when people use | |
| | computers in a nondeterministic way to solve a | |
| | deterministic problem | |
| | | |
| | Ah, no. This is wildly off the mark, but I think a lot of | |
| | people don't understand what SWEs actually do. | |
| | | |
| | We don't get paid to write code. We get paid to solve | |
| | problems. We're knowledge workers like lawyers or doctors | |
| | or other engineers, meaning we're the ones making the | |
| | judgement calls and making the technical decisions. | |
| | | |
| | In my current job, I tell my boss what I'm going to be | |
| | working on, not the other way around. That's not always | |
| | true, but it's mostly true for most SWEs. | |
| | | |
| | The flip side of that is I'm also held responsible. If I | |
| | write ass code and deploy it to prod, it's my ass that's | |
| | gonna get paged for it. If I take prod down and cause a | |
| | major incident, the blame comes to me. It's not hard to | |
| | come up with scenarios where your bad choices end up | |
| | costing the company enormous sums of money. Millions of | |
| | dollars for large companies. Fines. | |
| | | |
| | So no, it has nothing to do with non-determinism lol. We | |
| | deal with that all the time. (Machine learning is decades | |
| | old, after all.) | |
| | | |
| | It's evaluating things, weighing the benefits against the | |
| | risks and failure modes, and making a judgement call that | |
| | it's ass. | |
| | tombert wrote: | |
| | Sure, but think about what it's replacing. | |
| | | |
| | If you hired a human, it will cost you thousands a week. | |
| | Humans will also fail at basic tasks, get stuck in useless | |
| | loops, and you still have to pay them for all that time. | |
| | | |
| | For that matter, even if I'm not hiring anyone, _I_ will | |
| | still get stuck on projects and burn through the finite | |
| | number of hours I have on this planet trying to figure stuff | |
| | out and being wrong for a lot of it. | |
| | | |
| | It's not perfect yet, but these coding models, in my mind, | |
| | have gotten pretty good if you're specific about the | |
| | requirements, and even if it misfires fairly often, they can | |
| | still be _useful_ , even if they're not perfect. | |
| | | |
| | I've made this analogy before, but to me they're like really | |
| | eager-to-please interns; not necessarily perfect, and there's | |
| | even a fairly high risk you'll have to redo a lot of their | |
| | work, but they can still be _useful_. | |
| | falloutx wrote: | |
| | I am an AI-skeptic but I would agree this looks impressive | |
| | from certain angles, especially if you're an early startup | |
| | (maybe) or you are very high up the chain and just want to | |
| | focus on cutting costs. On the other hand, if you are about | |
| | to be unemployed, this is less impressive. Can it replace a | |
| | human? I would say no its still long way to go, but a good | |
| | salesman can convince executives that it does and thats all | |
| | that matters. | |
| | tombert wrote: | |
| | I just think Jevins paradox [1]/Gustafson's Law [2] kind | |
| | of applies here. | |
| | | |
| | Maybe I shouldn't have used the word "replaced", as I | |
| | don't really think it's actually going to "replace" | |
| | people long term. I think it's likely to just lead to | |
| | higher output as these get better and better . | |
| | | |
| | [1] https://en.wikipedia.org/wiki/Jevons_paradox | |
| | | |
| | [2] https://en.wikipedia.org/wiki/Gustafson%27s_law | |
| | falloutx wrote: | |
| | Not you, but the word replaced is the being used all the | |
| | time. Even senior engineers are saying they are using it | |
| | as a junior engineers while we can easily hire junior | |
| | engineers (but Execs don't want to). Jevon's paradox wont | |
| | work in Software because user's wallets and time is | |
| | limited, and if software becomes too easy to build, it | |
| | becomes harder to sell. Normal people can have 5 | |
| | subscriptions, may be 10, but they wont be going to 50 or | |
| | 100. I would say we would have already exhausted users | |
| | already, with all the bad practices. | |
| | xp84 wrote: | |
| | > On the other hand, if you are about to be unemployed, | |
| | this is less impressive | |
| | | |
| | > salesman can convince executives that it does | |
| | | |
| | I tend to think that reality will temper this trend as | |
| | the results develop. Replacing 10 engineers with one | |
| | engineer using Cursor will result in a vast velocity hit. | |
| | Replacing 5 engineers with 5 "agents" assigned to | |
| | autonomously implement features will result in a mess | |
| | eventually. (With current technology -- I have no idea | |
| | what even 2027 AI will do). At that point those | |
| | unemployed engineers will find their phones ringing off | |
| | the hook to come and clean up the mess. | |
| | | |
| | Not that unlike what happens in many situations where | |
| | they fire teams and offshore the whole thing to a team of | |
| | average developers 180 degrees of longitude away who | |
| | don't have any domain knowledge of the business or | |
| | connections to the stakeholders. The pendulum swings back | |
| | in the other direction. | |
| | maccard wrote: | |
| | You've missed my point here - I agree that gen AI has | |
| | changed everything and is useful, _but_ I disagree that | |
| | it's improved substantially - which is what the comment I | |
| | replied to claimed. | |
| | | |
| | Anecdotally I've seen no difference in model changes in the | |
| | last year, but going from LLM to Claude code (where we told | |
| | the LLMs they can use tools on our machines) was a game | |
| | changer. The improvement there was the agent loop and the | |
| | support for tools. | |
| | | |
| | In 2023 I asked v0.dev to one shot me a website for a | |
| | business I was working on and it did it in about 3 minutes. | |
| | I feel like we're still stuck there with the models. | |
| | tombert wrote: | |
| | In my experience it has gotten considerably better. When | |
| | I get it to generate C, it often gets the pointer logic | |
| | correct, which wasn't the case three years ago. Three | |
| | years ago, ChatGPT would struggle with even fairly | |
| | straightforward LaTeX, but now I can pretty easily get it | |
| | to generate pretty elaborate LaTeX and I have even had | |
| | good success generating LuaTeX. I've been able to fairly | |
| | successfully have it generate TLA+ spec from existing | |
| | code now, which didn't work even a year ago when I tried | |
| | it. | |
| | | |
| | Of course, sample size of one, so if you haven't gotten | |
| | those results then fair enough, but I've at least | |
| | observed it getting a lot better. | |
| | BeetleB wrote: | |
| | I've been coding with LLMs for less than a year. As I | |
| | mentioned to someone in email a few days ago: In the | |
| | first half, when an LLM solved a problem differently from | |
| | me, I would probe why and more often than not overrule | |
| | and instruct it to do it my way. | |
| | | |
| | Now it's reversed. More often than not its method is | |
| | better than mine (e.g. leveraging a better | |
| | function/library than I would have). | |
| | | |
| | In general, it's writing idiomatic mode much more often. | |
| | It's been many months since I had to correct it and tell | |
| | it to be idiomatic. | |
| | johnnienaked wrote: | |
| | Ya but what do you do when there are no humans left? | |
| | cudgy wrote: | |
| | Prompt for a human? | |
| | elzbardico wrote: | |
| | There's a subtle point a moment when you HAVE to take the | |
| | driver wheel from the AI. All issues I see are from people | |
| | insisting to use far beyond the point it stops being useful. | |
| | | |
| | It is a helper, a partner, it is still not ready go the last | |
| | mile | |
| | xp84 wrote: | |
| | It's funny how many people don't get that. It's like adding | |
| | a pretty great senior or staff level engineer to sit on- | |
| | call next to every developer and assist them, for basically | |
| | free (I've never used any of the expensive stuff yet. Just | |
| | things like Copilot, Grok Code in JetBrains, just asking | |
| | Gemini to write bits of code for me). | |
| | | |
| | If you hired a staff engineer to sit next to me, and I just | |
| | had him/her write 100% of the code and never tried to | |
| | understand it, that would be an unwise decision on my part | |
| | and I'd have little room to complain about the times he | |
| | made mistakes. | |
| | maccard wrote: | |
| | As someone else said in this thread: | |
| | | |
| | > The whole discourse around LLMs is so utterly exhausting. | |
| | If I say I don't like them for almost any reason, I'm a | |
| | luddite. If I complain about their shortcomings, I'm just | |
| | using it wrong. If I try and use it the "right" way and it | |
| | still gets extremely basic things wrong, then my | |
| | expectations are too high. | |
| | | |
| | I'm perfectly happy to write code, to use these tools. I do | |
| | use them, and sometimes they work (well). Other times they | |
| | have catastrophic failures. But apparently it's my failure | |
| | for not understanding the tool or expecting too much of the | |
| | tool, while others are screaming from the rooftops about | |
| | how this new model changes everything (which happens every | |
| | 3 months at this point) | |
| | elzbardico wrote: | |
| | There's no silver bullet. I'm not a researcher, but I've | |
| | done my best to understand how these systems work-- | |
| | through books, video courses, and even taking underpaid | |
| | hourly work at a company that creates datasets for RLHF. | |
| | I spent my days fixing bugs step-by-step, writing notes | |
| | like, "Hmm... this version of the library doesn't support | |
| | protocol Y version 4423123423. We need to update it, then | |
| | refactor the code so we instantiate 'blah' and pass it to | |
| | 'foo' before we can connect." | |
| | | |
| | That experience gave me a deep appreciation for how | |
| | incredible LLMs are and the amazing software they can | |
| | power--but it also completely demystified them. So by all | |
| | means, let's use them. But let's also understand there | |
| | are no miracles here. Go back to Shannon's papers from | |
| | the '60s, and you'll understand that what seems to you | |
| | like "emerging behaviors" are quite explainable from an | |
| | information theory background. Learn how these models are | |
| | built. Keep up with the latests research papers. If you | |
| | do, you'll recognize their limitations before those | |
| | limitations catch you by surprise. | |
| | | |
| | There is no silver bullet. And if you think you've found | |
| | one, you're in for a world of pain. Worse still, you'll | |
| | never realize the full potential of these tools, because | |
| | you won't understand their constraints, their limits, or | |
| | their pitfalls. | |
| | maccard wrote: | |
| | > There is no silver bullet. And if you think you've | |
| | found one, you're in for a world of pain. Worse still, | |
| | you'll never realize the full potential of these tools, | |
| | because you won't understand their constraints, their | |
| | limits, or their pitfalls. | |
| | | |
| | See my previous comment (quoted below). | |
| | | |
| | > If I complain about their shortcomings, I'm just using | |
| | it wrong. If I try and use it the "right" way and it | |
| | still gets extremely basic things wrong, then my | |
| | expectations are too high. | |
| | | |
| | Regarding "there are no miracles here" | |
| | | |
| | Here are a few comments from this thread alone, | |
| | | |
| | - https://news.ycombinator.com/item?id=46609559 - | |
| | https://news.ycombinator.com/item?id=46610260 - | |
| | https://news.ycombinator.com/item?id=46609800 - | |
| | https://news.ycombinator.com/item?id=46611708 | |
| | | |
| | Here's a few from some older threads: - | |
| | https://news.ycombinator.com/item?id=46519851 - | |
| | https://news.ycombinator.com/item?id=46485304 | |
| | | |
| | There is a very vocal group who are telling us that there | |
| | _is_ a silver bullet. | |
| | Aurornis wrote: | |
| | You're doing exactly the thing that the parent commenter | |
| | pointed out: Complaining that they're not perfect yet as if | |
| | that's damning evidence of failure. | |
| | | |
| | We all know LLMs get stuck. We know they hallucinate. We know | |
| | they get things wrong. We know they get stuck in loops. | |
| | | |
| | There are two types of people: The first group learns to work | |
| | within these limits and adapt to using them where they're | |
| | helpful while writing the code when they're not. | |
| | | |
| | The second group gets frustrated every time it doesn't one- | |
| | shot their prompt and declares it all a big farce. Meanwhile | |
| | the rest of us are out here having fun with these tools, | |
| | however limited they are. | |
| | maccard wrote: | |
| | Someone else said this perfectly farther down: | |
| | | |
| | > The whole discourse around LLMs is so utterly exhausting. | |
| | If I say I don't like them for almost any reason, I'm a | |
| | luddite. If I complain about their shortcomings, I'm just | |
| | using it wrong. If I try and use it the "right" way and it | |
| | still gets extremely basic things wrong, then my | |
| | expectations are too high. | |
| | | |
| | As I've said, I use LLMs, and I use tools that are assisted | |
| | by LLMs. They help. But they don't work anywhere near as | |
| | reliably as people talk about them working. And that hasn't | |
| | changed in the 18 months since I first promoted v0 to make | |
| | me a website. | |
| | vips7L wrote: | |
| | Rather be a Luddite than contribute to these soul suckers | |
| | like OpenAI and help them lay off workers. | |
| | Gud wrote: | |
| | How are they "soul suckers"? | |
| | | |
| | Using LLMs has made it fun for me to make software again. | |
| | nonethewiser wrote: | |
| | >Every 2/3 months we're hearing there's a new model that just | |
| | blows the last one out of the water for coding | |
| | | |
| | I haven't heard that at all. I hear about models that come | |
| | out and are a bit better. And other people saying they suck. | |
| | | |
| | >Meanwhile, here I am with Opus and Sonnet for $20/mo and | |
| | it's regularly failing at basic tasks, antigravity getting | |
| | stuck in loops and burning credits. | |
| | | |
| | Is it bringing you any value? I find it speeds things up a | |
| | LOT. | |
| | user34283 wrote: | |
| | I have a hard time believing that this v0, from 2023, | |
| | achieved comparable results to Gemini 3 in Web design. | |
| | | |
| | Gemini now often produces output that looks significantly | |
| | better than what I could produce manually, and I'm an expert | |
| | for web, although my expertise is more in tooling and package | |
| | management. | |
| | jbs789 wrote: | |
| | Because the likes of Altman have set short term expectations | |
| | unrealistically high. | |
| | tombert wrote: | |
| | I mean that's every tech company. | |
| | | |
| | I made a joke once after the first time I watched one of | |
| | those Apple announcement shows in 2018, where I said "it's | |
| | kind of sad, because there won't be any problems for us to | |
| | solve because the iPhone XS Max is going to solve all of | |
| | them". | |
| | | |
| | The US economy is pretty much a big vibes-based Ponzi scheme | |
| | now, so I don't think we can single-out AI, I think we have | |
| | to blame the fact that the CEOs running these things face no | |
| | negative consequences for lying or embellishing _and_ they do | |
| | get rewarded for it because it will often bump the stock | |
| | price. | |
| | | |
| | Is Tesla _really_ worth more than every other car company | |
| | combined in any kind of objective sense? I don 't think so, I | |
| | think people really like it when Elon lies to them about | |
| | stuff that will come out "next year", and they feel no need | |
| | to punish him economically. | |
| | Terr_ wrote: | |
| | "Ponzi" requires records fraud and is popularly misused, | |
| | sort of like if people started describing every software | |
| | bug as "a stack overflow." | |
| | | |
| | I'd rather characterize it as extremes of Greater Fool | |
| | Theory. | |
| | | |
| | https://en.wikipedia.org/wiki/Greater_fool_theory | |
| | tombert wrote: | |
| | I would argue it's fraud-adjacent. These tech CEOs know | |
| | that they're not going to be able to keep the promises | |
| | that they're making. It's dishonest at the very least, if | |
| | it doesn't legally constitute "fraud". | |
| | hamdingers wrote: | |
| | I maintain that most anti-AI sentiment is actually anti- | |
| | lying-tech-CEO sentiment misattributed. | |
| | | |
| | The technology is neat, the people selling it are ghouls. | |
| | acdha wrote: | |
| | Exactly: the technology is useful but because the executive | |
| | class is hyping it as close to AGI because their buddies | |
| | are slavering for layoffs. If that "when do you get fired?" | |
| | tone wasn't behind the conversation, I think a lot of | |
| | people would be interested in applying LLMs to the smaller | |
| | subset of things they actually perform well at. | |
| | tombert wrote: | |
| | Maybe CEOs should face consequences for going on the | |
| | stage and outwardly lying. Instead they're rewarded by a | |
| | bump in stock price because people appear to have | |
| | amnesia. | |
| | wolvoleo wrote: | |
| | For me it's mostly about the subset of things that LLMs | |
| | suck at but still rammed in everywhere because someone | |
| | wants to make a quick buck. | |
| | | |
| | I know it's good tech for some stuff, just not for | |
| | everything. It's the same with previous bubbles. VR is | |
| | really great for some things but we were never going to | |
| | work with a headset on 8 hours a day. Bitcoin is pretty | |
| | cool but we were never going to do our shopping list on | |
| | Blockchain. I'm just so sick of hypes. | |
| | | |
| | But I do think it's good tech, just like I enjoy VR daily | |
| | I do have my local LLM servers (I'm pretty anti cloud so | |
| | I avoid it unless I really need the power) | |
| | | |
| | It's not really about the societal impacts for me, at | |
| | least not yet, it's just not good enough for that yet. I | |
| | do worry about that longer-term but not with the current | |
| | generation of AI. At my work we've done extensive | |
| | benchmarking (especially among enthusiastic early | |
| | adopters) and while it can save a couple hours a week | |
| | we're nowhere near the point where it can displace FTEs. | |
| | sroerick wrote: | |
| | This is how I felt about Bitcoin. | |
| | viking123 wrote: | |
| | I hate the Anthropic guy so much.. when I see the face it | |
| | just brings back all the nonsense lies and "predictions" he | |
| | says. Altman is kind of the same but for some reason Dario | |
| | kind of takes the cake. | |
| | barbazoo wrote: | |
| | We implement pretty cool workflows at work using "GenAI" and | |
| | the users of our software are really appreciative. It's like | |
| | saying a hammer sucks because it breaks most things you hit | |
| | with it. | |
| | onlyrealcuzzo wrote: | |
| | > Generative AI, as we know it, has only existed ~5-6 years, | |
| | and it has improved substantially, and is likely to keep | |
| | improving. | |
| | | |
| | I think the big problem is that the pace of improvement was | |
| | UNBELIEVABLE for about 4 years, and it appears to have | |
| | plateaued to almost nothing. | |
| | | |
| | ChatGPT has _barely_ improved in, what, 6 months or so. | |
| | | |
| | They are driving costs down incredibly, which is not nothing. | |
| | | |
| | But, here's the thing, they're not cutting costs because they | |
| | have to. Google has deep enough pockets. | |
| | | |
| | They're cutting costs because - at least with the current known | |
| | paradigm - the cost is not worth it to make material | |
| | improvements. | |
| | | |
| | So unless there's a paradigm shift, we're not seeing MASSIVE | |
| | improvements in output like we did in the previous years. | |
| | | |
| | You could see costs go down to 1/100th over 3 years, seriously. | |
| | | |
| | But they need to make money, so it's possible non of that will | |
| | be passed on. | |
| | sheeh wrote: | |
| | They are focused on reducing costs in order to survive. Pure | |
| | and simple. | |
| | | |
| | Alphabet / Google doesn't have that issue. OAI and other | |
| | money losing firms do. | |
| | tombert wrote: | |
| | I think that even if it never improves, its current state is | |
| | already pretty useful. I _do_ think it 's going to improve | |
| | though I don't think AGI is going to happen any time soon. | |
| | | |
| | I have no idea what this is called, but it feels like a lot | |
| | of people assume that progress will continue at a linear pace | |
| | for forever for things, when I think that generally progress | |
| | is closer to a "staircase" shape. A new invention or | |
| | discovery will lead to a lot of really cool new inventions | |
| | and discoveries in a very short period of time, eventually | |
| | people will exhaust the low-to-middle-hanging fruit, and | |
| | progress kind of levels out. | |
| | | |
| | I suspect it will be the same way with AI; I don't now if | |
| | we've reached the top of our current plateau, but if not I | |
| | think we're getting fairly close. | |
| | jamesfinlayson wrote: | |
| | Yes I've read about something like before - like the jump | |
| | from living in 1800 to 1900 - you go from no electricity at | |
| | home to having electricity at home for example. The jump | |
| | from 1900 to 2000 is much less groundbreaking for the | |
| | electricity example - you have more appliances and more | |
| | reliable electricity but it's nothing like the jump from | |
| | candle to light bulb. | |
| | 1970-01-01 wrote: | |
| | >and is likely to keep improving. | |
| | | |
| | I'm not trying to be pedantic, but how did you arrive at 'keep | |
| | improving' as a conclusion? Nobody is really sure how this | |
| | stuff actually works. That's why AI safety was such a big deal | |
| | a few years ago. | |
| | tombert wrote: | |
| | Totally reasonable question, and I only am making an | |
| | assumption based on observed progress. AI generated code, at | |
| | least in my personal experience, has gotten a lot better, and | |
| | while I don't think that will go to infinity, I do think that | |
| | there's still more room for improvement that could happen. | |
| | | |
| | I will acknowledge that I don't have any evidence of this | |
| | claim, so maybe the word "likely" was unwise, as that | |
| | suggests probability. Feel free to replace "is "likely to" | |
| | with "it feels like it will". | |
| | nonethewiser wrote: | |
| | >Generative AI, as we know it, has only existed ~5-6 years | |
| | | |
| | Probably less than that, practically speaking. ChatGPT's | |
| | initial release date was November 2022. It's closer to 3 years, | |
| | in terms of any significant amount of people using them. | |
| | johnnienaked wrote: | |
| | You're saying the same thing cryptobros say about bitcoin right | |
| | now, and that's 17 years later. | |
| | | |
| | It's a business, but it won't be the thing the first movers | |
| | thought it was. | |
| | tombert wrote: | |
| | It's different in that Bitcoin was never useful in any | |
| | capacity when it was new. AI is at least useful right now and | |
| | it's improved considerably in the last few years. | |
| | robertclaus wrote: | |
| | Odds this was AI generated? | |
| | kingstnap wrote: | |
| | It's literally just four screenshots paired with this sentence. | |
| | | |
| | > Trying to orient our economy and geopolitical policy around | |
| | such shoddy technology -- particularly on the unproven hopes | |
| | that it will dramatically improve- is a mistake. | |
| | | |
| | The screenshots are screenshots of real articles. The sentence | |
| | is shorter than a typical prompt. | |
| | mythrwy wrote: | |
| | It's going well for coding. I just knocked out a mapping project | |
| | that would have been a week+ of work (with docs and stackoverflow | |
| | opened in the background) in a few hours. | |
| | | |
| | And yes, I do understand the code and what is happening and did | |
| | have to make a couple of adjustments manually. | |
| | | |
| | I don't know that reducing coding work justifies the current | |
| | valuations, but I wouldn't say it's "not going all that well". | |
| | dreadsword wrote: | |
| | This feels like a pretty low effort post that plays heavily to | |
| | superficial reader's cognitive biases. | |
| | | |
| | I work commercializing AI in some very specific use cases where | |
| | it extremely valuable. Where people are being lead astray is | |
| | layering generalizations: general use cases (copilots) deployed | |
| | across general populations and generally not doing very well. But | |
| | that's PMF stuff, not a failure of the underlying tech. | |
| | Aurornis wrote: | |
| | > This feels like a pretty low effort post that plays heavily | |
| | to superficial reader's cognitive biases. | |
| | | |
| | I haven't followed this author but the few times he's come up | |
| | his writings have been exactly this. | |
| | kokanee wrote: | |
| | I think both sides of this debate are conflating the tech and | |
| | the market. First of all, there were forms of "AI" before | |
| | modern Gen AI (machine learning, NLP, computer vision, | |
| | predictive algorithms, etc) that were and are very valuable for | |
| | specific use cases. Not much has changed there AFAICT, so it's | |
| | fair that the broader conversation about Gen AI is focused on | |
| | general use cases deployed across general populations. After | |
| | all, Microsoft thinks it's a copilot company, so it's fair to | |
| | talk about how copilots are doing. | |
| | | |
| | On the pro-AI side, people are conflating technology success | |
| | with product success. Look at crypto -- the technology supports | |
| | decentralization, anonymity, and use as a currency; but in the | |
| | marketplace it is centralized, subject to KYC, and used for | |
| | speculation instead of transactions. The potential of the tech | |
| | does not always align with the way the world decides to use it. | |
| | | |
| | On the other side of the aisle, people are conflating the | |
| | problematic socio-economics of AI with the state of the | |
| | technology. I think you're correct to call it a failure of PMF, | |
| | and that's a problem worth writing articles about. It just | |
| | shouldn't be so hard to talk about the success of the | |
| | technology and its failure in the marketplace in the same | |
| | breath. | |
| | 1a527dd5 wrote: | |
| | A year ago I would have agreed wholeheartedly and I was a self | |
| | confessed skeptic. | |
| | | |
| | Then Gemini got good (around 2.5?), like I-turned-my-head good. I | |
| | started to use it every week-ish, not to write code. But more | |
| | like a tool (as you would a calculator). | |
| | | |
| | More recently Opus 4.5 was released and now I'm using it every | |
| | day to assist in code. It is regularly helping me take tasks that | |
| | would have taken 6-12 hours down to 15-30 minutes with some minor | |
| | prompting and hand holding. | |
| | | |
| | I've not yet reached the point where I feel letting is loose and | |
| | do the entire PR for me. But it's getting there. | |
| | spaceywilly wrote: | |
| | I would strongly recommend this podcast episode with Andrej | |
| | Karpathy. I will poorly summarize it by saying his main point | |
| | is that AI will spread like any other technology. It's not | |
| | going to be a sudden flash and everything is done by AI. It | |
| | will be a slow rollout where each year it automates more and | |
| | more manual work, until one day we realize it's everywhere and | |
| | has become indispensable. | |
| | | |
| | It sounds like what you are seeing lines up with his | |
| | predictions. Each model generation is able to take on a little | |
| | more of the responsibilities of a software engineer, but it's | |
| | not as if we suddenly don't need the engineer anymore. | |
| | | |
| | https://www.dwarkesh.com/p/andrej-karpathy | |
| | sheeh wrote: | |
| | AI first of all is not a technology. | |
| | | |
| | Can people get their words straight before typing? | |
| | shawabawa3 wrote: | |
| | Is LLM a technology? Are you complaining about the use of | |
| | AI to mean LLM? Because I think that ship has sailed | |
| | daxfohl wrote: | |
| | Though I think it's a very steep sigmoid that we're still far | |
| | on the bottom half of. | |
| | | |
| | For math it just did its first "almost independent" Erdos | |
| | problem. In a couple months it'll probably do another, then | |
| | maybe one each month for a while, then one morning we'll wake | |
| | up and find _whoom_ it solved 20 overnight and is spitting | |
| | them out by the hour. | |
| | | |
| | For software it's been "curiosity ... curiosity ... curiosity | |
| | ... occasionally useful assistant ... slightly more capable | |
| | assistant" up to now, and it'll probably continue like that | |
| | for a while. The inflection point will be when | |
| | OpenAI/Anthropic/Google releases an e2e platform meant to be | |
| | driven primarily by the product team, with engineering just | |
| | being co-drivers. It probably starts out buggy and needing a | |
| | lot of hand-holding (and grumbling) from engineering, but | |
| | slowly but surely becomes more independently capable. Then at | |
| | some point, product will become more confident in that | |
| | platform than their own engineering team, and begin pushing | |
| | out features based on that alone. Once that process starts | |
| | (probably first at OpenAI/Anthropic/Google themselves, but | |
| | spreading like wildfire across the industry), then it's just | |
| | a matter of time until leadership declares that all feature | |
| | development goes through that platform, and retains only as | |
| | many engineers as is required to support the platform itself. | |
| | nullpoint420 wrote: | |
| | And then what? Am I supposed to be excited about this | |
| | future? | |
| | daxfohl wrote: | |
| | Hard to say. In business we'll still have to make hard | |
| | decisions about unique situations, coordinate and align | |
| | across teams and customers, deal with real world | |
| | constraints and complex problems that aren't suitable to | |
| | feed to an LLM and let it decide. In particular, deciding | |
| | whether or not to trust an LLM with a task will itself | |
| | always be a human decision. I think there will always be | |
| | a place for analytical thinking in business even if LLMs | |
| | do most of the actual engineering. If nothing else, the | |
| | speed at which they work will require an increase in | |
| | human analytical effort, to maximize their efficacy while | |
| | maintaining safety and control. | |
| | | |
| | In the academic world, and math in particular, I'm not | |
| | sure. In a way, you could say it doesn't change anything | |
| | because proofs already "exist" long before we discover | |
| | them, so AI just streamlines that discovery. Many | |
| | mathematicians say that asking the right questions is | |
| | more important than finding the answers. In which case, | |
| | maybe math turns into something more akin to philosophy | |
| | or even creative writing, and equivalently follows the | |
| | direction that we set for AI in those fields. Which is, | |
| | perhaps less than one would think: while AI can write a | |
| | novel and it could even be pretty good, part of the value | |
| | of a novel is the implicit bond between the author and | |
| | the audience. "Meaning" has less value coming from a | |
| | machine. And so maybe math continues that way, computers | |
| | solving the problems but humans determining the meaning. | |
| | | |
| | Or maybe it all turns to shit and the sheer ubiquity of | |
| | "masterpieces" of STEM/art everything renders all human | |
| | endeavor pointless. Then the only thing that's left worth | |
| | doing is for the greedy, the narcissists, and the power | |
| | hungry to take the world back to the middle ages where | |
| | knowledge and search for meaning take a back seat to | |
| | tribalism and war mongering until the datacenters power | |
| | needs destroy the planet. | |
| | | |
| | I'm hoping for something more like the former, but, it's | |
| | anybody's guess. | |
| | suddenlybananas wrote: | |
| | You have to remember that half these people think they | |
| | are building god. | |
| | user34283 wrote: | |
| | If machines taking over labor and allowing humans to live | |
| | a life of plenty instead of slaving away in jobs isn't | |
| | exciting, then I don't know what is. | |
| | | |
| | I guess cynics will yap about capitalism and how this | |
| | supposedly benefits only the rich. That seems very | |
| | unimaginative to me. | |
| | sensanaty wrote: | |
| | > That seems very unimaginative to me. | |
| | | |
| | Does it? How exactly is the common Joe going to benefit | |
| | from this world where the robots are doing the job he was | |
| | doing before, as well as everyone else's job (aka, no | |
| | more jobs for _anyone_ )? Where exactly is the money | |
| | going to come from to make sure Joe can still buy food? | |
| | Why on earth would the people in power (aka the psychotic | |
| | CxOs) care to expend any resources for Joe, once they | |
| | control the robots that can do everything Joe could? What | |
| | mechanisms exist for everyone here to prosper, rather | |
| | than a select few who _already_ own more wealth and power | |
| | than the majority of the planet combined? | |
| | | |
| | I think believing in this post-scarcity utopian fairy | |
| | tale is a lot less imaginative and grounded than the | |
| | opposite scenario, one where the common man gets crushed | |
| | ruthlessly. | |
| | | |
| | We don't even have to step into any kind of fantasy world | |
| | to see this is the path we're heading down, in our | |
| | current timeline as we speak, CEOs are foaming at the | |
| | mouth to replace as many people as they can with AI. This | |
| | entire massive AI/LLM bubble we find ourselves in is | |
| | predicated on the idea that companies can finally get rid | |
| | of their biggest cost centers, their human workers and | |
| | their pesky desires like breaks and vacations and | |
| | worker's rights. And yet, there's still somehow people | |
| | out there that will readily lap up the bullshit notion | |
| | that this tech is going to somehow be used as a force of | |
| | good? That I find completely baffling. | |
| | kstrauser wrote: | |
| | > I was a self confessed skeptic. | |
| | | |
| | I think that's the key. Healthy skepticism is always | |
| | appropriate. It's the outright cynicism that gets me. "AI will | |
| | never be able to [...]", when I've been sitting here at work | |
| | doing 2/3rds of those supposedly impossible things. Flawlessly? | |
| | No, of course not! But _I_ don 't do those things flawlessly on | |
| | the first pass, either. | |
| | | |
| | Skepticism is good. I have no time or patience for cynics who | |
| | dismiss the whole technology as impossible. | |
| | sublinear wrote: | |
| | I think the concern expressed as "impossible" is whether it | |
| | _can ever_ do those things "flawlessly" because that's what | |
| | we actually need from its output. Otherwise _a more | |
| | experienced human_ is forced to do double work figuring out | |
| | where it 's wrong and then fixing it. | |
| | | |
| | This is not a lofty goal. It's what we _always expect_ from a | |
| | competent human regardless of the number of passes it takes | |
| | them. This is not what we get from LLMs in the same amount | |
| | time it takes a human to do the work unassisted. If it 's | |
| | impossible then there is no amount of time that would ever | |
| | get this result from this type of AI. This matters because it | |
| | means the human is forced to still be in the loop, not saving | |
| | time, and forced to work harder than just not using it. | |
| | | |
| | I don't mean "flawless" in the sense that there cannot be | |
| | improvements. I mean that the result should be what was | |
| | expected for all possible inputs, and when inspected for bugs | |
| | there are reasonable and subtle technical misunderstandings | |
| | at the root of them (true bugs that are possibly undocumented | |
| | or undefined behavior) and not a mess of additional | |
| | linguistic ones or misuse. This is the stronger definition of | |
| | what people mean by "hallucination", and it is absolutely not | |
| | fixed and there has been no progress made on it either. No | |
| | amount of prompting or prayer can work around it. | |
| | | |
| | This game of AI whack-a-mole really is a waste of time in so | |
| | many cases. I would not bet on statistical models being | |
| | anything more than what they are. | |
| | cameronh90 wrote: | |
| | I'm now putting more queries into LLMs than I am into Google | |
| | Search. | |
| | | |
| | I'm not sure how much of that is because Google Search has | |
| | worsened versus LLMs having improved, but it's still a | |
| | substantial shift in my day-to-day life. | |
| | | |
| | Something like finding the most appropriate sensor ICs to use | |
| | for a particular use case requires so much less effort than it | |
| | used to. I might have spent an entire day digging through data | |
| | sheets before, and now I'll find what I need in a few minutes. | |
| | It feels at least as revolutionary as when search replaced | |
| | manually paging through web directories. | |
| | billsunshine wrote: | |
| | a historic moron. Marcus will make Krugman's internet==fax | |
| | machine look like a good prediction | |
| | segfaultex wrote: | |
| | I wholeheartedly agree. Shitty companies steal art and then put | |
| | out shitty products that shitty people use to spam us with slop. | |
| | | |
| | The same goes for code as well. | |
| | | |
| | I've explored Claude code/antigravity/etc, found them mostly | |
| | useless, tried a more interactive approach with copilot/local | |
| | models/ tried less interactive "agents"/etc. it's largely all | |
| | slop. | |
| | | |
| | My coworkers who claim they're shipping at warp speed using | |
| | generative AI are almost categorically our worst developers by a | |
| | mile. | |
| | 4782626292283 wrote: | |
| | Ah, Gary Marcus, the 10x ninja whose hand-crafted bespoke code | |
| | singlehandedly keeps his employer in business. | |
| | bawolff wrote: | |
| | Holy moving goal posts batman! | |
| | | |
| | I hate generative AI, but its inarguable what we have now would | |
| | have been considered pure magic 5 years ago. | |
| | meowface wrote: | |
| | How on Earth do people keep taking Gary Marcus seriously? | |
| | throw310822 wrote: | |
| | He's such a joke that even LLMs make fun of him. The Gemini- | |
| | generated Hacker News frontpage for December 9 2035 contains an | |
| | article by Gary Marcus: "AI progress is stalling": | |
| | https://dosaygo-studio.github.io/hn-front-page-2035/news | |
| | piskov wrote: | |
| | As if the articles he's linked were written by him | |
| | amw-zero wrote: | |
| | I'm starting to think this take is legitimately insane. | |
| | | |
| | As said in the article, a conservative estimate is that Gen AI | |
| | can currently do 2.5% of all jobs in the entire economy. A | |
| | technology that is really only a couple of years old. This is | |
| | supposed to be _disappointing_? That's millions of jobs _today_, | |
| | in a totally nascent form. | |
| | | |
| | I mean I understand skepticism, I'm not exactly in love with AI | |
| | myself, but the world has literally been transformed. | |
| | Jadiiee wrote: | |
| | It's more about how you use it. It should be a source of inspo. | |
| | Not the end all be all. | |
| | gejose wrote: | |
| | I believe Gary Marcus is quite well known for terrible AI | |
| | predictions. He's not in any way an expert in the field. Some of | |
| | his predictions from 2022 [1] | |
| | | |
| | > In 2029, AI will not be able to watch a movie and tell you | |
| | accurately what is going on (what I called the comprehension | |
| | challenge in The New Yorker, in 2014). Who are the characters? | |
| | What are their conflicts and motivations? etc. | |
| | | |
| | > In 2029, AI will not be able to read a novel and reliably | |
| | answer questions about plot, character, conflicts, motivations, | |
| | etc. Key will be going beyond the literal text, as Davis and I | |
| | explain in Rebooting AI. | |
| | | |
| | > In 2029, AI will not be able to work as a competent cook in an | |
| | arbitrary kitchen (extending Steve Wozniak's cup of coffee | |
| | benchmark). | |
| | | |
| | > In 2029, AI will not be able to reliably construct bug-free | |
| | code of more than 10,000 lines from natural language | |
| | specification or by interactions with a non-expert user. [Gluing | |
| | together code from existing libraries doesn't count.] | |
| | | |
| | > In 2029, AI will not be able to take arbitrary proofs from the | |
| | mathematical literature written in natural language and convert | |
| | them into a symbolic form suitable for symbolic verification. | |
| | | |
| | Many of these have already been achieved, and it's only early | |
| | 2026. | |
| | | |
| | [1]https://garymarcus.substack.com/p/dear-elon-musk-here-are- | |
| | fi... | |
| | ls612 wrote: | |
| | I'm pretty sure it can do all of those except for the one which | |
| | requires a physical body (in the kitchen) and the one that | |
| | humans can't do reliably either (construct 10000 loc bug-free). | |
| | merlincorey wrote: | |
| | Which ones are you claiming have already been achieved? | |
| | | |
| | My understanding of the current scorecard is that he's still | |
| | technically correct, though I agree with you there is velocity | |
| | heading towards some of these things being proven wrong by | |
| | 2029. | |
| | | |
| | For example, in the recent thread about LLMs and solving an | |
| | Erdos problem I remember reading in the comments that it was | |
| | confirmed there were multiple LLMs involved as well as an | |
| | expert mathematician who was deciding what context to shuttle | |
| | between them and helping formulate things. | |
| | | |
| | Similarly, I've not yet heard of any non-expert Software | |
| | Engineers creating 10,000+ lines of non-glue code that is bug- | |
| | free. Even expert Engineers at Cloud Flare failed to create a | |
| | bug-free OAuth library with Claude at the helm because some | |
| | things are just extremely difficult to create without bugs even | |
| | with experts in the loop. | |
| | stingrae wrote: | |
| | 1 and 2 have been achieved. | |
| | | |
| | 4 is close, the interface needs some work to allow | |
| | nontechnical people use it. (claude code) | |
| | fxtentacle wrote: | |
| | I strongly disagree. I've yet to find an AI that can | |
| | reliably summarise emails, let alone understand nuance or | |
| | sarcasm. And I just asked ChatGPT 5.2 to describe an | |
| | Instagram image. It didn't even get the easily OCR-able | |
| | text correct. Plus it completely failed to mention anything | |
| | sports or stadium related. But it was looking at a cliche | |
| | baseball photo taken by an fan inside the stadium. | |
| | protocolture wrote: | |
| | I have had ChatGPT read text in an image, give me a 100% | |
| | accurate result, and then claim not to have the ability | |
| | and to have guessed the previous result when I ask it to | |
| | do it again. | |
| | pixl97 wrote: | |
| | >let alone understand nuance or sarcasm | |
| | | |
| | I'm still trying to find humans that do this reliably | |
| | too. | |
| | | |
| | To add on, 5.2 seems to be kind of lazy when reading text | |
| | in images by default. Feeding it an image it may give the | |
| | first word or so. But coming back with a prompt 'read all | |
| | the text in the image' makes it do a better job. | |
| | | |
| | With one in particular that I tested I thought it was | |
| | hallucinating some of the words, but there was a picture | |
| | in the picture with small words it saw I missed the first | |
| | time. | |
| | | |
| | I think a lot of AI capabilities are kind of munged to | |
| | end users because they limit how much GPU is used. | |
| | falloutx wrote: | |
| | I dispute 1 & 2 more than 4. | |
| | | |
| | 1) Is it actually watching a movie frame by frame or just | |
| | searching about it and then giving you the answer? | |
| | | |
| | 2) Again can it handle very long novels, context windows | |
| | are limited and it can easily miss something. Where is the | |
| | proof for this? | |
| | | |
| | 4 is probably solved | |
| | | |
| | 4) This is more on predictor because this is easy to game. | |
| | you can create some gibberish code with LLM today that is | |
| | 10k lines long without issues. Even a non-technical user | |
| | can do | |
| | CjHuber wrote: | |
| | I think all of those are terrible indicators, 1 and 2 for | |
| | example only measure how well LLMs can handle long | |
| | context sizes. | |
| | | |
| | If a movie or novel is famous the training data is | |
| | already full of commentary and interpretations of them. | |
| | | |
| | If its something not in the training data, well I don't | |
| | know many movies or books that use only motives that no | |
| | other piece of content before them used, so interpreting | |
| | based on what is similar in the training data still | |
| | produces good results. | |
| | | |
| | EDIT: With 1 I meant using a transcript of the Audio | |
| | Description of the movie. If he really meant watch a | |
| | movie I'd say thats even sillier because well of course | |
| | we could get another Agent to first generate the Audio | |
| | Description, which definitely is possible currently. | |
| | zdragnar wrote: | |
| | Just yesterday I saw an article about a police station's | |
| | AI body cam summarizer mistakenly claim that a police | |
| | officer turned into a frog during a call. What actually | |
| | happened was that the cartoon "princess and the frog" was | |
| | playing in the background. | |
| | | |
| | Sure, another model might have gotten it right, but I | |
| | think the prediction was made less in the sense of "this | |
| | will happen at least once" and more of "this will not be | |
| | an uncommon capability". | |
| | | |
| | When the quality is this low (or variable depending on | |
| | model) I'm not too sure I'd qualify it as a larger issue | |
| | than mere context size. | |
| | CjHuber wrote: | |
| | My point was not that those video to text models are good | |
| | like they are used for example in that case, but more | |
| | generally I was referring to that list of indicators. | |
| | Like surely when analysing a movie it is alright if some | |
| | things are misunderstood by it, especially as the amount | |
| | of misunderstanding can be decreased a lot. That AI body | |
| | camera surely is optimized on speed and inference cost. | |
| | but if you give an agent 10 1s images along with the | |
| | transcript of that period and the full prior transcript, | |
| | and give it reasoning capabilities, it would take almost | |
| | endlessy for that movie to process but the result surely | |
| | will be much better than the body cameras. After all the | |
| | indicator talks about "AI" in general so judge a model | |
| | not optimized for capability but something else to | |
| | measure on that indicator | |
| | bspammer wrote: | |
| | The bug-free code one feels unfalsifiable to me. How do you | |
| | prove that 10,000 lines of code is bug-free, and then there's | |
| | a million caveats about what a bug actually is and how we | |
| | define one. | |
| | | |
| | The second claim about novels seems obviously achieved to me. | |
| | I just pasted a random obscure novel from project gutenberg | |
| | into a file and asked claude questions about the characters, | |
| | and then asked about the motivations of a random side- | |
| | character. It gave a good answer, I'd recommend trying it | |
| | yourself. | |
| | verse wrote: | |
| | I agree with you but I'd point out that unless you've read | |
| | the book it's difficult to know if the answer you got was | |
| | accurate or it just kinda made it up. In my experience it | |
| | makes stuff up. | |
| | | |
| | Like, it behaves as if any answer is better than no answer. | |
| | evrydayhustling wrote: | |
| | So do humans asked to answer tests. The appropriate thing | |
| | is to compare to human performance at the same task. | |
| | | |
| | At most of these comprehension tasks, AI is already | |
| | superhuman (in part because Gary picked scaled tasks that | |
| | humans are surprisingly bad at). | |
| | rafaelmn wrote: | |
| | You can't really compare to human performance because the | |
| | failure modes and performance characteristics are so | |
| | different. | |
| | | |
| | In some instances you'll get results that are shockingly | |
| | good (and in no time), in others you'll have a grueling | |
| | experience going in circles over fundamental reasoning, | |
| | where you'd probably fire any person on the spot for | |
| | having that kind of a discussion chain. | |
| | | |
| | And there's no learning between sessions or subject area | |
| | mastery - results on the same topic can vary within same | |
| | session (with relevant context included). | |
| | | |
| | So if something is superhuman and subhuman a large | |
| | percentage of time but there's no good way of telling | |
| | which you'll get or how - the result isn't the average if | |
| | you're trying to use the tool. | |
| | retrac wrote: | |
| | I've done similar loose experiments using a 19th century | |
| | French novel I have read (L'Agonie by Jean Lombard), and | |
| | which is quite obscure. There is no English translation of | |
| | the novel published. | |
| | | |
| | Using Gemini Notebooks I've found it passable at | |
| | summarizing chapters, listing characters, major theme and | |
| | so on, and it can do this in English. | |
| | | |
| | "What is the symbolism of the Black Stone?" | |
| | | |
| | > In the sources, the Black Stone (referred to as the | |
| | Pierre-Noire) serves as the central religious icon of the | |
| | Syrian Sun cult brought to Rome by the Emperor Elagabalus. | |
| | Its symbolism is multifaceted, ranging from fertility and | |
| | virility to a mystical pursuit of universal unity. | |
| | | |
| | > It represents the perennity of the Sun, which fertilizes | |
| | the world, causing "germs to rise" and spreading them | |
| | through the atmosphere. It is viewed as the "definitive | |
| | form" of divinity, intended to absorb and replace the | |
| | "transitory forms" of all other Roman, Greek, Egyptian, and | |
| | Persian gods, including the Christian "Kreistos". > | |
| | Virility and Phallic Symbolism > > The sources explicitly | |
| | characterize the Black Stone as a gigantic sacred phallus. | |
| | It is described as: * An icon of virility and the "organ of | |
| | generation" * A "unisexual icon" that materializes the | |
| | generative force of nature * A representation of "virility | |
| | in activity," which is why it is often paraded and elevated | |
| | in an "orgasmic" or "colossal adoration" | |
| | | |
| | > The Androgyne and Universal Unity - Beyond simple | |
| | fertility, the philosopher Atillius explains a more | |
| | complex, "mad" metaphysical project associated with the | |
| | stone. It symbolizes "Life One" (Vie Une) and the return to | |
| | a unisexual state of perfection. * The Androgyne: Atillius | |
| | believes that by pursuing "the male sex by the male sex," | |
| | the cult "inutilizes" the female sex to eventually create | |
| | the Androgyne--a self-sufficient being containing both | |
| | sexes * Unity: The stone signifies the fusion of all | |
| | generative forces into a single Unity, reversing the | |
| | "separation of the sexes" which is viewed as a state of | |
| | unhappiness and impotence. * Marriage of Moon and Sun: The | |
| | ritual marriage of the goddess Astaroth (representing the | |
| | Moon and the female principle) to the Black Stone | |
| | (representing the Sun and the male principle) symbolizes | |
| | the merging of the Orient and Occident into this unified | |
| | life principle > > Destruction of the Symbol - The Black | |
| | Stone ultimately becomes a symbol of Oriental pollution and | |
| | decadence to the Roman populace. During the final rebellion | |
| | against Elagabalus, the stone is torn from its temple on | |
| | the Palatine, defiled with filth, and broken into pieces to | |
| | ensure that its "signification of Life" would never again | |
| | dominate Rome. | |
| | | |
| | This is all accurate to the book, even teasing out a couple | |
| | themes that were only subconsciously present to me. | |
| | | |
| | The NotebookLM version gives citations with links to the | |
| | original text to support all these assertions, which | |
| | largely are coherent with that purpose. | |
| | | |
| | The input is raw images of a book scan! Imperfect as it is | |
| | it still blows my mind. Not that long ago any kind of | |
| | semantic search or analysis was a very hard AI problem. | |
| | daveguy wrote: | |
| | "quite obscure" doesn't mean there is nothing in the | |
| | internet that directly addresses the question. | |
| | | |
| | Here is an english analysis of the text that easily | |
| | showed up in an internet search: | |
| | | |
| | https://www.cantab.net/users/leonardo/Downloads/Varian%20 | |
| | Sym... | |
| | | |
| | This source includes analysis of "the Black Stone." | |
| | retrac wrote: | |
| | Not quite the same analysis. The human is better, no | |
| | surprise. But the NotebookLM output links back to the | |
| | original book in a very useful way. If you think about it | |
| | as fuzzy semantic search it's amazing. If you want an | |
| | essay or even just creativity, yes it's lacking. | |
| | daveguy wrote: | |
| | It doesn't have to be the _same_ analysis to put it in a | |
| | partially overlapping vector space. Not saying it wasn 't | |
| | a useful perspective shuffling in the vector space, but | |
| | it definitely wasn't original. | |
| | | |
| | LLMs haven't solved any of the 2029 predictions as they | |
| | were posited. But I expect some will be reached by 2029. | |
| | The AI hype acts like all this is easy. Not by 2029 | |
| | doesn't mean impossible or even most of the way there. | |
| | Workaccount2 wrote: | |
| | LLMs will never achieve anything as long as any victory | |
| | can be hand waved away with "in the training set". | |
| | Somehow these models have condensed the entire internet | |
| | down to a few TB's, yet people aren't backing up their | |
| | terabytes of personal data down to a couple MB using this | |
| | same tech...wonder why | |
| | suddenlybananas wrote: | |
| | Surely there is analysis available online in French | |
| | though? | |
| | zozbot234 wrote: | |
| | > In 2029, AI will not be able to read a novel and reliably | |
| | answer questions about plot, character, conflicts, motivations, | |
| | etc. Key will be going beyond the literal text, as Davis and I | |
| | explain in Rebooting AI. | |
| | | |
| | Can AI actually do this? This looks like a nice benchmark for | |
| | complex language processing, since a complete novel takes up a | |
| | whole lot of context (consider _War and Peace_ or _The Count of | |
| | Monte Cristo_ ). Of course the movie variety is even more | |
| | challenging since it involves especially complex multi-modal | |
| | input. You could easily extend it to making sense of a whole TV | |
| | series. | |
| | the-grump wrote: | |
| | Yes they can. The size of many codebases is much larger and | |
| | LLMs can handle those. | |
| | | |
| | Consider also that they can generate summaries and tackle the | |
| | novel piecemeal, just like a human would. | |
| | | |
| | Re: movies. Get YouTube premium and ask YouTube to summarize | |
| | a 2hr video for you. | |
| | falloutx wrote: | |
| | Novel is different from a codebase. In code you can have a | |
| | relationship between files and most files can be ignored | |
| | depending on what you're doing. But for a novel, its a | |
| | sequential thing, in most cases A leads to B and B leads to | |
| | C and so on. | |
| | | |
| | > Re: movies. Get YouTube premium and ask YouTube to | |
| | summarize a 2hr video for you. | |
| | | |
| | This is different from watching a movie. Can it tell what | |
| | suit actor was wearing? Can it tell what the actor's face | |
| | looked like? Summarising and watching are too different | |
| | things. | |
| | cmcaleer wrote: | |
| | You're moving the goalposts. Gary Marcus' proposal was | |
| | being able to ask: Who are the characters? What are their | |
| | conflicts and motivations? etc. | |
| | | |
| | Which is a relatively trivial task for a current LLM. | |
| | daveguy wrote: | |
| | The Gary Marcus proposal you refer to was about a novel, | |
| | and not a codebase. I think GP's point is that | |
| | motivations require analysis outside of the given (or | |
| | derived) context window, which LLMs are essentially | |
| | incapable of doing. | |
| | pigpop wrote: | |
| | Yes, it is possible to do those things and there are | |
| | benchmarks for testing multimodal models on their ability | |
| | to do so. Context length is the major limitation but | |
| | longer videos can be processed in small chunks whose | |
| | descriptions can be composed into larger scenes. | |
| | | |
| | https://github.com/JUNJIE99/MLVU | |
| | | |
| | https://huggingface.co/datasets/OpenGVLab/MVBench | |
| | | |
| | Ovis and Qwen3-VL are examples of models that can work | |
| | with multiple frames from a video at once to produce both | |
| | visual and temporal understanding | |
| | | |
| | https://huggingface.co/AIDC-AI/Ovis2.5-9B | |
| | | |
| | https://github.com/QwenLM/Qwen3-VL | |
| | idreyn wrote: | |
| | Yes. I am a novelist and I noticed a step change in what was | |
| | possible here around Claude Sonnet 3.7 in terms of being able | |
| | to analyze my own unpublished work for theme, implicit | |
| | motivations, subtext, etc -- without having any pre-digested | |
| | analysis of the work in its training data. | |
| | alextingle wrote: | |
| | How do you get a novel sized file into Claude? I've tried, | |
| | and it always complains it's too long. | |
| | colechristensen wrote: | |
| | >Can AI actually do this? This looks like a nice benchmark | |
| | for complex language processing, since a complete novel takes | |
| | up a whole lot of context (consider War and Peace or The | |
| | Count of Monte Cristo) | |
| | | |
| | Yes, you just break the book down by chapters or whatever | |
| | conveniently fits in the context window to produce summaries | |
| | such that all of the chapter summaries can fit in one context | |
| | window. | |
| | | |
| | You could also do something with a multi-pass strategy where | |
| | you come up with a collection of ideas on the first pass and | |
| | then look back with search to refine and prove/disprove them. | |
| | | |
| | Of course for novels which existed before the time of | |
| | training an LLM will already contain trained information | |
| | about so having it "read" classic works like _The Count of | |
| | Monte Cristo_ and answer questions about it would be a bit of | |
| | an unfair pass of the test because models will be expected to | |
| | have been trained on large volumes of existing text analysis | |
| | on that book. | |
| | | |
| | >reliably answer questions about plot, character, conflicts, | |
| | motivations | |
| | | |
| | LLMs can already do this automatically with my code in a | |
| | sizable project (you know what I mean), it seems pretty | |
| | simple to get them to do it with a book. | |
| | littlestymaar wrote: | |
| | > Yes, you just break the book down by chapters or whatever | |
| | conveniently fits in the context window to produce | |
| | summaries such that all of the chapter summaries can fit in | |
| | one context window. | |
| | | |
| | I've done that a few month ago and in fact doing just this | |
| | will miss cross-chapter informations (say something is said | |
| | in chapter 1, that doesn't appears to be important but | |
| | reveals itself crucial later on, like "Chekhov's gun"). | |
| | | |
| | Maybe doing that iteratively several time would solve the | |
| | problem, I run out of time and didn't try but the | |
| | straightforward workflow you're describing doesn't work so | |
| | I think it's fair to say this challenge isn't solve. (It | |
| | works better with non-fiction though, because the prose is | |
| | usually drier and straight to the point). | |
| | blharr wrote: | |
| | in that case, why not summarize the previous chapters and | |
| | then include that as context to the next chapter? | |
| | littlestymaar wrote: | |
| | That's what I did, but the thing is the LLM has no way to | |
| | know what details are important in the first chapter | |
| | before seeing their importance in the later chapters, and | |
| | so these details usually get discarded by the | |
| | summarization process. | |
| | postalrat wrote: | |
| | No human reads a novel and evaluates it as a whole. It's a | |
| | story and the readers perception changes over the course of | |
| | reading the book. Current AI can certainly do that. | |
| | jhanschoo wrote: | |
| | > It's a story and the readers perception changes over the | |
| | course of reading the book. | |
| | | |
| | You're referring to casual reading, but writers and people | |
| | who have an interest and motivation to read deeply review, | |
| | analyze, and summarize books under lenses and reflect on | |
| | them; for technique as much as themes, messages, how well | |
| | they capture a milieu, etc. So that's quite a bit more than | |
| | "no human"! | |
| | colechristensen wrote: | |
| | Besides being a cook which is more of a robotics problem all of | |
| | the rest are accomplished to the point of being arguable about | |
| | how reliably LLMs can perform these tasks, the arguments being | |
| | between the enthusiast and naysayer camps. | |
| | | |
| | The keyword being "reliably" and what your threshold is for | |
| | that. And what "bug free" means. Groups of expert humans | |
| | struggle to write 10k lines of "bug free" code in the | |
| | absolutist sense of perfection, even code with formal proofs | |
| | can have "bugs" if you consider the specification not matching | |
| | the actual needs of reality. | |
| | | |
| | All but the robotics one are demonstrable in 2026 at least. | |
| | thethirdone wrote: | |
| | Which ones of those have been achieved in your opinion? | |
| | | |
| | I think the arbitrary proofs from mathematical literature is | |
| | probably the most solved one. Research into IMO problems, and | |
| | Lean formalization work have been pretty successful. | |
| | | |
| | Then, probably reading a novel and answering questions is the | |
| | next most successful. | |
| | | |
| | Reliably constructing 10k bug free lines is probably the least | |
| | successful. AI tends to produce more bugs than human | |
| | programmers and I have yet to meet a programmer who can | |
| | _reliably_ produce less than 1 bug per 10k lines. | |
| | zozbot234 wrote: | |
| | Formalizing an _arbitrary_ proof is incredibly hard. For one | |
| | thing, you need to make sure that you 've got at least a | |
| | correct formal statement for all the prereqs you're relying | |
| | on, or the whole thing becomes pointless. Many areas of math | |
| | ouside of the very "cleanest" fields (meaning e.g. algebra, | |
| | logic, combinatorics etc.) have not seen much success in | |
| | formalizing existing theory developments. | |
| | kleene_op wrote: | |
| | > Reliably constructing 10k bug free lines is probably the | |
| | least successful. | |
| | | |
| | You imperatively need to try Claude Code, because it | |
| | absolutely does that. | |
| | thethirdone wrote: | |
| | I have seen many people try to use Claude Code and get LOTS | |
| | of bugs. Show me any > 10k project you have made with it | |
| | and I will put the effort in to find one bug free of | |
| | charge. | |
| | jgalt212 wrote: | |
| | This comment or something very close always appears alongside a | |
| | Gary Marcus post. | |
| | margalabargala wrote: | |
| | Which is fortunate, considering how asinine it is in 2026 to | |
| | expect that none of the items listed will be accomplished in | |
| | the next 3.9 years. | |
| | GorbachevyChase wrote: | |
| | I think it's for good reason. I'm a bit at a loss as to why | |
| | every time this guy rages into the ether of his blog it's | |
| | considered newsworthy. Celebrity driven tech news is just so | |
| | tiresome. Marcus was surpassed by others in the field and now | |
| | he's basically a professional heckler on a university | |
| | payroll. I wish people could just be happy for the success of | |
| | others instead of fuming about how so and so is a billionaire | |
| | and they are not. | |
| | raincole wrote: | |
| | And why not? Is there any reason for this comment to not | |
| | appear? | |
| | | |
| | If Bill Gates made a predication about computing, no matter | |
| | what the predication says, you can bet that 640K memory quote | |
| | would be mentioned in the comment section (even he didn't | |
| | actually say that). | |
| | raincole wrote: | |
| | > Many of these have already been achieved, and it's only early | |
| | 2026. | |
| | | |
| | I'm quite sure people who made those (now laughable) | |
| | predictions will tell you none of these has been achieved, | |
| | because AI isn't doing this "reliably" or "bug-free." | |
| | | |
| | Defending your predictions is like running an insurance | |
| | company. You always win. | |
| | dyauspitr wrote: | |
| | In my opinion, contrary to other comments here I think AI can | |
| | do all of the above already except being a kitchen cook. | |
| | | |
| | Just earlier today I asked it to give me a summary of a show I | |
| | was watching until a particular episode in a particular season | |
| | without spoiling the rest of it and it did a great job. | |
| | suddenlybananas wrote: | |
| | You know that almost every show as summaries of episodes | |
| | available online? | |
| | herunan wrote: | |
| | First of all, popping in a few screenshots of articles and papers | |
| | is not proper analysis. | |
| | | |
| | Second of all, GenAI is going well or not depending on how we | |
| | frame it. | |
| | | |
| | In terms of saving time, money and effort when coding, writing, | |
| | analysing, researching, etc. It's extremely successful. | |
| | | |
| | In terms of leading us to AGI... GenAI alone won't reach that. | |
| | Current ROI is plateauing, and we need to start investing more | |
| | somewhere else. | |
| | rpowers wrote: | |
| | I keep reading comments that claim GenAI's positive traits, but | |
| | this usually amounts to some toy PoC that very eerily mirrors | |
| | work found in code bootcamps. You want an app that has logins and | |
| | comments and upvotes? GenAI is going to look amazing setting up a | |
| | non-relational db to your node backend. | |
| | afspear wrote: | |
| | Meanwhile I'm over here reducing my ADO ticket time estimates by | |
| | 75%. | |
| | saberience wrote: | |
| | Gary Marcus (probably): "Hey this LLM isn't smarter than Einstein | |
| | yet, it's not going all that well" | |
| | | |
| | The goalposts keep getting pushed further and further every | |
| | month. How many math and coding Olympiads and other benchmarks | |
| | will LLMs need to dominate before people will actually admit that | |
| | in some domains it's really quite good. | |
| | | |
| | Sure, if you're a Nobel prize winner or PhD then LLMs aren't as | |
| | good as you yet, but for 99% of the people in the world, LLMs are | |
| | better than you at Math, Science, Coding, and every language | |
| | probably except your native language, and it's probably better at | |
| | you at that too... | |
| | mrbluecoat wrote: | |
| | > LLMs can still cannot be trusted | |
| | | |
| | But can they write grammatically correct statements? | |
| | efilife wrote: | |
| | This was the first thing I noticed too. This is the most low | |
| | effort post I have ever seen that high up on hacker news | |
| | m463 wrote: | |
| | I see stuff like this and think of these two things: | |
| | | |
| | 1) https://en.wikipedia.org/wiki/Gartner_hype_cycle | |
| | | |
| | or | |
| | | |
| | 2) "First they ignore you, then they laugh at you, then they | |
| | fight you, then you win." | |
| | | |
| | or maybe originally: | |
| | | |
| | "First they ignore you. Then they ridicule you. And then they | |
| | attack you and want to burn you. And then they build monuments to | |
| | you" | |
| | anarticle wrote: | |
| | Download models you can find now and forever. The guardrails will | |
| | only get worse, or models banned entirely. Whether it's because | |
| | of "hurts people's health" or some other moral panic, it will | |
| | kill this tech off. | |
| | | |
| | gpt-oss isn't bad, but even models you cannot run are worth | |
| | getting since you may be able to run them in the future. | |
| | | |
| | I'm hedging against models being so nerfed they are useless. | |
| | (This is unlikely, but drives are cheap and data is expensive.) | |
| | didibus wrote: | |
| | Ignoring the actual poor quality of this write-up, I think we | |
| | don't know how well GenAI is going to be honest. I feel we've not | |
| | been able to properly measure or assess it's actual impact yet. | |
| | | |
| | Even as I use it, and I use it everyday, I can't really assess | |
| | its true impact. Am I more productive or less overall? I'm not | |
| | too sure. Do I do higher quality work or lower quality work | |
| | overall? I'm not too sure. | |
| | | |
| | All I know, it's pretty cool, and using it is super easy. I | |
| | probably use it too much, in a way, that it actually slows things | |
| | down sometimes, when I use it for trivial things for example. | |
| | | |
| | At least when it comes to productivity/quality I feel we don't | |
| | really know yet. | |
| | | |
| | But there are definite cool use-cases for it, I mean, I can edit | |
| | photos/videos in ways I simply could not before, or generate a | |
| | logo for a birthday party, I couldn't do that before. I can make | |
| | a tune that I like, even if it's not the best song in the world, | |
| | but it can have the lyrics I want. I can have it extract whatever | |
| | from a PDF. I can have it tell me what to watch out for in a | |
| | gigantic lease agreement I would not have bothered reading | |
| | otherwise. | |
| | | |
| | I can have it fix my tests, or write my tests, not sure if it | |
| | saves me time, but I hate doing that, so it definitely makes it | |
| | more fun and I can kind of just watch videos at the same time, | |
| | what I couldn't before. Coding quality of life improvements are | |
| | there too, I want to generate a sample JSON out of a JSONSchema, | |
| | and so on. If I want, I can write the a method using English | |
| | prompts instead of the code itself, might not truly be faster or | |
| | not, not sure, but sometimes it's less mentally taxing, depending | |
| | on my mood, it can be more fun or less fun, etc. | |
| | | |
| | All those are pretty awesome wins and a sign that for sure those | |
| | things will remain and I will happily pay for them. So maybe it | |
| | depends on what you expected. | |
| | sheeh wrote: | |
| | And what do you think investors in OAI et al are expecting? | |
| | wewewedxfgdf wrote: | |
| | Haters gonna hate. | |
| | jaffee wrote: | |
| | What a joke this guy is. I can sit down and crank out a real, | |
| | complex feature in a couple hours that would have previously | |
| | taken days and ship it to the users of our AI platform who can | |
| | then respond to RFQs in minutes where they would have previously | |
| | spent hours matching descriptions to part numbers manually. | |
| | | |
| | ...and yet we still see these articles claiming LLMs are | |
| | dying/overhyped/major issues/whatever. | |
| | | |
| | Cool man, I'll just be over here building my AI based business | |
| | with AI and solving real problems in the very real manufacturing | |
| | sector. | |
| | blindriver wrote: | |
| | This entire take is nonsense. | |
| | | |
| | I just used ChatGPT to diagnose a very serious but ultimately | |
| | not-dangerous health situation last week and it was perfect. It | |
| | literally guided me perfectly without making me panic and helped | |
| | me understand what was going on. | |
| | | |
| | We use ChatGPT at work to do things that we have literally laid | |
| | people off for, because we don't need them anymore. This included | |
| | fixing bugs at a level that is at least E5/senior software | |
| | engineer. Sometimes it does something really bad but it | |
| | definitely saves times and helps avoid adding headcount. | |
| | | |
| | Generative AI is years beyond what I would have expected even 1 | |
| | year ago. This guy doesn't know what he's talking about, he's | |
| | just picking and choosing one-off articles that make it seem like | |
| | it's supporting his points. | |
| | unwise-exe wrote: | |
| | Meanwhile $employer is continuing to migrate individual tasks to | |
| | in-house AI tooling, and has licensed an off-the-shelf coding | |
| | agent for all of us developers to put in our IDEs. | |
| | siscia wrote: | |
| | I think that the wider industry is living right now what was | |
| | coding and software engineering around 1 year or so ago. | |
| | | |
| | Yeah you could ask ChatGPT or Claude to write code, but it wasn't | |
| | really there. | |
| | | |
| | It needs a while to adopt the model AND the UI. As in software | |
| | are the first one because we are both makers and users. | |
| | joshcsimmons wrote: | |
| | Huh? | |
| | | |
| | Seems like black and white thinking to me. I had it make | |
| | suggestions for 10 triage issues for my team today and agreed | |
| | with all of its routings. That's certainly better than 6 months | |
| | ago. | |
| | sublinear wrote: | |
| | All this AI discussion has done is reveal how naive some people | |
| | are. | |
| | | |
| | You're not losing your job unless you work on trivial codebases. | |
| | There's a very clear pattern what those are: startups, | |
| | greenfield, games, junk apps, mindless busywork that probably has | |
| | an existing better tool on github, etc. Basically anything that | |
| | doesn't have any concrete business requirements or legal | |
| | liability. | |
| | | |
| | This isn't to say those codebases will always be trivial, but | |
| | good luck cleaning that up or facing the reality of having to | |
| | rewrite it properly. At least you have AI to help with | |
| | boilerplate. Maybe you'll learn to read docs along the way. | |
| | | |
| | The people claiming to be significantly more productive are | |
| | either novice programmers or optimistic for unexplained reasons | |
| | they're still trying to figure out. When they want to let us | |
| | know, most people still won't care because it's not even the good | |
| | kind of unreasonable that brings innovation. | |
| | | |
| | The only real value in modern LLMs is that natural language | |
| | processing is a lot better than it used to be. | |
| | | |
| | Are we done now? | |
| | fortyseven wrote: | |
| | I've just started ignoring people like this. You think | |
| | everything's going bad? Okay fine. You go ahead and keep | |
| | believing that. Maybe you could get it printed on a sandwich | |
| | board and walk up and down the street with it. | |
| | moonshotideas wrote: | |
| | How long do you think it will be until the "ai isn't doing | |
| | anything" people are going away 1 month, 6 months, I'd say 1 Year | |
| | at the most, anyone who has used Claude code since Dec 1st knows | |
| | this in their bones, so I'd just let these people shout from the | |
| | top of the hill until they run out of steam... | |
| | | |
| | Right around then, we can send a bunch of reconnaissance teams | |
| | out to the abandoned Japanese islands to rescue them from the war | |
| | that's been over for 10 years - hopefully they can rejoin | |
| | society, merge back with reality and get on with their lives | |
| | dkobia wrote: | |
| | Preaching to the wrong choir. The HN community is reaping massive | |
| | benefits from generative AI. | |
| ___________________________________________________________________ | |
| (page generated 2026-01-14 12:01 UTC) |