Rendered at 02:28:19 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
dimtion 1 days ago [-]
I'm not sure why people struggle with the fact that an abstraction can be built on top of a non-deterministic and stochastic system. Many such abstractions already exist in the world we live.
Take sending a packet over a noisy, low SNR cell network. A high number of packets may be lost. This doesn't prevent me, as a software developer, from building an abstraction on top of a "mostly-reliable" TCP connection to deliver my website.
There are times when the service doesn't work, particularly when the packet loss rate is too high. I can still incorporate these failures into my mental model of the abstraction (e.g through TIMEOUTs, CONN_ERRs…).
Much of engineering and reliability history revolves around building mathematical models on top of an unpredictable world. We are far from solving this problem with LLMs, but this doesn't prevent me from thinking of LLMs as a new level of abstraction that can edit and transform code.
distalx 1 days ago [-]
A transmission error has a strictly contained, predictable blast radius. If a packet drops, the system knows exactly how to handle it: it throws a timeout, drops a connection, or asks for a retry. The worst-case scenario is known.
A reasoning error has an infinite, unpredictable blast radius. When an LLM hallucinates, it doesn't fail safely but it writes perfectly compiling code that does the wrong thing. That "wrong thing" might just render a button incorrectly, or it might silently delete your production database, or open a security backdoor.
You can build reliable abstractions over failures that are predictable and contained. You cannot abstract away unpredictable destruction.
yunwal 1 days ago [-]
> A reasoning error has an infinite, unpredictable blast radius.
Says who? It’s quite easy to limit the blast radius of a reasoning error.
distalx 1 days ago [-]
In 2024, a Chevy dealership deployed an AI chatbot that confidently agreed to sell a customer a 2024 Chevy Tahoe for $1. It executed a catastrophic business failure simply because it didn't know the logic was wrong.
Sure, you can patch that specific case with guardrails, but how many unpredictable edge cases are you going to cover? It only takes a user with a bit of ingenuity to circumvent them. There are already several examples of AI agents getting stuck in infinite loops, burning through massive API bills while achieving absolutely nothing.
You can contain a system failure, but you cannot contain a logic failure if the system doesn't know the logic is wrong.
pear01 23 hours ago [-]
This would be more convincing if a single car had been exchanged for $1.
It didn't happen. Seems the bug was "contained".
Sort of undermines your point re "catastrophic business failure" don't you think?
yunwal 7 hours ago [-]
> but how many unpredictable edge cases are you going to cover?
This is the wrong question. The correct question is what specific subsets of cases do you allow, similar to any security question
amazingamazing 1 days ago [-]
How so?
Suppose you had:
Math()
Add()
Subtract()
Program()
Math(“calculate rate”)
This is intentionally written vaguely. How do you limit that these implementations ensure Program() runs and does the right thing when there is no guarantee Math() or its components are correct?
Normally you could use a typed programming language, unit tests, etc, but if LLM is the ultimate abstraction programs will be written line above. At some point traditional software engineering principles will need to apply.
DiscourseFan 22 hours ago [-]
Very few people are even beginning to understand the constraints of these systems, and none of them have yet been elevated to high enough positions of prominence to rise above the noise of all the hype. Give us some time man, jeez
harrall 23 hours ago [-]
A transmission error does not have a strictly contained blast radius.
A bad packet could tell a flying probe to fire all thrusters on and deplete its fuel in 15 minutes.
What makes a transmission error controlled is all the protection mechanisms on top of it. An LLM cannot delete a production database unless you give it access to do it.
My hot take is that many people are naturally more comfortable with deterministic systems that have clearly analyzable outcomes. Software engineering has historically primarily been oriented around deterministic systems and it has attracted that type of thinker.
But many of us, myself included, prefer chaotic systems where you can’t fully nail down every cause and effect. The challenge of building a prediction model on top of chaos is exhilarating. I really don’t find many people like me in SWE as in, say, the graphics design department.
To me, that’s the underlying threat here — LLMs are rewriting a field that has previously self selected a certain type of person and this, quite understandably, rubs them the wrong way.
zadikian 6 hours ago [-]
This is sorta how I've felt working the past ~7 years.
Simple example, we've been striving for 90% unit test coverage and thorough code review when there's 0% integration test coverage. I blame the metrics only looking at unit tests, but also many people think unit tests should come first. I would prioritize integration. There are some small pieces that need to work reliably, but if your system relies so hard on all of them working right, it's a bad system. That, and too many things will work in pieces but not overall.
Broadly I'm gonna assume that the team will later hire solid SWEs who don't necessarily know how our stuff works, and aren't going to read 100 docs about it. If this is a backend+DB combo, get your DB right and there won't be too many wrong ways to code against it in the future, get it wrong and it becomes a black hole for SWE-time. Or if someone on their first day can't run a system locally for debugging, no matter how elegant the code is, don't count on that system getting fixed quickly during an outage.
sublinear 15 hours ago [-]
Yes, but when all it takes to avoid this chaos is hiring someone with at least 5 or 10 years of experience for a reasonable wage, this entire perspective looks insane.
It's... just... not that hard to write code nor does it cost that much. There are millions of us working silently at places that aren't "big tech". We all shrugged our shoulders, took a sip of coffee, and went back to our Teams meetings where the only LLM usage is still just Copilot.
c-linkage 23 hours ago [-]
I don't need to be able to write proofs about my maths using logic and determinism. If the answer comes out in a way that I like then it has to be correct!
dpark 22 hours ago [-]
This is vapid condescension.
The comment you replied to made no statements about math or proofs. They made a statement about working in systems of non determinism effectively. Your statement seems to imply that this is dumb, as if working in a world of full determinism is an option.
panarky 22 hours ago [-]
Thank you for "vapid condescension".
I've wanted a term for this for decades!
vrighter 16 hours ago [-]
when you do have the option of determinism, but intentionally eschew it in favour of a strictly inferior nondeterministic tool, then yes, it is kinda dumb.
dpark 13 hours ago [-]
What deterministic option are you referring to here? Humans certainly are not deterministic in how they interpret instructions and write code. If I asked you to implement a feature and a month later asked you to implement the exact same feature, you likely wouldn’t do it the same way again. Two different people certainly wouldn’t.
aspenmartin 14 hours ago [-]
When you cling to determinism and call a clearly useful and powerful tool “strictly inferior” I would say this misses the point.
22 hours ago [-]
aeon_ai 23 hours ago [-]
Insightful.
Feels like this maps to the J/P of Myers Briggs
td2 1 days ago [-]
I mean if your talking about packets, your already one abstraction over the real data Transmission, in wich is noisy. So bits can randomly flip, noise could be interpreted as bits, and bits could get lost.
A much larger blast radius
22 hours ago [-]
zadikian 1 days ago [-]
I'm fine with that. The part that makes it not really an abstraction is, you still deliver code in the end. It'd be different if your deliverable were prompt+conversation, and the code were merely an intermediate build artifact. Usually people throw away the convo. Some have tried making markdown files the deliverable instead, so far that doesn't really work.
It makes even less sense when people compare an LLM to a compiler. Imagine making a pull request that's just adding a binary because you threw the source code away.
mpyne 1 days ago [-]
The whole field of reproducible builds is only a field because compilers also have had trouble historically of producing binary artifacts with guaranteed provenance and binary compatibility even when built from the same source codes.
If I assign a bug fix ticket to a human developer on my team, I won't be able to precisely replicate how they go about solving the bug but for many bugs I can at least be assured that the bug will get solved, and that I understand the basic approach the assigned dev would use to troubleshoot and resolve the ticket.
This is an organizational abstraction but it's an abstraction just the same, leaky as it is.
kibwen 1 days ago [-]
> The whole field of reproducible builds is only a field because compilers also have had trouble
No, this is not comparable. The reason reproducible builds are tricky is not because compilers are inherently prone to randomness, it's because binaries often bake-in things like timestamps and the exact pathnames of the system used to produce the build. People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.
mpyne 23 hours ago [-]
> The reason reproducible builds are tricky is not because compilers are inherently prone to randomness
And neither are LLMs. Having their output employ randomness by default is a choice, not a requirement, just like things like embedding timestamps into builds is a choice that can be unwound if you want the build to be reproducible.
> People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.
They are certainly different things, but if you are going to criticize LLMs it would be better if you understood them.
jmuguy 23 hours ago [-]
Are you arguing that the output of an LLM isn’t random?
mpyne 23 hours ago [-]
It is random if you select it to be (temperature != 0, etc.).
It is not random if you don't use random sampling in its output generation.
It the whole thing were actually stochastic then prompt caching would be impossible because having a cache of what the previous tokens transformed into to speed up future generation would be invalidated by the missing random state.
Look at llama.cpp, you can see what samplers are adjustable and if you use samplers that employ randomness you can see what settings disable the random sampling. Or you can employ randomness but fix the seed to get reproducible results.
sumeno 14 hours ago [-]
Yes, it can still be random with temperature set to 0. It'll only be the same if you run it on exactly the same hardware every single time.
philipswood 18 hours ago [-]
An LLM is a set of structured matrix multiplies and function applications. The only potentially non-deterministic step is selecting the next token from the final output and that can be done deterministically.
jmalicki 17 hours ago [-]
Matrix multiplication on GPUs is non-deterministic. As are things like cumsum()
This comes down to map reduce and floating point's lack of associativity. You see the same thing with OpenMP on CPUs.
People are constantly claiming determinism in LLMs that is just not there.
zadikian 4 hours ago [-]
Even if it were reproducible, realistically most people are using some service like Claude that makes no guarantee that the model or hardware didn't change. Which is fine, it doesn't need reproducibility.
This is interesting though, I didn't know PyTorch had a debug mode for reproducibility.
jmalicki 3 hours ago [-]
Even with this debug mode, a different batch size can give different results for the same input - e.g. your tensor multiplies might use different blocking, hence different associativity.
I posted that to show that at a bare minimum, there is some pretty extreme nondeterminism (though probably mild in effect) in even the most pedestrian GPU inference, unless you go to the extreme of using the debug mode and taking the potential performance hit.
vrighter 16 hours ago [-]
well just run all inference on the cpu, single threaded /s
8note 9 hours ago [-]
random isnt the right term.
ill conditioned or unstable is better
a small change in the input can create a large difference in the output.
23 hours ago [-]
achierius 23 hours ago [-]
> Having their output employ randomness by default is a choice, not a requirement
This is not really meaningfully true. E.g. batching, heterogeneous inference HW, and even differences in model versions can make a difference in what result you get, and these are hard to solve.
mpyne 23 hours ago [-]
But again, these are all things that are also true of build systems.
GCC 16.1 vs. 15.2 will get you differences. GNU ld vs. gold vs. mold vs. lld will get you differences. Whether you do or do not employ LTO or other whole-program optimization vs. whether you do gets you differences.
Have you never debugged a race condition that worked on your machine but didn't work in prod, based only on how things ended up compiled in the final binary?
I'm not saying these are identical but there's a lot more similarity than you all seem to understand. And we've made compilers work well enough that a lot of you believe that they give very routine, very deterministic outputs as part of broader build systems even though nothing could be further from the truth, even today.
kibwen 15 hours ago [-]
> And neither are LLMs.
This is not my claim, you're veering wildly off course here. I'm merely responding to the common, tiresome and, to be frank, stupid analogy of LLMs to compilers.
z3c0 1 days ago [-]
It's an abstraction for you, not the rest of that developer's team, who have to reproduce the same solution even after said developer has "won the lottery", so-to-speak.
inb4: "Don't worry, just use GPT to make the docs"
zadikian 22 hours ago [-]
If you throw away the code then yeah, but I've never seen anyone do this.
vrighter 16 hours ago [-]
but even if it didn't it still provided a binary that is mathematically proven (assuming no compiler bugs, which if found are fully fixable, unlike LLMs) to correspond to the code you wrote.
danenania 23 hours ago [-]
This is a great point. We’re very much in a transitional phase on this, but I personally do see signs in my own work with agents that we are heading toward the main deliverable being a readme/docs.
The code is still important, but I could see it becoming something that humans rarely engage with.
HarHarVeryFunny 12 hours ago [-]
> We are far from solving this problem with LLMs, but this doesn't prevent me from thinking of LLMs as a new level of abstraction that can edit and transform code.
That's more anthropomorphism than an abstraction. An LLM talks like a person because it was trained to predict continuations of human speech. That does not make it a person, or a system with intent, responsibility or any other human attribute. They are what they are: text prediction engines.
Perhaps your input to the LLM is "make all the test cases pass", and so it predicts it better do something to make the test cases pass, and does so by deleting the test cases. I guess in the "abstract" sense it did what you asked.
Or, how about the case in the news from a few days ago where an agentic system deleted all the vendor's customer's data, and last 3 months of backups, despite having been EXPLICITLY "told" not to do any such thing. Should we consider "completely fuck the customer" as an "abstraction" of "never delete any data"?
evrydayhustling 1 days ago [-]
Besides deeply unpredictable factors (like signal transmission), most users of higher-level abstractions do so without certainty about how the translation will be executed. For example, one of the main selling points of C when I was growing up was that you could write code independent of architecture, and leave the architecture-specific translation to assembly to the compiler!
Abstractions often embrace nondeterministic translation because lower level details are unknown at time of expression -- which is the moivation for many LLM queries.
qazxcvbnmlp 24 hours ago [-]
Grocery stores are a level of abstraction. Exchange money, get food. If your whole life you had grown food, it might feel a bit strange.
Occasionally the low level details leak through ie: this egg came from this farm, theres a shipping issue so onions are more expensive or whatever.
I think llm assisted coding is going to work something like this.
vrighter 15 hours ago [-]
but you either exchange money and get the food you want, always. Or it's out of stock, so you don't get the food but you keep your money, guaranteed.
It's not a good analogy. With an LLM you might ask for a pea, be parted with your money, and be given a watermelon.
cestith 13 hours ago [-]
So LLMs are more like InstaCart or DoorDash.
kazinator 13 hours ago [-]
This is about the reverse: a non-deterministic/stochastic system built using reliable abstractions.
Also, the problem of "did we receive the, and is it correct" is vastly trivial compared to "did we get correct LLM output".
The problem in networking is making the reliable transfer perform well under many conditions, and scale.
Getting consistently reliable output of LLMs isn't solved, so we can't talk about it scaling to even one instance.
rock_artist 21 hours ago [-]
> I'm not sure why people struggle with the fact that an abstraction can be built on top of a non-deterministic and stochastic system. Many such abstractions already exist in the world we live.
It depends on what's the abstraction.
Using LLM for coding is 'abstracting' the developer, adding extra layer that can produce code. But it's not abstraction layer of the code itself.
ritcgab 13 hours ago [-]
Is "Mostly-reliable" TCP connection a real thing? A TCP connection is either reliable or not working at all. That is what a proper abstraction should be like.
faangguyindia 23 hours ago [-]
Machine itself is built on top of non deterministic world.
While your code is executing asteroid from space may hit it and halt execution.
yomismoaqui 1 days ago [-]
The kicker is when you delegate some work to another team member and discover that humans are also non-deterministic.
bluefirebrand 23 hours ago [-]
We can mitigate a lot of the problems with humans being non-deterministic by establishing trust and consequences
There are no consequences for a bad output from an LLM and idk about you but I don't trust them
dominotw 1 days ago [-]
that would make sense if ai said "fail. i dont know" . Its active deception is what makes it difficult.
cwyers 21 hours ago [-]
If people can figure out how to write RFCs about IP over carrier pigeons for April Fools, they can figure out how to conceive of LLMs as a layer of abstraction beneath a protocol as well.
vrighter 15 hours ago [-]
You need to be able to define exactly what it's abstracting.
ex: std::shared_ptr is abstracting over raw pointers, and does refcounting. It is abstracting something but you can actually know exactly what that thing is. An LLM is an abstraction over the space of all possible computer programs. If an abstraction doesn't constrain you in some way, it's not an abstraction.
jauntywundrkind 23 hours ago [-]
People are really really weird about the non-determinism thing. Got someone very adamant that Prompt API shouldn't be a web standard because the output isn't deterministic and according to them that means we can't allow prompts.
Such strong correlation between narrow specific demands on how things have to be & posting, in general. I'd really like to see open mindedness & exploratory views be more frequent and have better standing, in general.
I do tend to think this is different than a level of abstraction. But it feels like it's trying to hit hard, on a pretty weird narrow point.
avazhi 23 hours ago [-]
There’s a big difference between a packet being dropped and a packet’s meaning being changed along the way. The latter is better analogised to what LLMs do between receiving inputs and outputs.
dpark 23 hours ago [-]
“Each move from one layer of the tech stack to a higher one involved a function:
f(x) -> y
Given a specific x, you always get a specific y as the artifact being generated.”
Not at all. If this were true then the Python code in question would generate deterministic binary. Of course that’s not what happens. The Python runs through an interpreter that may change behavior on different runs. It may change behavior version to version. It may even change behavior during multiple invocations of a function in the same running instance. Because all of that is abstracted away.
Same for the C code. You give up control and some determinism for the higher abstraction. You might get there same output between compilations on the same version but that’s not actually guaranteed and version to version consistent certainly isn’t.
Moving to a higher layer of abstraction very often results in less constrained behavior.
HarHarVeryFunny 13 hours ago [-]
That's not a good analogy.
With a high level language implementation of any sort, the actual instructions executed by the CPU may vary according to how it was compiled or run, or what machine you run it on, but the behavior will not.
The high level language defines it's own level of abstraction, defining exact behavior, allowing the developer to have full control over program behavior, algorithms, UI, etc.
An LLM + natural language instructions is not a program-like abstraction of what you want the computer to do, because it does not have that level of precision. Natural languages are fuzzy and imprecise, because they have been developed for communication, not precise machine-level specification.
Obviously you can "vibe code" at different levels of detail, ranging from "build me an app to do X" to "here are 20 1000 word essays specifying the dos and don'ts of what I want you to build", but in either case you are nowhere near the level of precision of using a programming language to specify exactly what you want.
So sure, "vibe coding" let's you accomplish A result with less attention to detail than using a programming language, but it's not a "higher level abstraction" in the sense that HLLs are, since it doesn't define what that abstraction is. It lets you get A result, but not define a SPECIFIC result, since natural languages just aren't that precise... natural language means whatever the person/thing interpreting it interprets it to mean.
Of course you can use an LLM as a way to "rough out" a function or app, and as a crude tool to manipulate that roughed out form (or an existing project), but natural language does not have precise semantics and therefore cannot provide a precise definition of what you want to do.
dpark 13 hours ago [-]
It wasn’t my analogy. It was the article’s so I responded to that. There are many more (and better) examples of how abstractions give up control, precision, and/or determinism.
> The high level language defines it's own level of abstraction, defining exact behavior
This is not entirely true. A high level language defines some behaviors, leaving many behaviors to be undefined and implementation specific.
Many of those unspecified behaviors can matter in some cases.
> It lets you get A result, but not define a SPECIFIC result, since natural languages just aren't that precise...
You are repeating the same error as the article and missing the fact that while an abstraction lets you specify or control some things, it leaves many things out of your control. The higher the abstraction, the more stuff that is left out of your control. And maybe you don’t care about the things outside your control (great, the abstraction worked!) but regardless there are many things left unspecified in the typical abstraction and very often eventually you will care, which is why people say things like “all abstractions are leaky”.
For a simple example, think of writing something like this:
MessageBox(“hello world”, OkCancel)
MessageBox is an abstraction over a massive amount of logic. You specified a string and a set of buttons and not much else. You give up control over the styling, the placement of the buttons, the actual button text (very likely will be localized), where the box will appear, and so much more.
You are not getting a specific result. You are getting a result that meets the contract.
“Write a program that shows a hello world message box” is a much higher level of abstraction than even that, and you are giving up significant specificity and determinism. Is it a good abstraction? That’s a great question. But it certainly is an abstraction.
HarHarVeryFunny 12 hours ago [-]
> This is not entirely true. A high level language defines some behaviors, leaving many behaviors to be undefined and implementation specific.
Sure, but you don't need to use those, and shouldn't.
A programming language let's you avoid undefined behavior and stick to the defined abstraction provided by the language.
Natural language does NOT let you do this, because words have no strict meaning, and the meaning of any sentence is undefined and up for interpretation and contextual clarification, etc, etc. Maybe more to the point LLMs are not concerned with meaning - they are concerned with continuation prediction. The LLM/agent that "ignored instructions" and deleted all the customer's data and backups wasn't "being bad" or "ignoring instructions", it was just statistically predicting, and someone was daft enough to feed those predictions into an execution environment where real world consequences could ensue.
dpark 11 hours ago [-]
> A programming language let's you avoid undefined behavior and stick to the defined abstraction provided by the language.
Yes and no. It lets you avoid undefined behavior traps. It lets you rely on endless implementation defined choices.
> Natural language does NOT let you do this, because words have no strict meaning, and the meaning of any sentence is undefined and up for interpretation and contextual clarification, etc, etc.
This is a fair criticism of natural language. It is less well defined. That doesn’t stop it from being an abstraction, though perhaps it fairly makes it a problematic abstraction.
> The LLM/agent that "ignored instructions" and deleted all the customer's data and backups wasn't "being bad" or "ignoring instructions"
That one was entirely human error. And not just “oops I trusted the AI too much”. That guy was sharing the DB volume across prod and staging so deleting the staging DB also took down prod.
If you run your business like that, eventually a human will do the same thing, because it’s catastrophically dumb.
HarHarVeryFunny 9 hours ago [-]
> It lets you avoid undefined behavior traps. It lets you rely on endless implementation defined choices.
That's a strange way to think of a programming language specification.
A programming language is an abstraction that mostly fully specifies behaviors that any compliant implementation must adhere to, and that you as a user of the language can therefore rely on.
There may be a few details in a language specification that are specified as implementation defined, but that doesn't mean that are not specified, it just means they are specified by the implementation rather than the standard.
dpark 8 hours ago [-]
> A programming language is an abstraction that mostly fully specifies behaviors that any compliant implementation must adhere to, and that you as a user of the language can therefore rely on.
This is true but hand waves over a lot of behavior. An implementation of Python could be 10x faster or 10x slower and still be fully compliant with the specification.
There’s a reason that NumPy’s core is written in C and not Python. The implementation-specific details sometimes matter. The abstraction is leaky as soon as you care about anything not explicitly specified in the abstraction.
HarHarVeryFunny 12 hours ago [-]
> “Write a program that shows a hello world message box” is a much higher level of abstraction than even that, and you are giving up significant specificity and determinism. Is it a good abstraction? That’s a great question. But it certainly is an abstraction.
But to who/what is it an abstraction?
To a human, sure. If I told an intern to "write a hello world message box", I'd expect at least to get something approximating that request.
To an LLM? The LLM has no intent or understanding - it's just a statistical predictor. Maybe it'll "interpret" your request as only wanting a hello world message box, so it'll delete your company's entire git repository to ensure a clean slate to start from.
I think that when you say "it is certainly an abstraction" what you implicitly mean is "it is certainly an abstraction TO ME", but an LLM is not you, and does not have a brain, so what is an abstraction to you shouldn't be taken as being an abstraction to an LLM (whatever that would even mean).
dpark 11 hours ago [-]
No, we can’t retreat to the “LLMs so dumb” position every time we discuss anything around them. This is not a rebuttal. It is an interesting thing to discuss on its own merit, but it’s not relevant here.
If a natural language specification is an abstraction over the code implementation, then it is an abstraction whether given to a human or an LLM. The LLM being arguably a bad tool does not change that.
HarHarVeryFunny 10 hours ago [-]
So let's take LLMs and AI out of the discussion altogether.
The question is then can specifying a computer behavior in (inherently imprecise) natural language be considered as some sort of "program", a higher level abstraction in same sense as a HLL provides a higher level of abstraction to programming in assembler?
I would say no, for various reasons:
1) There is a difference between an abstraction and under-specifying your requirements
2) Whatever you want to call it, any descriptive language that is insufficiently precise to unambiguously describe the details that are important to you is not very useful as a way to specify system behavior
3) If you are not only using natural language to describe the desired behavior of a system, but are also assuming that the person (or thing) interpreting the description is bringing their own knowledge and expertise to bear in "fleshing out the details", then you don't have a full specification at all, even an abstract one. What you have in that case is not something analogous to a computer program, but more like business requirements.
dpark 7 hours ago [-]
I think this is all valid. What I think bears highlighting is that often you don’t need a 100% unambiguous specification. Very often “meets the business requirements” if totally sufficient and plenty of other stuff is either implicit or just doesn’t really matter.
If you need to develop an API that you’ll support publicly for 10+ years, yeah. You probably want to be really precise. If you need to code up yet another feature in some CRUD app, it matters a lot less.
Sometimes natural language is entirely sufficient to “unambiguously describe the details that are important”.
slopinthebag 22 hours ago [-]
Can you explain how Python or C programs change from invocation to invocation?
dpark 22 hours ago [-]
Mostly because the behavior is implementation defined. So long as the behavior meets the contract, the compiler/interpreter is free to do whatever it wants.
Python could certainly optimize repeated code paths to make them more efficient. I don’t know that the standard implementation actually does, but it could. Spending extra time optimizing repeated code paths is a reasonable choice for an interpreted or JIT compiled language.
I would not expect C to change from invocation to invocation mostly because C is supposed to be boring and predictable. That’s kind of its thing. But again, it could. There’s nothing in the C spec I’m aware of that says the C compiler has to ensure that each invocation of a piece of code will execute the same machine instructions.
yuye 18 hours ago [-]
>So long as the behavior meets the contract, the compiler/interpreter is free to do whatever it wants.
Yes, and that's how it's supposed to be. Any description that determines the totality of a problem space is an implementation itself.
Imagine the following requirements:
f(0) = 0, f(2) = 4
Both f(x) = x^2 and f(x) = 2x are correct ways to implement said requirements. But if you start relying on f(1) = 2, you might get in trouble with a coworker that relies on f(1) = 1. This is undefined behavior and should be avoided.
>There’s nothing in the C spec I’m aware of that says the C compiler has to ensure that each invocation of a piece of code will execute the same machine instructions.
It can't, because C can be written for any system you want. If I ask the compiler to compile x *= 2, it might use the mul primitive or it might use shl, both are ok.
slopinthebag 22 hours ago [-]
Ok but how does that change the behaviour of the program?
dpark 21 hours ago [-]
Depends on what you mean.
Assuming you write code that does not take advantage of undefined behavior, then in general you would expect the correctness of your program to be consistent. But you would not expect the performance, for example, to be consistent. An optimizing JIT compiler might certainly run the 3rd invocation of code path way faster than the first.
samstokes 24 hours ago [-]
I wouldn't agree that LLMs are a higher level of abstraction, but I've found they do help me think at a higher level of abstraction, by temporarily outsourcing cognitive load.
With changes like substantial refactors or ambitious feature additions, it's easy to exceed the infamous "seven things I can remember at once":
* the idea for the big change itself
* my reason for making the change
* the relevant components and how they currently work
* the new way they'll fit together after the change
* the messy intermediate state when I'm half finished but still need a working system to get feedback
* edge cases I'm ignoring for now but will have to tackle eventually
* actual code changes
* how I'm going to test this
Good lab notes, specs etc can help, but it's a lot to keep in mind. In practice these often turn into multi person projects, and communication is hard so that often means delay or drift. Having an agent temporarily worry about
* wiring a new parameter through several layers
* writing a test harness for an untested component
* experimentally adding multibyte character support on a branch
frees up my mental bandwidth for the harder parts of the problem.
The main benefit is to defer the concern until I have a mostly working system. Then I come back and review its output, since I'm still responsible for what it delivers, and I want better than "mostly working".
conradludgate 18 hours ago [-]
This is what I've found to be very successful for me. My flavour of ADHD has historically made it hard for me to start new projects as I get very stuck on all of the little details from the start, while also thinking about the high level aspects.
Being able to spend my energy on the architectural decisions and validate my understanding before spending time on optimising the internals has actually allowed me to follow through with some of my designs.
Experimentation is then faster. If the data model wasn't good enough, I can actually experiment with it immediately, before we accidentally ship something to production and then have to deal with a very annoying data migration problem. The exact code doesn't matter to begin with when we just want to make sure the data is efficient to decode and is cache friendly.
I recently built a project I had in my mind for 3 years but could never work on because all the individual components were overwhelming. It involved e2e encryption, consensus, p2p networking, CRDTs, and API design. It was very nice to see it come together. The project ended up failing due to some underlying invariant, so it was nice to validate that and finally get it out of my head.
yongjik 1 days ago [-]
It's orthogonal to whether LLMs can be a useful abstraction layer, but ...
I have a feeling that if LLMs were built on a deterministic technology, a lot of the current AI-is-not-intelligent crowd would be saying "These LLMs can only generate one answer given a question, which means they lack human creativity and they'll never be intelligent!"
xigoi 20 hours ago [-]
It’s not really about determinism, but about the fact that the input to an LLM is inherently ambiguous, unlike the input to a C compiler.
xpct 23 hours ago [-]
Interesting. I believe some circles reached the consensus that they aren't creative, but that it's independent of their intelligence/modelling capabilities.
byzantinegene 1 days ago [-]
it is a fruitless endeavour to try to appeal to a crowd that does not and will never understand the fundamentals of how llms work.
ofjcihen 24 hours ago [-]
I think that crowd would agree with you.
ronsor 9 hours ago [-]
Doesn't make them any more rational
jefurii 1 days ago [-]
I don't feel that this piece explains its title very well (to me) though the idea expressed by the title is spot-on
I've gone through hand-coding HTML, CGI, CMSes, web frameworks, and CMSes built with web frameworks. Each is (roughly) a layer of abstraction on top of lower layers.
People talk about LLMs as an extension of this layering but they're not. With the layers of abstraction I've listed you can go down to the layers underneath and understand them if you take the time.
LLMs are something different. They're a replacement for or a simulation of the thinking process involved in programming at various layers.
dakial1 15 hours ago [-]
Your point is similar to the post in a sense that all abstractions are deterministic, so you could go connect the higher layer directly to the lower layer, while in LLMs, by their very probabilistic/black box nature you can’t have this direct link.
But isn’t this just a semantics discussion? Is there a rule for abstraction in CS that says it needs to be deterministic (I really don’t know)?
I believe deterministic abstraction to natural language is impossible to reach by the very ambiguous nature of it, we get misunderstandings when we talk to each other so naturally when talking to a machine it would need to be probabilistic to understand how to translate it to code.
DauntingPear7 22 hours ago [-]
theyre like an advanced form of program synthesis. Something that operates outside of the abstraction layering.
srikanthsastry 7 hours ago [-]
Technically, the claim is true, but only technically. I think the reason it is not a reliable level of abstraction is due to what I like to call "directive gap" (https://srikanth.sastry.name/garden/directive-gap/), which is the distance between the human's goal the context available to the LLM. Theoretically, if the directive gap is zero, then with very high probability you will get the correct code.
If you think of 'programming in X' as a process, then you have multiple iterations as you go from incorrect code to correct code, and the same can be true of 'programming in LLM'. As you iterate on the prompts and have verification in place, whether it is spec verification, TLA+, unit testing, CI, etc., you can get the same effect.
Now, it is an open question whether this is simpler than programming in any modern programming language. By the time you figure out the exact prompt trajectory that will build what you want, you might as well have used some fancy autocomplete IDE to write the same code. It really depends on your fluency with that specific language. People are usually very fluent in natural language, and so it levels the playing field so to speak.
Legend2440 1 days ago [-]
I don't agree with this take. Determinism is a nice property for abstractions to have, but it isn't necessary to be an abstraction.
And LLMs can handle very abstract concepts that could not possibly be encoded in C++, like the user's goal in using software.
farmdawgnation 1 days ago [-]
I think you could also make the case that the existing abstractions aren't actually fully deterministic themselves. The compiler or interpreter may not behave as it should. Therefore, for any correct C code, there's probability that the GCC compiler will turn it into correctly formed machine code. But it may not!
Is the probability much higher with GCC? Sure. But it's still a probability.
farmdawgnation 48 minutes ago [-]
One of these days I'm going to learn the lesson that my tongue in cheek doesn't always come across on HN.
anon-3988 23 hours ago [-]
I am sorry but this is an insane take. The probability of GCC going haywire with your special snowflake correct C code? Please. Have this EVER happen to you? I am not talking about the performance of the generated assembly because that IS flaky, but functionality wise I do not think so.
If people are so confident about the determinism of LLMs, or at least consider it on par with compilers, please ask it to compile your source code instead. Better yet, replace all your GNU utils with LLM instead. Replace your `ls` with `codex "prompt"`.
elwebmaster 22 hours ago [-]
I have done this, alias codex --yolo -p . It's very helpful not having to remember every odd command and its parameters. It's a bit more typing but I type faster than invoke and scan through man pages.
hirako2000 23 hours ago [-]
They are deterministic. Including in the way they fail.
yuye 18 hours ago [-]
People forget what determinism is.
Non-deterministic systems produce different output states given identical input states.
Even if a compiler's memory gets a one-in-a-million bitflip that produces a different output, it doesn't mean it's non-deterministic. It just means that the output state is different due to an external force changing the internal state.
An infinite loop will halt when the processor is powered off.
vrighter 15 hours ago [-]
if a compiler bug is found, it can be fixed. You can't fix an llm.
Terr_ 5 hours ago [-]
I think a less-brittle claim would be that they are at best a extraordinarily leaky and idiosyncratic layer of "abstraction" enough that for certain tasks you wouldn't want to actually use the term.
It's like saying human personal assistant Bob is an abstraction over your calendar and shopping list.
In other words, it depends where the people talking have placed their cutoff point for a good abstraction versus a terrible and unwise one.
Havoc 17 hours ago [-]
When people say things like that they mean it as a rough mental model.
Bit like when people say "it's like riding a bike" they're not actually talking about bicyle riding being the exact same activity.
Coming with this in response:
> f(x) -> P(y) ∪ P(z1) ∪ P(z2) ∪ ... P(zN)
is a failure in human communication not a disagreement about what LLMs are or aren't.
DauntingPear7 22 hours ago [-]
I don't think they fit in as a layer of abstraction, but instead are outside of it. An abstraction simplifies away the inner workings of what is being abstracted. The LLM exists outside of your code. It is not part of it, thus, it is not abstracting it away. If this were the case, a coworker would be an abstraction to code they own (you could argue this, but I think it erodes the meaning of abstraction). LLMs behave like program synthesizers rather than another layer of abstraction. They take natural language as input, and using fancy math produce a (hopefully) relevant and useful output based on that input. They can produce layers of abstraction, but are not part of a program's abstraction stack.
However, they can abstract away the need to understand implementation, similar to a coworker. They can summarize behavior, be queried for questions, etc, so you don't have to actually understand the inner workings of what is going on. This is a different form of abstraction than the typical abstraction stack of a program.
madisonmay 1 days ago [-]
LLMs are not inherently non-deterministic during inference. I don't believe non-determinism implies lack of abstraction. Abstraction is simply hiding detail to manage complexity.
danpalmer 1 days ago [-]
Non-determinism is configurable at the level of the mathematical model, but current production systems do not support deterministic evaluation of LLMs.
orbital-decay 24 hours ago [-]
They do, though. Providers don't because batching makes it cheaper. Among the providers, DeepSeek seems to support it for v4 (and have actually optimized their kernels for batching), and Gemini Flash is "almost deterministic".
danpalmer 23 hours ago [-]
I'm pretty sure that the determinism issue is at the floating point math level, or even the hardware level. Just disabling batching and reducing the temperature to 0 does not result in truly deterministic answers.
orbital-decay 22 hours ago [-]
FP math itself is deterministic on real hardware, if the order of operations stays the same. Output reproducibility is much less of a problem than it seems, see for example https://docs.vllm.ai/en/latest/usage/reproducibility/
nnevatie 22 hours ago [-]
The FP math is deterministic. However, the environments in which inference is run and specifically batching make current LLM services practically non-deterministic.
Anyone claiming LLMs are an a higher level of abstraction are not using it in the way used by programmers and computer scientists.
They're usually conflating "delegation" and "abstraction", as if a junior developer is an abstraction.
taraharris 24 hours ago [-]
The claim is that compilers were f(x) -> y, and LLMs are f(x) -> P(y | z1 | z2 | ... z3).
But how were various combinations of popular programming languages, operating systems and hardware platforms not effectively f(x) -> P(y | z1 | z2 | ... z3)? Suppose you were quick on the take and were writing in Unix and C in the early 80s and found yourself porting your program from a PDP-something to an 8088 PC, or to a 68k Mac, dealing with DOS extenders, printer drivers, different versions of C (remember K&R style?) or C++? Remember MFC? The evolution of the STL?
LLMs are similar to that maelstrom, just on a faster timescale.
hirako2000 23 hours ago [-]
The difference is that you can port f(x) -> y
To be exactly that. To any target that exists and to come.
An LLM can't. Even within your primary target.
It's like explaining how a hammer isn't a screw driver. And someone comes to argue the fact that a hammer too, can break.
royal__ 1 days ago [-]
I agree, but I think it's for a different reason than what the author says: LLMs are a very leaky abstraction compared to other levels, meaning it's much harder to convey the true intent of logic you are trying to encode through natural language, and often by doing so you are just relying on the LLM to "get it right", which is inherently messy business. Oftentimes, that leakiness just doesn't matter that much. Other times, it does.
Mikhail_Edoshin 17 hours ago [-]
There was an article on database UX and it compared the expectations of a database user and a user working with a search engine. It's interesting, because both are searching, right? Yet the database user expects the found set to be complete or it expects an explanation on why this record is in it and this one is not. A search engine user does not expect things like that and will put up with false positives and negatives if their number is not too big.
resonancel 22 hours ago [-]
I always think the determinism discourse on LLMs is off the point. The elephant in the room is semantic preservation. Compilers can most often preserve semantics across abstractions, while LLMs most often cannot.
For sure the problem isn't that clear-cut, for the siren's call of AI coding is to induce a system out of prompts with ambiguous semantics. It's hardly surprisingly you get unpredictable outcomes when giving ambiguous commands to human collaborators, and that in the case of LLMs they resolve ambiguity with probabilistic approximation.
yuye 18 hours ago [-]
I thought the whole idea is that we have programming languages because turning a rigidly defined language (like C) into another rigidly defined language (like machine code) is relatively simple.
Turning an ambiguous language with no formal definitions(like English) into one that does is a very hard problem.
13 hours ago [-]
kusokurae 22 hours ago [-]
Really anything can (and must) be written to justify delegated thought. See: replies to this thread.
bigstrat2003 1 days ago [-]
You're right, but the reality is that the people who are excited about LLMs don't care about determinism. They are happy to hand off the thinking to a third party, even if it will give wrong answers they don't notice.
archagon 20 hours ago [-]
Then they should not pretend that they're still "engineers" working on a higher level of "abstraction".
amdivia 7 hours ago [-]
I mean,
"Give me a Todo app"
Is also different from
"Write a function that takes a string parameter (Todo) and saves it into a text file with the name <current date time (as a Unix epoch)>.txt, and if already present, append to it to the file instead"
The probability distribution for the potential output is different, and it's more limited in the second case perhaps.
Besides, even the "deterministic" systems the author is referring to, are not fully deterministic. They are "deterministic" if we ignore a certain threshold of randomness that could afflict the system. Yes perhaps this threshold is higher when using LLMs, but even when using LLMs, not all inputs share the same level of indeterministic output
ofjcihen 24 hours ago [-]
Tangential to the subject matter but has anyone else noticed that night time tends to have more people arguing that LLMs are intelligent and the daytime tends to have more arguing that they aren’t?
TremendousJudge 13 hours ago [-]
I'll believe that LLMs are like compilers the day a repo contains only the prompts and no generated code.
conorbergin 1 days ago [-]
LLMs are deterministic, the same model under the same conditions will produce the same output, unless some randomness is purposefully injected. Neural networks in general can be thought of as universal function approximators.
mrob 1 days ago [-]
Whenever somebody calls LLMs "non-deterministic", assume they meant "chaotic", in the informal sense of being a system where small changes of input can cause large changes to output, and the only way to find out if it will happen is by running the full calculation.
For many applications, this is equally troublesome as true non-determinism.
conorbergin 1 days ago [-]
I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.
They are definitely not interpretable, I was reading some stuff from mechanistic interpretability researchers saying they've given up trying to build a bottom up model of how they work.
mylifeandtimes 1 days ago [-]
> I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.
Compare
"You are a helpful assistant. Your task is to <100 lines of task description> <example problem>"
with
"you are a helpless assistant. Your task is to <100 lines of task description> <example problem>"
I've changed 3 or 4 CHARACTERS ("ful" to "less") out of a (by construction) 1000+ character prompt.
and the outputs are not at all similar.
Just realized I've never tried the "you are a helpless ass" prompt. Again a very minor change in wording, just dropping a few letters. The helpless assistant at least output text apologizing for being so bad at the task.
orbital-decay 23 hours ago [-]
Sure. What did you expect? You changed the semantic of your prompt to the complete opposite. Of course it will attempt to make sense of it to its ability, and deliver what you requested. The input isn't formally specified, that's inherent for the domain, not the model or a human. GP, on the other hand, is talking about semantically negligible differences like typos.
2ndorderthought 1 days ago [-]
That's not really true. If you turn a few knobs you can make them deterministic. Namely setting temperature to zero, and turning off all history. But none of the cloud providers do this. Because it's not a product as far as they are concerned. So in practice - not so much.
nnevatie 22 hours ago [-]
> Namely setting temperature to zero, and turning off all history
That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.
2ndorderthought 14 hours ago [-]
True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.
maplethorpe 1 days ago [-]
Can someone explain why this is? Do LLMs somehow contain a true random number generator? Why wouldn't they produce the same outputs given the same inputs (even temperature)?
edit: I'm not talking about an LLM as accessed through a provider. I'm just talking about using a model directly. Why wouldn't that be deterministic?
anon373839 1 days ago [-]
The model outputs a probability distribution for the next token, given the sequence of all previous tokens in the context window. It’s just a list of floats in the same order as the list of tokens that the tokenizer uses.
After that, a piece of software that is NOT the LLM chooses the next token. This is called the sampler. There are different sampling parameters and strategies available, but if you want repeatable* outputs, just take the token with the highest probability number.
* Perfect determinism in this sense is difficult to achieve because GPU calculations naturally have a minor bit of nondeterminism. But you can get very close.
2ndorderthought 1 days ago [-]
I'm not so sold the LLM is an LLM without a sampler but it's not worth quibbling over. It's part of the statistical model anyways.
vrighter 15 hours ago [-]
the llm is the trained part, the rest is the handwritten part. The sampler is handwritten, not learned.
2ndorderthought 14 hours ago [-]
Believe it or not in statistics and machine learning the hard coded parts of a model that impact the results are considered part of the model. But I understand that now days we don't care about these things because ai goes brrr.
nowittyusername 23 hours ago [-]
There are A LOT of misconceptions about llms, biggest one is they are not deterministic. And they are 100% deterministic and temperature has nothing to do with it. You WILL get exactly same result every single time (at ANY temperature) as long as you use same sampling parameters and server config parameters. What causes variance in LLM's is server parameters like batch processing and caching among a few other things possibly. the batching being responsible for most of the issues. The reason that flag is used is because large providers serve multiple customers per one gpu, and breaking up the vram is tricky and causes drift. If you start llama.cpp for example with only one person per slot batching off, you will always get same results every time even at temperature 1.2 or whatever other parameters because you are using one gpu per inferance call so no fucky buseness there. Reason most people are unaware of this is because most people have experience only with api instead of working with the actual inferance enjine itself so this godd damned myth keeps spreading. my vide for referance here where you can download and try for yourself. https://www.youtube.com/watch?v=EyE5BrUut2o
maplethorpe 22 hours ago [-]
Thanks so much for this! I still haven't got around to building my own language model yet, so I'm a bit fuzzy on the details, but if I imagined a thought experiment where I did all the math by hand on paper, I just couldn't see how I would end up with a different output each time given the same inputs. Finding out that the variance other people are seeing comes from the server/hardware stuff clears that up.
This is a surprisingly annoying question to Google. A lot of articles give the reason that softmax returns a probability distribution, as if the presence of the word "probability" means the tokens will be different every time.
evrydayhustling 1 days ago [-]
An LLM model itself -- that is, the weights and the mathematical functions linking them -- does not tell you exactly how to train from data, nor how to generate an output. Instead, it describes a function providing relative likelihood(output | input).
Deciding how to pick a particular output given that likelihood function is left as an exercise for the user, which we call inference.
One obvious choice is to keep picking the highest likelihood token, feed it into the model, and get another -- on repeat. This is what most algorithms call "temperature=0". But doing this for token after token can lead boring output, or steer you into pathological low-probability sequences like a set of endless repeats.
So, the current SOTA is to intentionally introduce a random factor (temperature>0) to the sampling process -- along with other hacks, like explicit suppression of repeats.
2ndorderthought 1 days ago [-]
Yea sure. So temperature is baked into these LLM models and when it isn't zero it increases the probability of taking a different path to decode the tokens. Whether it's at a provider or downloaded on your own machine.
Technically even when the temperature is 0 it's not deterministic but it's more likely to be... You can have ties in probabilities for generating the next words. And floating point noise is real.
All these models are doing is guesstimating the next token to say.
slashdave 1 days ago [-]
Eh, conceptually true, but in practice, it is rather hard to get any decent performance out of a GPU and still produce a deterministic answer.
And in any case, setting the temperature to zero will not produce a useful result, unless you don't mind your LLM constantly running into infinite loops.
alansaber 1 days ago [-]
Yes theres a good thinking machines lab blog about this
0-_-0 1 days ago [-]
You're being downvoted, but you're right. Determinism is a different concept and doesn't characterise LLMs well. You can have deterministic random number generators for example.
archagon 20 hours ago [-]
"AI is an abstraction" only makes sense if working with a contractor to write your app is also an abstraction. An absurd dilution of the word "abstraction" to the point of meaninglessness.
legerdemain 1 days ago [-]
This is absurd. The author misrepresents the type of "abstraction" that people mean. This abstraction ladder goes as follows:
- contributing individually
- contributing as a tech lead
- contributing as a technical manager
- leaving the occupation to open a vanity business, such as a gastropub or horse shoeing service
maplethorpe 1 days ago [-]
Abstraction has a specific meaning in computer programming. I don't think he's misrepresenting it.
OP is being a bit tongue-in-cheek, I believe they mean that some vibe coders really want to be abstracted away from their own jobs, and are very much not interested in computer-scientific abstraction.
maplethorpe 1 days ago [-]
Oh.
UltraSane 23 hours ago [-]
The exact code might not be deterministic but the behavior can be if your spec uses something like Dafny or TLA+ and is detailed enough
calf 1 days ago [-]
There are a few things being confused because people are having to learn/re-learn/re-discover basic computer science classes, but both formal specifications and informal specifications - such as pseudocode (I balk imagining how many AI users might not know this term), or natural language documentation - are all forms of abstraction. Programming languages and underlying models of computation all enable varying degrees of hiding details or emphasizing important ideas/information. Human thought and language, and mathematics, are already examples of abstraction in general. LLMs thus also purport to provide a (via computational model alternative to Turing machines) higher kind of abstraction, the debate is whether it is a good one, if its hallucinations make it unreliable, etc.
jqpabc123 1 days ago [-]
In other words, LLMs are probabilistic, not deterministic.
kibwen 1 days ago [-]
Determinism is a red herring here. The problem is that LLMs are inductive systems, not deductive systems. This makes them powerfully general, and yet inherently unreliable.
sscaryterry 1 days ago [-]
Dare I say, so are humans?
jqpabc123 1 days ago [-]
This used to be a big reason why we used computers --- to help eliminate the probability of error.
But apparently, not so much any more.
mpyne 1 days ago [-]
Digital computers were named after the humans whose jobs they automated out of existence.
They were invented to reduce cost of computation, not to eliminate the probability of error per se. Ask a Windows 11 user, they'll tell you computers still make errors.
card_zero 24 hours ago [-]
No, I'm pretty sure it does it on purpose.
somewhereoutth 1 days ago [-]
Right, it was the perfect match: Humans for fuzzy touchy feely stuff, computers for hard edged correct calculations. How have we managed to screw this up so badly?
irishcoffee 1 days ago [-]
I think the big unmentioned elephant in the room is the gambling/dopamine aspect of using an LLM. It’s to the point where people at $dayjob joke about it… but they’re not joking. That’s how it got screwed up so badly.
We have a bunch of engineers paying money to open loot boxes and they get visibly upset when they run out of tokens.
LLM companies have done an absolutely brilliant job of figuring out how to burn more tokens quickly, couch it as “more advanced” and people throw money at them.
I realize this wasn’t the thrust of your point, but tangentially, we fucked it up so badly because people desperately want to ignore this bit, and instead of looking at these tools analytically, there are the ardent defenders and the staunchly opposed… much like every other topic under the sun these days.
I use the free stuff work pays for, and I’ve never hit any token limit or anything like that. But I’m also trying extremely hard to ensure my skillsets don’t atrophy. I just use the web interface and ask questions. I have no interest in tying my development experience directly into an LLM, not after what I’ve seen at work over the last few weeks.
somewhereoutth 18 hours ago [-]
Gambling is right. And not just the dopamine, habitual gamblers remember the wins and forget the losses, so tend to believe they are 'ahead'. In time the bean counters will come and sober everyone up, the bottom line can't be argued with.
jqpabc123 14 hours ago [-]
In time the bean counters will come and sober everyone up
Lawyers are going to be involved in this too.
suttontom 23 hours ago [-]
At this point this argument has crossed over into whataboutism.
cyanydeez 1 days ago [-]
This makes sense, but you need to understand that you're ignoring the compiler once you're past the machine code level which isn't an abstraction right, it's the root. So ignoring that part of the missive, goin from C to Python, different compilers do add different machine code.
C and Python have a bunch of different compilers, so you don't if you take the same code, the f' output can be different. There's determinism within the same compiler. Add in different architectures, and the machine code output definitely is more varied than presented.
But that's still a manageable; then what if you add in all the dependencies, well you get a more florid complexity.
So really, it's a shitty abstraction rather than an inaccurate analogy. If you lined them up in levels, there could be some universe where they are a valid abstraction. But it's not the current universe, because we know the models function on non-determinism.
I'd posit if there was a 'turtles all the way down' abstraction for the LLM, it's simply coming from the other end, the one where human mind might start entering the picture.
Take sending a packet over a noisy, low SNR cell network. A high number of packets may be lost. This doesn't prevent me, as a software developer, from building an abstraction on top of a "mostly-reliable" TCP connection to deliver my website.
There are times when the service doesn't work, particularly when the packet loss rate is too high. I can still incorporate these failures into my mental model of the abstraction (e.g through TIMEOUTs, CONN_ERRs…).
Much of engineering and reliability history revolves around building mathematical models on top of an unpredictable world. We are far from solving this problem with LLMs, but this doesn't prevent me from thinking of LLMs as a new level of abstraction that can edit and transform code.
A reasoning error has an infinite, unpredictable blast radius. When an LLM hallucinates, it doesn't fail safely but it writes perfectly compiling code that does the wrong thing. That "wrong thing" might just render a button incorrectly, or it might silently delete your production database, or open a security backdoor.
You can build reliable abstractions over failures that are predictable and contained. You cannot abstract away unpredictable destruction.
Says who? It’s quite easy to limit the blast radius of a reasoning error.
Sure, you can patch that specific case with guardrails, but how many unpredictable edge cases are you going to cover? It only takes a user with a bit of ingenuity to circumvent them. There are already several examples of AI agents getting stuck in infinite loops, burning through massive API bills while achieving absolutely nothing.
You can contain a system failure, but you cannot contain a logic failure if the system doesn't know the logic is wrong.
It didn't happen. Seems the bug was "contained".
Sort of undermines your point re "catastrophic business failure" don't you think?
This is the wrong question. The correct question is what specific subsets of cases do you allow, similar to any security question
Suppose you had:
Math() Add() Subtract()
Program() Math(“calculate rate”)
This is intentionally written vaguely. How do you limit that these implementations ensure Program() runs and does the right thing when there is no guarantee Math() or its components are correct?
Normally you could use a typed programming language, unit tests, etc, but if LLM is the ultimate abstraction programs will be written line above. At some point traditional software engineering principles will need to apply.
A bad packet could tell a flying probe to fire all thrusters on and deplete its fuel in 15 minutes.
What makes a transmission error controlled is all the protection mechanisms on top of it. An LLM cannot delete a production database unless you give it access to do it.
My hot take is that many people are naturally more comfortable with deterministic systems that have clearly analyzable outcomes. Software engineering has historically primarily been oriented around deterministic systems and it has attracted that type of thinker.
But many of us, myself included, prefer chaotic systems where you can’t fully nail down every cause and effect. The challenge of building a prediction model on top of chaos is exhilarating. I really don’t find many people like me in SWE as in, say, the graphics design department.
To me, that’s the underlying threat here — LLMs are rewriting a field that has previously self selected a certain type of person and this, quite understandably, rubs them the wrong way.
Simple example, we've been striving for 90% unit test coverage and thorough code review when there's 0% integration test coverage. I blame the metrics only looking at unit tests, but also many people think unit tests should come first. I would prioritize integration. There are some small pieces that need to work reliably, but if your system relies so hard on all of them working right, it's a bad system. That, and too many things will work in pieces but not overall.
Broadly I'm gonna assume that the team will later hire solid SWEs who don't necessarily know how our stuff works, and aren't going to read 100 docs about it. If this is a backend+DB combo, get your DB right and there won't be too many wrong ways to code against it in the future, get it wrong and it becomes a black hole for SWE-time. Or if someone on their first day can't run a system locally for debugging, no matter how elegant the code is, don't count on that system getting fixed quickly during an outage.
It's... just... not that hard to write code nor does it cost that much. There are millions of us working silently at places that aren't "big tech". We all shrugged our shoulders, took a sip of coffee, and went back to our Teams meetings where the only LLM usage is still just Copilot.
The comment you replied to made no statements about math or proofs. They made a statement about working in systems of non determinism effectively. Your statement seems to imply that this is dumb, as if working in a world of full determinism is an option.
I've wanted a term for this for decades!
Feels like this maps to the J/P of Myers Briggs
It makes even less sense when people compare an LLM to a compiler. Imagine making a pull request that's just adding a binary because you threw the source code away.
If I assign a bug fix ticket to a human developer on my team, I won't be able to precisely replicate how they go about solving the bug but for many bugs I can at least be assured that the bug will get solved, and that I understand the basic approach the assigned dev would use to troubleshoot and resolve the ticket.
This is an organizational abstraction but it's an abstraction just the same, leaky as it is.
No, this is not comparable. The reason reproducible builds are tricky is not because compilers are inherently prone to randomness, it's because binaries often bake-in things like timestamps and the exact pathnames of the system used to produce the build. People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.
And neither are LLMs. Having their output employ randomness by default is a choice, not a requirement, just like things like embedding timestamps into builds is a choice that can be unwound if you want the build to be reproducible.
> People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.
They are certainly different things, but if you are going to criticize LLMs it would be better if you understood them.
It is not random if you don't use random sampling in its output generation.
It the whole thing were actually stochastic then prompt caching would be impossible because having a cache of what the previous tokens transformed into to speed up future generation would be invalidated by the missing random state.
Look at llama.cpp, you can see what samplers are adjustable and if you use samplers that employ randomness you can see what settings disable the random sampling. Or you can employ randomness but fix the seed to get reproducible results.
https://docs.pytorch.org/docs/2.11/generated/torch.use_deter...
This comes down to map reduce and floating point's lack of associativity. You see the same thing with OpenMP on CPUs.
People are constantly claiming determinism in LLMs that is just not there.
This is interesting though, I didn't know PyTorch had a debug mode for reproducibility.
I posted that to show that at a bare minimum, there is some pretty extreme nondeterminism (though probably mild in effect) in even the most pedestrian GPU inference, unless you go to the extreme of using the debug mode and taking the potential performance hit.
ill conditioned or unstable is better
a small change in the input can create a large difference in the output.
This is not really meaningfully true. E.g. batching, heterogeneous inference HW, and even differences in model versions can make a difference in what result you get, and these are hard to solve.
GCC 16.1 vs. 15.2 will get you differences. GNU ld vs. gold vs. mold vs. lld will get you differences. Whether you do or do not employ LTO or other whole-program optimization vs. whether you do gets you differences.
Have you never debugged a race condition that worked on your machine but didn't work in prod, based only on how things ended up compiled in the final binary?
I'm not saying these are identical but there's a lot more similarity than you all seem to understand. And we've made compilers work well enough that a lot of you believe that they give very routine, very deterministic outputs as part of broader build systems even though nothing could be further from the truth, even today.
This is not my claim, you're veering wildly off course here. I'm merely responding to the common, tiresome and, to be frank, stupid analogy of LLMs to compilers.
inb4: "Don't worry, just use GPT to make the docs"
The code is still important, but I could see it becoming something that humans rarely engage with.
That's more anthropomorphism than an abstraction. An LLM talks like a person because it was trained to predict continuations of human speech. That does not make it a person, or a system with intent, responsibility or any other human attribute. They are what they are: text prediction engines.
Perhaps your input to the LLM is "make all the test cases pass", and so it predicts it better do something to make the test cases pass, and does so by deleting the test cases. I guess in the "abstract" sense it did what you asked.
Or, how about the case in the news from a few days ago where an agentic system deleted all the vendor's customer's data, and last 3 months of backups, despite having been EXPLICITLY "told" not to do any such thing. Should we consider "completely fuck the customer" as an "abstraction" of "never delete any data"?
Abstractions often embrace nondeterministic translation because lower level details are unknown at time of expression -- which is the moivation for many LLM queries.
Occasionally the low level details leak through ie: this egg came from this farm, theres a shipping issue so onions are more expensive or whatever.
I think llm assisted coding is going to work something like this.
It's not a good analogy. With an LLM you might ask for a pea, be parted with your money, and be given a watermelon.
Also, the problem of "did we receive the, and is it correct" is vastly trivial compared to "did we get correct LLM output".
The problem in networking is making the reliable transfer perform well under many conditions, and scale.
Getting consistently reliable output of LLMs isn't solved, so we can't talk about it scaling to even one instance.
It depends on what's the abstraction.
Using LLM for coding is 'abstracting' the developer, adding extra layer that can produce code. But it's not abstraction layer of the code itself.
While your code is executing asteroid from space may hit it and halt execution.
There are no consequences for a bad output from an LLM and idk about you but I don't trust them
ex: std::shared_ptr is abstracting over raw pointers, and does refcounting. It is abstracting something but you can actually know exactly what that thing is. An LLM is an abstraction over the space of all possible computer programs. If an abstraction doesn't constrain you in some way, it's not an abstraction.
Such strong correlation between narrow specific demands on how things have to be & posting, in general. I'd really like to see open mindedness & exploratory views be more frequent and have better standing, in general.
I do tend to think this is different than a level of abstraction. But it feels like it's trying to hit hard, on a pretty weird narrow point.
Not at all. If this were true then the Python code in question would generate deterministic binary. Of course that’s not what happens. The Python runs through an interpreter that may change behavior on different runs. It may change behavior version to version. It may even change behavior during multiple invocations of a function in the same running instance. Because all of that is abstracted away.
Same for the C code. You give up control and some determinism for the higher abstraction. You might get there same output between compilations on the same version but that’s not actually guaranteed and version to version consistent certainly isn’t.
Moving to a higher layer of abstraction very often results in less constrained behavior.
With a high level language implementation of any sort, the actual instructions executed by the CPU may vary according to how it was compiled or run, or what machine you run it on, but the behavior will not.
The high level language defines it's own level of abstraction, defining exact behavior, allowing the developer to have full control over program behavior, algorithms, UI, etc.
An LLM + natural language instructions is not a program-like abstraction of what you want the computer to do, because it does not have that level of precision. Natural languages are fuzzy and imprecise, because they have been developed for communication, not precise machine-level specification.
Obviously you can "vibe code" at different levels of detail, ranging from "build me an app to do X" to "here are 20 1000 word essays specifying the dos and don'ts of what I want you to build", but in either case you are nowhere near the level of precision of using a programming language to specify exactly what you want.
So sure, "vibe coding" let's you accomplish A result with less attention to detail than using a programming language, but it's not a "higher level abstraction" in the sense that HLLs are, since it doesn't define what that abstraction is. It lets you get A result, but not define a SPECIFIC result, since natural languages just aren't that precise... natural language means whatever the person/thing interpreting it interprets it to mean.
Of course you can use an LLM as a way to "rough out" a function or app, and as a crude tool to manipulate that roughed out form (or an existing project), but natural language does not have precise semantics and therefore cannot provide a precise definition of what you want to do.
> The high level language defines it's own level of abstraction, defining exact behavior
This is not entirely true. A high level language defines some behaviors, leaving many behaviors to be undefined and implementation specific.
Many of those unspecified behaviors can matter in some cases.
> It lets you get A result, but not define a SPECIFIC result, since natural languages just aren't that precise...
You are repeating the same error as the article and missing the fact that while an abstraction lets you specify or control some things, it leaves many things out of your control. The higher the abstraction, the more stuff that is left out of your control. And maybe you don’t care about the things outside your control (great, the abstraction worked!) but regardless there are many things left unspecified in the typical abstraction and very often eventually you will care, which is why people say things like “all abstractions are leaky”.
For a simple example, think of writing something like this:
MessageBox(“hello world”, OkCancel)
MessageBox is an abstraction over a massive amount of logic. You specified a string and a set of buttons and not much else. You give up control over the styling, the placement of the buttons, the actual button text (very likely will be localized), where the box will appear, and so much more.
You are not getting a specific result. You are getting a result that meets the contract.
“Write a program that shows a hello world message box” is a much higher level of abstraction than even that, and you are giving up significant specificity and determinism. Is it a good abstraction? That’s a great question. But it certainly is an abstraction.
Sure, but you don't need to use those, and shouldn't.
A programming language let's you avoid undefined behavior and stick to the defined abstraction provided by the language.
Natural language does NOT let you do this, because words have no strict meaning, and the meaning of any sentence is undefined and up for interpretation and contextual clarification, etc, etc. Maybe more to the point LLMs are not concerned with meaning - they are concerned with continuation prediction. The LLM/agent that "ignored instructions" and deleted all the customer's data and backups wasn't "being bad" or "ignoring instructions", it was just statistically predicting, and someone was daft enough to feed those predictions into an execution environment where real world consequences could ensue.
Yes and no. It lets you avoid undefined behavior traps. It lets you rely on endless implementation defined choices.
> Natural language does NOT let you do this, because words have no strict meaning, and the meaning of any sentence is undefined and up for interpretation and contextual clarification, etc, etc.
This is a fair criticism of natural language. It is less well defined. That doesn’t stop it from being an abstraction, though perhaps it fairly makes it a problematic abstraction.
> The LLM/agent that "ignored instructions" and deleted all the customer's data and backups wasn't "being bad" or "ignoring instructions"
That one was entirely human error. And not just “oops I trusted the AI too much”. That guy was sharing the DB volume across prod and staging so deleting the staging DB also took down prod.
If you run your business like that, eventually a human will do the same thing, because it’s catastrophically dumb.
That's a strange way to think of a programming language specification.
A programming language is an abstraction that mostly fully specifies behaviors that any compliant implementation must adhere to, and that you as a user of the language can therefore rely on.
There may be a few details in a language specification that are specified as implementation defined, but that doesn't mean that are not specified, it just means they are specified by the implementation rather than the standard.
This is true but hand waves over a lot of behavior. An implementation of Python could be 10x faster or 10x slower and still be fully compliant with the specification.
There’s a reason that NumPy’s core is written in C and not Python. The implementation-specific details sometimes matter. The abstraction is leaky as soon as you care about anything not explicitly specified in the abstraction.
But to who/what is it an abstraction?
To a human, sure. If I told an intern to "write a hello world message box", I'd expect at least to get something approximating that request.
To an LLM? The LLM has no intent or understanding - it's just a statistical predictor. Maybe it'll "interpret" your request as only wanting a hello world message box, so it'll delete your company's entire git repository to ensure a clean slate to start from.
I think that when you say "it is certainly an abstraction" what you implicitly mean is "it is certainly an abstraction TO ME", but an LLM is not you, and does not have a brain, so what is an abstraction to you shouldn't be taken as being an abstraction to an LLM (whatever that would even mean).
If a natural language specification is an abstraction over the code implementation, then it is an abstraction whether given to a human or an LLM. The LLM being arguably a bad tool does not change that.
The question is then can specifying a computer behavior in (inherently imprecise) natural language be considered as some sort of "program", a higher level abstraction in same sense as a HLL provides a higher level of abstraction to programming in assembler?
I would say no, for various reasons:
1) There is a difference between an abstraction and under-specifying your requirements
2) Whatever you want to call it, any descriptive language that is insufficiently precise to unambiguously describe the details that are important to you is not very useful as a way to specify system behavior
3) If you are not only using natural language to describe the desired behavior of a system, but are also assuming that the person (or thing) interpreting the description is bringing their own knowledge and expertise to bear in "fleshing out the details", then you don't have a full specification at all, even an abstract one. What you have in that case is not something analogous to a computer program, but more like business requirements.
If you need to develop an API that you’ll support publicly for 10+ years, yeah. You probably want to be really precise. If you need to code up yet another feature in some CRUD app, it matters a lot less.
Sometimes natural language is entirely sufficient to “unambiguously describe the details that are important”.
Python could certainly optimize repeated code paths to make them more efficient. I don’t know that the standard implementation actually does, but it could. Spending extra time optimizing repeated code paths is a reasonable choice for an interpreted or JIT compiled language.
I would not expect C to change from invocation to invocation mostly because C is supposed to be boring and predictable. That’s kind of its thing. But again, it could. There’s nothing in the C spec I’m aware of that says the C compiler has to ensure that each invocation of a piece of code will execute the same machine instructions.
Yes, and that's how it's supposed to be. Any description that determines the totality of a problem space is an implementation itself.
Imagine the following requirements:
f(0) = 0, f(2) = 4
Both f(x) = x^2 and f(x) = 2x are correct ways to implement said requirements. But if you start relying on f(1) = 2, you might get in trouble with a coworker that relies on f(1) = 1. This is undefined behavior and should be avoided.
>There’s nothing in the C spec I’m aware of that says the C compiler has to ensure that each invocation of a piece of code will execute the same machine instructions.
It can't, because C can be written for any system you want. If I ask the compiler to compile x *= 2, it might use the mul primitive or it might use shl, both are ok.
Assuming you write code that does not take advantage of undefined behavior, then in general you would expect the correctness of your program to be consistent. But you would not expect the performance, for example, to be consistent. An optimizing JIT compiler might certainly run the 3rd invocation of code path way faster than the first.
With changes like substantial refactors or ambitious feature additions, it's easy to exceed the infamous "seven things I can remember at once":
Good lab notes, specs etc can help, but it's a lot to keep in mind. In practice these often turn into multi person projects, and communication is hard so that often means delay or drift. Having an agent temporarily worry about frees up my mental bandwidth for the harder parts of the problem.The main benefit is to defer the concern until I have a mostly working system. Then I come back and review its output, since I'm still responsible for what it delivers, and I want better than "mostly working".
Being able to spend my energy on the architectural decisions and validate my understanding before spending time on optimising the internals has actually allowed me to follow through with some of my designs.
Experimentation is then faster. If the data model wasn't good enough, I can actually experiment with it immediately, before we accidentally ship something to production and then have to deal with a very annoying data migration problem. The exact code doesn't matter to begin with when we just want to make sure the data is efficient to decode and is cache friendly.
I recently built a project I had in my mind for 3 years but could never work on because all the individual components were overwhelming. It involved e2e encryption, consensus, p2p networking, CRDTs, and API design. It was very nice to see it come together. The project ended up failing due to some underlying invariant, so it was nice to validate that and finally get it out of my head.
I have a feeling that if LLMs were built on a deterministic technology, a lot of the current AI-is-not-intelligent crowd would be saying "These LLMs can only generate one answer given a question, which means they lack human creativity and they'll never be intelligent!"
I've gone through hand-coding HTML, CGI, CMSes, web frameworks, and CMSes built with web frameworks. Each is (roughly) a layer of abstraction on top of lower layers.
People talk about LLMs as an extension of this layering but they're not. With the layers of abstraction I've listed you can go down to the layers underneath and understand them if you take the time.
LLMs are something different. They're a replacement for or a simulation of the thinking process involved in programming at various layers.
But isn’t this just a semantics discussion? Is there a rule for abstraction in CS that says it needs to be deterministic (I really don’t know)?
I believe deterministic abstraction to natural language is impossible to reach by the very ambiguous nature of it, we get misunderstandings when we talk to each other so naturally when talking to a machine it would need to be probabilistic to understand how to translate it to code.
Now, it is an open question whether this is simpler than programming in any modern programming language. By the time you figure out the exact prompt trajectory that will build what you want, you might as well have used some fancy autocomplete IDE to write the same code. It really depends on your fluency with that specific language. People are usually very fluent in natural language, and so it levels the playing field so to speak.
And LLMs can handle very abstract concepts that could not possibly be encoded in C++, like the user's goal in using software.
Is the probability much higher with GCC? Sure. But it's still a probability.
If people are so confident about the determinism of LLMs, or at least consider it on par with compilers, please ask it to compile your source code instead. Better yet, replace all your GNU utils with LLM instead. Replace your `ls` with `codex "prompt"`.
Non-deterministic systems produce different output states given identical input states.
Even if a compiler's memory gets a one-in-a-million bitflip that produces a different output, it doesn't mean it's non-deterministic. It just means that the output state is different due to an external force changing the internal state.
An infinite loop will halt when the processor is powered off.
It's like saying human personal assistant Bob is an abstraction over your calendar and shopping list.
In other words, it depends where the people talking have placed their cutoff point for a good abstraction versus a terrible and unwise one.
Bit like when people say "it's like riding a bike" they're not actually talking about bicyle riding being the exact same activity.
Coming with this in response:
> f(x) -> P(y) ∪ P(z1) ∪ P(z2) ∪ ... P(zN)
is a failure in human communication not a disagreement about what LLMs are or aren't.
However, they can abstract away the need to understand implementation, similar to a coworker. They can summarize behavior, be queried for questions, etc, so you don't have to actually understand the inner workings of what is going on. This is a different form of abstraction than the typical abstraction stack of a program.
Anyone claiming LLMs are an a higher level of abstraction are not using it in the way used by programmers and computer scientists.
They're usually conflating "delegation" and "abstraction", as if a junior developer is an abstraction.
But how were various combinations of popular programming languages, operating systems and hardware platforms not effectively f(x) -> P(y | z1 | z2 | ... z3)? Suppose you were quick on the take and were writing in Unix and C in the early 80s and found yourself porting your program from a PDP-something to an 8088 PC, or to a 68k Mac, dealing with DOS extenders, printer drivers, different versions of C (remember K&R style?) or C++? Remember MFC? The evolution of the STL?
LLMs are similar to that maelstrom, just on a faster timescale.
An LLM can't. Even within your primary target.
It's like explaining how a hammer isn't a screw driver. And someone comes to argue the fact that a hammer too, can break.
For sure the problem isn't that clear-cut, for the siren's call of AI coding is to induce a system out of prompts with ambiguous semantics. It's hardly surprisingly you get unpredictable outcomes when giving ambiguous commands to human collaborators, and that in the case of LLMs they resolve ambiguity with probabilistic approximation.
Turning an ambiguous language with no formal definitions(like English) into one that does is a very hard problem.
"Give me a Todo app"
Is also different from
"Write a function that takes a string parameter (Todo) and saves it into a text file with the name <current date time (as a Unix epoch)>.txt, and if already present, append to it to the file instead"
The probability distribution for the potential output is different, and it's more limited in the second case perhaps.
Besides, even the "deterministic" systems the author is referring to, are not fully deterministic. They are "deterministic" if we ignore a certain threshold of randomness that could afflict the system. Yes perhaps this threshold is higher when using LLMs, but even when using LLMs, not all inputs share the same level of indeterministic output
For many applications, this is equally troublesome as true non-determinism.
They are definitely not interpretable, I was reading some stuff from mechanistic interpretability researchers saying they've given up trying to build a bottom up model of how they work.
Compare "You are a helpful assistant. Your task is to <100 lines of task description> <example problem>"
with
"you are a helpless assistant. Your task is to <100 lines of task description> <example problem>"
I've changed 3 or 4 CHARACTERS ("ful" to "less") out of a (by construction) 1000+ character prompt.
and the outputs are not at all similar.
Just realized I've never tried the "you are a helpless ass" prompt. Again a very minor change in wording, just dropping a few letters. The helpless assistant at least output text apologizing for being so bad at the task.
That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.
edit: I'm not talking about an LLM as accessed through a provider. I'm just talking about using a model directly. Why wouldn't that be deterministic?
After that, a piece of software that is NOT the LLM chooses the next token. This is called the sampler. There are different sampling parameters and strategies available, but if you want repeatable* outputs, just take the token with the highest probability number.
* Perfect determinism in this sense is difficult to achieve because GPU calculations naturally have a minor bit of nondeterminism. But you can get very close.
This is a surprisingly annoying question to Google. A lot of articles give the reason that softmax returns a probability distribution, as if the presence of the word "probability" means the tokens will be different every time.
Deciding how to pick a particular output given that likelihood function is left as an exercise for the user, which we call inference.
One obvious choice is to keep picking the highest likelihood token, feed it into the model, and get another -- on repeat. This is what most algorithms call "temperature=0". But doing this for token after token can lead boring output, or steer you into pathological low-probability sequences like a set of endless repeats.
So, the current SOTA is to intentionally introduce a random factor (temperature>0) to the sampling process -- along with other hacks, like explicit suppression of repeats.
Technically even when the temperature is 0 it's not deterministic but it's more likely to be... You can have ties in probabilities for generating the next words. And floating point noise is real.
All these models are doing is guesstimating the next token to say.
And in any case, setting the temperature to zero will not produce a useful result, unless you don't mind your LLM constantly running into infinite loops.
https://en.wikipedia.org/wiki/Abstraction_(computer_science)
But apparently, not so much any more.
They were invented to reduce cost of computation, not to eliminate the probability of error per se. Ask a Windows 11 user, they'll tell you computers still make errors.
We have a bunch of engineers paying money to open loot boxes and they get visibly upset when they run out of tokens.
LLM companies have done an absolutely brilliant job of figuring out how to burn more tokens quickly, couch it as “more advanced” and people throw money at them.
I realize this wasn’t the thrust of your point, but tangentially, we fucked it up so badly because people desperately want to ignore this bit, and instead of looking at these tools analytically, there are the ardent defenders and the staunchly opposed… much like every other topic under the sun these days.
I use the free stuff work pays for, and I’ve never hit any token limit or anything like that. But I’m also trying extremely hard to ensure my skillsets don’t atrophy. I just use the web interface and ask questions. I have no interest in tying my development experience directly into an LLM, not after what I’ve seen at work over the last few weeks.
Lawyers are going to be involved in this too.
C and Python have a bunch of different compilers, so you don't if you take the same code, the f' output can be different. There's determinism within the same compiler. Add in different architectures, and the machine code output definitely is more varied than presented.
But that's still a manageable; then what if you add in all the dependencies, well you get a more florid complexity.
So really, it's a shitty abstraction rather than an inaccurate analogy. If you lined them up in levels, there could be some universe where they are a valid abstraction. But it's not the current universe, because we know the models function on non-determinism.
I'd posit if there was a 'turtles all the way down' abstraction for the LLM, it's simply coming from the other end, the one where human mind might start entering the picture.