20 Comments
User's avatar
Sherman's avatar

wow these are hard questions. the more i tried to write, the more i realized i did not understand. good luck to the participants

Emil Sotirov's avatar

How about defining a critical question... based on a fresh framework?

(too old for a researcher... just volunteering)

Kian Kyars's avatar

Great initiative. I imagine you're aware, but you've just signed yourself up for a lot of application reading.

Kian Kyars's avatar

On that note, I propose some filter like a GPT-0 style checker to discard any responses which have some certain AI-generated score above a threshold.

Andrew Jones's avatar

I entered a submission but couldn't help myself from also responding to another question here. The most compelling case imo for the foundation model companies making money is captured in this quote from Aschenbrenner's paper:

> As an aside, this also means that we should expect more variance between the different labs in coming years compared to today. Up until recently, the state of the art techniques were published, so everyone was basically doing the same thing. (And new upstarts or open source projects could easily compete with the frontier, since the recipe was published.) Now, key algorithmic ideas are becoming increasingly proprietary. I’d expect labs’ approaches to diverge much more, and some to make faster progress than others—even a lab that seems on the frontier now could get stuck on the data wall while others make a breakthrough that lets them race ahead. And open source will have a much harder time competing. It will certainly make things interesting. (And if and when a lab figures it out, their breakthrough will be the key to AGI, key to superintelligence—one of the United States’ most prized secrets.)

The data flywheel is, in my view, the core dynamic. I think we'll start to see greater specialization of each model provider on a set of domains - say coding, legal etc. as they compete for capturing the users who become the domain experts that are encoding the judgment for that given domain. The quality of the models is going to increasingly rely on this volume of training data (which helps to explain the acquisition of Cursor by xAI).

Whichever model provider achieves a breakaway utility in a given domain - say finance - will become the de facto daily driver. That will lead to more and higher quality annotations as people steer and use it, hence driving what could become an exponential and divergent advantage.

And this of course translates directly to ability to capture profits.

Andrew VanLoo's avatar

“The data flywheel is, in my view, the core dynamic.”

I disagree, but I’ll have to include my sense in a reply later.

Dave Banerjee's avatar

The real test is picking the most important listed problem

ATC (Absolute Total Compound)'s avatar

AI has turned Team Work into One Man Show.

F.I. Munger's avatar

AI needs to be used to create replacement workers, the data is being given for free. Now sell the perfect slave to companies tired of human error & workers. Another avenue is to sell the brains of the highest IQ users - AI trained to be more intelligent rather than the average of the average. Chat GPT trained on ppl above 120 IQ - sell that to those with money, and note the difference in thinking between the 2. AI replacement humans is the way forward, data needs to be organised and have some direction and separation. One AI for everything is not feasible in a fragmented human society that has noted the benefit of specialists

Ben Szuhaj's avatar

Love that you’re doing this. Thank you for pushing the discourse forward by asking thoughtful, important questions, both on air and online.

Andrew VanLoo's avatar

I have what I consider a great answer for what to do with the 180B OpenAI endowment. I’ll have to craft the essay…

Alasdair's avatar

Do you want citations and background references included to validate claims? If so, do they count toward the 1000 word limit?

Axel Schmiegelow's avatar

When is the deadline?

Georg Philip Krog's avatar

Great questions ! Three of the four are not what they look like on the surface. They are phrased as what or how questions but their structure is mixed-modal: they bundle a deontic claim, a causal mechanism, and a temporal slot into one sentence. That is exactly why, as you note, LLMs answer them poorly: the model collapses the multi-stream query into a flat list of plausible answers without the schema to keep the streams separated :-)

Jerry Zhang's avatar

I think 1,2,4 all hinge on the premise that "this time is different".

1. I don't think the progress is faster or out of the ordinary. LLM/VLM still only generalizes to domains where it has plenty of training data. The current paradigm doesn't work on lidar, hardware design, protein folding, or generating a photonic mask for making chips. It still exhibits the same problem as models 10yrs ago. My hypothesis of the fast progress feeling is that it stems from more resources being poured into specific vertical domains, eg. coding, ui design, legal, and finance. We have experienced similar things in computer vision from the 2012-2020 era, where the backbone improvements plateaued after ResNet, the downstream applications continue to improve: object detection, semantic segmentation, instance segmentation, etc. The current fast application improvements are domain-specific, not the backbone model.

I also reject the notion of exponential takeoff. The fundamental limit is the interaction with the world. You can't come up with a new model that can predict protein structure without doing a bunch of experiments (Alphafold is based on previous experimental data and custom architecture, not a generic LLM). VLA also doesn't quite work because we simply don't have enough data, at least in the current paradigm. Countries of genuses cannot generate new knowledge just by thinking and not doing experiments, and doing experiments have physical limits, which would exhibit as economic cost.

2. AI labs will start making money after consolidation and market pressure. Similar to ride-hailing. The operating margin will be in the low teens, not SaaS like. Even without open source, OpenAI, Anthropic and Google will engage in price war and drive the price down to cost. This is more akin to utility. The counterargument is that the model will continue to improve, but I can give a counterexample like DRAM and CPU, where it improves exponentially, but still has a lower margin than SaaS. The current hardware super cycle will end (margin compression) when the public market demands that frontier labs start generating profits. Uber and Lyft, as well as their Chinese counterparts, engaged in a ferocious price war, but once they go public, they have to rein in the spending.

From what I heard, the current inference with high utilization is roughly breakeven, but if you account for the peak and trough in utilization during a week, the current pricing might be underwater.

I have also done some napkin math for the inference cost. For a $300k 8 GPU server spread over 6 years, that is $50k depreciation per year or $5.7/hr. If we rent a 15kw rack space in Canada, BC, the monthly cost is maybe around $2000, so $2.7/hr. So the operating cost would be $8-9/hr. If you spread the depreciation into 3 years, that is roughly $15/hr. We haven't accounted for maintenance and other costs. If we assume $1/$4 input/output pricing and a 60/40 mix. Single stream would be underwater by a lot with <1M tokens/hr. So batching is necessary to get a positive gross margin. With moderate batching, you maybe able to get to 5-10M tokens/hr, which yields $11-25/hr. If we use the cloud rent price of $50 for h200, $100 for b200, and a 50% discount, even with batching, you are barely breakeven. I doubt the public market would tolerate this kind of operating margin for long. Frontier labs would have to raise prices and cut spending at the same time.

4. Technology diffuses. Plus, you can always trade. Most of the countries did not participate in the Industrial Revolution. Same thing here.

ron vrooman's avatar

853196 There is no AI. It is just a computer with lots of info and a program. it susses out an answer. It you push back it corrects but just a little bit. Until you accept its offer. If you are of the mindset it is useable. you must do it yourself. When you start with "AI" you leave out what you should have done before turning to "AI".