When using Brave, a window opened
and the following was displayed. Did anyone else receive this?
Such things donāt happen on their own. Take a look at the domain you are on, what website were you visiting BEFORE this happened? And what extensions do you have installed if any?
Brave does not whatsoever have anything in itās code to go to that domain. The code is on GitHub and can be easily searched for that string, it simply doesnāt exist in the code.
I think it came up when I was searching for Block Lists for Little Snitch. (Brave is such a resource hog, even my M4 Macbook struggles to keep Sonnet from crashing) Regardless, just thought Iād post it. Thanks for your input. Happy NY!
Thatās actually not you. See, when Leo runs on your computer, itās not actually running on your computer. Think of Leo in Brave more like a ādumb terminalā to Braveās server which is where Leo truly is running from.
If you were running any of these LLMās locally, youād know, your VRAM/RAM load would slowly rise, while you wait for the model to load, and likely the computer would become unresponsive. See, in this case here, this is Brave on a Leo page, vs a LLM run locally on my machine. Notice the VRAM and RAM usage is wildly different here? This is why RAM is expensive right now. ![]()
So, if Leoās Sonnet is crashing, thatās a worthy post to make, as that means itās crashing on their end most likely, not you.
Brave with Leo open but no prompt yet made:
Noticed in that example I chose Qwen ā14Bā. I chose this model because I have this very model on my PC, and can give you a true Appleās to Appleās comparison here.
The difference is striking, isnāt it? Thatās what is really going on when you run an LLM locally.
Wanna see something kinda fun though? Watch as I prompt it, watch the CPU, and GPUās reaction. Thatās how an LLM really runs on a system.
Notice it took ā16ā seconds to think out that reply before it really started talking? This isnāt a weak consumer GPU either, and itās running on ROCm, itās (the GPUās) native interface for compute such as AI workloads. So yea, I promise you, Leo and Sonnet are not actually run on your M4 MacBook, youāre using Braveās servers.
In fact, while hard to see, while it was thinking and responding, did you notice the VRAM was slowly climbing? Thatās called the context buffer/window. Basically how much of the conversation in tokens, it can ārememberā itās having with you. Once you spill over the set limit (you never set more than VRAM you have, and since each model can take up different amounts of VRAM, you have to adjust per model), itāll adjust based on your configuration of said buffer.
(PS, my machines name is āMiltonā, because it was built when our power was out while we had hurricane Milton slamming our house, so Brave has Leo, I have Milton.)
Thank you for what seems to me, to be a glimpse of Hybrid AI. { ā oops, my mistake }
This isnāt āhybridā AI whatsoever. This is how LLMās have always worked.
I made a clear example showing that Leo isnāt running on your computer or mine. It never was. All website based AIās never run on your computer. ChatGPT? Thatās not on your computer. Copilot? Ha, that still doesnāt even run on your computer, it still uses Microsoftās.
Most people truly donāt know how LLMās work. Iāve been saying it for a while, a lot here really donāt know what they are talking about in regards to LLMās. Braveās team does, but the users did not. Thatās what this example is showing, the difference between using a consumer facing AI service like Leo, Copilot or ChatGPT. Your computers resources were never used once. Run an LLM locally however and thatās a very different story, now you can talk to it with no Internet at all of any kind.
But every time you talk to it, the GPU pulls so much power the lights on the desk to flicker from transients. Thatās the harsh truth about LLMās.
If people want to like AI, they should know about what it really is, and what it is not. This however, aināt āhybrid AIā whatsoever.
Huh. I managed to read a recent article that led me to āthinkā that Hybrid AI is a blend of Local and Cloud - the article was aimed at bursting āthe AI bubble.ā And then I stumbled onto:
āhttps://blogs.nvidia.com/blog/rtx-ai-brave-browser/ā
And then what you wrote above, more local and cloud blend (I had been thinking? / reading too fast while not getting sleep) . . . so apparently I went off the rails near the start, and blurred things.
During which derailing, I had noticed:
āhttps://www.earley.com/insights/what-is-hybrid-ai-approach-to-data-discoveryā
and
āhttps://www.geeksforgeeks.org/artificial-intelligence/what-is-hybrid-ai-and-its-architecture/ā
[I actually bookmarked both of those.]
. . . but was too tired to study, and I must have imagined some hybrid vehicle, internal combustion engine + electric motor, similarity.
You know, I have to admit something, I had NO idea Braveās Leo had this ability whatsoever.
But yea, thatās your models. Claude Sonnet by default is setup to use Braveās own AWS servers, but if the user is paying for Claude Sonnet through Brave, theyād have no reason to install it locally as youāre leasing it from their servers. You could pay for Claude Sonnet or GPT and install those to run locally, but you wouldnāt then be paying Brave, youād be paying the developer of those models themselves.
These are my actual local models I have:
The size of the model includes that distilled training. While I have 20GB VRAM, Iād likely never get a model larger than 12GB, because you still need VRAM room for open apps, desktop UI, and the context buffer itself (or the AI really will have amnesia issues). This means my local AI doesnāt possibly know everything (I mean in the video it wasnāt entirely right about the debug kit I asked it, thereās far more than 30-40 made, lol!!).
Iām by no means an LLM expert or tech, but I have been learning over the past 3 years when I first started with an 11GB GTX 1080 Ti, which was not fun at all. Painful.
And even then, I discover that Brave does let Leo hook into local LLMās.
But, not hybrid.
Iām honestly not sure how a hybrid model thatās both cloud and local based at the same time would work. They canāt both have the same training data as the data Brave has would be astronomically larger than most of us have in RAM size (if from RAM itās slower than VRAM). So I am a bit curious and might read into how that actually works. Even fetching the training from system RAM to compute on the GPU is around 8x slower than if run from the GPUās own VRAM (which is why I have my GPU offload 100% all times, until it runs out of VRAM and is forced to use system RAM and thatās when responses start slowing down massively).
I see this (in a Linux box, Opensuse/Plasma) when I already have Brave running in a different screen and click to āopenā a link Url from inside a message in my email client.the link from my email client.
The failure on my own case occurs because click in my email program (I use Betterbird, BTW) is not configured to use an existing instance of regular release Brave, already opened.
The workaround for my misconfiguration is to ācopyā the link in Betterbird,rather than āopen it in browserā. After switching to my Brave screen, I simply open a new tab, then paste into the location bar with paste-and-go.
I have switched between ābetaā and āreleaseā channel a couple of times, so Plasma (and-or Betterbird) might be trying to run an older beta version. For the case of a the URL going to a form page, my profile is locked to the active browser ā and the ābadā window wonāt even submit the form.
I havenāt bothered to track down my configuration problem, but it is external to Brave. Does you situation sound similar?