Brave Update Scam

When using Brave, a window opened

and the following was displayed. Did anyone else receive this?

Such things don’t happen on their own. Take a look at the domain you are on, what website were you visiting BEFORE this happened? And what extensions do you have installed if any?

Brave does not whatsoever have anything in it’s code to go to that domain. The code is on GitHub and can be easily searched for that string, it simply doesn’t exist in the code.

I think it came up when I was searching for Block Lists for Little Snitch. (Brave is such a resource hog, even my M4 Macbook struggles to keep Sonnet from crashing) Regardless, just thought I’d post it. Thanks for your input. Happy NY!

That’s actually not you. See, when Leo runs on your computer, it’s not actually running on your computer. Think of Leo in Brave more like a ā€œdumb terminalā€ to Brave’s server which is where Leo truly is running from.

If you were running any of these LLM’s locally, you’d know, your VRAM/RAM load would slowly rise, while you wait for the model to load, and likely the computer would become unresponsive. See, in this case here, this is Brave on a Leo page, vs a LLM run locally on my machine. Notice the VRAM and RAM usage is wildly different here? This is why RAM is expensive right now. :wink:

So, if Leo’s Sonnet is crashing, that’s a worthy post to make, as that means it’s crashing on their end most likely, not you.

Brave with Leo open but no prompt yet made:

Noticed in that example I chose Qwen ā€œ14Bā€. I chose this model because I have this very model on my PC, and can give you a true Apple’s to Apple’s comparison here.

The difference is striking, isn’t it? That’s what is really going on when you run an LLM locally.

Wanna see something kinda fun though? Watch as I prompt it, watch the CPU, and GPU’s reaction. That’s how an LLM really runs on a system.

Notice it took ā€œ16ā€ seconds to think out that reply before it really started talking? This isn’t a weak consumer GPU either, and it’s running on ROCm, it’s (the GPU’s) native interface for compute such as AI workloads. So yea, I promise you, Leo and Sonnet are not actually run on your M4 MacBook, you’re using Brave’s servers.

In fact, while hard to see, while it was thinking and responding, did you notice the VRAM was slowly climbing? That’s called the context buffer/window. Basically how much of the conversation in tokens, it can ā€œrememberā€ it’s having with you. Once you spill over the set limit (you never set more than VRAM you have, and since each model can take up different amounts of VRAM, you have to adjust per model), it’ll adjust based on your configuration of said buffer.

(PS, my machines name is ā€œMiltonā€, because it was built when our power was out while we had hurricane Milton slamming our house, so Brave has Leo, I have Milton.)

@MasterLink

Thank you for what seems to me, to be a glimpse of Hybrid AI. { ← oops, my mistake }

This isn’t ā€œhybridā€ AI whatsoever. This is how LLM’s have always worked.

I made a clear example showing that Leo isn’t running on your computer or mine. It never was. All website based AI’s never run on your computer. ChatGPT? That’s not on your computer. Copilot? Ha, that still doesn’t even run on your computer, it still uses Microsoft’s.

Most people truly don’t know how LLM’s work. I’ve been saying it for a while, a lot here really don’t know what they are talking about in regards to LLM’s. Brave’s team does, but the users did not. That’s what this example is showing, the difference between using a consumer facing AI service like Leo, Copilot or ChatGPT. Your computers resources were never used once. Run an LLM locally however and that’s a very different story, now you can talk to it with no Internet at all of any kind.

But every time you talk to it, the GPU pulls so much power the lights on the desk to flicker from transients. That’s the harsh truth about LLM’s.

If people want to like AI, they should know about what it really is, and what it is not. This however, ain’t ā€œhybrid AIā€ whatsoever.

@MasterLink

Huh. I managed to read a recent article that led me to ā€œthinkā€ that Hybrid AI is a blend of Local and Cloud - the article was aimed at bursting ā€œthe AI bubble.ā€ And then I stumbled onto:

ā€˜https://blogs.nvidia.com/blog/rtx-ai-brave-browser/’

And then what you wrote above, more local and cloud blend (I had been thinking? / reading too fast while not getting sleep) . . . so apparently I went off the rails near the start, and blurred things.

During which derailing, I had noticed:

ā€˜https://www.earley.com/insights/what-is-hybrid-ai-approach-to-data-discovery’

and

ā€˜https://www.geeksforgeeks.org/artificial-intelligence/what-is-hybrid-ai-and-its-architecture/’

[I actually bookmarked both of those.]

. . . but was too tired to study, and I must have imagined some hybrid vehicle, internal combustion engine + electric motor, similarity.

You know, I have to admit something, I had NO idea Brave’s Leo had this ability whatsoever.

But yea, that’s your models. Claude Sonnet by default is setup to use Brave’s own AWS servers, but if the user is paying for Claude Sonnet through Brave, they’d have no reason to install it locally as you’re leasing it from their servers. You could pay for Claude Sonnet or GPT and install those to run locally, but you wouldn’t then be paying Brave, you’d be paying the developer of those models themselves.

These are my actual local models I have:

The size of the model includes that distilled training. While I have 20GB VRAM, I’d likely never get a model larger than 12GB, because you still need VRAM room for open apps, desktop UI, and the context buffer itself (or the AI really will have amnesia issues). This means my local AI doesn’t possibly know everything (I mean in the video it wasn’t entirely right about the debug kit I asked it, there’s far more than 30-40 made, lol!!).

I’m by no means an LLM expert or tech, but I have been learning over the past 3 years when I first started with an 11GB GTX 1080 Ti, which was not fun at all. Painful.

And even then, I discover that Brave does let Leo hook into local LLM’s. :sweat_smile: But, not hybrid.

I’m honestly not sure how a hybrid model that’s both cloud and local based at the same time would work. They can’t both have the same training data as the data Brave has would be astronomically larger than most of us have in RAM size (if from RAM it’s slower than VRAM). So I am a bit curious and might read into how that actually works. Even fetching the training from system RAM to compute on the GPU is around 8x slower than if run from the GPU’s own VRAM (which is why I have my GPU offload 100% all times, until it runs out of VRAM and is forced to use system RAM and that’s when responses start slowing down massively).

I see this (in a Linux box, Opensuse/Plasma) when I already have Brave running in a different screen and click to ā€œopenā€ a link Url from inside a message in my email client.the link from my email client.

The failure on my own case occurs because click in my email program (I use Betterbird, BTW) is not configured to use an existing instance of regular release Brave, already opened.

The workaround for my misconfiguration is to ā€œcopyā€ the link in Betterbird,rather than ā€œopen it in browserā€. After switching to my Brave screen, I simply open a new tab, then paste into the location bar with paste-and-go.

I have switched between ā€˜beta’ and ā€˜release’ channel a couple of times, so Plasma (and-or Betterbird) might be trying to run an older beta version. For the case of a the URL going to a form page, my profile is locked to the active browser – and the ā€˜bad’ window won’t even submit the form.

I haven’t bothered to track down my configuration problem, but it is external to Brave. Does you situation sound similar?