Meta's Fundamental AI Research (FAIR) team released TRIBE v2 on March 26, 2026 — a tri-modal foundation model that predicts how the human brain responds to video, audio, and language. Trained on over 1,115 hours of fMRI data from more than 700 volunteers, it delivers a 70-fold jump in spatial resolution over its predecessor and can generalize to subjects it has never seen. This is not a consumer product or a chatbot upgrade. It is a research-grade tool that could meaningfully reduce the cost and time required to run neuroscience experiments — and it is fully open-sourced under a CC BY-NC license. For computational neuroscientists, BCI researchers, and AI teams studying brain-inspired architectures, this release matters right now.
What You Need to Know
TRIBE v2 is the most capable open-source brain encoding model available as of April 2026. It predicts fMRI brain responses at 70,000-voxel resolution — a 70x improvement over TRIBE v1 — without requiring new scans for each subject. According to Meta's research paper, its zero-shot predictions can correlate more strongly with population-average brain responses than any individual's actual fMRI scan.
- Use TRIBE v2 if you run neuroscience experiments and want to screen hypotheses digitally before committing to expensive fMRI sessions.
- Use TRIBE v2 if you build brain-computer interfaces and need a scalable way to model neural responses to multimodal stimuli.
- Skip it for now if your research requires temporal resolution at the millisecond scale — fMRI's inherent lag means TRIBE v2 operates on indirect, blood-flow-based signals with a delay of several seconds.
What's New in TRIBE v2
TRIBE v2 stands for TRImodal Brain Encoder, version 2. The name reflects its core architecture: a single model that simultaneously handles video, audio, and language inputs and outputs predicted fMRI brain activity maps.
Here is what changed from v1 to v2:
| Feature | TRIBE v1 | TRIBE v2 |
|---|---|---|
| Training subjects | 4 | 25 (deep) / 720 (eval) |
| fMRI hours | ~100 hours | 451.6 hrs training / 1,115.7 hrs eval |
| Spatial resolution | ~1,000 brain parcels | ~70,000 voxels |
| Modalities | Vision + Audio | Video + Audio + Language |
| Zero-shot generalization | No | Yes |
| Open-source | Yes | Yes (CC BY-NC) |
| Competition result | Algonauts 2025 winner | Foundation model successor |
Sources: Meta FAIR research blog, official paper (March 26, 2026), MarkTechPost, The Decoder.
Resolution: 1,000 parcels → 70,000 voxels
TRIBE v2 scales to approximately 70,000 brain voxels compared to roughly 1,000 in the original TRIBE model. That difference is not cosmetic. At 1,000 parcels, the model could identify broad activation regions — roughly like knowing a concert is happening somewhere in a stadium. At 70,000 voxels, it can pinpoint activity down to small cortical sub-regions, which is what separates coarse approximation from actionable neuroscience.
How the Architecture Works
TRIBE v2 takes three types of input — video, audio, and text. Each channel runs through a pre-trained Meta AI model first: Llama 3.2 for text, Wav2Vec-Bert-2.0 for audio, and Video-JEPA-2 for video. These models turn raw data into embeddings that capture what's visible in an image, audible in a sound, or readable in a sentence. A transformer then processes all three representations together, picking up patterns that hold across different stimuli, tasks, and people. A final person-specific layer translates the output into a brain map with 70,000 voxels.
This design means TRIBE v2 does not learn to perceive the world from scratch. It piggybacks on Meta's existing foundation models to extract meaning from each modality, then focuses its learning on mapping those representations to neural activity patterns.
Zero-Shot Generalization
This is the capability that makes TRIBE v2 practically useful at scale. Using an "unseen subject" layer, TRIBE v2 can predict the group-averaged response of a new cohort more accurately than the actual recording of many individual subjects within that cohort.
That means a researcher studying a new population does not need to collect fMRI data on every individual first. The model can generate a starting prediction that is, in many cases, cleaner than a single person's noisy scan.
Scaling Laws
TRIBE v2 follows a log-linear scaling law; its ability to accurately predict brain activity increases steadily as it is fed more fMRI data, with no performance plateau currently in sight. This mirrors the scaling behavior seen in large language models — a significant finding, because it suggests TRIBE v2 will keep improving as public fMRI databases grow, without a ceiling in sight.
What TRIBE v2 Can Actually Do
Meta identifies three primary use cases in its release. Each has different implications for different research communities.
| Use Case | What It Enables | Who Benefits |
|---|---|---|
| Virtual neuroscience experiments | Run thousands of hypothesis tests digitally before any human participants enter a scanner | Academic neuroscientists |
| Brain-inspired AI design | Use neural topography data to build AI systems that process information more like the human brain | AI architecture researchers |
| Neurological disorder research | Simulate how damaged or atypical brains might respond to stimuli without requiring clinical subjects | Clinical researchers, BCI developers |
In-Silico Neuroscience
The most immediate value is computational speed. This allows researchers to conduct thousands of virtual experiments — testing how the brain might react to specific stimuli or identifying where neural signaling might be breaking down — without the need for expensive and time-consuming fMRI sessions. A single fMRI session can cost thousands of dollars and takes hours to run. TRIBE v2 reduces that to a computation that runs in seconds.
Tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. Specifically, it correctly identified functional regions including the fusiform face area (FFA) and Broca's area — purely through digital simulation, with no labeled training data pointing it toward those regions.
Limitations Worth Taking Seriously
TRIBE v2's technical achievements are real, but the limitations are equally important to understand — especially given the hype this release has attracted.
| Limitation | What It Means in Practice |
|---|---|
| fMRI temporal delay | fMRI tracks blood flow, not neural firing. There's a 4–6 second lag. Millisecond-scale brain dynamics are invisible to this model. |
| Indirect signal | fMRI does not measure electrical activity directly. It measures blood oxygen levels as a proxy — BOLD signal — which introduces noise. |
| Passive brain model only | TRIBE v2 treats the brain as a receiver of stimuli. It cannot model decision-making, motor output, or active cognitive processes. |
| Three sensory channels only | Vision, audio, and language are covered. Touch, smell, taste, and proprioception are absent. |
| No clinical population coverage | The model was trained on healthy volunteers. It cannot generalize to clinical conditions or developmental differences without further work. |
| Company-reported accuracy claims | Meta's finding that synthetic signals can outperform individual scans has not yet been independently validated by third parties. |
Sources: The Decoder, The Tech Portal, Meta FAIR paper.
TRIBE v2's limitations are still significant. fMRI only measures brain activity indirectly through blood flow, with a delay of several seconds. The fast-moving dynamics of neural signals in the millisecond range stay hidden. The model also only covers three sensory channels — smell, touch, and balance are all missing.
One claim in particular warrants scrutiny. Meta claims the model's synthetic brain signals are sometimes cleaner than real fMRI data, though that's a company-reported finding without independent validation yet. That does not mean the claim is false — it may well be accurate — but neuroscience is a field where independent replication matters. Researchers should treat that specific result as a strong hypothesis pending external confirmation.
The Surprising Finding
The most counter-intuitive result in the TRIBE v2 paper is not the resolution improvement — it is the training data split. Despite being evaluated on over 1,100 hours of fMRI from 720 subjects, the model was trained on just 451.6 hours of fMRI data from 25 subjects across four naturalistic studies. The model learned its predictive power from a relatively small group scanned in depth, then generalized across hundreds of people it had never encountered.
This is a "deep before wide" strategy — prioritize intensity of data per subject over breadth of subjects in training. It worked. And it suggests that for future brain models, quality and depth of training data may matter more than raw participant count — a counterpoint to the common assumption that more subjects always means better generalization.
How It Compares to Prior Approaches
Traditional brain encoding used Finite Impulse Response (FIR) models — linear tools that predicted voxel-wise responses for a single subject at a time, trained on narrow experimental paradigms. According to the Meta FAIR paper, TRIBE v2 significantly outperforms traditional Finite Impulse Response (FIR) models, the long-standing gold standard for voxel-wise encoding.
| Model Type | Coverage | Resolution | Zero-Shot | Multimodal |
|---|---|---|---|---|
| Traditional FIR models | Single subject, single task | Variable | No | No |
| TRIBE v1 | 4 subjects, narrow tasks | ~1,000 parcels | No | Limited |
| TRIBE v2 | 720 subjects, naturalistic stimuli | ~70,000 voxels | Yes | Yes (video, audio, text) |
No commercially available or academic competitor currently matches TRIBE v2's combination of resolution, zero-shot generalization, and open access across all three modalities simultaneously, based on available published benchmarks as of April 2026.
Who This Actually Affects
Switch to TRIBE v2 if you run academic neuroscience research and currently rely on FIR encoding models or single-modality approaches. The zero-shot capability alone can eliminate the need for per-subject fMRI collection in many experimental designs.
Use TRIBE v2 if you are building brain-computer interfaces and need a scalable way to predict how diverse subjects will respond to multimodal inputs without recruiting every individual into a scanner.
Apply TRIBE v2 cautiously if you need real-time or millisecond-resolution neural data. The fMRI-based approach is a hard constraint — not a limitation Meta can engineer away in future versions without changing the underlying imaging modality.
This does not affect you yet if you are a consumer, a product developer, or an AI practitioner building general-purpose language or vision models. TRIBE v2 is a research tool, not a product feature or API service.
Watch this space if you work in computational neuromarketing or UX research. By modeling how the brain responds to visual and auditory stimuli, TRIBE v2 opens the door to computational neuromarketing. Marketing teams could eventually test how content, ads, and UX designs affect neural engagement patterns without recruiting fMRI participants. That application is not currently deployed, but the model infrastructure now exists to support it.
What to Watch Next
The model's log-linear scaling behavior — no plateau observed — means accuracy will improve as public fMRI datasets grow. Watch for third-party replication of the zero-shot accuracy claims, particularly the finding that synthetic signals can exceed individual scan quality. Meta has flagged developmental and clinical populations as a priority for future versions; that extension would substantially broaden the model's medical utility. Independent evaluations from neuroscience labs that adopt the open-source release will be the real test of how well the model holds up outside Meta's own benchmarks.
Conclusion
TRIBE v2 is the most capable open-source brain encoding model released to date. Its 70-fold resolution increase, zero-shot generalization across 720 subjects, and tri-modal architecture represent a genuine step forward for computational neuroscience — not a marginal upgrade. The practical payoff is clear: researchers can now run virtual brain experiments in seconds rather than scheduling expensive fMRI sessions. The limitations are equally real — fMRI's temporal constraints and the absence of independent validation for some key claims mean this model is a powerful research accelerator, not a complete brain model. If you work in neuroscience, BCI development, or brain-inspired AI architecture, download the model weights from Hugging Face or GitHub now and start testing. Everyone else should monitor for the independent replications that will determine how far these findings hold.


