< Home

Highlights

Open-World Neural Navigation

Open-World Neural Navigation

Real-time neural maps for unfamiliar indoor environments.

Compositional Vision-Language Reasoning

Compositional Vision-Language Reasoning

A training objective that improves compositional generalization.

Long-Form Video Memory Transformers

Long-Form Video Memory Transformers

Memory-augmented transformers reasoning over hour-long videos.

Self-Supervised Medical Imaging

Self-Supervised Medical Imaging

Cutting radiology annotation cost by 10x via contrastive pretraining.

Diffusion Policies for Manipulation

Diffusion Policies for Manipulation

Diffusion action models for bimanual robot hands.

Causal Discovery from Video

Causal Discovery from Video

Learning causal graphs directly from raw video.

On-Device Multimodal Inference

On-Device Multimodal Inference

Running 7B-parameter VLMs on a single mobile GPU.