Back to search
[NeurIPS 2024] GenRL: Multimodal-foundation world models enable grounding language and video prompts into embodied domains, by turning them into sequences of latent world model states. Latent state sequences can be decoded using the decoder of the model, allowing visualization of the expected behavior, before training the agent to execute it.
Stars
86
Forks
4
Watchers
86
Open Issues
2
Overall repository health assessment
No package.json found
This might not be a Node.js project
5
commits
Delete third_party/relay-policy-learning/kitchen_demos_multitask.zip
4cd8394View on GitHubAdded video prompts reward methods. Minor fixes in quadruped and kitchen environments. Updated to NeurIPS publication status.
5a677a1View on GitHub