Can it make sone complex tasks ?

What would you define as "complex"?

This kind of architecture is very similar to what Physical Intelligence used for Pi-0.5 (a VLM triggering a VLA in different areas), albeit to a smaller size for now.

You can also see some example (autonomous!) use-cases here: https://docs.innate.bot/welcome/mars-example-use-cases