The depth-capture technology has been around for a while and the motion-capture from it looks like it's feasible (considering the low quality and lag shown in the live demos), and simple vocal recognition of color names and simple phrases is certainly plausible. I seriously doubt the Molyneaux "Milo" demo as being anything but half-baked fantasy wank though (but then again that's nothing new for Molyneaux).
This is the 'we're too cool for actual games so give us some bullshit that we can call more trendy because it involves flailing' crowd that the Wii generally caters to.