Inference of Meta's LLaMA model (and others) in pure C/C++ with minimal setup and state-of-the-art performance on a wide range of hardware