Any of DeepSeek's recent papers which are more about efficiency and that's how their inference costs can be so low.