There has been some experimentation with the use of ReLU^2 in language models in recent years, e.g., here: https://proceedings.neurips.cc/paper_files/paper/2021/file/2...
There has been some experimentation with the use of ReLU^2 in language models in recent years, e.g., here: https://proceedings.neurips.cc/paper_files/paper/2021/file/2...