原推:Same prompt, different model size.
Not apparent from the 350m parameter model that it would ever get to written text representation.
Can we really pre-predict the capability of a 200b or 2t parameter model (fed with sufficient volumes of data)?
翻译英文优质信息和名人推特
原推:Same prompt, different model size.
Not apparent from the 350m parameter model that it would ever get to written text representation.
Can we really pre-predict the capability of a 200b or 2t parameter model (fed with sufficient volumes of data)?