LHASA, Nov. 20 (Xinhua) -- The Tibetan large language model, SunshineGLM V1.0, the first Tibetan foundation model in China with hundreds of billions of parameters, was launched on Wednesday in Lhasa, capital of southwest China's Xizang Autonomous Region.
During the launch event at Xizang University, Nyima Tashi, the chief scientist of the research team and a professor with the university, said that the model was trained using around 28.8 billion tokens of high-quality Tibetan-language data.
These data include a large-scale corpus of Tibetan sentences and texts, Chinese-Tibetan and Tibetan-English parallel corpora, as well as entries from Chinese-Tibetan bilingual dictionaries, covering various fields such as news reporting, law, medicine, philosophy, education, culture, science and technology.
SunshineGLM V1.0 can handle complex language structures and multi-domain knowledge, according to its developers. It demonstrates proficient semantic understanding of Tibetan, capable of producing prompt responses to queries, as well as clear and accurate content. It excels in various areas, including Tibetan text generation and machine translation.
As a foundation model, SunshineGLM V1.0 can be widely applied in the development of sector-specific models, such as in agriculture, tourism, education, Tibetan medicines and high-altitude healthcare.
Once the model gets registered with regulatory authorities, it will be officially launched for public use, Nyima Tashi said. Enditem




京公網安備 11010802027341號