Experience link: https://welm.weixin.qq.com/docs/playground/
API interface: https://welm.weixin.qq.com/docs/api/
Paper address: https://arxiv.org/abs/2209.10372
Other related reference links:
The WeChat version of the big language model is here: cross time and space dialogue with Li Bai, teaching you how to speak with high emotional intelligence, online playable – quantum bit
https://baijiahao.baidu.com/s?id=1746546375886224043&wfr=spider&for=pc
The field of large-scale language modeling welcomes new “players”. Recently, WeChat AI launched its self-developed NLP large-scale language model WeLM, which is a reasonably sized Chinese model capable of completing various NLP tasks, including multilingual tasks, in zero sample and few sample scenarios.
At the same time, the WeChat AI team also provides WeLM’s experience webpage and API interface, which interested users can visit https://welm.weixin.qq.com/docs/ The related technical paper “WeLM: A Well Read Pre trained Language Model for Chinese” for experiencing and applying API interfaces has also been published on the preprint website arXiv.
NLP Big Model Welcome Contestants, WeLM Provides Interactive Web PlayGround and API Interface
In the wave of development in the field of natural language processing (NLP) in recent years, the GPT-3 natural language processing model developed by OpenAI undoubtedly has the upper hand. At the beginning of its release, the zero sample and small sample learning ability demonstrated by the pre trained model with a parameter scale of 175 billion refreshed people’s cognition and ignited the trend of AI big model research.
For the industry, pre trained large models have lowered the threshold for AI applications and are getting closer to the grand goal of “AI liberating humans from repetitive labor”. Currently, based on GPT-3, global developers have explored a wide range of application scenarios, including programming, email replies, UI design, answering mathematical questions, legal language conversion, summarizing central ideas, reasoning, text processing, etc. Moreover, researchers from various countries are also writing a new chapter of competition for large models from multiple language/multi task perspectives.
In the field of large-scale language models with Chinese as the core in China, WeChat AI’s billion dollar large-scale language model WeLM is a new player in the competition of the Hundred Schools of Models.
It is reported that WeLM is a Chinese model worth billions of dollars, capable of completing various NLP tasks including dialogue interview, reading comprehension, translation, rewriting, continuation, and multilingual reading comprehension in zero sample and few sample contexts, and possessing memory, self correction, and checking abilities. Moreover, WeLM has the advantage of reasonable size. In 14 Chinese NLP tasks, WeLM’s overall performance exceeded all models of the same size, and even matched models 25 times larger than it.
Taking text style conversion (rewriting), which is widely considered a more difficult NLP task, as an example, although the 5 examples provided by the user do not overlap with the style conversion type that needs to be generated in the end, WeLM has excellent ability to generalize and achieve text conversion of any type by learning a small number of text conversion examples. Moreover, WeLM has equally excellent performance in multiple Chinese text generation tasks such as dialogue interview, reading comprehension, translation, and continuation writing.
In addition to possessing strong Chinese comprehension and generation capabilities, WeLM also has the ability to handle tasks across multiple languages (Chinese, English, and Japanese). Taking the sentence “WeLM launched by WeChat AI is a language model that performs a task” as an example, the translation of WeLM is more accurate compared to Google Translate, which mixes Chinese, Japanese, and English languages.
Moreover, after further fine-tuning, WeLM can have better zero sample learning ability and perform better according to the scene. At present, WeLM has been deployed and applied in some scenarios of WeChat video accounts, and will be further optimized for more WeChat application scenarios in the future.
At the same time, in order to further promote WeLM as a truly practical tool that can be implemented, the WeChat AI team has also released an interactive web page PlayGround for user experience and opened up API interfaces for accessing WeLM.
Currently, users can access it through https://welm.weixin.qq.com/docs/ Experience the relevant capabilities of WeLM and adjust the configuration to achieve a more realistic text generation effect. For developers who want to access WeLM, they can also do so through https://welm.weixin.qq.com/docs/api/ After filling out the questionnaire, obtain the API Token of WeLM and call the corresponding interface to deploy WeLM on your own application.
With strong knowledge reserves, WeLM performs outstandingly in 14 Chinese NLP tasks
It is reported that in the selection of mainstream NLP model paths such as pure Encoder (Bert), pure Decoder (GPT), and Encoder Decoder (T5) structures, WeLM, like GPT3 and Google PaLM, has chosen the path of autoregressive models. Meanwhile, considering that different users may have considerations or trade-offs regarding model performance and inference latency, WeChat AI’s WeLM trained three versions of the model: 1.3B, 2.7B, and 10B, to meet the calling needs of different users.
At the same time, in terms of training data, the WeChat AI team hopes to build a sufficiently rich, clean, and fair dataset. To this end, the research team downloaded nearly two years of Chinese web page data, as well as a large number of books and news from Common Crawl. In order to enhance professional capabilities, the WeChat AI team also supplemented the dataset with knowledge intensive forum data and some academic papers. The collected total data was 10TB, including 750GB of English data and retaining some Japanese and Korean language data.
Subsequently, through rule filtering and additional training of the binary classification fasttext model, as well as removal of evaluation related data, the final processed data volume of the dataset was 262B tokens. In order to better balance the proportion of various data sources, the WeChat AI team also sampled the data with different proportions, and ultimately, the topic distribution of the overall dataset was smoother compared to Common Crawl.
In comparative tests with industry peers CPM, Huawei Pangu, and Baidu Ernie3.0, WeLM demonstrated extremely strong knowledge reserves. In 14 Chinese NLP tasks, WeLM’s overall performance exceeded all models of the same size, and even matched models 25 times larger than it. At the same time, in addition to its strong Chinese comprehension and generation capabilities, WeLM also has excellent multilingual comprehension capabilities, allowing users to smoothly switch between Chinese, Japanese, and English inputs.
At present, WeLM’s related technical paper “WeLM: A Well Read Pre trained Language Model for Chinese” has been published on the paper preprint website arXiv. Interested users can visit https://arxiv.org/abs/2209.10372 View more technical details.
In the field of NLP, making big models a truly practical tool that can be implemented is the unwavering direction of every NLP researcher. In the future, WeChat AI will further fine tune and optimize WeLM to enhance its generalization performance on new tasks. We also welcome more developers and users to experience WeLM and provide valuable opinions and suggestions to help the model become a truly practical tool that can be implemented as soon as possible, and explore the development path of artificial intelligence together.