fastNLP is a lightweight natural language processing (NLP) toolkit that aims to quickly implement NLP tasks and build complex models. fastNLP is composed of core, io, embeddings, modules, models and other sub-modules:
core is the core module of fastNLP, including DataSet, Trainer, Tester and other components
io is a module that implements input and output, including data set reading, model access and other functions
embeddings provide the embedding required to build complex network models
modules contain many components for building neural network models, which can help users quickly build their own networks
models include complete network models implemented using fastNLP, including common models such as CNNText, SeqLabeling, and others
fastNLP features:
project structure
Installation:
fastNLP relies on the following packages:
numpy>=1.14.2torch>=1.0.0tqdm>=4.28.1nltk>=3.4.1requestsspacyprettytable>=0.7.2
A simple example:
Preprocess text using DataSet
DataSet in fastNLP
The DataSet is the class that fastNLP uses to hold data, and typically the training set, validation set, and test set are loaded as three separate DataSet objects.
The data in the DataSet is organized like a table. For example, the following DataSet has three columns, which are called fields in fastNLP.
Each row is an instance (called an Instance in fastNLP) and each column is a field (called a FieldArray in fastNLP).
Construction of DataSet
Initializes a DataSet
from fastNLP import DataSetdata = {'raw_words':["This is the first instance .", "Second instance .", "Third instance ."],'words': [['this', 'is', 'the', 'first', 'instance', '.'], ['Second', 'instance', '.'], ['Third', 'instance', '.']],'seq_len': [6, 3, 3]}dataset = DataSet(data)# 传入的dict的每个key的value应该为具有相同长度的listprint(dataset)
Output as:
+------------------------------+------------------------------------------------+---------+ | raw_words | words | seq_len |+------------------------------+------------------------------------------------+---------+ | This is the first instance . | ['this', 'is', 'the', 'first', 'instance', ... | 6 | | Second instance . | ['Second', 'instance', '.'] | 3 | | Third instance . | ['Third', 'instance', '.'] | 3 |+------------------------------+------------------------------------------------+---------+
For more information, please go to the fastNLP Chinese documentation – fastNLP 0.6.0 documentation.