The recommended DeepKE in this issue is an open-source knowledge graph extraction toolset maintained by the knowledge graph team of Zhejiang University.
Project Introduction
DeepKE is a knowledge extraction toolkit that supports low-resource and document-level scenarios. It provides three PyTorch-based features, including named entity recognition, relationship extraction, and attribute extraction. DeepKE implements a variety of information extraction tasks, including named entity recognition, relationship extraction, and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured text according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementations for different functions and scenarios, but also organizes all components through a consistent framework to maintain sufficient modularity and scalability.
Supports weight and deviation
To enable automatic hyperparameter fine-tuning, DeepKE employs Weight & Biases, a machine learning toolkit for developers to build better models faster. With this toolkit, DeepKE can better automate the visualization of results and adjust parameters. The toolkit is supported by sample runtimes for all functions in the repository, and researchers are able to modify the metric and hyperparameter configurations as needed.
Model architecture
Architecture diagram
- DeepKE has designed a unified framework for three knowledge extraction functions (named entity recognition, relationship extraction, and attribute extraction).
- Different functions can be implemented in different scenarios. For example, relationship extraction can be performed in standard full supervision, low resources and few samples, and document-level settings
- Each application scenario consists of three parts: the Data part contains the Tokenizer, Preprocessor, and Loader, the Model part contains the Module, Encoder, and Forwarder, and the Core part contains Training, Evaluation, and Prediction
Get started quickly
DeepKE supports pip installation and use, taking the conventional fully supervised setting relationship extraction as an example, a regular relationship extraction model can be realized through the following 6 steps
1. Download the code
git clone https://github.com/zjunlp/DeepKE.git
2. Use anaconda to create a virtual environment and enter the virtual environment (provide the Dockerfile source code to create an image by yourself, located in the docker folder)
conda create -n deepke python=3.8
conda activate deepke
//Provide a dockerfile to create a docker image
cd docker
docker build -t deepke .
conda activate deepke
1)Installed based on pip and used directly
pip install deepke
2) Installation based on source code
python setup.py install
python setup.py develop
3.Go to the task folder and take regular relationship extraction as an example
cd DeepKE/example/re/standard
4.Download the dataset
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data.tar.gz
5.For model training, the parameters used in the training can be modified in the conf folder
DeepKE uses wandb to support visual parameter tuning
python run.py
6.Model prediction. The parameters used for prediction can be modified in the conf folder
Modify conf/predict.yaml to save the path of the trained model.
python predict.py
example
Standard renewables
The standard modules are implemented by common deep learning models, including CNNs, RNNs, Capsules, GCNs, Transformers, and pretrained models.
Step 1
enter
DeepKE/example/re/standard folder.
Step 2
Get the data:
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data .tar.gz
dataYou can customize the dataset and the parameter conf in folders and folders separately.
The dataset needs to be imported as a CSV file.
The data format of the file must meet the following requirements:
sentence |
relationship |
head |
Head_offset |
tail |
tail_offset |
The file format of the relationship needs to meet the following requirements:
Head shape |
Tail type |
relationship |
index |
Step 3
Train:
Python runs .py
Step 4
Forecast:
Python predicts .py
cd example/re/standard
wget 120.27.214.45/Data/re/standard/data.tar.gz
tar -xzvf data.tar.gz
python run.py
python predict.py
remark
- When using Anaconda, it is recommended to add a domestic image for faster downloads.
- When using pip, we recommend that you use domestic images, such as Alibaba Cloud images, for faster downloads.
- After installation, the ModuleNotFoundError: No module named ‘past’ message is displayed, and the pip install future command is used to solve the problem.
- When using a language pretrained model, it is slow to install and download the model online, so it is recommended to download it in advance and store it in the pretrained folder. For specific file storage requirements, see the README.md in the folder.
- The old version of DeepKE is located in the deepke-v1.0 branch, and users can switch branches to use the old version, and all the capabilities of the old version have been migrated to the standard setting relationship extraction (example/re/standard).