PiFlow hybrid scientific big data pipeline system

PiFlow hybrid scientific big data pipeline system

2022-09-14 0 901
Resource Number 38234 Last Updated 2025-02-24
¥ 0HKD Upgrade VIP
Download Now Matters needing attention
Can't download? Please contact customer service to submit a link error!
Value-added Service: Installation Guide Environment Configuration Secondary Development Template Modification Source Code Installation

PiFlow recommended in this issue includes a variety of processor components, including shell, DSL, web configuration interface, task scheduling, task monitoring, and other functions.

PiFlow hybrid scientific big data pipeline system插图

Project characteristics
Simple and easy to use

Visually configure pipeline monitoring pipelines, view pipeline logs, check points, and schedule pipelines

Scalability

Custom development of data processing components is supported

Superior performance

It is developed based on the distributed computing engine Spark

Powerful

It provides 100+ data processing components, including Hadoop, Spark, MLlib, Hive, Solr, Redis, MemCache, ElasticSearch, JDBC, MongoDB, HTTP, FTP, XML, CSV, JSON, etc., integrating relevant algorithms in the field of microorganisms

Architecture diagram

PiFlow hybrid scientific big data pipeline system插图1

environment
JDK 1.8
Scala-2.11.8
Apache Maven 3.1.0
Spark-2.1.0 or later
Hadoop-2.6.0
Get started

Build PiFlow:

install external package
mvn install:install-file -Dfile=/.. /piflow/piflow-bundle/lib/spark-xml_2.11-0.4.2.jar -DgroupId=com.databricks -DartifactId=spark-xml_2.11 -Dversion=0.4.2 -Dpackaging=jar
mvn install:install-file -Dfile=/.. /piflow/piflow-bundle/lib/java_memcached-release_2.6.6.jar -DgroupId=com.memcached -DartifactId=java_memcached-release -Dversion=2.6.6 -Dpackaging=jar
mvn install:install-file -Dfile=/.. /piflow/piflow-bundle/lib/ojdbc6-11.2.0.3.jar -DgroupId=oracle -DartifactId=ojdbc6 -Dversion=11.2.0.3 -Dpackaging=jar
mvn install:install-file -Dfile=/.. /piflow/piflow-bundle/lib/edtftpj.jar -DgroupId=ftpClient -DartifactId=edtftp -Dversion=1.0.0 -Dpackaging=jar
mvn clean package -Dmaven.test.skip=true
[INFO] Replacing original artifact with shaded artifact.
[INFO] Reactor Summary:
[INFO]
[INFO] piflow-project ………………………………. SUCCESS [ 4.369 s]
[INFO] piflow-core …………………………………. SUCCESS [01:23 min]
[INFO] piflow-configure …………………………….. SUCCESS [ 12.418 s]
[INFO] piflow-bundle ……………………………….. SUCCESS [02:15 min]
[INFO] piflow-server ……………………………….. SUCCESS [02:05 min]
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 06:01 min
[INFO] Finished at: 2020-05-21T15:22:58+08:00
[INFO] Final Memory: 118M/691M
[INFO] ————————————————————————

To run Piflow Server:

Running PiFlow Server on Intellij:

Download piflow: git clone https://github.com/cas-bigdatalab/piflow.git
Import PiFlow to Intellij
Edit the config.properties configuration file

Build PiFlow jar package:

Run –> Edit Configurations –> Add New Configuration –> Maven
Name: package
Command line: clean package -Dmaven.test.skip=true -X
run ‘package’ (piflow jar file will be built in .. /piflow/piflow-server/target/piflow-server-0.9.jar)

Run HttpService:

Edit Configurations –> Add New Configuration –> Application
Name: HttpService
Main class : cn.piflow.api.Main
Environment Variable: SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.6(change the path to your spark home)
run ‘HttpService’

Test HttpService:

Run a sample pipeline: /piflow/piflow-server/src/main/scala/cn/piflow/api/HTTPClientStartMockDataFlow.scala
You need to modify the server ip and port in the API

How to configure config.properties

#spark and yarn config
spark.master=yarn
spark.deploy.mode=cluster

#hdfs default file system
fs.defaultFS=hdfs://10.0.86.191:9000

#yarn resourcemanager.hostname
yarn.resourcemanager.hostname=10.0.86.191

#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.88.71:9083

#show data in log, set 0 if you do not want to show data in logs
data.show=10

#server port
server.port=8002

#h2db port
h2.port=50002
To run PiFlow Web, please go to the following link, PiFlow Server and PiFlow Web version should correspond:

https://github.com/cas-bigdatalab/piflow-web/releases/tag/v1.0

Docker image
Pull the Docker image
docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v1.1
View the information about a Docker image
docker images
If you run a container with the image ID, all PiFlow services will run automatically. Please pay attention to the setting HOST_IP
docker run -h master -itd –env HOST_IP=*.*.*.* –name piflow-v1.1 -p 6001:6001 -p 6002:6002 [imageID]
Access “HOST_IP:6001”, the startup time may be a bit slow, and you need to wait a few minutes
if somethings goes wrong, all the application are in /opt folder
Page display

PiFlow hybrid scientific big data pipeline system插图2

login

PiFlow hybrid scientific big data pipeline system插图3

List of pipelines

PiFlow hybrid scientific big data pipeline system插图4

Create a pipeline

PiFlow hybrid scientific big data pipeline system插图5

 

Configure a pipeline

PiFlow hybrid scientific big data pipeline system插图6

Configure a pipeline group

PiFlow hybrid scientific big data pipeline system插图7

A list of pipeline runs

PiFlow hybrid scientific big data pipeline system插图8

Monitor the pipeline

资源下载此资源为免费资源立即下载
Telegram:@John_Software

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free Source Code PiFlow hybrid scientific big data pipeline system https://ictcoder.com/piflow-hybrid-scientific-big-data-pipeline-system/

Share free open-source source code

Q&A
  • 1. Automatic: After making an online payment, click the (Download) link to download the source code; 2. Manual: Contact the seller or the official to check if the template is consistent. Then, place an order and make payment online. The seller ships the goods, and both parties inspect and confirm that there are no issues. ICTcoder will then settle the payment for the seller. Note: Please ensure to place your order and make payment through ICTcoder. If you do not place your order and make payment through ICTcoder, and the seller sends fake source code or encounters any issues, ICTcoder will not assist in resolving them, nor can we guarantee your funds!
View details
  • 1. Default transaction cycle for source code: The seller manually ships the goods within 1-3 days. The amount paid by the user will be held in escrow by ICTcoder until 7 days after the transaction is completed and both parties confirm that there are no issues. ICTcoder will then settle with the seller. In case of any disputes, ICTcoder will have staff to assist in handling until the dispute is resolved or a refund is made! If the buyer places an order and makes payment not through ICTcoder, any issues and disputes have nothing to do with ICTcoder, and ICTcoder will not be responsible for any liabilities!
View details
  • 1. ICTcoder will permanently archive the transaction process between both parties and snapshots of the traded goods to ensure the authenticity, validity, and security of the transaction! 2. ICTcoder cannot guarantee services such as "permanent package updates" and "permanent technical support" after the merchant's commitment. Buyers are advised to identify these services on their own. If necessary, they can contact ICTcoder for assistance; 3. When both website demonstration and image demonstration exist in the source code, and the text descriptions of the website and images are inconsistent, the text description of the image shall prevail as the basis for dispute resolution (excluding special statements or agreements); 4. If there is no statement such as "no legal basis for refund" or similar content, any indication on the product that "once sold, no refunds will be supported" or other similar declarations shall be deemed invalid; 5. Before the buyer places an order and makes payment, the transaction details agreed upon by both parties via WhatsApp or email can also serve as the basis for dispute resolution (in case of any inconsistency between the agreement and the description of the conflict, the agreement shall prevail); 6. Since chat records and email records can serve as the basis for dispute resolution, both parties should only communicate with each other through the contact information left on the system when contacting each other, in order to prevent the other party from denying their own commitments. 7. Although the probability of disputes is low, it is essential to retain important information such as chat records, text messages, and email records, in case a dispute arises, so that ICTcoder can intervene quickly.
View details
  • 1. As a third-party intermediary platform, ICTcoder solely protects transaction security and the rights and interests of both buyers and sellers based on the transaction contract (product description, agreed content before the transaction); 2. For online trading projects not on the ICTcoder platform, any consequences are unrelated to this platform; regardless of the reason why the seller requests an offline transaction, please contact the administrator to report.
View details

Related Source code

ICTcoder Customer Service

24-hour online professional services