DataHub: The metadata platform for modern data stacks

DataHub: The metadata platform for modern data stacks

2022-09-30 0 1,329
Resource Number 44035 Last Updated 2025-02-24
¥ 0HKD Upgrade VIP
Download Now Matters needing attention
Can't download? Please contact customer service to submit a link error!
Value-added Service: Installation Guide Environment Configuration Secondary Development Template Modification Source Code Installation

Recommended in this issue is DataHub, an open source metadata platform for modern data stacks.

DataHub is a modern data directory designed to support end-to-end data discovery, data observability, and data governance. This scalable metadata platform is built for developers to tame the complexity of their rapidly evolving data ecosystems and allow data practitioners to leverage the full value of data within their organizations.

class=”pgc-h-arrow-right” data-track=”4″>

1 Cross-database, data lake, BI platform, ML function storage, workflow orchestration

Here is an example of searching for assets associated with that term health: We see the results across the Looker dashboard, BigQuery dataset, and DataHub labels and users, and finally navigate to the “DataHub Health” Looker dashboard overview.

DataHub: The metadata platform for modern data stacks插图

2 Cross platform, dataset, pipeline, chart

Using the lineage view, we can navigate all upstream dependencies of the dashboard, including Looker Charts, Snowflake and s3 data sets, and Airflow Pipelines.

DataHub: The metadata platform for modern data stacks插图1

3 data set analysis

DataHub provides data set analysis and usage statistics for popular data warehouse platforms, making it easy for data practitioners to understand the shape of data and how it evolves over time.

3921c222af4f4dfd8204fc07fd399393noop.image_

4 strong document

DataHub can easily update and maintain documents as definitions and use cases evolve. In addition to managing documents through GMS, DataHub also provides rich documentation and support for external linking through UI.

DataHub: The metadata platform for modern data stacks插图3

5 Metadata quality and usage

Gain insight into the health of metadata in DataHub and how end users interact with the platform. The analytical view provides a snapshot of the number and percentage of assets, including assigned ownership, weekly active users, and the most common searches and actions.

068497ef570241eda3acd11045f3c2fdnoop.image_

Install and deploy

1 Install docker, jq, and docker-compose (if using Linux). Ensure that sufficient hardware resources are allocated to the Docker engine. Tested and validated configuration: 2 CPus, 8GB RAM, 2GB swap area, and 10GB disk space.

2 Start the Docker engine from the command line or desktop application.

3 Install the DataHub CLI and run the following command in the terminal:

python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip uninstall datahub acryl-datahub || true  # sanity check - ok if it fails
python3 -m pip install --upgrade acryl-datahub
datahub version

If you see ‘command not found’ try running the cli command with the prefix ‘python3-m’ : python3-m datahub version

4 To deploy DataHub, run the following CLI command from the terminal:

datahub docker quickstart

5 To extract sample metadata, run the following CLI command from the terminal:

datahub docker ingest-sample-data

6 To clear all the DataHub’s state (for example, before inguring your own state), you can use the CLInuke command:

datahub docker nuke

If you want to delete the container but keep the data, you can –keep-data to add flags to the command. This allows you to run the quickstart command to make DataHub run with the data you previously extracted.

Introduction to metadata ingestion

This module hosts an extensible Python-based metadata ingestion system for DataHub. This enables sending data to DataHub using Kafka or through the REST API. It’s available through our CLI tool, choreographers like Airflow, or as a library.

Before running any metadata ingestion job, you should ensure that the DataHub backend services are running.

The

recipe is a configuration file that tells our ingest script where to extract data from (the source) and where to put it (the sink). This is a simple example that extracts metadata from MSSQL (source) and puts it into datahub rest (sink).

# A sample recipe that pulls metadata from MSSQL and puts it into DataHub
# using the Rest API.
source:
  type: mssql
  config:
    username: sa
    password: ${MSSQL_PASSWORD}
    database: DemoData

transformers:
  - type: "fully-qualified-class-name-of-transformer"
    config:
      some_property: "some.value"


sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

CLI

pip install 'acryl-datahub[datahub-rest]'  # install the required  plugin
datahub ingest -c ./examples/recipes/mssql_to_datahub.yml

The –dry-run option ingest for this command performs all the ingestion steps except the write sink. This helps ensure that the ingestion recipes generate the required units of work before they are ingested into the datahub.

# Dry run
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml --dry-run
# Short-form
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml -n

The –preview option ingest of the command performs all ingestion steps, but limits processing to only the first 10 units of work generated by the source. This option facilitates quick end-to-end smoke testing of ingested formulations.

# Preview
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml --preview
# Preview with dry-run
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml -n --preview

If you want to modify the data before it arrives at the ingestion sink — for example, adding additional owners or labels — you can use the converter to write your own module and integrate it with DataHub.

class=”pgc-h-arrow-right”>

10f3f6644ab841db807c7aebaaf400f7noop.image_DataHub: The metadata platform for modern data stacks插图6

 

—END—

Open source: Apache-2.0 License

资源下载此资源为免费资源立即下载
Telegram:@John_Software

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free Source Code DataHub: The metadata platform for modern data stacks https://ictcoder.com/datahub-the-metadata-platform-for-modern-data-stacks/

Share free open-source source code

Q&A
  • 1. Automatic: After making an online payment, click the (Download) link to download the source code; 2. Manual: Contact the seller or the official to check if the template is consistent. Then, place an order and make payment online. The seller ships the goods, and both parties inspect and confirm that there are no issues. ICTcoder will then settle the payment for the seller. Note: Please ensure to place your order and make payment through ICTcoder. If you do not place your order and make payment through ICTcoder, and the seller sends fake source code or encounters any issues, ICTcoder will not assist in resolving them, nor can we guarantee your funds!
View details
  • 1. Default transaction cycle for source code: The seller manually ships the goods within 1-3 days. The amount paid by the user will be held in escrow by ICTcoder until 7 days after the transaction is completed and both parties confirm that there are no issues. ICTcoder will then settle with the seller. In case of any disputes, ICTcoder will have staff to assist in handling until the dispute is resolved or a refund is made! If the buyer places an order and makes payment not through ICTcoder, any issues and disputes have nothing to do with ICTcoder, and ICTcoder will not be responsible for any liabilities!
View details
  • 1. ICTcoder will permanently archive the transaction process between both parties and snapshots of the traded goods to ensure the authenticity, validity, and security of the transaction! 2. ICTcoder cannot guarantee services such as "permanent package updates" and "permanent technical support" after the merchant's commitment. Buyers are advised to identify these services on their own. If necessary, they can contact ICTcoder for assistance; 3. When both website demonstration and image demonstration exist in the source code, and the text descriptions of the website and images are inconsistent, the text description of the image shall prevail as the basis for dispute resolution (excluding special statements or agreements); 4. If there is no statement such as "no legal basis for refund" or similar content, any indication on the product that "once sold, no refunds will be supported" or other similar declarations shall be deemed invalid; 5. Before the buyer places an order and makes payment, the transaction details agreed upon by both parties via WhatsApp or email can also serve as the basis for dispute resolution (in case of any inconsistency between the agreement and the description of the conflict, the agreement shall prevail); 6. Since chat records and email records can serve as the basis for dispute resolution, both parties should only communicate with each other through the contact information left on the system when contacting each other, in order to prevent the other party from denying their own commitments. 7. Although the probability of disputes is low, it is essential to retain important information such as chat records, text messages, and email records, in case a dispute arises, so that ICTcoder can intervene quickly.
View details
  • 1. As a third-party intermediary platform, ICTcoder solely protects transaction security and the rights and interests of both buyers and sellers based on the transaction contract (product description, agreed content before the transaction); 2. For online trading projects not on the ICTcoder platform, any consequences are unrelated to this platform; regardless of the reason why the seller requests an offline transaction, please contact the administrator to report.
View details

Related Source code

ICTcoder Customer Service

24-hour online professional services