DataHub is a modern data directory designed to support end-to-end data discovery, data observability, and data governance. This scalable metadata platform is built for developers to tame the complexity of their rapidly evolving data ecosystems and allow data practitioners to leverage the full value of data within their organizations.

class=”pgc-h-arrow-right” data-track=”4″>

1 Cross-database, data lake, BI platform, ML function storage, workflow orchestration

Here is an example of searching for assets associated with that term health: We see the results across the Looker dashboard, BigQuery dataset, and DataHub labels and users, and finally navigate to the “DataHub Health” Looker dashboard overview.

DataHub: The metadata platform for modern data stacks插图

2 Cross platform, dataset, pipeline, chart

Using the lineage view, we can navigate all upstream dependencies of the dashboard, including Looker Charts, Snowflake and s3 data sets, and Airflow Pipelines.

DataHub: The metadata platform for modern data stacks插图1

3 data set analysis

DataHub provides data set analysis and usage statistics for popular data warehouse platforms, making it easy for data practitioners to understand the shape of data and how it evolves over time.

3921c222af4f4dfd8204fc07fd399393noop.image_

4 strong document

DataHub can easily update and maintain documents as definitions and use cases evolve. In addition to managing documents through GMS, DataHub also provides rich documentation and support for external linking through UI.

DataHub: The metadata platform for modern data stacks插图3

5 Metadata quality and usage

Gain insight into the health of metadata in DataHub and how end users interact with the platform. The analytical view provides a snapshot of the number and percentage of assets, including assigned ownership, weekly active users, and the most common searches and actions.

068497ef570241eda3acd11045f3c2fdnoop.image_

Install and deploy

1 Install docker, jq, and docker-compose (if using Linux). Ensure that sufficient hardware resources are allocated to the Docker engine. Tested and validated configuration: 2 CPus, 8GB RAM, 2GB swap area, and 10GB disk space.

2 Start the Docker engine from the command line or desktop application.

3 Install the DataHub CLI and run the following command in the terminal:

python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip uninstall datahub acryl-datahub || true  # sanity check - ok if it fails
python3 -m pip install --upgrade acryl-datahub
datahub version

If you see ‘command not found’ try running the cli command with the prefix ‘python3-m’ : python3-m datahub version

4 To deploy DataHub, run the following CLI command from the terminal:

datahub docker quickstart

5 To extract sample metadata, run the following CLI command from the terminal:

datahub docker ingest-sample-data

6 To clear all the DataHub’s state (for example, before inguring your own state), you can use the CLInuke command:

datahub docker nuke

If you want to delete the container but keep the data, you can –keep-data to add flags to the command. This allows you to run the quickstart command to make DataHub run with the data you previously extracted.

Introduction to metadata ingestion

This module hosts an extensible Python-based metadata ingestion system for DataHub. This enables sending data to DataHub using Kafka or through the REST API. It’s available through our CLI tool, choreographers like Airflow, or as a library.

Before running any metadata ingestion job, you should ensure that the DataHub backend services are running.

The

recipe is a configuration file that tells our ingest script where to extract data from (the source) and where to put it (the sink). This is a simple example that extracts metadata from MSSQL (source) and puts it into datahub rest (sink).

# A sample recipe that pulls metadata from MSSQL and puts it into DataHub
# using the Rest API.
source:
  type: mssql
  config:
    username: sa
    password: ${MSSQL_PASSWORD}
    database: DemoData

transformers:
  - type: "fully-qualified-class-name-of-transformer"
    config:
      some_property: "some.value"


sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

CLI

pip install 'acryl-datahub[datahub-rest]'  # install the required  plugin
datahub ingest -c ./examples/recipes/mssql_to_datahub.yml

The –dry-run option ingest for this command performs all the ingestion steps except the write sink. This helps ensure that the ingestion recipes generate the required units of work before they are ingested into the datahub.

# Dry run
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml --dry-run
# Short-form
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml -n

The –preview option ingest of the command performs all ingestion steps, but limits processing to only the first 10 units of work generated by the source. This option facilitates quick end-to-end smoke testing of ingested formulations.

# Preview
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml --preview
# Preview with dry-run
datahub ingest -c  ./examples/recipes/example_to_datahub_rest.yml -n --preview

If you want to modify the data before it arrives at the ingestion sink — for example, adding additional owners or labels — you can use the converter to write your own module and integrate it with DataHub.

class=”pgc-h-arrow-right”>

10f3f6644ab841db807c7aebaaf400f7noop.image_ DataHub: The metadata platform for modern data stacks插图6

—END—

Open source: Apache-2.0 License

资源下载此资源为免费资源立即下载

Telegram:@John_Software

collect(0) Like (0)

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free source code DataHub: The metadata platform for modern data stacks https://ictcoder.com/kyym/datahub-the-metadata-platform-for-modern-data-stacks.html

lllll

Share free open-source source code

Previous article： Goodbye crontab! This time-scheduled task management system developed in Go language is really delicious

Next article： Block management JavaScript framework for building complex pages in a simple way

Q&A

What is the delivery method?

1, automatic: after taking the photo, click the (download) link to download; 2. Manual: After taking the photo, contact the seller to issue it or contact the official to find the developer to ship.

View details

How long is the trading cycle?

1, the default transaction cycle of the source code: manual delivery of goods for 1-3 days, and the user payment amount will enter the platform guarantee until the completion of the transaction or 3-7 days can be issued, in case of disputes indefinitely extend the collection amount until the dispute is resolved or refunded!

View details

Matters needing attention

1. Heptalon will permanently archive the process of trading between the two parties and the snapshots of the traded goods to ensure that the transaction is true, effective and safe! 2, Seven PAWS can not guarantee such as "permanent package update", "permanent technical support" and other similar transactions after the merchant commitment, please identify the buyer; 3, in the source code at the same time there is a website demonstration and picture demonstration, and the site is inconsistent with the diagram, the default according to the diagram as the dispute evaluation basis (except for special statements or agreement); 4, in the absence of "no legitimate basis for refund", the commodity written "once sold, no support for refund" and other similar statements, shall be deemed invalid; 5, before the shooting, the transaction content agreed by the two parties on QQ can also be the basis for dispute judgment (agreement and description of the conflict, the agreement shall prevail); 6, because the chat record can be used as the basis for dispute judgment, so when the two sides contact, only communicate with the other party on the QQ and mobile phone number left on the systemhere, in case the other party does not recognize self-commitment. 7, although the probability of disputes is very small, but be sure to retain such important information as chat records, mobile phone messages, etc., in case of disputes, it is convenient for seven PAWS to intervene in rapid processing.

View details

Systemhere declaration

1. As a third-party intermediary platform, Qichou protects the security of the transaction and the rights and interests of both buyers and sellers according to the transaction contract (commodity description, content agreed before the transaction); 2, non-platform online trading projects, any consequences have nothing to do with mutual site; No matter the seller for any reason to require offline transactions, please contact the management report.

View details