认识
官网:https://superset.apache.org/
文档:https://superset.apache.org/docs/intro
仓库:https://github.com/apache/superset
Superset is a modern data exploration and data visualization platform. Superset can replace or augment proprietary business intelligence tools for many teams. Superset integrates well with a variety of data sources.
Superset 提供强大的数据探索和可视化功能。它允许用户连接到各种数据源,包括 MySQL、PostgreSQL、SQLite 等数据库。连接后,用户可以创建交互式仪表板,通过 SQL 查询或用户友好的界面探索数据集,并构建各种可视化效果,如图表、表格和地图。
组成
—— 组件 | Components | 原理 | Principles
根据 superset/docker-compose.yml 文件,其包含 nginx、redis、db、superset、superset-websocket、superset-init、superset-node、superset-worker、superset-worker-beat、superset-tests-worker 组件。
根据 Architecture 文档,其包含组件:
1)The Superset application itself
2)A metadata database
3)A caching layer (optional, but necessary for some features)
4)A worker & beat (optional, but necessary for some features)
针对 Alerts and Reports、Caching、Async Queries、Dashboard Thumbnails 功能,需要开启可选组件。
The Superset Application
This is the core application. Superset operates like this:
A user visits a chart or dashboard
That triggers a SQL query to the data warehouse holding the underlying dataset
The resulting data is served up in a data visualization
The Superset application is comprised of the Python (Flask) backend application (server), API layer, and the React frontend, built via Webpack, and static assets needed for the application to work
Metadata Database
This is where chart and dashboard definitions, user information, logs, etc. are stored. Superset is tested to work with PostgreSQL and MySQL databases as the metadata database (not be confused with a data source like your data warehouse, which could be a much greater variety of options like Snowflake, Redshift, etc.).
Some installation methods like our Quickstart and PyPI come configured by default to use a SQLite on-disk database. And in a Docker Compose installation, the data would be stored in a PostgresQL container volume. Neither of these cases are recommended for production instances of Superset.
For production, a properly-configured, managed, standalone database is recommended. No matter what database you use, you should plan to back it up regularly.
Caching Layer
The caching layer serves two main functions:
Most people use Redis for their cache, but Superset supports other options too. See the cache docs for more.
Worker and Beat
This is one or more workers who execute tasks like run async queries or take snapshots of reports and send emails, and a “beat” that acts as the scheduler and tells workers when to perform their tasks. Most installations use Celery for these components.
Other components
Other components can be incorporated into Superset. The best place to learn about additional configurations is the Configuration page. For instance, you could set up a load balancer or reverse proxy to implement HTTPS in front of your Superset application, or specify a Mapbox URL to enable geospatial charts, etc.
Superset won’t even start without certain configuration settings established, so it’s essential to review that page.
性质
它提供数据过滤、排序和聚合功能,使用户能够快速从大量数据中获取见解。仪表板可以定制并与他人共享,使其成为数据分析师、商业智能团队和决策者的宝贵工具。
Superset 具有高度的可扩展性,允许开发人员添加自定义插件和集成以满足特定的业务需求。凭借其活跃的社区和持续的开发,它在不断发展和改进。
A no-code interface for building charts quickly
A powerful, web-based SQL Editor for advanced querying
A lightweight semantic layer for quickly defining custom dimensions and metrics
Out of the box support for nearly any SQL database or data engine
A wide array of beautiful visualizations to showcase your data, ranging from simple bar charts to geospatial visualizations
Lightweight, configurable caching layer to help ease database load
Highly extensible security roles and authentication options
An API for programmatic customization
A cloud-native architecture designed from the ground up for scale
数据的记录与抓取
Superset 是否可以直接记录数据 —— 豆包
注意,Superset 本身不具备直接记录数据的功能。
数据来源的性质:Superset 主要是一个数据可视化和数据分析工具,它侧重于对已有的数据进行查询、可视化展示和分析。其数据通常来自于各种外部数据源,如关系型数据库(MySQL、PostgreSQL 等)、非关系型数据库(MongoDB 等)或者云存储(Amazon S3 等)。它的设计理念是通过连接这些数据源,利用 SQL 或其他数据查询语言来获取数据,而不是用于直接记录新的数据。
间接记录数据的可能方式:
构建
部署 | Formation | 运维 | Governance
版本选择
https://github.com/apache/superset/releases
部署方式
我们通过 Helm Chart 部署
with Helm on Kubernetes
https://superset.apache.org/docs/installation/kubernetes
我们使用其内置的 Redis / PostgreSQL 数据库
helm repo add superset https://apache.github.io/superset
helm search repo superset
helm pull superset/superset --version x.x.x
helm show values superset/superset > superset.helm-values.yaml
vim superset.helm-values.yaml
# 修改 PostgreSQL 信息:存储 StorageClass 修改;通过 size 修改大小;
# 修改 PostgreSQL 信息:数据库密码;
# 修改 Redis 信息:存储 StorageClass 修改;通过 size 修改大小;
# 修改 SECRET_KEY 信息:https://superset.apache.org/docs/installation/kubernetes#security-settings
# 修改 Ingress 配置:...
# 修改 SuperSet 配置:一处 Superset 连接
helm upgrade --install --namespace superset-ha --create-namespace \
superset-ha ./superset-0.12.11.tgz -f superset-0.12.11.tgz.helm-values.yaml
应用
Creating Your First Dashboard
https://superset.apache.org/docs/using-superset/creating-your-first-dashboard
Exploring Data in Superset
https://superset.apache.org/docs/using-superset/exploring-data
Preset.io maintains an updated set of end-user documentation at docs.preset.io.
https://docs.preset.io/
改进
https://superset.apache.org/docs/intro#get-involved
4.1 Undertakings and Revisions
Issue Code Reference | 常见错误
https://superset.apache.org/docs/using-superset/issue-codes