认证(⇒ Operations/Authentication)
Authentication | Grafana Loki documentation
Grafana Loki 服务并不提供认证层,官方文件建议在前端的 Reverse Porxy 进行相关的认证服务。
题外话,Reverse Proxy 同时需要负责多租户的 X-Scope-OrgID HTTP Header 的添加;
Template variable service failed Query error: 504
query_range request returns a 504 · Issue #4721 · grafana/loki · GitHub
问题描述
在 Loki 中,如果时间范围选择的较旧,则会产生如下错误:
Templating Template variable service failed Query error: 504 // 这里的主要错误还是 504 网关问题
原因分析
根据 Loki 架构,我们查看 Querier 日志,发现在索引下载或日志取回的过程中,请求被取消
level=error ts=2022-11-16T12:57:19.037772819Z caller=frontend_processor.go:69 msg="error processing requests" address=10.34.0.76:9095 err="rpc error: code = Unknown desc = context canceled" level=error ts=2022-11-16T12:57:19.037798173Z caller=frontend_processor.go:69 msg="error processing requests" address=10.34.0.76:9095 err="rpc error: code = Unknown desc = context canceled" level=error ts=2022-11-16T12:57:19.037829129Z caller=frontend_processor.go:69 msg="error processing requests" address=10.34.0.76:9095 err="rpc error: code = Unknown desc = context canceled" level=error ts=2022-11-16T12:57:19.037863643Z caller=frontend_processor.go:69 msg="error processing requests" address=10.34.0.76:9095 err="rpc error: code = Unknown desc = context canceled" level=error ts=2022-11-16T12:57:19.060024914Z caller=batch.go:716 org_id=fake msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" level=warn ts=2022-11-16T12:57:19.060120216Z caller=logging.go:72 traceID=59d5437158c6d9d5 orgID=fake msg="GET /loki/api/v1/series?end=1667347199999000000&match%5B%5D=%7Bcluster_name%3D~%22rivtower-developing-130%22%7D&shards=8_of_16&start=1667260800000000000 (500) 3.105483321s Response: \"failed to get s3 object: RequestCanceled: request context canceled\\ncaused by: context canceled\\n\" ws: false; X-Scope-Orgid: fake; uber-trace-id: 59d5437158c6d9d5:7a0dd68144a4d862:2ff6a77108d5a4e1:0; " level=error ts=2022-11-16T12:57:19.060189252Z caller=frontend_processor.go:145 msg="error processing requests" err=EOF level=error ts=2022-11-16T12:57:19.060343147Z caller=batch.go:716 org_id=fake msg="error fetching chunks" err="failed to get s3 object: RequestCanceled: request context canceled\ncaused by: context canceled" level=warn ts=2022-11-16T12:57:19.060388366Z caller=logging.go:72 traceID=59d5437158c6d9d5 orgID=fake msg="GET /loki/api/v1/series?end=1667347199999000000&match%5B%5D=%7Bcluster_name%3D~%22rivtower-developing-130%22%7D&shards=7_of_16&start=1667260800000000000 (500) 3.266632968s Response: \"failed to get s3 object: RequestCanceled: request context canceled\\ncaused by: context canceled\\n\" ws: false; X-Scope-Orgid: fake; uber-trace-id: 59d5437158c6d9d5:7a0dd68144a4d862:2ff6a77108d5a4e1:0; " level=error ts=2022-11-16T12:57:19.060416449Z caller=frontend_processor.go:145 msg="error processing requests" err=EOF
根据 Querier 日志量,我们猜测是请求较多而导致的请求被取消。而请求较多的可能原因是:查询时间分片太小,而产生很多查询分片,进而导致请求过多。
解决方案
尝试增大查询分片(split_queries_by_interval: 6h,默认 15 分钟),目前 504 情况得以改善。
但是如果时间范围更长(30d),仍旧会存在 504 的问题,其原因在于索引以 15min 为单位,过程时间范围要下载多个索引文件。
至于是否要进行相关调整,这取决于在日常中常用的查询范围。目前,能够支持 7d 查询,我们不需要进行其他调整。