Skip to content

Commit

Permalink
⚡️ pprof 性能调优 / 计划任务支持秒级
Browse files Browse the repository at this point in the history
  • Loading branch information
naiba committed Sep 29, 2021
1 parent 1f1e0b6 commit 47dfa47
Show file tree
Hide file tree
Showing 22 changed files with 149 additions and 105 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,11 @@

# Output of the go coverage tool, specifically when used with LiteIDE
*.out
*.pprof
.idea
/data
/dist
.DS_Store
/main
/cmd/agent/main
/cmd/dashboard/main
30 changes: 20 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<br>
<small><i>LOGO designed by <a href="https://xio.ng" target="_blank">熊大</a> .</i></small>
<br><br>
<img src="https://img.shields.io/github/workflow/status/naiba/nezha/Dashboard%20image?label=Dash%20v0.10.1&logo=github&style=for-the-badge">&nbsp;<img src="https://img.shields.io/github/v/release/naiba/nezha?color=brightgreen&label=Agent&style=for-the-badge&logo=github">&nbsp;<img src="https://img.shields.io/github/workflow/status/naiba/nezha/Agent%20release?label=Agent%20CI&logo=github&style=for-the-badge">&nbsp;<img src="https://img.shields.io/badge/Installer-v0.7.0-brightgreen?style=for-the-badge&logo=linux">
<img src="https://img.shields.io/github/workflow/status/naiba/nezha/Dashboard%20image?label=Dash%20v0.10.2&logo=github&style=for-the-badge">&nbsp;<img src="https://img.shields.io/github/v/release/naiba/nezha?color=brightgreen&label=Agent&style=for-the-badge&logo=github">&nbsp;<img src="https://img.shields.io/github/workflow/status/naiba/nezha/Agent%20release?label=Agent%20CI&logo=github&style=for-the-badge">&nbsp;<img src="https://img.shields.io/badge/Installer-v0.7.0-brightgreen?style=for-the-badge&logo=linux">
<br>
<br>
<p>:trollface: <b>哪吒监控</b> 一站式轻监控轻运维系统。支持系统状态、HTTP(SSL 证书变更、即将到期、到期)、TCP、Ping 监控报警,命令批量执行和计划任务。</p>
Expand All @@ -14,9 +14,9 @@

\>> [我们的用户](https://www.google.com/search?q="powered+by+哪吒监控"&filter=0) (Google)

| 默认主题 | DayNight [@JackieSung](https://github.com/JackieSung4ev) | hotaru |
| ------------------------------------------------------- | -------------------------------------------------------- | ---------------------------------------------------------------------- |
| ![首页截图1](https://s3.ax1x.com/2020/12/07/DvTCwD.jpg) | <img src="https://s3.ax1x.com/2021/01/20/sfJv2q.jpg"/> | <img src="https://s3.ax1x.com/2020/12/09/rPF4xJ.png" width="1600px" /> |
| 默认主题 | DayNight [@JackieSung](https://github.com/JackieSung4ev) | hotaru |
| ----------------------------------------------------------- | ------------------------------------------------------------ | -------------------------------------------------------------------------- |
| ![默认主题](resource/template/theme-default/screenshot.png) | ![daynight](resource/template/theme-daynight/screenshot.png) | <img src="resource/template/theme-hotaru/screenshot.png" width="1600px" /> |

## 安装脚本

Expand All @@ -36,11 +36,12 @@ CN=true sudo ./nezha.sh

_\* 使用 WatchTower 可以自动更新面板,Windows 终端可以使用 nssm 配置自启动(见尾部教程)_

### 特殊技能
### 增强配置

通过执行 `./nezha-agent --help` 查看支持的参数,如果你使用一键脚本,可以编辑 `/etc/systemd/system/nezha-agent.service`,在 `ExecStart=` 这一行的末尾加上

- `--skip-conn` 不监控连接数,机场/连接密集型机器推荐设置,不然比较占 CPU([shirou/gopsutil/issues#220](https://github.com/shirou/gopsutil/issues/220))
- `--skip-procs` 不监控进程数,也可以降低 agent 占用
- `--disable-auto-update` 禁止 Agent 自动更新

## 功能说明
Expand Down Expand Up @@ -230,7 +231,7 @@ URL 里面也可放置占位符,请求时会进行简单的字符串替换。
</details>

<details>
<summary>Agent 不断重启/无法启动 ?</summary>
<summary>Agent不上线不启动自检流程</summary>

1. 直接执行 `/opt/nezha/agent/nezha-agent -s 面板IP或非CDN域名:面板RPC端口 -p Agent密钥 -d` 查看日志是否是 DNS 问题。
2. `nc -v 域名/IP 面板RPC端口` 或者 `telnet 域名/IP 面板RPC端口` 检验是否是网络问题,检查本机与面板服务器出入站防火墙,如果单机无法判断可借助 <https://port.ping.pe/> 提供的端口检查工具进行检测。
Expand All @@ -243,7 +244,7 @@ URL 里面也可放置占位符,请求时会进行简单的字符串替换。

首先在 release 下载对应的二进制解压 tar.gz 包后放置到 `/root`,然后 `chmod +x /root/nezha-agent` 赋予执行权限,然后创建 `/etc/init.d/nezha-service`

```
```shell
#!/bin/sh /etc/rc.common

START=99
Expand Down Expand Up @@ -272,8 +273,9 @@ restart() {
</details>

<details>
<summary>实时通道断开/终端连接失败</summary>
使用反向代理时需要针对 `/ws` 路径的 WebSocket 进行特别配置以支持实时更新服务器状态。
<summary>实时通道断开/网页终端连接失败</summary>

使用反向代理时需要针对 `/ws`,`/terminal` 路径的 WebSocket 进行特别配置以支持实时更新服务器状态和 **WebSSH**

- Nginx(宝塔):在你的 nginx 配置文件中加入以下代码

Expand All @@ -285,7 +287,6 @@ restart() {
location ~ ^/(ws|terminal/.+)$ {
proxy_pass http://ip:站点访问端口;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
Expand All @@ -295,6 +296,15 @@ restart() {
}
```

如果非宝塔,还要在 `server{}` 中添加上这一段

```nginx
location / {
proxy_pass http://ip:站点访问端口;
proxy_set_header Host $host;
}
```

- CaddyServer v1(v2 无需特别配置)

```Caddyfile
Expand Down
31 changes: 16 additions & 15 deletions cmd/agent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,25 +30,21 @@ import (

type AgentConfig struct {
SkipConnectionCount bool
SkipProcsCount bool
DisableAutoUpdate bool
Debug bool
Server string
ClientSecret string
}

func init() {
http.DefaultClient.Timeout = time.Second * 5
flag.CommandLine.ParseErrorsWhitelist.UnknownFlags = true
}

var (
version string
agentConf AgentConfig
version string
client pb.NezhaServiceClient
inited bool
)

var (
client pb.NezhaServiceClient
inited bool
agentConf AgentConfig
updateCh = make(chan struct{}) // Agent 自动更新间隔
httpClient = &http.Client{
CheckRedirect: func(req *http.Request, via []*http.Request) error {
Expand All @@ -63,6 +59,11 @@ const (
networkTimeOut = time.Second * 5 // 普通网络超时
)

func init() {
http.DefaultClient.Timeout = time.Second * 5
flag.CommandLine.ParseErrorsWhitelist.UnknownFlags = true
}

func main() {
// 来自于 GoReleaser 的版本号
monitor.Version = version
Expand All @@ -71,6 +72,7 @@ func main() {
flag.StringVarP(&agentConf.Server, "server", "s", "localhost:5555", "管理面板RPC端口")
flag.StringVarP(&agentConf.ClientSecret, "password", "p", "", "Agent连接Secret")
flag.BoolVar(&agentConf.SkipConnectionCount, "skip-conn", false, "不监控连接数")
flag.BoolVar(&agentConf.SkipProcsCount, "skip-procs", false, "不监控进程数")
flag.BoolVar(&agentConf.DisableAutoUpdate, "disable-auto-update", false, "禁用自动升级")
flag.Parse()

Expand All @@ -88,7 +90,6 @@ func run() {
}

go pty.DownloadDependency()

// 上报服务器信息
go reportState()
// 更新IP信息
Expand Down Expand Up @@ -168,8 +169,8 @@ func receiveTasks(tasks pb.NezhaService_RequestTaskClient) error {
}
go func() {
defer func() {
if recover() != nil {
println("task panic", task)
if err := recover(); err != nil {
println("task panic", task, err)
}
}()
doTask(task)
Expand Down Expand Up @@ -209,7 +210,7 @@ func reportState() {
if client != nil && inited {
monitor.TrackNetworkSpeed()
timeOutCtx, cancel := context.WithTimeout(context.Background(), networkTimeOut)
_, err = client.ReportSystemState(timeOutCtx, monitor.GetState(agentConf.SkipConnectionCount).PB())
_, err = client.ReportSystemState(timeOutCtx, monitor.GetState(agentConf.SkipConnectionCount, agentConf.SkipProcsCount).PB())
cancel()
if err != nil {
println("reportState error", err)
Expand All @@ -229,7 +230,7 @@ func doSelfUpdate() {
println("检查更新:", v)
latest, err := selfupdate.UpdateSelf(v, "naiba/nezha")
if err != nil {
println("自动更新失败:", err)
println("更新失败:", err)
return
}
if !latest.Version.Equals(v) {
Expand Down Expand Up @@ -282,7 +283,7 @@ func handleHttpGetTask(task *pb.Task, result *pb.TaskResult) {
}
if err == nil {
// 检查 SSL 证书信息
if len(resp.TLS.PeerCertificates) > 0 {
if resp.TLS != nil && len(resp.TLS.PeerCertificates) > 0 {
c := resp.TLS.PeerCertificates[0]
result.Data = c.Issuer.CommonName + "|" + c.NotAfter.In(time.Local).String()
}
Expand Down
64 changes: 38 additions & 26 deletions cmd/agent/monitor/monitor.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ import (
"regexp"
"runtime"
"strings"
"sync/atomic"
"syscall"
"time"

Expand All @@ -15,20 +14,27 @@ import (
"github.com/shirou/gopsutil/v3/load"
"github.com/shirou/gopsutil/v3/mem"
"github.com/shirou/gopsutil/v3/net"
"github.com/shirou/gopsutil/v3/process"

"github.com/naiba/nezha/model"
)

var Version string = "debug"
var netInSpeed, netOutSpeed, netInTransfer, netOutTransfer, lastUpdate uint64
var expectDiskFsTypes = []string{
"apfs", "ext4", "ext3", "ext2", "f2fs", "reiserfs", "jfs", "btrfs",
"fuseblk", "zfs", "simfs", "ntfs", "fat32", "exfat", "xfs", "fuse.rclone",
}
var excludeNetInterfaces = []string{
"lo", "tun", "docker", "veth", "br-", "vmbr", "vnet", "kube",
}
var getMacDiskNo = regexp.MustCompile(`\/dev\/disk(\d)s.*`)
var (
Version string = "debug"
expectDiskFsTypes = []string{
"apfs", "ext4", "ext3", "ext2", "f2fs", "reiserfs", "jfs", "btrfs",
"fuseblk", "zfs", "simfs", "ntfs", "fat32", "exfat", "xfs", "fuse.rclone",
}
excludeNetInterfaces = []string{
"lo", "tun", "docker", "veth", "br-", "vmbr", "vnet", "kube",
}
getMacDiskNo = regexp.MustCompile(`\/dev\/disk(\d)s.*`)
)

var (
netInSpeed, netOutSpeed, netInTransfer, netOutTransfer, lastUpdateNetStats uint64
cachedBootTime time.Time
)

func GetHost() *model.Host {
hi, _ := host.Info()
Expand Down Expand Up @@ -58,6 +64,8 @@ func GetHost() *model.Host {
swapMemTotal = mv.SwapTotal
}

cachedBootTime = time.Now().Add(time.Duration(-1 * int64(hi.BootTime*1000)))

return &model.Host{
Platform: hi.OS,
PlatformVersion: hi.PlatformVersion,
Expand All @@ -74,8 +82,12 @@ func GetHost() *model.Host {
}
}

func GetState(skipConnectionCount bool) *model.HostState {
hi, _ := host.Info()
func GetState(skipConnectionCount bool, skipProcsCount bool) *model.HostState {
var procs []int32
if !skipProcsCount {
procs, _ = process.Pids()
}

mv, _ := mem.VirtualMemory()

var swapMemUsed uint64
Expand All @@ -92,11 +104,11 @@ func GetState(skipConnectionCount bool) *model.HostState {
if err == nil {
cpuPercent = cp[0]
}

_, diskUsed := getDiskTotalAndUsed()
loadStat, _ := load.Avg()

var tcpConnCount, udpConnCount uint64

if !skipConnectionCount {
conns, _ := net.Connections("all")
for i := 0; i < len(conns); i++ {
Expand All @@ -114,17 +126,17 @@ func GetState(skipConnectionCount bool) *model.HostState {
MemUsed: mv.Total - mv.Available,
SwapUsed: swapMemUsed,
DiskUsed: diskUsed,
NetInTransfer: atomic.LoadUint64(&netInTransfer),
NetOutTransfer: atomic.LoadUint64(&netOutTransfer),
NetInSpeed: atomic.LoadUint64(&netInSpeed),
NetOutSpeed: atomic.LoadUint64(&netOutSpeed),
Uptime: hi.Uptime,
NetInTransfer: netInTransfer,
NetOutTransfer: netOutTransfer,
NetInSpeed: netInSpeed,
NetOutSpeed: netOutSpeed,
Uptime: uint64(time.Since(cachedBootTime).Seconds()),
Load1: loadStat.Load1,
Load5: loadStat.Load5,
Load15: loadStat.Load15,
TcpConnCount: tcpConnCount,
UdpConnCount: udpConnCount,
ProcessCount: hi.Procs,
ProcessCount: uint64(len(procs)),
}
}

Expand All @@ -140,14 +152,14 @@ func TrackNetworkSpeed() {
innerNetOutTransfer += v.BytesSent
}
now := uint64(time.Now().Unix())
diff := now - atomic.LoadUint64(&lastUpdate)
diff := now - lastUpdateNetStats
if diff > 0 {
atomic.StoreUint64(&netInSpeed, (innerNetInTransfer-atomic.LoadUint64(&netInTransfer))/diff)
atomic.StoreUint64(&netOutSpeed, (innerNetOutTransfer-atomic.LoadUint64(&netOutTransfer))/diff)
netInSpeed = (innerNetInTransfer - netInTransfer) / diff
netOutSpeed = (innerNetOutTransfer - netOutTransfer) / diff
}
atomic.StoreUint64(&netInTransfer, innerNetInTransfer)
atomic.StoreUint64(&netOutTransfer, innerNetOutTransfer)
atomic.StoreUint64(&lastUpdate, now)
netInTransfer = innerNetInTransfer
netOutTransfer = innerNetOutTransfer
lastUpdateNetStats = now
}
}

Expand Down
12 changes: 10 additions & 2 deletions cmd/agent/monitor/myip.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,21 @@ func UpdateIP() {
for {
ipv4 := fetchGeoIP(geoIPApiList, false)
ipv6 := fetchGeoIP(geoIPApiList, true)
cachedIP = fmt.Sprintf("ip(v4:%s,v6:[%s])", ipv4.IP, ipv6.IP)
if ipv4.IP == "" && ipv6.IP == "" {
time.Sleep(time.Minute)
continue
}
if ipv4.IP == "" || ipv6.IP == "" {
cachedIP = fmt.Sprintf("%s%s", ipv4.IP, ipv6.IP)
} else {
cachedIP = fmt.Sprintf("%s/%s", ipv4.IP, ipv6.IP)
}
if ipv4.CountryCode != "" {
cachedCountry = ipv4.CountryCode
} else if ipv6.CountryCode != "" {
cachedCountry = ipv6.CountryCode
}
time.Sleep(time.Minute * 10)
time.Sleep(time.Minute * 30)
}
}

Expand Down
2 changes: 1 addition & 1 deletion cmd/dashboard/controller/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ func ServeWeb(port uint) *http.Server {
},
})
r.Static("/static", "resource/static")
r.LoadHTMLGlob("resource/template/**/*")
r.LoadHTMLGlob("resource/template/**/*.html")
routers(r)

page404 := func(c *gin.Context) {
Expand Down
13 changes: 8 additions & 5 deletions cmd/dashboard/controller/member_api.go
Original file line number Diff line number Diff line change
Expand Up @@ -267,19 +267,22 @@ func (ma *memberAPI) addOrEditCron(c *gin.Context) {
cr.Cover = cf.Cover
err = json.Unmarshal([]byte(cf.ServersRaw), &cr.Servers)
}
if err == nil {
_, err = cron.ParseStandard(cr.Scheduler)
}
tx := dao.DB.Begin()
if err == nil {
if cf.ID == 0 {
err = dao.DB.Create(&cr).Error
err = tx.Create(&cr).Error
} else {
err = dao.DB.Save(&cr).Error
err = tx.Save(&cr).Error
}
}
if err == nil {
cr.CronID, err = dao.Cron.AddFunc(cr.Scheduler, dao.CronTrigger(cr))
}
if err == nil {
err = tx.Commit().Error
} else {
tx.Rollback()
}
if err != nil {
c.JSON(http.StatusOK, model.Response{
Code: http.StatusBadRequest,
Expand Down
Loading

0 comments on commit 47dfa47

Please sign in to comment.