Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • V verify
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 22
    • Issues 22
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ran.lu
  • verify
  • Wiki
  • 【draft】【百信现场】代码开发里面git push 失败 draft

Last edited by ran.lu Nov 24, 2022
Page history

【draft】【百信现场】代码开发里面git push 失败 draft

【【171】【AIARTS】【百信现场】代码开发里面git push 失败】https://www.tapd.cn/42483287/bugtrace/bugs/view?bug_id=1142483287001007608

远程连接

连上集群:

ssh -p 8886 root@220.194.147.64         100Trust!@

ssh root@192.3.21.217    100trust!@

以普通用户连上代码开发:

ssh -p 34221 org-dev@192.3.21.217  Nwo7jvCW

过程

org-dev@dev-d09e033f-989f-45f2-858c-ab591110b265-vxsld:~/code$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
org-dev@dev-d09e033f-989f-45f2-858c-ab591110b265-vxsld:~/code$ git push
Counting objects: 98, done.
Delta compression using up to 192 threads.
Compressing objects: 100% (94/94), done.
remote: error: file write error: Bad file descriptor
remote: fatal: unable to write loose object file
error: remote unpack failed: unpack-objects abnormal exit
error: failed to push some refs to 'git@gitea-ssh.apulis:gitea/ouc68z1NTiC02V7Z0mTAbg.git'

添加文本文件好像挺好的?

怀疑是: ls -lh code/pretrained_model/ssd_80C_500E.ckpt ?

org-dev@dev-d09e033f-989f-45f2-858c-ab591110b265-vxsld:~/code$ git commit -m "try add code"
git [master 2a54d46] try add code
 69 files changed, 7027 insertions(+)
 create mode 100644 code/Dockerfile
 create mode 100644 code/create_data.py
 create mode 100644 code/demo.jpg
 create mode 100644 code/eval.py
 create mode 100644 code/export.py
 create mode 100644 code/infer.py
 create mode 100644 code/infer/convert/convert_om.sh
 create mode 100644 code/infer/data/classes.json
 create mode 100644 code/infer/data/classes_id.json
 create mode 100644 code/infer/data/coco_ssd_mobile_net_v2.name
 create mode 100644 code/infer/data/config.cfg
 create mode 100644 code/infer/data/ssd-mobilenet-v2.aipp
 create mode 100644 code/infer/data/ssd_mobile_net_v2_aipp.pipeline
 create mode 100644 code/infer/data/ssd_mobile_net_v2_no_aipp.pipeline
 create mode 100644 code/infer/mxbase/CommandFlagParser.h
 create mode 100644 code/infer/mxbase/FunctionTimer.h
 create mode 100644 code/infer/mxbase/MxBaseInfer.cpp
 create mode 100644 code/infer/mxbase/MxBaseInfer.h
 create mode 100644 code/infer/mxbase/MxImage.cpp
 create mode 100644 code/infer/mxbase/MxImage.h
 create mode 100644 code/infer/mxbase/MxUtil.cpp
 create mode 100644 code/infer/mxbase/MxUtil.h
 create mode 100644 code/infer/mxbase/SSDInfer.cpp
 create mode 100644 code/infer/mxbase/SSDInfer.h
 create mode 100644 code/infer/mxbase/SSDPostProcessor.h
 create mode 100644 code/infer/mxbase/build.sh
 create mode 100644 code/infer/mxbase/main.cpp
 create mode 100644 code/infer/mxbase/run_test.py
 create mode 100644 code/infer/sdk/mxpi/MxpiSSDMobileNetV2PostProcessor.cpp
 create mode 100644 code/infer/sdk/mxpi/MxpiSSDMobileNetV2PostProcessor.h
 create mode 100644 code/infer/sdk/mxpi/build_mxpi.sh
 create mode 100644 code/infer/sdk/sample/ResultProcess.h
 create mode 100644 code/infer/sdk/sample/aipp.cpp
 create mode 100644 code/infer/sdk/sample/build_aipp.sh
 create mode 100644 code/infer/sdk/sample/build_no_aipp.sh
 create mode 100644 code/infer/sdk/sample/no_aipp.cpp
 create mode 100644 code/mindspore_hub_conf.py
 create mode 100644 code/modelzoo_level.txt
 create mode 100644 code/on_platform/modelarts/README.md
 create mode 100644 code/on_platform/modelarts/__init__.py
 create mode 100644 code/on_platform/modelarts/start.py
 create mode 100644 code/on_platform/plat_cfg.yaml
 create mode 100644 code/requirements.txt
 create mode 100644 code/scripts/docker_start.sh
 create mode 100644 code/scripts/run_distribute_self_defined_train.sh
 create mode 100644 code/scripts/run_distribute_train.sh
 create mode 100644 code/scripts/run_distribute_train_gpu.sh
 create mode 100644 code/scripts/run_eval.sh
 create mode 100644 code/scripts/run_eval_gpu.sh
 create mode 100644 code/serve_desc.template
 create mode 100644 code/src/__init__.py
 create mode 100644 code/src/anchor_generator.py
 create mode 100644 code/src/box_utils.py
 create mode 100644 code/src/config.py
 create mode 100644 code/src/config_ssd300.py
 create mode 100644 code/src/config_ssd_mobilenet_v1_fpn.py
 create mode 100644 code/src/dataset.py
 create mode 100644 code/src/eval_utils.py
 create mode 100644 code/src/init_params.py
 create mode 100644 code/src/lr_schedule.py
 create mode 100644 code/src/mobilenet_v1_fpn.py
 create mode 100644 code/src/ssd.py
 create mode 100644 code/train.py
 create mode 100644 code/transformer/ext.proto
 create mode 100644 code/transformer/ext_pb2.py
 create mode 100644 code/transformer/postprocess.py
 create mode 100644 code/transformer/preprocess.py
 create mode 100644 code/transformer/serve.yaml
 create mode 100644 code/version.ini

try add infer 也有点问题

org-dev@dev-d09e033f-989f-45f2-858c-ab591110b265-vxsld:~/code$ git push
Counting objects: 10, done.
Delta compression using up to 192 threads.
Compressing objects: 100% (9/9), done.
remote: fatal: error when closing loose object file: I/O error
error: remote unpack failed: unpack-objects abnormal exit
error: failed to push some refs to 'git@gitea-ssh.apulis:gitea/ouc68z1NTiC02V7Z0mTAbg.git'
org-dev@dev-d09e033f-989f-45f2-858c-ab591110b265-vxsld:~/code$ ls -lh infer
total 25M
-rw-r--r-- 1 org-dev domainusers  180 Nov 21 18:28 eval_result.json
drwxr-xr-x 2 org-dev domainusers 4.0K Nov 21 18:28 export
-rw-r--r-- 1 org-dev domainusers 295K Nov 21 18:28 predictions.json
-rw-r--r-- 1 org-dev domainusers 2.5K Nov 21 18:28 serve_desc.template
-rw-r--r-- 1 org-dev domainusers 2.5K Nov 21 18:37 serve_desc.yaml
-rw-r--r-- 1 org-dev domainusers  13M Nov 21 18:28 ssd.air
-rw------- 1 org-dev domainusers  12M Nov 21 18:37 ssd.om
drwxr-xr-x 2 org-dev domainusers 4.0K Nov 21 18:28 transformer
git config --global http.postBuffer 524288000

对https://gitlab.apulis.com.cn/ran.lu/psutil.git进行了操作,也是加这个 zip 文件,没有这个问题。 ( https://gitlab.apulis.com.cn/ran.lu/psutil/-/tree/feat/test-add-zip )

[pid 523107] 14:11:09.606926 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=17, si_uid=1000} ---

改一下配置文件,加一下日志设置:

cd /data/aistudio/nfs/apulis/pvc/aiplatform-gitea-data
ls gitea/conf/
app.ini

配置似乎不会生效~

改了 secret, 可以生效了。但是没有出现新的好日志。


ssh 进程: root 1039605 0.0 0.0 4292 3384 ? Ss 15:19 0:00 sshd: /usr/sbin/sshd -D -e [listener] 0 of 10-100 startups

strace -f  -T -tt -e trace=all -p  1039605
[pid 1207696] 15:46:28.089729 write(3, "$\207)(\356\327\306\267\271\256\v\233\255S\351\352\376[\324\330\370\32\254\332\360\216\312\252\33\220\260\336"..., 4096) = -1 EBADF (Bad file descriptor) <0.000053>
[pid 1207696] 15:46:28.089936 write(2, "error: file write error: Bad fil"..., 45) = 45 <0.000046>
[pid 1207438] 15:46:28.090051 <... read resumed> "error: file write error: Bad fil"..., 128) = 45 <9.879668>
[pid 1207696] 15:46:28.090106 write(2, "fatal: unable to write loose obj"..., 41 <unfinished ...>
[pid 1207438] 15:46:28.090167 write(1, "0032\2", 5 <unfinished ...>

发现有 2 处 EBADF:

ioctl(-1, TIOCGPGRP, 0xfffffebfa84c) = -1 EBADF (Bad file descriptor) <0.000029>

pid: 1237051

这个文件句柄:

1237051 15:51:01.193577 openat(AT_FDCWD, "/data/git/gitea-repositories/gitea/ouc68z1ntic02v7z0mtabg.git/./objects/incoming-oEoffE/b3/tmp_obj_XwYObm", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0444) = 3 <0.033109>

EBADF fd is not a valid file descriptor or is not open for writing.

https://linux.die.net/man/2/write

mmap
void *mmap(void *addr, size_t lengthint " prot ", int " flags ,
           int fd, off_t offset);int munmap(void *addr, size_t length);
1237051 15:51:01.226765 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff82a9c000 <0.000043>
1237051 15:51:01.226908 mmap(NULL, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff7f790000 <0.000027>
1237051 15:51:01.227002 mmap(NULL, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff7f77f000 <0.000025>
1237051 15:51:01.227093 mmap(NULL, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff7f76e000 <0.000025>
1237051 15:51:01.227182 mmap(NULL, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff7f75d000 <0.000025>

这个加起来是 286720

67159 行 76406 行

估计不是行数超标…

也有怀疑 nfs ,因为 dmesg 中有许多错误日志。

https://www.redhat.com/sysadmin/using-nfsstat-nfsiostat

观看 gitea 代码


var (
	allowedCommands = map[string]models.AccessMode{
		"git-upload-pack":    models.AccessModeRead,
		"git-upload-archive": models.AccessModeRead,
		"git-receive-pack":   models.AccessModeWrite,
		lfsAuthenticateVerb:  models.AccessModeNone,
	}
				m.PostOptions("/git-receive-pack", repo.ServiceReceivePack)
git-receive-pack - Receive what is pushed into the repository Invoked by git send-pack and updates the repository with the information fed from the remote end. This command is usually not invoked directly by the end user.

确实,调用了 git-receive-pack :

root@ubuntu:~# cat tmp_strace.log | grep -i receive
1236483 15:50:56.633331 newfstatat(AT_FDCWD, "/bin/git-receive-pack", 0x400162c338, 0) = -1 ENOENT (No such file or directory) <0.000029>
1236483 15:50:56.633420 newfstatat(AT_FDCWD, "/usr/bin/git-receive-pack",  <unfinished ...>
1236569 15:50:56.640285 execve("/usr/bin/git-receive-pack", ["git-receive-pack", "gitea/ouc68z1ntic02v7z0mtabg.git"], 0x400167ec30 /* 24 vars */ <unfinished ...>
1236459 15:51:03.606437 write(2, "Received disconnect from 172.20."..., 76) = 76 <0.000016>

飞哥换用了 hostPath ,然后 git push 大文件可以用了:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/edge
            operator: DoesNotExist
  containers:
  - env:
    - name: SSH_LISTEN_PORT
      value: "22"
    - name: SSH_PORT
      value: "22"
    - name: GITEA_APP_INI
      value: /data/gitea/conf/app.ini
    - name: GITEA_CUSTOM
      value: /data/gitea
    - name: GITEA_WORK_DIR
      value: /data
    - name: GITEA_TEMP
      value: /tmp/gitea
    - name: TMPDIR
      value: /tmp/gitea
    - name: POSTGRES_PASSWORD
      value: vault:secret/data/postgres#POSTGRES_PASSWORD
    image: harbor.internal.cn:8443/internal/aistudio/infra/gitea:aistudio-v1.7.1-rc0
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 10
      initialDelaySeconds: 200
      periodSeconds: 10
      successThreshold: 1
      tcpSocket:
        port: http
      timeoutSeconds: 1
    name: gitea
    ports:
    - containerPort: 22
      name: ssh
      protocol: TCP
    - containerPort: 3000
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      tcpSocket:
        port: http
      timeoutSeconds: 1
    resources: {}
    securityContext: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp
      name: temp
    - mountPath: /data
      name: data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-xjsk2
      readOnly: true
  - args:
    - proxy
    - sidecar
    - --domain
    - $(POD_NAMESPACE).svc.cluster.local
    - --serviceCluster
    - gitea.$(POD_NAMESPACE)
    - --proxyLogLevel=warning
    - --proxyComponentLogLevel=misc:error
    - --log_output_level=default:info
    - --concurrency
    - "2"
    env:
    - name: JWT_POLICY
      value: first-party-jwt
    - name: PILOT_CERT_PROVIDER
      value: istiod
    - name: CA_ADDR
      value: istiod.istio-system.svc:15012
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: INSTANCE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: SERVICE_ACCOUNT
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.serviceAccountName
    - name: HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: CANONICAL_SERVICE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['service.istio.io/canonical-name']
    - name: CANONICAL_REVISION
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['service.istio.io/canonical-revision']
    - name: PROXY_CONFIG
      value: |
        {}
    - name: ISTIO_META_POD_PORTS
      value: |-
        [
            {"name":"ssh","containerPort":22,"protocol":"TCP"}
            ,{"name":"http","containerPort":3000,"protocol":"TCP"}
        ]
    - name: ISTIO_META_APP_CONTAINERS
      value: gitea
    - name: ISTIO_META_CLUSTER_ID
      value: Kubernetes
    - name: ISTIO_META_INTERCEPTION_MODE
      value: REDIRECT
    - name: ISTIO_METAJSON_ANNOTATIONS
      value: |
        {"checksum/config":"f274be7bf3c100a034ac4b3fc396fe970118128b1b3d912a9cbc07f65bcc52d5","checksum/ldap":"00b7af41c86021efd76987f55a6e6aa17a497e98f2a48b9c2f71d5c0295ed342","checksum/oauth":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","traffic.sidecar.istio.io/excludeInboundPorts":"22","traffic.sidecar.istio.io/excludeOutboundPorts":"22","vault.security.banzaicloud.io/vault-addr":"https://vault.kube-system:8200","vault.security.banzaicloud.io/vault-tls-secret":"vault-tls"}
    - name: ISTIO_META_WORKLOAD_NAME
      value: gitea
    - name: ISTIO_META_OWNER
      value: kubernetes://apis/apps/v1/namespaces/apulis/statefulsets/gitea
    - name: ISTIO_META_MESH_ID
      value: cluster.local
    - name: TRUST_DOMAIN
      value: cluster.local
    image: harbor.internal.cn:8443/internal/istio/proxyv2:1.9.4
    imagePullPolicy: Always
    name: istio-proxy
    ports:
    - containerPort: 15090
      name: http-envoy-prom
      protocol: TCP
    readinessProbe:
      failureThreshold: 30
      httpGet:
        path: /healthz/ready
        port: 15021
        scheme: HTTP
      initialDelaySeconds: 1
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 3
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 1337
      runAsNonRoot: true
      runAsUser: 1337
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/istio
      name: istiod-ca-cert
    - mountPath: /var/lib/istio/data
      name: istio-data
    - mountPath: /etc/istio/proxy
      name: istio-envoy
    - mountPath: /etc/istio/pod
      name: istio-podinfo
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-xjsk2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: gitea-0
  initContainers:
  - command:
    - /usr/sbin/init_directory_structure.sh
    env:
    - name: GITEA_APP_INI
      value: /data/gitea/conf/app.ini
    - name: GITEA_CUSTOM
      value: /data/gitea
    - name: GITEA_WORK_DIR
      value: /data
    - name: GITEA_TEMP
      value: /tmp/gitea
    - name: POSTGRES_PASSWORD
      value: vault:secret/data/postgres#POSTGRES_PASSWORD
    image: harbor.internal.cn:8443/internal/aistudio/infra/gitea:aistudio-v1.7.1-rc0
    imagePullPolicy: IfNotPresent
    name: init-directories
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/sbin
      name: init
    - mountPath: /tmp
      name: temp
    - mountPath: /etc/gitea/conf
      name: config
    - mountPath: /data
      name: data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-xjsk2
      readOnly: true
  - command:
    - /usr/sbin/configure_gitea.sh
    env:
    - name: GITEA_APP_INI
      value: /data/gitea/conf/app.ini
    - name: GITEA_CUSTOM
      value: /data/gitea
    - name: GITEA_WORK_DIR
      value: /data
    - name: GITEA_TEMP
      value: /tmp/gitea
    - name: GITEA_ADMIN_USERNAME
      value: gitea
    - name: GITEA_ADMIN_PASSWORD
      value: BD2IHvf1jEVuK5J6
    - name: POSTGRES_PASSWORD
      value: vault:secret/data/postgres#POSTGRES_PASSWORD
    image: harbor.internal.cn:8443/internal/aistudio/infra/gitea:aistudio-v1.7.1-rc0
    imagePullPolicy: IfNotPresent
    name: configure-gitea
    resources: {}
    securityContext:
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/sbin
      name: init
    - mountPath: /tmp
      name: temp
    - mountPath: /data
      name: data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-xjsk2
      readOnly: true
  - args:
    - istio-iptables
    - -p
    - "15001"
    - -z
    - "15006"
    - -u
    - "1337"
    - -m
    - REDIRECT
    - -i
    - '*'
    - -x
    - ""
    - -b
    - '*'
    - -d
    - 15090,15021,22,15020
    - -o
    - "22"
    image: harbor.internal.cn:8443/internal/istio/proxyv2:1.9.4
    imagePullPolicy: Always
    name: istio-init
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: false
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-xjsk2
      readOnly: true
  nodeName: 192.3.21.217
  nodeSelector:
    series: a310
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1337
  serviceAccount: default
  serviceAccountName: default
  subdomain: gitea
  terminationGracePeriodSeconds: 60
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 100
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 100
  volumes:
  - emptyDir:
      medium: Memory
    name: istio-envoy
  - emptyDir: {}
    name: istio-data
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels
        path: labels
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
      - path: cpu-limit
        resourceFieldRef:
          containerName: istio-proxy
          divisor: 1m
          resource: limits.cpu
      - path: cpu-request
        resourceFieldRef:
          containerName: istio-proxy
          divisor: 1m
          resource: requests.cpu
    name: istio-podinfo
  - configMap:
      defaultMode: 420
      name: istio-ca-root-cert
    name: istiod-ca-cert
  - name: init
    secret:
      defaultMode: 511
      secretName: gitea-init
  - name: config
    secret:
      defaultMode: 420
      secretName: gitea
  - emptyDir: {}
    name: temp
  - hostPath:
      path: /opt/gitea
      type: ""
    name: data
  - name: default-token-xjsk2
    secret:
      defaultMode: 420
      secretName: default-token-xjsk2

用于尝试复现问题的脚本

import os
import time


def write_pid_to_pidfile(pidfile_path):
    """ Write the PID in the named PID file.

        Get the numeric process ID (“PID”) of the current process
        and write it to the named file as a line of text.

        """
    os.remove(pidfile_path)

    open_flags = (os.O_CREAT | os.O_EXCL | os.O_WRONLY | os.O_LARGEFILE)
    open_mode = 0o444
    pidfile_fd = os.open(pidfile_path, open_flags, open_mode)
    pidfile = os.fdopen(pidfile_fd, 'w')

    # According to the FHS 2.3 section on PID files in /var/run:
    #
    #   The file must consist of the process identifier in
    #   ASCII-encoded decimal, followed by a newline character. For
    #   example, if crond was process number 25, /var/run/crond.pid
    #   would contain three characters: two, five, and newline.

    pid = os.getpid()
    pidfile.write("%s\n" % pid)
    for i in range(33):
        for rep in range(1000):
            pidfile.write('a' * 4096)
        print('i:', i)
        time.sleep(1)

    pidfile.close() 


write_pid_to_pidfile('test')

一次成功的 git push:

root@nfs-nginx-5d98bcd8cb-hzgks:/tmp/ouc68z1ntic02v7z0mtabg# git push
root@127.0.0.1's password: 
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 192 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 70.02 MiB | 17.23 MiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
To 127.0.0.1:/data/git/gitea-repositories/ouc68z1ntic02v7z0mtabg.git
   bdb2b02..ac8d7bd  master -> master

git 版本导致?

代码开发 pod 中 git 是 2.17 版本;server 端是 2.30 版本。

但是,这样会导致, hostPath 是正常的、 nfs 是异常的?

Clone repository
  • 3.137 环境 websocket 连接失败,其它环境无此问题
  • 3.137 环境,即使使用 env 中的 grafana 密码,都无法登陆;测试环境则可以
  • 3.172 不定期出现“疯狂写盘”
  • [2022 11 17] 本地数据集要在 slurm|superpodk8s 上面使用
  • [TODO] 合入日志加速发动
  • [build] 加快 SDK 打包
  • [info] 平台日志
  • [优化] ai arts 调用 aim 时,设置超时时间
  • [问题] aim SDK, 连不上 rpc 时会报错?这不能达到无感知
  • [问题] aimstack 有时会很慢
  • [问题] desay 171 部署: 单个训练成功, 收到 2 次调用, 一次训练成功, 一次训练失败
  • [问题] 收集到的日志只有几个服务。问题: 这个是哪里配置的?
  • aim SDK 支持 tensorboard 日志
  • aim SDK 瘦身
  • gpu02环境 选择 推理模型目录 很慢
View All Pages