# 5. GR00T N1 추론 테스트

## 5.1 GR00T N1 개요

[NVIDIA GR00T N1](https://developer.nvidia.com/gr00t)은 범용 휴머노이드 로봇을 위한 **Vision-Language-Action (VLA) 모델**입니다. 이전 모듈에서 학습한 강화학습 기반 Policy(PPO)가 특정 태스크에 특화된 제어기라면, GR00T N1은 자연어 명령과 카메라 영상을 입력으로 받아 로봇 관절 명령을 직접 생성하는 **Foundation Model**입니다.

**RL Policy vs VLA Model 비교:**

| 구분     | RL Policy (PPO)            | VLA Model (GR00T N1)       |
| ------ | -------------------------- | -------------------------- |
| 입력     | 관절 상태, IMU, 접촉 센서 등 저수준 상태 | 카메라 영상 + 자연어 명령 + 관절 상태    |
| 출력     | 관절 토크/위치 명령                | 관절 위치 명령 (action horizon)  |
| 태스크 범위 | 단일 태스크 (예: 걷기)             | 범용 (언어로 다양한 태스크 지시)        |
| 학습 방법  | 시뮬레이션에서 강화학습               | 대규모 로봇 데모 데이터로 사전학습 + 파인튜닝 |
| 모델 크기  | \~1MB (MLP 수준)             | 3B 파라미터 (\~6GB)            |
| 추론 속도  | 수백 Hz                      | 20\~27 Hz                  |

**GR00T N1 아키텍처:**

```mermaid
flowchart TB
    subgraph Input["입력 (Multimodal)"]
        V["Vision Encoder\n(카메라 영상 256×256)"]
        L["Language Encoder\n(자연어 명령)"]
        P["Proprioception Encoder\n(관절 상태)"]
    end

    T["Transformer Backbone\n(3B params)"]
    D["Action Decoder\n(action horizon = 16 steps)"]
    Out["로봇 관절 명령 출력\n(left_arm, right_arm, left_hand, right_hand, waist)"]

    V --> T
    L --> T
    P --> T
    T --> D
    D --> Out
```

**추론 서버 구조 (ZMQ REQ/REP):**

```mermaid
sequenceDiagram
    participant C as Client (로봇 제어기)
    participant S as GR00T Server (GPU, TCP:5555)

    C->>S: REQ: observation
    Note right of C: video: 카메라 영상 (256×256 RGB)<br/>state: 관절 상태 (arm, hand, waist)<br/>language: 자연어 태스크 명령
    Note over S: 모델 추론 (3B params)
    S-->>C: REP: action (16-step horizon)
    Note left of S: left_arm: (1,16,7)<br/>right_arm: (1,16,7)<br/>left_hand: (1,16,6)<br/>right_hand: (1,16,6)<br/>waist: (1,16,3)
    Note over C: 첫 몇 스텝 실행 후<br/>새 observation으로 재요청
```

모델은 한 번의 추론으로 **16 스텝의 미래 액션**(action horizon)을 출력합니다. 로봇 제어기는 이 중 첫 몇 스텝만 실행하고, 새로운 관측 데이터로 다시 추론을 요청하는 방식(receding horizon)으로 동작합니다.

***

## 5.2 CDK가 자동으로 구성하는 것

이 워크숍의 CDK 스택(`infra-multiuser-groot`)은 EC2 인스턴스 프로비저닝 시 UserData 스크립트를 통해 GR00T 추론 환경을 **자동으로** 구성합니다. 배포 시 `grootRepoUrl` 파라미터가 지정되면 `groot.sh` 스크립트가 실행됩니다.

**자동 구성 흐름:**

```mermaid
flowchart TB
    A["CDK Deploy\n(grootRepoUrl 지정)"] --> B["EC2 UserData 실행\n(인스턴스 부팅 시)"]
    B --> C["1. HuggingFace에서\nGR00T-N1.6-3B 다운로드 → EFS"]
    B --> D["2. Isaac-GR00T 리포 클론\n+ Dockerfile 자동 생성"]
    D --> E["3. groot-docker-build.service\n(docker build -t groot-n1:latest)"]
    E --> F["4. groot-inference.service\n(docker run --gpus all -p 5555)"]
    F --> G["localhost:5555\n추론 서버 자동 시작"]

    style G fill:#d4edda,stroke:#28a745
```

**자동 생성되는 Dockerfile:**

```dockerfile
FROM nvcr.io/nvidia/pytorch:25.04-py3       # NVIDIA PyTorch 베이스 이미지 (~15GB)
ENV DEBIAN_FRONTEND=noninteractive
ENV PIP_CONSTRAINT=""
COPY gr00t/ /workspace/gr00t/               # Isaac-GR00T SDK 복사
WORKDIR /workspace/gr00t
RUN pip install --no-cache-dir -e .          # gr00t 패키지 설치 (의존성 포함)
EXPOSE 5555                                  # ZMQ 서버 포트
ENTRYPOINT ["python"]
CMD ["gr00t/eval/run_gr00t_server.py"]       # 추론 서버 실행
```

**추론 서버 실행 명령 (systemd가 자동 실행):**

```bash
docker run --rm --gpus all --name groot-inference \
  --network=host \
  -v /home/ubuntu/environment/efs:/workspace/weights \
  groot-n1:latest \
  gr00t/eval/run_gr00t_server.py \
    --model_path /workspace/weights/GR00T-N1.6-3B \
    --embodiment_tag GR1 \
    --host 0.0.0.0 \
    --port 5555
```

| 옵션                              | 설명                                  |
| ------------------------------- | ----------------------------------- |
| `--gpus all`                    | GPU 전체를 컨테이너에 할당 (모델 추론용)           |
| `-p 5555:5555`                  | 호스트 5555 포트를 컨테이너 5555에 매핑 (ZMQ 서버) |
| `-v .../efs:/workspace/weights` | EFS의 모델 가중치를 컨테이너 내부에 마운트           |
| `--model_path`                  | 모델 가중치 디렉토리 (컨테이너 내부 경로)            |
| `--embodiment_tag GR1`          | GR1 로봇의 관절 구성에 맞춘 추론                |

따라서, CDK 배포가 완료되고 인스턴스가 부팅되면 **별도 설치 없이** 5555 포트에서 GR00T 추론 서버가 자동으로 실행됩니다.

{% hint style="info" %}
Docker 빌드에 5~~10분, 모델 다운로드에 2~~3분이 소요됩니다. 인스턴스 접속 직후에는 아직 빌드 중일 수 있으므로, `systemctl status groot-docker-build.service`로 완료 여부를 확인하세요.
{% endhint %}

***

## 5.3 환경 설치 (CDK 자동 배포가 아닌 수동 설치 시)

CDK로 배포한 경우 아래 설치 과정은 이미 완료되어 있으므로 [2. 추론 서버 실행](#2-추론-서버-실행)으로 건너뛰세요. 별도의 인스턴스에서 GR00T을 수동으로 설치하려는 경우에만 이 섹션을 따라하세요.

### 5.3.1 uv 설치 (Python 패키지 관리자)

[uv](https://docs.astral.sh/uv/)는 Rust로 작성된 고속 Python 패키지 관리자입니다. Isaac-GR00T 리포지토리는 `uv`를 사용하여 의존성을 관리합니다. `pip` 대비 10\~100배 빠른 설치 속도를 제공합니다.

```bash
# uv 설치
curl -LsSf https://astral.sh/uv/install.sh | sh

# 셸 환경 반영
source $HOME/.local/bin/env

# 설치 확인
uv --version
```

### 5.3.2 Isaac-GR00T 리포지토리 클론

```bash
cd /home/ubuntu/environment
git clone https://github.com/NVIDIA/Isaac-GR00T.git
cd Isaac-GR00T
```

### 5.3.3 Python 의존성 설치

`uv`는 프로젝트의 `pyproject.toml`을 읽어 자동으로 가상환경을 생성하고 의존성을 설치합니다. 별도로 `venv`를 만들 필요가 없습니다.

```bash
# 프로젝트 의존성 동기화 (첫 실행 시 가상환경 자동 생성)
uv sync

# 설치 확인
uv run python -c "import gr00t; print('GR00T SDK loaded')"
```

### 5.3.4 모델 가중치 다운로드

모델 가중치가 EFS에 없는 경우 HuggingFace에서 다운로드합니다 (\~6GB).

```bash
# HuggingFace CLI 설치 및 로그인
uv pip install huggingface_hub
huggingface-cli login

# 모델 다운로드
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download(
    'nvidia/GR00T-N1.6-3B',
    local_dir='/home/ubuntu/environment/efs/GR00T-N1.6-3B'
)
"

# 다운로드 확인 (~6GB)
du -sh /home/ubuntu/environment/efs/GR00T-N1.6-3B/
```

### 5.3.5 ZMQ 클라이언트 의존성

추론 서버와 통신하기 위해 `pyzmq`, `msgpack`, `numpy`가 필요하지만, Isaac-GR00T 프로젝트의 의존성에 이미 포함되어 있으므로 별도 설치가 불필요합니다. `uv run`으로 스크립트를 실행하면 프로젝트 가상환경에서 자동으로 사용됩니다.

```bash
# Isaac-GR00T 디렉토리에서 uv run으로 실행하면 의존성 자동 해결
cd /home/ubuntu/environment/Isaac-GR00T
uv run python -c "import zmq, msgpack, numpy; print('OK')"
```

{% hint style="info" %}
Ubuntu 24.04는 PEP 668로 인해 시스템 Python에 `pip install`이 차단됩니다. `uv run`을 사용하면 프로젝트별 가상환경에서 실행되므로 이 제한을 우회할 수 있습니다. 시스템 전역 설치가 꼭 필요한 경우에만 `pip3 install --break-system-packages`를 사용하세요.
{% endhint %}

***

## 5.4 추론 서버 실행

GR00T 추론 서버를 Docker 컨테이너 또는 직접 실행할 수 있습니다.

### 방법 A: systemd 서비스 (CDK 배포 시 자동 구성)

CDK로 배포한 경우 `groot-inference.service`가 자동으로 실행됩니다.

```bash
# 서비스 상태 확인
sudo systemctl status groot-inference.service

# 서비스가 실행 중이 아니면 시작
sudo systemctl start groot-inference.service

# 로그 확인
sudo journalctl -u groot-inference.service -f
```

### 방법 B: Isaac-GR00T 리포지토리에서 직접 실행

```bash
cd /home/ubuntu/environment/Isaac-GR00T

uv run python gr00t/eval/run_gr00t_server.py \
  --embodiment-tag GR1 \
  --model-path /home/ubuntu/environment/efs/GR00T-N1.6-3B \
  --device cuda:0 \
  --host 0.0.0.0 \
  --port 5555
```

**주요 파라미터:**

| 파라미터                   | 설명                              |
| ---------------------- | ------------------------------- |
| `--embodiment-tag GR1` | 로봇 하드웨어 타입. 관절 구성과 액션 공간을 결정    |
| `--model-path`         | 모델 가중치 디렉토리 경로                  |
| `--device cuda:0`      | 추론에 사용할 GPU. 멀티 GPU 환경에서는 번호 지정 |
| `--host 0.0.0.0`       | 모든 네트워크 인터페이스에서 접속 허용           |
| `--port 5555`          | ZMQ REP 소켓 포트                   |

### 서버 실행 확인

```bash
# 포트 리스닝 확인
ss -tlnp | grep 5555

# Docker로 실행 중인 경우
docker ps | grep groot
```

***

## 5.5 추론 테스트

CDK 배포 환경에서는 GR00T이 Docker 컨테이너로 실행되며 호스트에 Python 패키지가 설치되어 있지 않습니다. 따라서 **Docker 컨테이너 내부에서 테스트**하거나, 호스트에서 최소한의 패키지만 설치하여 테스트합니다.

### 5.5.1 Ping 테스트 (연결 확인)

서버가 정상적으로 응답하는지 간단히 확인합니다.

<details>

<summary><strong>방법 A: GR00T 컨테이너 내부에서 테스트 (CDK 배포 환경 권장)</strong></summary>

CDK 배포 시 `groot-n1` 컨테이너에 이미 모든 의존성이 설치되어 있습니다. 별도 컨테이너를 띄워 테스트합니다.

```bash
# groot-n1 이미지로 테스트용 컨테이너 실행 (호스트 네트워크 공유)
docker run --rm --network=host groot-n1:latest -c "
import zmq, msgpack
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect('tcp://localhost:5555')
sock.send(msgpack.packb({'endpoint': 'ping'}))
print('Server response:', msgpack.unpackb(sock.recv(), raw=False))
"
```

`--network=host`로 호스트 네트워크를 공유하므로 `localhost:5555`로 추론 서버에 접근할 수 있습니다.

</details>

<details>

<summary><strong>방법 B: 호스트에서 직접 테스트 (수동 설치 환경 / uv 사용)</strong></summary>

Isaac-GR00T 리포지토리를 클론하고 uv를 설치한 경우:

```bash
cd /home/ubuntu/environment/Isaac-GR00T
uv run python -c "
import zmq, msgpack
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect('tcp://localhost:5555')
sock.send(msgpack.packb({'endpoint': 'ping'}))
print('Server response:', msgpack.unpackb(sock.recv(), raw=False))
"
```

</details>

**정상 출력:**

```
Server response: {'status': 'ok', 'message': 'Server is running'}
```

### 5.5.2 더미 데이터 추론 테스트

랜덤 관측 데이터를 전송하여 모델이 올바른 형태의 액션을 반환하는지 검증합니다. 워크숍 리포지토리의 `gr00t-inference/test_inference.py`에 테스트 스크립트가 준비되어 있습니다.

```bash
cd gr00t-inference
uv run python test_inference.py
```

**정상 출력:**

```
추론 성공! Action keys: ['left_arm', 'right_arm', 'left_hand', 'right_hand', 'waist']
  left_arm: shape=(1, 16, 7)
  right_arm: shape=(1, 16, 7)
  left_hand: shape=(1, 16, 6)
  right_hand: shape=(1, 16, 6)
  waist: shape=(1, 16, 3)
```

각 액션의 shape `(1, 16, N)`은 `(batch, action_horizon, joints)`를 의미합니다. 16 타임스텝에 대한 관절 목표 위치가 반환됩니다.

<details>

<summary><strong>test_inference.py 코드 참고</strong></summary>

```python
import zmq, msgpack, numpy as np, io

def encode_ndarray(obj):
    if isinstance(obj, np.ndarray):
        buf = io.BytesIO()
        np.save(buf, obj, allow_pickle=False)
        return {"__ndarray_class__": True, "as_npy": buf.getvalue()}
    return obj

def decode_ndarray(obj):
    if isinstance(obj, dict) and "__ndarray_class__" in obj:
        return np.load(io.BytesIO(obj["as_npy"]), allow_pickle=False)
    return obj

# ZMQ 연결
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect("tcp://localhost:5555")

# 관측 데이터 구성 (더미 데이터)
observation = {
    "video": {
        # 카메라 영상: (batch=1, temporal=1, H=256, W=256, RGB=3)
        "ego_view_bg_crop_pad_res256_freq20": np.random.randint(0, 255, (1, 1, 256, 256, 3), dtype=np.uint8)
    },
    "state": {
        # 각 관절의 현재 위치 (batch=1, temporal=1, joints)
        "left_arm": np.random.rand(1, 1, 7).astype(np.float32),
        "right_arm": np.random.rand(1, 1, 7).astype(np.float32),
        "left_hand": np.random.rand(1, 1, 6).astype(np.float32),
        "right_hand": np.random.rand(1, 1, 6).astype(np.float32),
        "waist": np.random.rand(1, 1, 3).astype(np.float32),
    },
    "language": {
        # 자연어 태스크 명령
        "task": [["pick up the cup"]]
    }
}

# 추론 요청
request = {"endpoint": "get_action", "data": {"observation": observation}}
sock.send(msgpack.packb(request, default=encode_ndarray))
response = msgpack.unpackb(sock.recv(), raw=False, object_hook=decode_ndarray)

# 결과 확인
if isinstance(response, list):
    action = response[0]
    print("추론 성공! Action keys:", list(action.keys()))
    for key in action:
        print(f"  {key}: shape={np.array(action[key]).shape}")
elif isinstance(response, dict) and "error" in response:
    print("에러:", response["error"])
else:
    print("예상치 못한 응답:", type(response))
```

</details>

### 5.5.3 외부 환경에서 추론 테스트

GR00T 추론 서버는 ZMQ TCP 소켓으로 동작하므로, 네트워크가 연결된 **어떤 머신에서든** 클라이언트로 호출할 수 있습니다. Docker나 Isaac-GR00T SDK 설치 없이 [uv](https://docs.astral.sh/uv/)와 테스트 스크립트만 있으면 됩니다.

**사전 준비:**

```bash
# uv 설치 (이미 설치되어 있으면 생략)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
```

**테스트 실행:**

워크숍 리포지토리의 `gr00t-inference/` 디렉토리에 테스트 스크립트와 `pyproject.toml`이 준비되어 있습니다. `uv run`을 사용하면 가상환경 생성과 의존성 설치가 자동으로 처리됩니다.

```bash
cd gr00t-inference

# 로컬 서버 테스트 (같은 인스턴스 내에서)
uv run python test_inference.py

# 원격 서버 테스트 (외부 머신에서 EC2 퍼블릭 IP 지정)
uv run python test_inference_remote.py <EC2_퍼블릭_IP>
```

**정상 출력:**

```
Ping: {'status': 'ok', 'message': 'Server is running'}
추론 성공! Action keys: ['left_arm', 'right_arm', 'left_hand', 'right_hand', 'waist']
  left_arm: shape=(1, 16, 7)
  right_arm: shape=(1, 16, 7)
  left_hand: shape=(1, 16, 6)
  right_hand: shape=(1, 16, 6)
  waist: shape=(1, 16, 3)
```

{% hint style="info" %}
EC2 인스턴스의 퍼블릭 IP는 AWS 콘솔 → EC2 → 인스턴스에서 확인할 수 있습니다. Security Group 인바운드 규칙에 TCP 포트 5555가 열려있어야 합니다.
{% endhint %}

### 5.5.4 공식 PolicyClient로 테스트 (수동 설치 환경)

Isaac-GR00T 리포지토리의 공식 클라이언트를 사용한 테스트입니다. `uv`로 설치한 환경에서만 사용 가능합니다.

```bash
cd /home/ubuntu/environment/Isaac-GR00T

uv run python -c "
from gr00t.policy.server_client import PolicyClient

policy = PolicyClient(host='localhost', port=5555)
if policy.ping():
    print('GR00T 서버 연결 성공')
else:
    print('서버 응답 없음')
"
```

## 5.6 전체 상태 확인 (Quick Check)

```bash
echo "=== GPU ==="
nvidia-smi --query-gpu=name,memory.used,memory.total --format=csv

echo "=== Docker Images ==="
docker images | grep -E 'groot|isaac'

echo "=== GR00T Services ==="
systemctl is-active groot-docker-build.service
systemctl is-active groot-inference.service

echo "=== Port 5555 ==="
ss -tlnp | grep 5555

echo "=== Model Weights ==="
ls /home/ubuntu/environment/efs/GR00T-N1.6-3B/ 2>/dev/null | head -5 || echo "NOT FOUND"

echo "=== uv ==="
uv --version 2>/dev/null || echo "NOT INSTALLED"
```

### 수동으로 전체 재설치

```bash
# 1. 기존 정리
docker rm -f groot-inference 2>/dev/null
docker rmi groot-n1:latest 2>/dev/null
sudo rm -f /var/groot-done

# 2. 빌드 재시작
sudo systemctl restart groot-docker-build.service

# 3. 진행 상황 확인 (완료까지 5~10분)
sudo journalctl -u groot-docker-build.service -f
```

***

## References

* [NVIDIA Isaac-GR00T GitHub](https://github.com/NVIDIA/Isaac-GR00T)
* [GR00T N1 모델 (HuggingFace)](https://huggingface.co/nvidia/GR00T-N1.6-3B)
* [uv 공식 문서](https://docs.astral.sh/uv/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hi-space.gitbook.io/physical-ai-on-aws/physical-ai-on-aws-guide/nvidia-isaac-lab-on-aws/5.-gr00t-n1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.