• Topic ID: id_15460238
  • Version: 6.0
  • Date: Jun 15, 2020 11:00:32 PM

Host Computer (Z840) Recon GPU Card Replacement

Prerequisites

Overview

This procedure shall be followed when replacing the Recon GPU card in the Host Computer (Z840).

Figure 1. Recon GPU card

1 Host Computer Removal

Procedure

  1. Shutdown system. Select one of the following methods to Power OFF the Console:
    • If Applications are up, click on the Shut Down button on desktop display and select Shutdown.

    • If Applications are down, open a Terminal Window. Type: halt , then press ENTER.

    • When halt command has finished, power Off the console at the front panel switch.

  2. Apply LOTO. See Equipment Service - Lockout-Tagout-PPE procedure.
  3. Remove Front and Top covers. Refer to the following procedure.

    NIO64: Refer to Replacement → Console (NIO64) → RIO / NIO64 Console Cover Removal and Installation

    GOC6.6: Refer to Replacement → Console (GOC6.6) → Console Cover Removal and Installation

  4. Remove the host computer from console chassis. Refer to the following procedure.

    NIO64: Refer to Replacement → Console (NIO64 Z840) → NIO64 Host Computer (Z840) Replacement

    GOC6.6: Refer to Replacement → Console (GOC6.6) → GOC6.6 VCT Host Computer (Z820/Z840) Replacement

2 Recon GPU Card Replacement

Procedure

  1. Open the host computer side access panel.
  2. Remove the Expansion Card Support.

    Figure 2. Z840 Airflow Guide and Expansion Card Support

  3. Disconnect the power cable from the Recon GPU card.
  4. Replace the existing Recon GPU Card with new one.
    note:

    Lift up the card latch when removing the card. (See Figure 4)

    Figure 3. Z840 Component Location

    Figure 4. PCI card Latch

  5. Connect the power cable to the Recon GPU card.
  6. Install the Expansion Card Support and close the Side Access Panel.

3 Restore the Console

Procedure

  1. Install Host computer into Console chassis. Refer to the following procedure.

    NIO64: Refer to Replacement → Console (NIO64 Z840) → NIO64 Host Computer (Z840) Replacement

    GOC6.6: Refer to Replacement → Console (GOC6.6) → GOC6.6 VCT Host Computer (Z820) Replacement

  2. Reconnect all cables removed earlier to the Z820/Z840 computer.
  3. Install Console covers. Refer to the following procedure.

    NIO64: Refer to Replacement → Console (NIO64) → RIO/NIO64 Console Cover Removal and Installation

    GOC6.6: Refer to Replacement → Console (GOC6.6) → Console Cover Removal and Installation

  4. Remove LOTO on console.

4 Finalization

Procedure

  1. Perform the Functional Checks → System Scanning Test instructions from the procedure list.
  2. Check the GPU card installed. Open a shell, then type:

    {ctuser@hostname} ls /proc/driver/nvidia/gpus | wc —l

    • If “2” displays, the Recon GPU card is installed.

    • If “1” displays, the Recon GPU card is un-installed.

  3. Ensure GPU ECC state is ON:
    1. Open a Uni shell and log on as root.
    2. Type: su - [ENTER].
    3. Type the root password [ENTER].
    4. Type: nvidia-smi [ENTER]
    5. Check GPU ECC status as below:
      • If the GPU ECC is ON, below is what the output would look like (boxed in green):

        Figure 5. GPU ECC - ON

      • If the GPU ECC is OFF, below is what the output would look like (boxed in red):

        Figure 6. GPU ECC - OFF

    6. How to turn ECC back on
      • •Type: nvidia-smi -g 0 --ecc-config=1 [ENTER]
      • •ill show that ECC is enabled and a reboot is required:

        Figure 7. Reboot Message

      • After reboot, check that the ECC is ON according to previous steps.
  4. Perform the Functional Checks → System Scanning Test instructions from the procedure list.