• Topic ID: id_17423147
  • Version: 2.0
  • Date: Nov 27, 2020 2:15:24 AM

NIO16 Host Computer (Z840) Recon GPU Card Replacement

Prerequisites

Overview

This procedure shall be followed when replacing the Recon GPU card in the NIO16 Host Computer (Z840).

Figure 1. Recon GPU card

1 Preparation

Procedure

  1. Shutdown system. Select one of the following methods to Power OFF the Console:
    • If Applications are up, click on the Shut Down button on desktop display and select Shutdown.

    • If Applications are down, open a Terminal Window. Type: halt , then press ENTER.

    • When halt command has finished, power Off the console at the front panel switch.

  2. Apply LOTO. See Equipment Service - Lockout-Tagout-PPE procedure.
  3. Remove the left side cover of console. Refer to Replacement → Console → Console Cover Removal and Installation procedure.

2 Recon GPU Card Replacement

Procedure

  1. Remove the side access panel and pull out the latch to release the computer left side cover.

    Figure 2. Side Access Panel Removal

  2. Remove the following components from the host computer (Z840):
    • Expansion Card Support

    • Airflow Guide

    Figure 3. Z840 Airflow Guide and Expansion Card Support

  3. Locate Recon GPU Card FRU by referencing the Illustration below.

    Figure 4. Z840 Component Location

  4. Replace the existing Recon GPU Card (Slot 6) with new one.
    note:

    Lift up the card latch when removing the card. (See Figure 5)

    Figure 5. PCI card Latch

  5. Reinstall all removed components, and reconnect any cables that have been disconnected.

3 Restore the Console

Procedure

  1. Remove LOTO from console.
  2. Reinstall the console covers.

4 Finalization

Procedure

  1. Confirm Host computer powers up when console power is turned on.
  2. Check the GPU card installed. Open a shell, then type:

    {ctuser@hostname} ls /proc/driver/nvidia/gpus | wc —l

    • If “2” displays, the Recon GPU card is installed.

    • If “1” displays, the Recon GPU card is un-installed.

  3. Ensure GPU ECC state is ON:
    1. Open a Unix shell and log on as root.
    2. Type: su - [ENTER].
    3. Type the root password [ENTER].
    4. Type: nvidia-smi [ENTER].
    5. Check GPU ECC status as below:
      • If the GPU ECC is ON, below is what the output would look like (boxed in green):

      • If the GPU ECC is OFF, below is what the output would look like (boxed in red):

    6. How to turn ECC back on:
      • Type: nvidia-smi -g 0 --ecc-config=1 [ENTER]
      • A message will show that ECC is enable and a reboot is required:

      • After reboot, check that the ECC is ON according to previous steps.
  4. Perform the Functional Checks → System Scanning Test instructions from the procedure list.