<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Rhoai on Home</title><link>/tags/rhoai/</link><description>Recent content in Rhoai on Home</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sat, 09 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="/tags/rhoai/" rel="self" type="application/rss+xml"/><item><title>Deploying OpenShift on AWS with Automated Cluster Provisioning</title><link>/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/</link><pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate><guid>/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_20/overview.png"data-src="/images/posts/post_20/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;The full provisioning pipeline: CLI setup, ocp-on-aws config, and a single script that spins up VPCs, EC2 instances, DNS records, and an Argo CD baseline - AI generated&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, I want to describe how to deploy &lt;strong&gt;Red Hat OpenShift&lt;/strong&gt; in a blank Amazon Web Services (AWS) environment using a fully automated and repeatable approach. This post is part of a series of two posts: 1. This post covers the cluster provisioning step. 2. The installation of OpenShift AI on top of the running OpenShift cluster is covered in a separate post: &lt;a href="/2026/installing-openshift-ai-on-openshift/"&gt;Install OpenShift AI on OpenShift&lt;/a&gt;. If you already have an OpenShift cluster available, feel free to jump straight to that post.
Both workflows build on two GitHub repositories that cover both infrastructure provisioning and the installation of the AI platform components, and they reduce what could easily be a multi-hour manual effort to a handful of shell commands.&lt;/p&gt;
&lt;p&gt;I should be upfront: one purpose of this post is also to serve as a personal reference for future me, who will inevitably return here after six months asking &amp;ldquo;wait, what was the exact command again?&amp;rdquo; Consider this the written documentation I should have filed away the first time.&lt;/p&gt;
&lt;p&gt;A special thanks goes to my team mate &lt;a href="https://github.com/alvarolop"&gt;&lt;strong&gt;Álvaro López Medina&lt;/strong&gt;&lt;/a&gt;, who created and maintains the &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;ocp-on-aws&lt;/a&gt; and &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;rhoai-gitops&lt;/a&gt; repositories. Without his work and support, setting up this environment would have been significantly more involved.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;Before starting, a Linux workstation or jump host is recommended for running the commands. The following command line tools must be installed and configured:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;&lt;strong&gt;OpenShift CLI (oc)&lt;/strong&gt;&lt;/a&gt; – required to interact with the OpenShift cluster&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html"&gt;&lt;strong&gt;AWS CLI&lt;/strong&gt;&lt;/a&gt; – required to provision and manage AWS infrastructure&lt;/li&gt;
&lt;li&gt;&lt;a href="https://httpd.apache.org/docs/current/programs/htpasswd.html"&gt;&lt;strong&gt;htpasswd&lt;/strong&gt;&lt;/a&gt; – required to generate user credentials for the cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are fundamental prerequisites. The installation scripts will fail or behave unexpectedly without them.&lt;/p&gt;
&lt;h2 id="ordering-an-aws-blank-environment"&gt;Ordering an AWS Blank Environment&lt;/h2&gt;
&lt;p&gt;For Red Hat employees and Red Hat partners, the easiest starting point is an &lt;a href="https://catalog.demo.redhat.com/catalog?item=babylon-catalog-prod/sandboxes-gpte.sandbox-open.prod&amp;amp;utm_source=webapp&amp;amp;utm_medium=share-link"&gt;AWS Blank Open Environment&lt;/a&gt; from the &lt;a href="https://catalog.demo.redhat.com/catalog"&gt;Red Hat Demo Platform (RHDP)&lt;/a&gt;. Otherwise, an existing AWS account accessed through the &lt;a href="https://aws.amazon.com/"&gt;AWS Web Console&lt;/a&gt; works just as well.&lt;/p&gt;
&lt;p&gt;This tutorial was validated against eu-west-1. The blank environment provides a clean, ephemeral AWS account with the necessary IAM permissions and service quotas to support an &lt;em&gt;Installer-Provisioned Infrastructure (IPI)&lt;/em&gt; deployment of OpenShift.&lt;/p&gt;
&lt;p&gt;Once the environment is provisioned, the service overview page contains the AWS access credentials and the base DNS zone that will be needed in the configuration step below.&lt;/p&gt;
&lt;h2 id="deploying-openshift-on-aws"&gt;Deploying OpenShift on AWS&lt;/h2&gt;
&lt;p&gt;With the AWS environment in place, the &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;ocp-on-aws&lt;/a&gt; repository handles the rest of the cluster provisioning. The repository wraps the OpenShift IPI installer in a shell script and manages user creation, cluster-admin group configuration, and the pull secret in a structured, repeatable way.&lt;/p&gt;
&lt;h3 id="preparing-the-repository"&gt;Preparing the repository&lt;/h3&gt;
&lt;p&gt;Throughout the following steps, any value written in &lt;code&gt;&amp;lt;angle brackets&amp;gt;&lt;/code&gt; is a placeholder and must be replaced with your actual value before running the command.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clone the repository:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git clone https://github.com/alvarolop/ocp-on-aws
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ocp-on-aws
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Copy the authentication file templates:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cp auth/users.htpasswd.example auth/users.htpasswd
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cp auth/group-cluster-admins.yaml.example auth/group-cluster-admins.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="3"&gt;
&lt;li&gt;Generate a password hash for your user:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;htpasswd -b -B auth/users.htpasswd &amp;lt;user_name&amp;gt; &amp;lt;password&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Adjust &lt;code&gt;auth/group-cluster-admins.yaml&lt;/code&gt; to list the users that should receive &lt;code&gt;cluster-admin&lt;/code&gt; privileges:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;user.openshift.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Group&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;cluster-admins&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;users&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;redhat&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;&amp;lt;user_name&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="configuring-the-installation"&gt;Configuring the installation&lt;/h3&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Copy the configuration template:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cp aws-ocp4-config aws-ocp4-config-labs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="6"&gt;
&lt;li&gt;Open the configuration file and adjust the following parameters:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;vi aws-ocp4-config-labs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The key values to review:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;OPENSHIFT_VERSION&lt;/code&gt; (Line 6):&lt;/strong&gt; Set this to match your local &lt;code&gt;oc&lt;/code&gt; client version for maximum compatibility.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;RHPDS_TOP_LEVEL_ROUTE53_DOMAIN&lt;/code&gt; (Line 9):&lt;/strong&gt; The base DNS zone for your cluster; find this in the RHDP service overview.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; (Lines 16–18):&lt;/strong&gt; The programmatic access credentials from the RHDP environment, required to create the VPC and EC2 instances.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;RHOCM_PULL_SECRET&lt;/code&gt; (Line 31):&lt;/strong&gt; Retrieve this from the &lt;a href="https://console.redhat.com/openshift/install/pull-secret"&gt;Hybrid Cloud Console&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;WORKER_REPLICAS&lt;/code&gt; (Line 47):&lt;/strong&gt; Set to the number of worker nodes required for your workload.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="running-the-installation"&gt;Running the installation&lt;/h3&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Start the cluster installation:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./aws-ocp4-install.sh aws-ocp4-config-labs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The script invokes the OpenShift IPI installer and creates all required AWS infrastructure: VPC, subnets, EC2 instances, Elastic Load Balancers, and Route53 DNS records. The process typically takes 30 to 45 minutes. It is worth monitoring the AWS console in the corresponding region during this time to observe the resources coming up.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_20/aws_console.png"data-src="/images/posts/post_20/aws_console.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;EC2 instances and load balancers provisioned in AWS after the installation completes&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Once the installer finishes, the cluster API and console URLs, along with the &lt;code&gt;kubeconfig&lt;/code&gt; file, will be available in the output and in the &lt;code&gt;auth/&lt;/code&gt; directory of the repository.&lt;/p&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_20/argo_cd.png"data-src="/images/posts/post_20/argo_cd.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Argo CD applications deployed as part of the cluster bootstrap&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The installation script also bootstraps a set of &lt;em&gt;Argo CD&lt;/em&gt; applications that manage cluster-level configurations through GitOps from the start. This gives the cluster a solid, declarative baseline before any additional workloads are installed.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The combination of the AWS blank environment and the &lt;code&gt;ocp-on-aws&lt;/code&gt; repository makes it straightforward to spin up a fully functional OpenShift cluster in under an hour with minimal manual intervention. The IPI installer handles the infrastructure details, and the GitOps bootstrap ensures a consistent cluster configuration from the first login.&lt;/p&gt;
&lt;p&gt;With the cluster in place, the next step is installing OpenShift AI and enabling GPU support, which is covered in the follow-up post: &lt;a href="/2026/installing-openshift-ai-on-openshift/"&gt;Install OpenShift AI on OpenShift&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;ocp-on-aws - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;rhoai-gitops - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Red Hat Demo Platform - &lt;a href="https://catalog.demo.redhat.com/catalog"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenShift CLI - Getting started - &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;AWS CLI - Installation guide - &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;htpasswd - &lt;a href="https://httpd.apache.org/docs/current/programs/htpasswd.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Red Hat Hybrid Cloud Console - Pull Secret - &lt;a href="https://console.redhat.com/openshift/install/pull-secret"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Installing OpenShift AI on OpenShift</title><link>/2026/installing-openshift-ai-on-openshift/</link><pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate><guid>/2026/installing-openshift-ai-on-openshift/</guid><description>&lt;figure&gt;&lt;img src="/images/posts/post_21/overview.png"data-src="/images/posts/post_21/overview.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;From GitOps repo to OpenShift AI deployment with verified GPU access in minutes - AI generated]&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In this post, I want to describe how to install &lt;strong&gt;Red Hat OpenShift AI&lt;/strong&gt; on an existing OpenShift cluster and configure it to run GPU-accelerated workloads. The approach uses the &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;rhoai-gitops&lt;/a&gt; repository, created and maintained by my team mate &lt;strong&gt;Álvaro López Medina&lt;/strong&gt;, which automates the installation of OpenShift AI, the required operators, and the NVIDIA GPU stack through a single script backed by a &lt;em&gt;GitOps&lt;/em&gt; approach.&lt;/p&gt;
&lt;p&gt;If you do not have an OpenShift cluster available yet and want to provision one on AWS, a previous post &lt;a href="/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/"&gt;Deploying OpenShift on AWS with Automated Cluster Provisioning&lt;/a&gt; covers exactly that. The steps below pick up where that post leaves off, though they apply equally to any running OpenShift cluster.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;Before proceeding, ensure the following are in place:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A running OpenShift cluster with sufficient compute capacity&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;OpenShift CLI (oc)&lt;/a&gt; installed and available on your workstation&lt;/li&gt;
&lt;li&gt;Cluster-admin access&lt;/li&gt;
&lt;li&gt;If GPU support is needed: sufficient AWS quota for GPU instance types&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="selecting-the-correct-gpu-instance-node-type"&gt;Selecting the correct GPU instance node type&lt;/h2&gt;
&lt;p&gt;Selecting the right GPU instance type for your workload is a decision that is worth getting right before you provision anything, the instance family determines not just raw performance but also memory capacity, which directly constrains which models you can load and at what precision. Undersizing leads to out-of-memory failures, oversizing means paying for capacity you do not use.&lt;/p&gt;
&lt;p&gt;Consult the &lt;a href="https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html"&gt;AWS recommended GPU instances for deep learning&lt;/a&gt; to identify instance families suited to your workload, then cross-reference with the &lt;a href="https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-regions.html"&gt;EC2 instance type availability by region&lt;/a&gt; to confirm that your target region actually offers the instance type you need. GPU instance availability varies significantly across regions and is a common source of unexpected quota errors at deployment time.&lt;/p&gt;
&lt;p&gt;The following AWS instance types are commonly used in OpenShift AI GPU deployments:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance Name&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;GPU RAM&lt;/th&gt;
&lt;th&gt;vCPUs&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;g5.4xlarge&lt;/td&gt;
&lt;td&gt;1x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;24 GiB&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;64 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;g5.12xlarge&lt;/td&gt;
&lt;td&gt;4x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;96 GiB&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;192 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;g5.24xlarge&lt;/td&gt;
&lt;td&gt;4x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;96 GiB&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;384 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;g5.48xlarge&lt;/td&gt;
&lt;td&gt;8x NVIDIA A10G&lt;/td&gt;
&lt;td&gt;192 GiB&lt;/td&gt;
&lt;td&gt;192&lt;/td&gt;
&lt;td&gt;768 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p4d.24xlarge&lt;/td&gt;
&lt;td&gt;8x NVIDIA A100&lt;/td&gt;
&lt;td&gt;320 GiB&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;1,152 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="installing-openshift-ai"&gt;Installing OpenShift AI&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Clone the &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;rhoai-gitops&lt;/a&gt; repository:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git clone https://github.com/alvarolop/rhoai-gitops
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; rhoai-gitops
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="2"&gt;
&lt;li&gt;Open the installation script and review the GPU-related configuration:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;vi auto-install.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The three parameters that matter most for GPU-enabled deployments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;CREATE_GPU_MACHINESETS&lt;/code&gt; (Line 9):&lt;/strong&gt; When set to &lt;code&gt;true&lt;/code&gt;, the script automatically creates &lt;em&gt;MachineSets&lt;/em&gt; for GPU nodes. Set to &lt;code&gt;false&lt;/code&gt; if you do not need GPU support initially.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;GPU_NODE_COUNT&lt;/code&gt; (Line 10):&lt;/strong&gt; Total number of GPU nodes to provision. The nodes are distributed across Availability Zones a, b, and c for resilience.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;AWS_GPU_INSTANCE&lt;/code&gt; (Line 18):&lt;/strong&gt; Defaults to &lt;code&gt;g5.4xlarge&lt;/code&gt;, which provides an NVIDIA A10G GPU per node. Adjust based on the workload requirements and available quota.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Throughout the following steps, any value written in &lt;code&gt;&amp;lt;angle brackets&amp;gt;&lt;/code&gt; is a placeholder and must be replaced with your actual value before running the command.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Log in to the OpenShift cluster:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc login -u &amp;lt;user_name&amp;gt; &amp;lt;cluster_api_url&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="4"&gt;
&lt;li&gt;Run the installation script:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./auto-install.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The script installs the required operators — including the &lt;em&gt;OpenShift AI Operator&lt;/em&gt;, the &lt;em&gt;Node Feature Discovery Operator&lt;/em&gt;, and the &lt;em&gt;NVIDIA GPU Operator&lt;/em&gt; — and provisions GPU MachineSets if configured to do so. Depending on node provisioning times, the complete process takes 15 to 30 minutes.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Confirm that the GPU worker nodes have joined the cluster:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get machineset -n openshift-machine-api
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get machine -n openshift-machine-api
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get nodes
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start="6"&gt;
&lt;li&gt;Verify that the NVIDIA driver is loaded and that the GPU is accessible:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc &lt;span class="nb"&gt;exec&lt;/span&gt; -it -n nvidia-gpu-operator &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;$(&lt;/span&gt;oc get pod -o wide -l openshift.driver-toolkit&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -o &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;{.items[0].metadata.name}&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -n nvidia-gpu-operator&lt;span class="k"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -- nvidia-smi
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure&gt;&lt;img src="/images/posts/post_21/nvidia_smi.png"data-src="/images/posts/post_21/nvidia_smi.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;nvidia-smi output confirming GPU access from within the NVIDIA GPU Operator pod&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Check the &lt;em&gt;Argo CD&lt;/em&gt; applications deployed as part of the GitOps installation:&lt;/li&gt;
&lt;/ol&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_21/argo_cd.png"data-src="/images/posts/post_21/argo_cd.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Argo CD application overview after the rhoai-gitops installation completes&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;All applications should be in a healthy and synced state before proceeding to configuration.&lt;/p&gt;
&lt;h2 id="configuring-openshift-ai-for-gpu-workloads"&gt;Configuring OpenShift AI for GPU Workloads&lt;/h2&gt;
&lt;p&gt;With OpenShift AI installed, a small amount of configuration is needed to allow workbenches to schedule onto the GPU nodes. GPU nodes in OpenShift are typically tainted with &lt;code&gt;nvidia.com/gpu:NoSchedule&lt;/code&gt; to prevent standard workloads from landing on them accidentally. Workbenches that need GPU access must be configured with a matching toleration.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check the taints applied to the GPU nodes:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc get nodes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;oc describe node &amp;lt;gpu_node_name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The relevant taint will appear as &lt;code&gt;nvidia.com/gpu=:NoSchedule&lt;/code&gt; in the node description.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;
&lt;p&gt;In the OpenShift AI console, navigate to &lt;strong&gt;Settings &amp;gt; Hardware Profiles&lt;/strong&gt; and create a new profile (for example, &lt;code&gt;nvidia-gpu&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add a &lt;strong&gt;Toleration&lt;/strong&gt; with the following values:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;nvidia.com/gpu&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effect&lt;/td&gt;
&lt;td&gt;&lt;code&gt;NoSchedule&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operator&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Exists&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figure&gt;&lt;img src="/images/posts/post_21/toleration.png"data-src="/images/posts/post_21/toleration.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;Configuring a toleration for the NVIDIA GPU taint in the Hardware Profile&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This toleration allows workbenches assigned to this profile to be scheduled onto GPU nodes while keeping those nodes unavailable to other workloads.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;
&lt;p&gt;Create a new workbench and select the &lt;code&gt;nvidia-gpu&lt;/code&gt; hardware profile. The workbench pod will be scheduled on a GPU node.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Once the workbench is running, open a terminal and confirm GPU access:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;nvidia-smi
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure&gt;&lt;img src="/images/posts/post_21/nvidia_smi_2.png"data-src="/images/posts/post_21/nvidia_smi_2.png"
/&gt;&lt;figcaption&gt;
&lt;h4&gt;nvidia-smi output from inside an OpenShift AI workbench, confirming direct access to the NVIDIA A10G GPU&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For a complete reference on hardware profiles and toleration configuration, the &lt;a href="https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/managing_openshift_ai/managing-hardware-profiles"&gt;Red Hat OpenShift AI documentation&lt;/a&gt; covers the options in detail.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;rhoai-gitops&lt;/code&gt; repository makes the Red Hat OpenShift AI installation genuinely straightforward: one script handles the operator stack, the GPU node provisioning, and the GitOps wiring. The manual steps that remain — creating the hardware profile and configuring the workbench — are minimal and need to be done only once per cluster.&lt;/p&gt;
&lt;p&gt;The end result is an OpenShift AI environment with full GPU access, ready for running Jupyter notebooks, training jobs, or serving models. If you provisioned the underlying cluster using the approach described in &lt;a href="/2026/deploying-openshift-on-aws-with-automated-cluster-provisioning/"&gt;Deploying OpenShift on AWS with Automated Cluster Provisioning&lt;/a&gt;, the two repositories together cover the entire path from a blank AWS account to a working AI platform within a short timeframe of approximately two hours.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;rhoai-gitops - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/rhoai-gitops"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ocp-on-aws - GitHub repository by Álvaro López Medina - &lt;a href="https://github.com/alvarolop/ocp-on-aws"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Red Hat OpenShift AI - Managing Hardware Profiles - &lt;a href="https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/managing_openshift_ai/managing-hardware-profiles"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenShift AI - Product documentation - &lt;a href="https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenShift CLI (oc) - Getting started - &lt;a href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/cli_tools/openshift-cli-oc#cli-getting-started"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;NVIDIA GPU Operator documentation - &lt;a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;AWS EC2 instance type availability by region - &lt;a href="https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-regions.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;AWS recommended GPU instances for deep learning - &lt;a href="https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;G5-Instances von Amazon EC2 - &lt;a href="https://aws.amazon.com/de/ec2/instance-types/g5/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Amazon-EC2-P4-Instances - &lt;a href="https://aws.amazon.com/de/ec2/instance-types/p4/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>