Deploying AI Apps to Cloud Run: Node.js + Vertex AI
You have built a Node.js app that calls Vertex AI, tested it locally, and now you need it running in production. Cloud Run is the right first choice — serverless, fully managed, and scales to zero when idle.
This guide covers containerizing the app, handling authentication properly, deploying to Cloud Run, and wiring up continuous deployment.
Prerequisites
Make sure you have completed the Google Cloud project setup — billing enabled, required APIs enabled — and the Vertex AI Node.js integration working locally.
Project Structure
my-ai-app/
├── src/
│ ├── index.js # Express app
│ └── vertexai.js # Vertex AI client
├── package.json
├── Dockerfile
└── .dockerignore
A minimal Express app wrapping your Vertex AI calls:
// src/index.js import express from 'express'; import { generateText } from './vertexai.js'; const app = express(); app.use(express.json()); app.get('/health', (req, res) => res.json({ status: 'ok' })); app.post('/generate', async (req, res) => { const { prompt } = req.body; if (!prompt) return res.status(400).json({ error: 'prompt required' }); try { const text = await generateText(prompt); res.json({ text }); } catch (err) { console.error(err); res.status(500).json({ error: 'generation failed' }); } }); // Cloud Run injects PORT automatically — never hardcode 3000 const PORT = process.env.PORT || 8080; app.listen(PORT, () => console.log(`Listening on port ${PORT}`));
Dockerfile
FROM node:20-slim WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY src/ ./src/ ENV NODE_ENV=production ENV PORT=8080 EXPOSE 8080 CMD ["node", "src/index.js"]
# .dockerignore
node_modules
.env
*.md
.git
Two things matter here:
npm ci --only=productionskips dev dependencies — keeps the image lean- Cloud Run injects
PORTautomatically; your app must read it from the environment, not hardcode a port number
Authentication: The Critical Part
On Cloud Run, you do not use key files. Instead, assign a dedicated service account to the Cloud Run service and grant it Vertex AI access. The SDK picks up credentials automatically via the GCP metadata server.
Step 1: Create a service account
gcloud iam service-accounts create vertex-ai-runner \ --display-name="Vertex AI Cloud Run Runner"
Step 2: Grant it the Vertex AI User role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:vertex-ai-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/aiplatform.user"
Step 3: Attach it during deployment (shown in the deploy step below).
No GOOGLE_APPLICATION_CREDENTIALS environment variable, no JSON key files in your container. The SDK authenticates automatically.
Build and Push the Container
Cloud Run pulls images from Artifact Registry. Create a repository:
gcloud artifacts repositories create ai-apps \ --repository-format=docker \ --location=us-central1 \ --description="AI application containers"
Configure Docker auth and build:
gcloud auth configure-docker us-central1-docker.pkg.dev # Build docker build -t us-central1-docker.pkg.dev/YOUR_PROJECT_ID/ai-apps/my-ai-app:latest . # Push docker push us-central1-docker.pkg.dev/YOUR_PROJECT_ID/ai-apps/my-ai-app:latest
Or use Cloud Build to build remotely (avoids needing Docker locally):
gcloud builds submit \ --tag us-central1-docker.pkg.dev/YOUR_PROJECT_ID/ai-apps/my-ai-app:latest .
Deploy to Cloud Run
gcloud run deploy my-ai-app \ --image=us-central1-docker.pkg.dev/YOUR_PROJECT_ID/ai-apps/my-ai-app:latest \ --region=us-central1 \ --service-account=vertex-ai-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com \ --set-env-vars=GCP_PROJECT=YOUR_PROJECT_ID,GCP_LOCATION=us-central1 \ --memory=512Mi \ --cpu=1 \ --min-instances=0 \ --max-instances=10 \ --allow-unauthenticated
Flag notes:
--service-account— attaches the service account; the Vertex AI SDK authenticates automatically via the metadata server--min-instances=0— scales to zero when idle (saves cost for low-traffic apps)--max-instances=10— caps scaling; adjust based on expected load and your Vertex AI quota--allow-unauthenticated— makes the service publicly accessible; remove for internal APIs
Secrets with Secret Manager
For API keys or sensitive config, use Secret Manager instead of plain environment variables:
# Create a secret echo -n "your-api-key" | gcloud secrets create my-api-key --data-file=- # Grant the service account access gcloud secrets add-iam-policy-binding my-api-key \ --member="serviceAccount:vertex-ai-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/secretmanager.secretAccessor" # Reference in the deploy command gcloud run deploy my-ai-app \ --set-secrets=MY_API_KEY=my-api-key:latest \ ...
In your Node.js code, the secret appears as a normal environment variable: process.env.MY_API_KEY.
Continuous Deployment with Cloud Build
Add a cloudbuild.yaml to auto-deploy on every push to your main branch:
steps: - name: 'gcr.io/cloud-builders/docker' args: - 'build' - '-t' - 'us-central1-docker.pkg.dev/$PROJECT_ID/ai-apps/my-ai-app:$COMMIT_SHA' - '.' - name: 'gcr.io/cloud-builders/docker' args: - 'push' - 'us-central1-docker.pkg.dev/$PROJECT_ID/ai-apps/my-ai-app:$COMMIT_SHA' - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' args: - 'gcloud' - 'run' - 'deploy' - 'my-ai-app' - '--image=us-central1-docker.pkg.dev/$PROJECT_ID/ai-apps/my-ai-app:$COMMIT_SHA' - '--region=us-central1' - '--quiet' images: - 'us-central1-docker.pkg.dev/$PROJECT_ID/ai-apps/my-ai-app:$COMMIT_SHA'
Connect your GitHub repository to Cloud Build in the GCP console, and every push triggers a build and deploy.
Cold Starts and Latency
Cloud Run scales to zero by default, meaning the first request after a period of inactivity pays a cold start penalty — usually 1–3 seconds for a Node.js container.
For AI APIs where latency matters:
- Set
--min-instances=1to keep one instance always warm (~$15/month for a 256Mi instance) - Preload expensive dependencies at module load time, not inside request handlers — initializing the Vertex AI client once at startup instead of per-request cuts cold start time significantly
- Use startup probes via your
/healthendpoint so Cloud Run only marks an instance ready after initialization is complete
Verifying the Deployment
# Get the service URL gcloud run services describe my-ai-app --region=us-central1 --format='value(status.url)' # Test the endpoint curl -X POST https://YOUR_SERVICE_URL/generate \ -H "Content-Type: application/json" \ -d '{"prompt": "Explain what Vertex AI is in one sentence."}'
Once deployed, instrument your Vertex AI calls to track latency, token usage, and errors from day one — Cloud Run metrics alone won't show you what's happening inside your AI calls.