@@ -5,30 +5,44 @@ info: Any user with at least the Maintainer role can merge updates to this conte
...
@@ -5,30 +5,44 @@ info: Any user with at least the Maintainer role can merge updates to this conte
title:Model Migration Process
title:Model Migration Process
---
---
## Introduction
## Current Migration Issues
LLM models are constantly evolving, and GitLab needs to regularly update our AI features to support newer models. This guide provides a structured approach for migrating AI features to new models while maintaining stability and reliability.
The table below shows current open issues labeled with `AI Model Migration`. This provides a live view of ongoing model migration work across GitLab.
query: label = "AI Model Migration" AND opened = true
```
## Purpose
*Note: This table is dynamically generated using GitLab Query Language (GLQL) when viewing the rendered documentation. It shows up to 10 open issues with the AI Model Migration label, sorted by most recently updated.*
Provide a comprehensive guide for migrating AI models within GitLab.
## Quick Links
### Expected Duration
-**[GitLab AI Features - Default GitLab AI Vendor Models](https://duo-feature-list-754252.gitlab.io/)**: View all features and their current model mappings
-**[AI Model Version Migration Initiative Epic](https://gitlab.com/groups/gitlab-org/-/epics/15650)**: Central tracking epic for all model migration work
-**[AI Gateway Repository](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist)**: Where model configurations are managed
-**[Prompt Library](https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library)**: For evaluating models and prompts
## Introduction
LLM models are constantly evolving, and GitLab needs to regularly update our AI features to support newer models. This guide provides a structured approach for migrating AI features to new models while maintaining stability and reliability.
## Model Migration Timelines
Model migrations typically follow these general timelines:
Model migrations typically follow these general timelines:
-**Simple Model Updates (Same Provider):**2-3 weeks
-**Simple Model Updates (Same Provider):**1-2 weeks
- Example: Upgrading from Claude Sonnet 3.5 to 3.6
- Example: Upgrading from Claude Sonnet 3.5 to 3.7
- Involves model validation, testing, and staged rollout
- Involves model validation, testing, and staged rollout
- Primary focus on maintaining stability and performance
- Primary focus on maintaining stability and performance
- Can sometimes be expedited when urgent, but 2 weeks is standard
-**Complex Migrations:** 1-2 months (full milestone or longer)
-**Complex Migrations:** 1-2 months (full milestone or longer)
- Example: Adding support for a new provider like AWS Bedrock
- Example: Adding support for a new provider like AWS Bedrock
- Example: Major version upgrades with breaking changes (e.g., Claude 2 to 3)
- Example: Major version upgrades with breaking changes (e.g., Claude 2 to 3)
- Requires significant API integration work
- Requires significant API integration work
- May need infrastructure changes
- May need infrastructure changes
- Extensive testing and validation required
### Timeline Factors
### Timeline Factors
...
@@ -45,123 +59,388 @@ Several factors can impact migration timelines:
...
@@ -45,123 +59,388 @@ Several factors can impact migration timelines:
- Always err on the side of caution with initial timeline estimates
- Always err on the side of caution with initial timeline estimates
- Use feature flags for gradual rollouts to minimize risk
- Use feature flags for gradual rollouts to minimize risk
- Plan for buffer time to handle unexpected issues
- Plan for buffer time to handle unexpected issues
- Communicate conservative timelines externally while working to deliver faster
- Prioritize system stability over speed of deployment
- Prioritize system stability over speed of deployment
{{<alerttype="note">}}
{{<alerttype="note">}}
While some migrations can technically be completed quickly, we typically plan for longer timelines to ensure proper testing and staged rollouts. This approach helps maintain system stability and reliability.
While some migrations can technically be completed quickly, we typically plan for longer timelines to ensure proper testing and staged rollouts. This approach helps maintain system stability and reliability.
{{</alert>}}
## Team Responsibilities
Model migrations involve several teams working together. This section clarifies which teams are responsible for different aspects of the migration process.
### RACI Matrix for Model Migrations
| Task | AI Framework | Feature Teams | Product | Infrastructure |
| Model configuration file creation | R/A | C | I | I |
| Infrastructure compatibility | R/A | I | I | C |
| Feature-specific prompt adjustments | C | R/A | I | I |
| Evaluations & testing | C | R/A | I | I |
| Feature flag implementation | C | R/A | I | I |
| Rollout planning | C | R/A | C | I |
| Documentation updates | C | R/A | C | I |
| Monitoring & incident response | C | R/A | I | C |
R = Responsible, A = Accountable, C = Consulted, I = Informed
## Migration Process
{{<alerttype="note">}}
**Model Mapping Resource**: You can see which features use which models and versions via the [GitLab AI Features - Default GitLab AI Vendor Models](https://duo-feature-list-754252.gitlab.io/) page.
{{</alert>}}
{{</alert>}}
## Scope
### Standard Migration Process
1.**Initialization**
- AI Framework team creates an Issue in the [AI Model Version Migration Initiative Epic](https://gitlab.com/groups/gitlab-org/-/epics/15650)
- Issue should use the naming convention: `AI Model Migration - Provider/Model/Version`
- Apply the [`AI Model Migration`](https://gitlab.com/gitlab-org/gitlab/-/labels?subscribed=&sort=relevance&search=AI+Model+Migration#) label
- AI Framework team adds model configuration to AI Gateway
- AI Framework team verifies infrastructure compatibility
1.**Feature Team Implementation**
- Feature teams create implementation plans
- Feature teams adjust prompts if needed
- Feature teams implement feature flags for controlled rollout
1.**Testing & Validation**
- Feature teams run evaluations against the new model
- AI Framework team provides evaluation support
1.**Deployment**
- Feature teams manage feature flag rollout
- Feature teams monitor performance and make adjustments
1.**Completion**
- Feature teams remove feature flags when migration is complete
- Feature teams update documentation
Applicable to all AI model-related teams at GitLab. We currently support using Anthropic and Google Vertex models. Support for AWS Bedrock models is proposed in [issue 498119](https://gitlab.com/gitlab-org/gitlab/-/issues/498119).
### Model Deprecation Process
## Prerequisites
1.**Identification & Planning**
- AI Framework team monitors provider announcements
- AI Framework team creates an epic: `Replace discontinued [model] with [replacement]`
- Epic should have the `AI Model Migration` label
- Set due date at least 2-4 weeks before provider's cutoff date
- AI Framework team identifies replacement models
1.**Evaluation**
- AI Framework team evaluates replacement models
- Feature teams test affected features with candidates
- Teams determine the best replacement model
1.**Implementation**
- AI Framework team creates model configuration files
- Feature teams update features to use the replacement model
- Teams implement feature flags for controlled rollout
1.**Testing**
- Feature teams run comprehensive evaluations
- Teams document performance metrics
1.**Deployment**
- Feature teams manage phased rollout via feature flags
- Teams monitor performance closely
- Rollout expands gradually based on performance
1.**Completion**
- Remove feature flags when migration is complete
- Update documentation
- Clean up deprecated model references
## Prerequisites for Model Migration
Before starting a model migration:
Before starting a model migration:
- Create an issue under the [AI Model Version Migration Initiative epic](https://gitlab.com/groups/gitlab-org/-/epics/15650) with the following:
1.**Create an issue** under the [AI Model Version Migration Initiative epic](https://gitlab.com/groups/gitlab-org/-/epics/15650):
- Label with `group::ai framework`
- Label with `group::ai framework` and `AI Model Migration`
- Document any known behavioral changes or improvements in the new model
- Document behavioral changes or improvements
- Include any breaking changes or compatibility issues
- Include any breaking changes or compatibility issues
- Reference any model provider documentation about the changes
- Reference provider documentation
- Verify the new model is supported in our current AI-Gateway API specification by:
1.**Verify model support** in AI Gateway:
- Check model definitions:
- Check model definitions in AI gateway:
- For LiteLLM models: `ai_gateway/models/v2/container.py`
- For LiteLLM models: `ai_gateway/models/v2/container.py`
- For Anthropic models: `ai_gateway/models/anthropic.py`
- For Anthropic models: `ai_gateway/models/anthropic.py`
- For new providers: Create new model definition file
- For new providers: Create a new model definition file in `ai_gateway/models/`
- Set up the [AI gateway development environment](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#how-to-run-the-server-locally)
- Stop tokens
- Configure API keys in `.env` file
- Timeout settings
- Test using Swagger UI at `http://localhost:5052/docs`
- Completion type (text or chat)
- Create an issue for new model support if needed
- Max token limits
- Review provider API documentation for breaking changes
- Testing the model locally in AI gateway:
- Set up the [AI gateway development environment](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#how-to-run-the-server-locally)
1.**Ensure access** to testing environments and monitoring tools
- Configure the necessary API keys in your `.env` file
- Test the model using the Swagger UI at `http://localhost:5052/docs`
1.**Complete model evaluation** using the [Prompt Library](https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/how-to/run_duo_chat_eval.md)
- If the model isn't supported, create an issue in the [AI gateway repository](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist) to add support
- Review the provider's API documentation for any breaking changes:
### Additional Prerequisites for Model Deprecations
-[Anthropic API Documentation](https://docs.anthropic.com/claude/reference/versions)
-[Google Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs/reference)
For model deprecations:
- Ensure you have access to testing environments and monitoring tools
1.**Create an epic** when a deprecation is announced:
- Complete model evaluation using the [Prompt Library](https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/how-to/run_duo_chat_eval.md)
- Label with `group::ai framework` and `AI Model Migration`
- Document the deprecation timeline
- Include provider migration recommendations
- Reference the deprecation announcement
- List all affected features
1.**Evaluate replacement models**:
- Document evaluation criteria
- Run comparative evaluations
- Consider regional availability
- Assess infrastructure changes required
1.**Create migration timeline**:
- Set completion target at least 2-4 weeks before cutoff
- Include time for each feature update
- Plan for gradual rollout
- Allow time for infrastructure changes
{{<alerttype="note">}}
{{<alerttype="note">}}
Documentation of model changes and deprecations is crucial for tracking impact and future troubleshooting. Always create an issue before beginning any migration process.
{{</alert>}}
Documentation of model changes is crucial for tracking the impact of migrations and helping with future troubleshooting. Always create an issue to track these changes before beginning the migration process.
## Implementation Guidelines
{{</alert>}}
### Feature Team Migration Template
Feature teams should use the following [template](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/issue_templates/AI%20Model%20Rollout%20Plan.md?ref_type=heads) to implement model migrations. See an example from our [Claude 3.7 Sonnet Code Generation Rollout Plan](https://gitlab.com/gitlab-org/gitlab/-/issues/521044).
### Anthropic Model Migration Tasks
**AI Framework Team:**
## Migration Tasks
- Add new model to AI gateway configurations
- Verify compatibility with current API specification
- Verify the model works with existing API patterns
- Create model configuration file
- Document model-specific parameters or behaviors
- Verify infrastructure compatibility
- Update model definitions following [prompt definition guidelines](actions.md#2-create-a-prompt-definition-in-the-ai-gateway)
### Migration Tasks for Anthropic Model
**Feature Team:**
-**Optional** - Investigate if the new model is supported within our current AI-Gateway API specification. This step can usually be skipped. However, sometimes to support a newer model, we may need to accommodate a new API format.
- Add new model to [available models list](https://gitlab.com/gitlab-org/gitlab/-/blob/32fa9eaa3c8589ee7f448ae683710ec7bd82f36c/ee/lib/gitlab/llm/concerns/available_models.rb#L5-10)
- Add the new model to our [available models list](https://gitlab.com/gitlab-org/gitlab/-/blob/32fa9eaa3c8589ee7f448ae683710ec7bd82f36c/ee/lib/gitlab/llm/concerns/available_models.rb#L5-10).
- Change default model in [AI-Gateway client](https://gitlab.com/gitlab-org/gitlab/-/blob/41361629b302f2c55e35701d2c0a73cff32f9013/ee/lib/gitlab/llm/chain/requests/ai_gateway.rb#L63-67) behind feature flag
- Change the default model in our [AI-Gateway client](https://gitlab.com/gitlab-org/gitlab/-/blob/41361629b302f2c55e35701d2c0a73cff32f9013/ee/lib/gitlab/llm/chain/requests/ai_gateway.rb#L63-67). Please place the change around a feature flag. We may need to quickly rollback the change.
- Update model references in feature-specific code
- Update the model definitions in AI gateway following the [prompt definition guidelines](actions.md#2-create-a-prompt-definition-in-the-ai-gateway)
- Implement feature flags for controlled rollout
- Test prompts with new model
- Monitor performance during rollout
- Update documentation
{{<alerttype="note">}}
While we're moving toward AI gateway holding the prompts, feature flag implementation still requires a GitLab release.
While we're moving toward AI gateway holding the prompts, feature flag implementation still requires a GitLab release.
{{</alert>}}
### Vertex Models Migration Tasks
### Migration Tasks for Vertex Models
**AI Framework Team:**
**Work in Progress**
- Activate model in Google Cloud Platform
- Update AI gateway to support new Vertex model
- Document model-specific parameters
## Feature Flag Process
**Feature Team:**
- Update model references in feature-specific code
- Implement feature flags for controlled rollout
- Test prompts with new model
- Monitor performance during rollout
- Update documentation
## Feature Flag Implementation
### Implementation Steps
### Implementation Steps
For implementing feature flags, refer to our [Feature Flags Development Guidelines](../feature_flags/_index.md).
For implementing feature flags, refer to our [Feature Flags Development Guidelines](../feature_flags/_index.md).
{{<alerttype="note">}}
{{<alerttype="note">}}
Feature flag implementations will affect self-hosted cloud-connected customers. These customers won't receive the model upgrade until the feature flag is removed from the AI gateway codebase, as they won't have access to the new GitLab release.
Feature flag implementations will affect self-hosted cloud-connected customers. These customers won't receive the model upgrade until the feature flag is removed from the AI gateway codebase, as they won't have access to the new GitLab release.
{{</alert>}}
{{</alert>}}
### Model Selection Implementation
### Model Selection Implementation
The model selection logic should be implemented in:
Implement model selection logic in:
- AI gateway client (`ee/lib/gitlab/llm/chain/requests/ai_gateway.rb`)
- AI gateway client (`ee/lib/gitlab/llm/chain/requests/ai_gateway.rb`)
- Model definitions in AI gateway
- Model definitions in AI gateway
- Any custom implementations in specific features that override the default model
- Any custom implementations in specific features
### Rollout Strategy
### Rollout Strategy
- Enable the feature flag for a small percentage of users/groups initially
1.**Enable feature flag** for small percentage of users/groups
- Monitor performance metrics and error rates using:
1.**Monitor performance** using:
-[Sidekiq Service dashboard](https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview) for error ratios and response latency
-[Sidekiq Service dashboard](https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview)
-[AI gateway metrics dashboard](https://dashboards.gitlab.net/d/ai-gateway-main/ai-gateway3a-overview?orgId=1) for gateway-specific metrics