Skip to content
代码片段 群组 项目
未验证 提交 22068642 编辑于 作者: Madelein van Niekerk's avatar Madelein van Niekerk 提交者: GitLab
浏览文件

Add instructions and a helper to generate issue embeddings locally

上级 01b7977f
No related branches found
No related tags found
无相关合并请求
...@@ -76,3 +76,48 @@ The following process outlines the steps to get embeddings generated and stored ...@@ -76,3 +76,48 @@ The following process outlines the steps to get embeddings generated and stored
1. Add a new unit primitive: [here](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/merge_requests/918) and [here](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/155835). 1. Add a new unit primitive: [here](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/merge_requests/918) and [here](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/155835).
1. Use `Elastic::ApplicationVersionedSearch` to access callbacks and add the necessary checks for when to generate embeddings. See [`Search::Elastic::IssuesSearch`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/search/elastic/issues_search.rb) for an example. 1. Use `Elastic::ApplicationVersionedSearch` to access callbacks and add the necessary checks for when to generate embeddings. See [`Search::Elastic::IssuesSearch`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/search/elastic/issues_search.rb) for an example.
1. Backfill embeddings: [example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/154940). 1. Backfill embeddings: [example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/154940).
## Adding issue embeddings locally
### Prerequisites
1. [Make sure Elasticsearch is running](../advanced_search.md#setting-up-development-environment).
1. If you have an existing Elasticsearch setup, make sure the `AddEmbeddingToIssues` migration has been completed by executing the following until it returns:
```ruby
Elastic::MigrationWorker.new.perform
```
1. Make sure you can run [GitLab Duo features on your local environment](../ai_features/index.md#instructions-for-setting-up-gitlab-duo-features-in-the-local-development-environment).
1. Ensure running the following in a rails console outputs an embedding (a vector of 768 dimensions). If not, there is a problem with the AI setup.
```ruby
Gitlab::Llm::VertexAi::Embeddings::Text.new('text', user: nil, tracking_context: {}, unit_primitive: 'semantic_search_issue').execute
```
### Running the backfill
To backfill issue embeddings for a project's issues, run the following in a rails console:
```ruby
Gitlab::Duo::Developments::BackfillIssueEmbeddings.execute(project_id: project_id)
```
The task adds the issues to a queue and processes them in batches, indexing embeddings into Elasticsearch.
It respects a rate limit of 450 embeddings per minute. Reach out to `@maddievn` or `#g_global_search` in Slack if there are any issues.
### Verify
If the following returns 0, all issues for the project have embeddings:
<details><summary>Expand</summary>
```shell
curl "http://localhost:9200/gitlab-development-issues/_count" \
--header "Content-Type: application/json" \
--data '{"query": {"bool": {"filter": [{"term": {"project_id": PROJECT_ID}}], "must_not": [{"exists": {"field": "embedding"}}]}}}' | jq '.count'
```
</details>
Replacing `PROJECT_ID` with your project ID.
# frozen_string_literal: true
module Gitlab
module Duo
module Developments
class BackfillIssueEmbeddings
def self.execute(project_id:)
issues_to_backfill = Project.find(project_id).issues
puts "Adding #{issues_to_backfill.count} issue embeddings to the queue"
issues_to_backfill.each_batch do |batch|
batch.each do |issue|
::Search::Elastic::ProcessEmbeddingBookkeepingService.track_embedding!(issue)
end
end
while ::Search::Elastic::ProcessEmbeddingBookkeepingService.queue_size > 0
puts "Queue size: #{::Search::Elastic::ProcessEmbeddingBookkeepingService.queue_size}"
::Search::Elastic::ProcessEmbeddingBookkeepingService.new.execute
if ::Search::Elastic::ProcessEmbeddingBookkeepingService.queue_size > 0
puts 'Sleeping for 1 minute...'
sleep(60)
end
end
puts "Finished processing the queue.\nAll issues for project (#{project_id}) now have embeddings."
end
end
end
end
end
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册