Local AI Assistance with Continue and Ollama for VS Code

·12 min read

blog/local-ai-assitance-with-continue-ollama-vscode

Table of Contents

Introduction

AI tools are changing how we write code. They can suggest solutions, automate boring tasks, and answer coding questions instantly. Many developers use online AI helpers like GitHub Copilot, but some want a tool that works on their own computer for better privacy and control.

This guide will show you how to set up your own AI coding assistant using two free tools: Continue (a VS Code add-on) and Ollama (a program that runs AI models on your computer). With this setup, you’ll have an AI helper that’s like a super-smart autocomplete, right on your own machine. It’s like having a coding expert always ready to help, without sending your code to the internet.

Why Go Local?

Before we dive into the setup, let’s briefly discuss why you might want to consider a local AI coding assistant:

  1. Privacy: Your code and queries never leave your machine, ensuring complete confidentiality.
  2. Customization: You have full control over which models to use and can easily switch between them.
  3. Offline Access: Work with AI assistance even without an internet connection.
  4. Cost-Effective: No subscription fees or usage limits to worry about.
  5. Learning Opportunity: Gain insights into how AI models work and experiment with different configurations.

Tools We’ll Be Using

Our local AI coding assistant setup will primarily use two main components:

  1. Continue: An open-source VS Code extension that provides AI-powered coding assistance.
  2. Ollama: A tool for easily running large language models on your local machine.

In this guide, we’ll be focusing on the following models:

  • Llama 3.1 8b: A powerful general-purpose model that performs well for coding tasks.
  • Gemma 2 9b: Another general-purpose model, offering a different perspective from Llama.
  • StarCoder2 3b and 7b: Code-specific models optimized for programming tasks.
  • Nomic Embed Text: A specialized model for generating text embeddings, crucial for code search and understanding.

ℹ️ Note: You don’t need to install all of these models. For chat functionality, choose either Llama 3.1 8b or Gemma 2 9b. For code completion, select either StarCoder2 3b or 7b depending on your system’s capabilities and available VRAM. In this guide, we’ll download all models to explore their performance, but you can select the ones that best fit your needs and system resources.

Setting Up Your Environment

Installing Continue

  1. Open Visual Studio Code.
  2. Go to the Extensions view (Cmd+Shift+X).
  3. Search for “Continue” and install the extension by Continue.dev.

Installing Ollama

  1. Open Terminal on your MacBook Pro.
  2. Visit ollama.com and follow the installation instructions for macOS.
  3. Once installed, you can run Ollama from the command line.

Configuring Ollama for Continue Integration

To enable Continue to communicate with Ollama, you need to configure Ollama to listen on all network interfaces. This step is crucial for the integration to work properly.

For macOS:

  1. If Ollama is run as a macOS application, set the environment variable using launchctl:
    launchctl setenv OLLAMA_HOST "0.0.0.0"
  2. Restart the Ollama application.

For Linux: If Ollama is run as a systemd service:

  1. Edit the systemd service:
    systemctl edit ollama.service
  2. Add the following lines to the editor that opens:
    [Service]
    Environment="OLLAMA_HOST=0.0.0.0"
  3. Save and exit the editor.
  4. Reload systemd and restart Ollama:
    systemctl daemon-reload
    systemctl restart ollama

Note: Setting OLLAMA_HOST to “0.0.0.0” allows Ollama to accept connections from Continue. While this also allows connections from any IP address on your network, Continue requires this setting to function correctly with Ollama.

Downloading Models

With Ollama installed, it’s time to download the models. Remember, you don’t need all of these models – choose based on your needs and system capabilities.

Note: For chat functionality, choose either Llama 3.1 8b or Gemma 2 9b. For code completion, select either StarCoder2 3b or 7b depending on your system’s VRAM. The Nomic Embed Text model is used for embeddings and is recommended for all setups.

Here are the commands to download each model:

# Choose one of these for chat functionality:
ollama pull llama3:8b    # ~4.7 GB
# OR
ollama pull gemma2:9b    # ~5 GB

# Choose one of these for code completion:
ollama pull starcoder2:3b # ~1.7 GB
# OR
ollama pull starcoder2:7b # ~4 GB

# Required for embeddings:
ollama pull nomic-embed-text # ~274 MB

Run the commands for the models you’ve chosen. The download process might take some time depending on your internet connection.

Important: Ensure you have sufficient disk space before proceeding. If you download all models, you’ll need around 16 GB of free space.

In this guide, we’ll download all models to explore their performance, but feel free to select only the ones that best fit your needs and system resources.

Configuring Continue with Ollama

Now that we have Continue and Ollama set up with our models downloaded, let’s configure Continue to use these local models:

  1. In VS Code, open the Command Palette (Cmd+Shift+P).
  2. Type “Continue: Open Config” and select it.
  3. In the config.json file, add or modify the following sections:

Note: if you have not downloaded some models, don’t add them here.

{
  "models": [
    // Include only the models you've downloaded
    {
      "title": "Llama 3.1 8b",
      "provider": "ollama",
      "model": "llama3:8b"
    },
    {
      "title": "Gemma 2 9b",
      "provider": "ollama",
      "model": "gemma2:9b"
    },
    {
      "title": "StarCoder2 3b",
      "provider": "ollama",
      "model": "starcoder2:3b"
    },
    {
      "title": "StarCoder2 7b",
      "provider": "ollama",
      "model": "starcoder2:7b"
    }
  ],
  "tabAutocompleteModel": {
    // Choose either StarCoder2 3b or 7b based on what you've downloaded
    "title": "StarCoder2 3b",
    "provider": "ollama",
    "model": "starcoder2:3b"
  },
  "embeddingsProvider": {
    "title": "Nomic Embed Text",
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

This configuration sets up multiple models for chat-based interactions, uses StarCoder2 3b for tab autocompletion, and Nomic Embed Text for generating embeddings (used in code search and understanding).

Understanding the strengths and use cases of each model is crucial for optimizing your local AI coding assistant. Let’s explore the characteristics of the models we’ve set up:

General-Purpose Models: Llama 3.1 8b and Gemma 2 9b

Both Llama 3.1 8b and Gemma 2 9b are versatile models suitable for a wide range of tasks, including coding assistance. Here’s how they compare:

  • Llama 3.1 8b:

    • Strengths: Broad knowledge base, good performance across various programming languages.
    • Use cases: General coding queries, explaining complex concepts, brainstorming ideas.
  • Gemma 2 9b:

    • Strengths: Potentially more up-to-date knowledge, might excel in certain specialized areas.
    • Use cases: Exploring alternative approaches, getting insights on newer programming trends.

In practice, you might find that one model performs better for certain types of queries or programming languages. It’s worth experimenting with both to see which one aligns better with your specific needs.

Code-Specific Models: StarCoder2 3b and 7b

StarCoder2 models are specifically trained on code, making them excellent choices for programming tasks:

  • StarCoder2 3b:

    • Strengths: Fast response times, efficient for quick suggestions.
    • Use cases: Real-time code completions, quick syntax checks, rapid prototyping.
  • StarCoder2 7b:

    • Strengths: More comprehensive code understanding, potentially more accurate for complex tasks.
    • Use cases: Detailed code explanations, complex refactoring suggestions, solving algorithmic problems.

The choice between 3b and 7b depends on your hardware capabilities and the complexity of your coding tasks. If you have a powerful system, StarCoder2 7b might provide more sophisticated assistance, while StarCoder2 3b is excellent for snappy, day-to-day coding help.

Embeddings Model: Nomic Embed Text

Nomic Embed Text plays a crucial role in enhancing Continue’s code search and understanding capabilities:

  • Purpose: Generates vector representations (embeddings) of code snippets and queries.
  • Use in RAG (Retrieval-Augmented Generation): These embeddings enable semantic search within your codebase. When you ask a question or request assistance, Continue can use these embeddings to find the most relevant parts of your code, providing context-aware responses.
  • Benefits: Improves the relevance of AI suggestions by grounding them in your specific codebase, enhancing features like code search and contextual completions.

By leveraging Nomic Embed Text, Continue can offer more tailored and project-specific assistance, going beyond generic coding knowledge to understand and work with your unique codebase.

Leveraging Continue’s Features

Now that we have our local AI coding assistant set up, let’s explore its key features and how to use them effectively:

Chat (⌘L / Ctrl+L)

The Chat feature allows you to interact with the AI model directly within your IDE.

  • Basic Use: Press ⌘L (Mac) or Ctrl+L (Windows/Linux) to open the chat sidebar. Type your question or request and press Enter.
  • Code Context: Highlight code before opening chat to include it as context.
  • Advanced Context: Use ’@’ to access additional context like @Codebase or @Files.

In this video you can see how I ask Llama3.1:8b what the fibonacci serie is and to write a js function to show the serie given an input number:

Autocomplete

Autocomplete provides AI-powered code suggestions as you type.

  • Enable: Click the “Continue” button in the status bar or check “Enable Tab Autocomplete” in settings.
  • Accept Suggestion: Press Tab to accept a full suggestion.
  • Partial Accept: Use ⌘→ (Mac) or Ctrl+→ (Windows/Linux) to accept parts of the suggestion word-by-word.
  • Reject: Press Esc to reject a suggestion.

Note: Autocomplete uses the model specified in your config.json file, not the one selected in the chat dropdown.

Edit (⌘I / Ctrl+I)

Edit allows you to modify code directly using AI suggestions.

  • Basic Use: Highlight code, press ⌘I (Mac) or Ctrl+I (Windows/Linux), then describe the desired changes.
  • Accept Changes: Use ⌘⌥Y (Mac) or Ctrl+Alt+Y (Windows/Linux) to accept, or ⇧⌘↵ (Mac) / Shift+Ctrl+Enter (Windows/Linux) to accept all.
  • Reject Changes: Use ⌘⌥N (Mac) or Ctrl+Alt+N (Windows/Linux) to reject, or ⇧⌘⌫ (Mac) / Shift+Ctrl+Backspace (Windows/Linux) to reject all.

In this video you can see how I asked Continue using Llama3.1:8b to write a fibonacci function:

Actions

Actions are shortcuts for common coding tasks.

  • Slash Commands: Type ’/’ in the chat to access built-in commands like /edit or /comment.
  • Quick Actions: (VS Code only) Enabled via settings, these appear as buttons above classes and functions.
  • Right-Click Actions: (VS Code only) Highlight code, right-click, and select an action from the menu.
  • Debug Action: (VS Code only) Use ⇧⌘R (Mac) or Ctrl+Shift+R (Windows/Linux) to get debugging advice based on terminal output.

Model Selection

You can switch between different AI models for chat:

  • Use the dropdown in the Continue sidebar to select different models.
  • Press ⌘’ (Mac) or Ctrl+’ (Windows/Linux) to quickly switch between models.

Remember, the model used for autocompletion is set in the config.json file and is separate from the chat model selection.

These features provide a powerful set of tools to enhance your coding workflow. Experiment with different features to find what works best for your coding style and projects.

Best Practices and Tips

  1. Experiment with different models: Each model has its strengths, so try them out for different tasks to see which performs best for your needs.

  2. Use tab completion judiciously: While powerful, AI-suggested completions aren’t always perfect. Use them as a guide but always review the suggestions.

  3. Leverage context providers: Continue offers various context providers (like @codebase or @file) to give the AI more information about your project. Use these to get more accurate and relevant assistance.

  4. Keep models updated: Periodically check for updates to the models using ollama pull <model_name> to ensure you’re using the latest versions.

  5. Consider compute resources: Larger models like StarCoder2 7b may require more computational power. If you notice slowdowns, consider using smaller models for day-to-day tasks and larger ones for more complex queries.

  6. Compare model outputs: When faced with a challenging problem, try asking the same question to different models. Comparing the outputs can provide diverse perspectives and help you arrive at the best solution.

  7. Use StarCoder2 for code-specific tasks: While Llama and Gemma are powerful general-purpose models, StarCoder2 is specifically trained on code. Prefer StarCoder2 models for tasks like code completion, refactoring, and language-specific queries.

Comparing Local Setup to Cloud-Based Alternatives

While cloud-based solutions like GitHub Copilot offer seamless integration and access to very large models, our local setup with Continue and Ollama provides several advantages:

  1. Privacy: Your code never leaves your machine, which is crucial for sensitive projects.
  2. Customization: You have full control over which models to use and can easily switch between them or even fine-tune them for your specific needs.
  3. No Subscription Costs: Once set up, you can use these tools without ongoing fees.
  4. Offline Work: Your AI assistant works even without an internet connection.
  5. Learning Opportunity: This setup allows you to dive deeper into how AI coding assistants work, providing valuable insights for developers interested in AI and machine learning.

However, it’s worth noting that cloud solutions may offer more up-to-date models and potentially more powerful hardware for running larger models. The choice between local and cloud-based solutions often comes down to your specific needs, privacy concerns, and the nature of your projects.

Conclusion

Setting up Continue and Ollama on your computer gives you a powerful AI coding helper that’s all yours. You can choose different AI models for different jobs - like using StarCoder2 for quick code suggestions or Llama for solving tricky problems. This setup works offline and keeps your code private, which is great for sensitive projects.

With this tool, you’re not just following the AI trend - you’re taking control of it. You can customize it to fit how you work best. As you use it more, you’ll find new ways to make your coding faster and smarter.

Remember, AI is always improving. With your local setup, it’s easy to add new models as they come out. This means your coding assistant can keep getting better over time. You’ve now got a cutting-edge tool that not only helps you code better today but also prepares you for the future of programming. Enjoy your new AI coding sidekick and happy coding!

Enjoyed this article? Subscribe for more!

Stay Updated

Get my new content delivered straight to your inbox. No spam, ever.

Related PostsAI, Development, LLMs

Pedro Alonso

I'm a software developer and consultant. I help companies build great products.
Contact me by email, and check out my MVP fastforwardiq.com for summarizing YouTube videos for free!

Get the latest articles delivered straight to your inbox.