Have a question?
Message sent Close

How to Integrate Databricks in Visual Studio Code Using GitHub Copilot?

Setting Up Databricks with Visual Studio Code and GitHub Copilot

Boost your data wrangling and analysis efficiency by seamlessly integrating GitHub Copilot within your Databricks environment through Visual Studio Code. This powerful combination enhances code completion, automates repetitive tasks, and unlocks creative solutions, significantly accelerating your development process.

Prerequisites:

  • A Databricks workspace with an active cluster.

Click here to Setup your Databricks account setup.

  • Visual Studio Code installed on your system.

Click here to download and install Visual Studio Code. 

  • A GitHub account and a paid or trial subscription to GitHub Copilot.

Click here Setting up you GitHub Copilot account.

Steps:

1. Install the Databricks Extension for VS Code:

  • Open VS Code and navigate to the Extensions tab (Ctrl+Shift+X or Cmd+Shift+X).
  • Search for “Databricks” and install the official extension by Databricks Labs.

  • Follow the on-screen instructions to configure your Databricks workspace and cluster connectivity.

Configure the Databricks plugin in Visual Studio Code.

If you’ve used Databricks CLI before, it might already be configured for you in the databricks file. If not then please use below link to download and install Azure CLI.

Click here to Download CLI

or

Create the following contents in ~/.databrickscfg file:

[DEFAULT]
host = https://your-databricks-workspace-url
token = your-access-token
jobs-api-version = 2.0

Select the “Configure Databricks” option, followed by choosing the initial selection from the dropdown menu, showcasing the hostname set up in the preceding step, and proceed with the “DEFAULT” profile.

You can click on “Show Quickstart” for more details.

Once you finish setting up, Visual Studio Code connects to Databricks. You’ll find workspace and cluster info by clicking the Databricks plugin.

After setting up your GitHub Copilot account, ensure you have access to it. Get GitHub Copilot and its Chat Plugins for VSCode from the Marketplace.

2. Install the GitHub Copilot Extension:

  • Go back to the Extensions tab and search for “GitHub Copilot”.
  • Select the official extension by GitHub and click “Install”.
github copilot setup in vs code

  • Complete the setup process, including linking your GitHub account and activating your Copilot subscription.

After installing the GitHub Copilot and Copilot Chat plugins, you’ll need to sign in to GitHub Copilot via the Visual Studio IDE. If the prompt to authorize doesn’t appear, simply click on the bell icon located in the bottom panel of the Visual Studio Code IDE.

github copilot

3. Connect to your Databricks Workspace:

  • Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P) and type “Databricks: Connect Cluster”.
  • Select your Databricks workspace and cluster from the list.

4. Start Developing with Copilot’s Assistance:

  • Open a new notebook, Python script, or SQL file in VS Code.
  • Start typing your code, and Copilot will suggest relevant completions based on your context and the Databricks libraries you’re using.
  • Accept suggestions by pressing Tab or Enter, or cycle through alternative suggestions using the arrow keys.
  • Remember, Copilot is still under development, so review and customize its suggestions before running your code.

Additional Tips:

Benefits:

  • Faster Coding: Automate repetitive tasks and generate common code patterns with Copilot’s suggestions.
  • Reduced Errors: Less manual typing means fewer typos and syntax errors.
  • Improved Productivity: Focus on the complex logic while Copilot handles the boilerplate code.
  • Exploration and Learning: Discover new approaches and libraries suggested by Copilot.

Example Code with Screenshot:

Scenario: You’re writing a simple Python script in VS Code to read data from a Databricks Delta table and calculate basic statistics.

Without Copilot:

# Import libraries
import databricks_connect

# Connect to Databricks
conn = databricks_connect.get_connection(...)

# Define table path
table_path = "/dbfs/myworkspace/data/my_table"

# Read data as DataFrame
df = conn.read_delta(table_path)

# Calculate mean of a column
avg_value = df["column_name"].mean()

# Print result
print(f"Average value: {avg_value}")

With Copilot:

(After typing “import databricks_connect” and pressing Tab):

import databricks_connect

# Connect to Databricks (Copilot suggests the following line)
conn = databricks_connect.get_connection(cluster_id="my_cluster_id")

# Define table path
table_path = "/dbfs/myworkspace/data/my_table"

# Read data as DataFrame (Copilot suggests using `spark_read.delta` from pyspark.sql)
df = spark_read.delta(spark, table_path)

# Calculate mean of a column (Copilot suggests using `df.select(...).avg(...)`)
avg_value = df.select("column_name").avg("column_name")

# Print result
print(f"Average value: {avg_value}")

Crafting a Data Engineering Pipeline with GitHub Copilot

Data engineers have the power to expedite the creation of data engineering pipelines effortlessly using GitHub Copilot, ensuring rapid development alongside comprehensive documentation. Here’s a step-by-step guide to swiftly create a basic data engineering pipeline leveraging prompting techniques.

Write a code to read files from the AWS S3 bucket using PySpark framework.

Remember:

  • Use Copilot responsibly and ethically, citing sources for any generated code snippets.
  • Always review and understand the code before running it, especially in production environments.

Start using this powerful combination today and witness how GitHub Copilot in VS Code can elevate your Databricks development experience!

Click here to learn more related topics

20 Comments

Leave a Reply