Category Archives: Azure Databricks
Azure Databricks – How to read CSV file from blob storage and push the data into a synapse SQL pool table
In this blog, we will learn how to read CSV file from blob storage and push data into a synapse SQL pool table using Azure Databricks python script. In part1 we created an Azure synapse analytics workspace, dedicated SQL pool in this we have seen how to create a dedicated SQL pool. In this blog, we will use a JDBC connection string to connect the SQL pool. Step 1: Sign to the Azure portal. Open Azure Databricks and click on lunch workspace to create a new Notebook. Step 2: Once the Azure Databricks Studio opens click on New Notebook and select your language, here I have selected “Python” language. Step 3: Add the following code to connect your dedicated SQL pool using the JDBC connection string and push the data into a table. Python script : from azure.storage.blob import BlobServiceClient import pandas as pd import io import pyspark.sql storage_account_name = ‘Your Storage account name’ storage_account_access_key = ‘Your Storage account access key’ spark.conf.set(‘fs.azure.account.key.’ + storage_account_name + ‘.blob.core.windows.net’, storage_account_access_key) blob_container = ‘Your container name’ filePath = “wasbs://” + blob_container + “@” + storage_account_name + “.blob.core.windows.net/Your CSV file name” empDf = spark.read.format(“csv”).load(filePath, inferSchema = True, header = True) connectionString=”Your JDSB connection sting;encrypt=true;trustServerCertificate=false;rewriteBatchedStatements=true;loginTimeout=30;” empDf.write.jdbc(connectionString,”[dbo].[Employee]”, mode=”append”) Step 4: You can get the JDBC connection string >> First open Synapse work space on the left pane in Analytics pools open SQL pool. Select your SQL pool, in overview you can find the link “Show database connection strings” and clicked on JDBC tab and copy the connection string. Step 7: Now click on the Run All button to execute the main script. Your script will execute successfully and also check in the SQL table. Hope this will help.
Share Story :
Azure Databricks – Part 1 – How to create Azure Databricks workspace and a Spark Cluster?
In this blog, we will learn how to create Azure Databricks workspace and a Spark Cluster step by step using the Azure portal. Create Azure Databricks workspace: Step 1: To create Azure Databricks workspace, sign in to the Azure portal. In the upper-left corner of the home page, select Create a resource. In the Search, the Marketplace box, enter Azure Databricks and select and press enter Step 2: Select Azure Databricks from the search result and click on the create button. Step 3: Click on the create button and enter the following information Subscription Resource group Workspace name Region Pricing tier Step 4: Click the Review + create tab before click on the create button. Once you click on the create button it will take 3 to 4 minutes to create a resource. Create a Spark Cluster in Azure Databricks: Step 1: In the Azure portal, go to the Databricks workspace that you created, and then click Launch Workspace Step 2: You are redirected to the Azure Databricks portal. From the portal, click New Cluster. Step 3: In the New cluster page, provide the values to create a cluster. Hope this will help.
Share Story :
Azure Databricks – Part 2 – How to read Amazon DynmoDB table data using NoteBooks
In this blog, we will learn how to connect AWS DynmoDB and read the table data using Python script step by step. Step 1: In the left pane, select Azure Databricks. From the Common Tasks, select New Notebook. Step 2: In the Create Notebook dialog box, enter a name, select Python as the language, and select the Spark cluster that you created earlier. Step 3: Once the Notebook creates you can write a python script to connect AWS DynmoDB using the boto3 client library. To connect AWS DynmoDB you must have an AWS access key ID and AWS secret access key. Python script : # Databricks notebook source import boto3 import pandas as pd session = boto3.session.Session(aws_access_key_id=’your AWS access key ID’,aws_secret_access_key=’your AWS secret access key’,region_name=’your region’) dynamodb = session.resource(“dynamodb”) table = dynamodb.Table(“Table Name”) response = table.scan() items = response[“Items”] data = pd.DataFrame(items) output = data.to_csv (index_label=”idx”, encoding = “utf-8”) print(output) Step 4: Now you can check the output by pressing the Shift + Enter key or click on the Run cell. Hope this will help.