Tag Archives: Azure Data Factory
How to create a Xero Data Source in Azure Data Factory
Hello, in this blog we will understand the steps required to create Xero as a data source in Azure Data Factory which can be then used to copy data from Xero into various target systems such as Azure SQL. Create a new Data Source in Azure Data Factory and search for Xero. Add the desired name and description for the data source. Select OAuth 2.0 as authentication and select the host as api.xero.com Go to App management | Xero (https://developer.xero.com/app/manage) and create a new Web app. Enter developer.xero.com as company/application URL. Go to OAuth 2.0 configuration copy the client id and client secret after generating it. Download and install Postman windows app and import this collection in Postman Collection Web View | Postman (getpostman.com) Create a new environment and initialize the following variables. Client ID, Client Secret are the same that are obtained from Xero. For “scopes” variable add offline_access accounting.transactions as initial value and openid profile email accounting.contacts accounting.settings as current value respectively. Save the environment. No go to collections and select ‘Get Started’, under the Auth tab select type as OAuth 2.0 Make sure the environment we just created is selected Call all the variables set in the environment by typing “{{” in their respective fields. Add https://login.xero.com/identity/connect/authorize to the Auth URL field Add https://identity.xero.com/connect/token to the Access Token Field Save the collection. Now After saving, click on Get new access token. Sign in into Xero and provide access to the organization from the pop-up window that appears Now select the access token which will be shown on screen, right click and select Set “your environment name” put its value in the access token variable whose value we have set before. Set value of refresh token variable using the same method and paste this refresh token in Azure data factory field as well. After setting Access token and refresh token, scroll to the top and click on Use this token button. Send the GET request to Xero API After successful API call you will get the tenant id as well, paste this tenant id in Azure data factory. Make sure encrypted endpoints are enabled. Test the connection and after successful testing click on apply The Xero Connector is ready for use. Hope this blog helped!!
Trigger Azure Pipeline with logic app
Hello, friends in this blog we will see how to trigger an azure data factory pipeline using a logic app. Step 1: Create an Azure Data Factory pipeline for your integration. Step 2: Create a logic app of your preference, for this blog, I am creating an HTTP trigger logic app. Step 3: Now click on add step and search for Azure data factory. Step 4: Select create pipeline run and fill in the required information. Step 5: Trigger your logic app and let it finish the run. Once that is done go to the monitor section of your Data factory and check whether the integration pipeline is triggered or not. Since we are triggering the pipeline from the logic app the triggered will be a manual trigger instead of your ADF trigger name. Hope this helps.
ADF’s Wrangling Data Flow (Power Query)– How do you get matched rows from the two data sources using Inner Joins?
In this blog, we will learn how to get matched rows from the two data sources using inner join in ADF’s Wrangling Data Flow step by step. Step 1: Add a Power query flow as per the below screenshot. Step 2: In the New power query give the proper power query name and add the data source that you want to merge. Here I am adding two datasets named “DS_EMP1” and “DS_EMP2”, both data sources have employee information. Step 3: By default, the UserQuery will point to the first dataset query. All the transformation should be done on the UserQuery. Step 4: Now click on Merge queries to merge your dataset. Step 5: select a table and matching columns to create a merge table, here I have select EmpID as a common key to merge the data, and the join kind will be “Inner”. Step 6: Once you click the OK button, you got a warning “Nested join must be expanded”. Step 7: Click on expand dataset button to expand your result and select columns whatever you want from the other data source, here in my case both the datasets have the same column name so I deselect all the columns from the result dataset. Step 8: Now the UserQuery will show the matched rows, that’s all you need to do to get matched rows in two data sources. Hope this will help.
ADF’s Mapping Data flows – How do you get matched rows from the two data sources using Inner Joins?
In this blog, we will learn how to get matched rows from the two data sources using inner join in ADF’s Mapping Data flows step by step. Step 1: Add a data flow activity and name as “InnerJoin_Test”, in the settings tab add a new data flow. Select the Source Settings tab, add a source transformation, and connect it to one of your datasets. Step 2: In the Data preview tab you can see your data. Step 3: Add another source and name “Employee2”, in the source settings tab connect it to one of your datasets. Step 4: In the Data preview tab you can see your data. Step 5: Add a Join transformation, named “InneJoin”. The Join transform will allow you to join 2 Data flow. In the Join settings tab set left the stream and right stream and select join type as inner. Apply to join conditions on the unique field, in this demo I pick up “Emp Id” as a join condition. Step 6: In the Data preview tab you can see the matched rows result, that’s all you need to do to get matched rows in two data sources. Hope this will help.
ADF’s Mapping Data flows – How do you get distinct rows and rows count from the data source?
In this blog, we will learn how to get distinct rows and rows count from the data source via ADF’s Mapping Data flows step by step. Step 1: Create an Azure Data Pipeline. Step 2: Add a data flow activity and name as “DistinctRows”. Step 3: Go to settings and add a new data flow. Select the Source Settings tab, add a source transformation, and connect it to one of your datasets. Step 3: In the Projection tab, it allows you the change the column data type. Here I have changed my Emp ID column to Integer. Step 4: In the Data preview tab you can see your data. Step 5: Add an Aggregate transformation, named “DistinctRows”. In the group by settings, you need to choose which column or combination of columns will make up the key(s) for ADF to determine distinct rows, here in this demo I pick up “Emp ID” as my key columns. Step 6: The inherent nature of the aggregate transformation is to block all metadata columns not used in the aggregate. But here, we are using the aggregate to filter out non-distinct rows, so we need every column from the original dataset. To do this, go to the aggregate settings and choose the column pattern. Here, you will need to make a choice between including the first set of values from the duplicate rows, or the last. Essentially, choose which row you want to be the source of truth. Step 7: That’s all you need to do to find distinct rows in your data, click on the Data preview tab to see the result. You can see the duplicate data have been removed. Step 8: The row counts are just aggregate transformation, to create a row counts go to Aggregate settings and use the function count(1). This will create a running count of every row. Hope this will help.
Creating On Premise SQL Server Linked Service in ADF
In this blog we will see how we can create on-premise SQL server linked serviceSteps Create Linked Service in ADF for SQL server It will ask for Server name, database and credentials along with integration runtime as we are using the On-premise SQL Server, we must configure Integration runtime. We have already created integration runtime (Self hosted) created. Below are steps to create Integration runtime (Self hosted) Click on new and select self-hosted integration runtime, Enter the name and Integration run time gets created, once it created it will displays following details, and click on express setup. Once downloaded, execute the setup, On azure portal we can see that, it is running Now we will create connection for the Salesforce and make sure that both source and destination running under same Integration runtime. Now we will create Datasets for the on-premise SQL and salesforce. Hope this helps!
Integration of CSV file from ADLS to Azure SQL DB using ADF
Go through the blog to understand how to integrate a CSV file stored in your ADLS to Azure SQL DB.
Creating Azure Data Factory
Login to azure portal, click on create a resource>Analytics>Data Factory Enter the required details and data factory gets created, enable git option will prompt to enter the git repository details which can later be used for CI/CD. In this way we can create Azure Data factory.
Incremental refresh in Azure Data Factory
In this article we are going to do Incremental refresh for Account entity from Dynamics 365 CRM to Azure SQL. Steps: Create Linked Service for Azure SQL and Dynamics 365 CRM and create a table in Azure SQL DB Now we will create pipeline, in the pipeline we have two blocks, one is for getting data from last modified date and other block is used to Copy data from D365 CRM to Azure SQL recordmark block:In the Query Section we are retrieving latest modified data, if there is no data in Azure SQL database for modified on column it will take ‘01-01-1999’ and start integrating data after this date. In preview section we can see the last modified data. Copy data from D365 CRM to ASQL block Source In this block we will get last modified date from recordmark block and based on this it will fetch data for account, In the query section we are retrieving the account data from D365 CRM. SinkNow we will create UDT (User Defined Datatype) in Azure SQL, it will take care of update operation of existing records and insertion of new records based on “code” column using user defined stored procedure. MappingBelow is mapping details, once completed with above steps we are good to run Pipeline. first we will check last modified date in Destination Now we will create Account in CRM Now we will run the pipeline. Now will check in destination, as we can see modified date/time is “2020-04-13 07:08:35.000” Now let us modify this record(Change Phone number to “987654321” Now run the pipeline again, Now we will check the data in Destination(Azure SQL Database) In this way we can perform incremental refresh in Azure data factory.
How to Connect your Azure Data Factory with Github
Introduction: In order to move your Pipeline from one environment to other you require your Pipeline to be saved on Github. This Blog will guide you through how to connect your Azure Data Factory Pipeline with your Github account. Step 1: Create a new Azure Data Factory and Tick the Enable GIT checkbox. Step 2: Create a new Repository in your Github Account. Step 3: Copy the URL from of the newly created repository from the address bar. Step 4: Enter the GIT URL, the repository name which you recently created, branch name (usually master) and your root folder (usually same as your repository name) and click on Create. Step 5: Now you must authorize your GIT Account on ADF. You will be promoted to Login with your Github Account. Do the necessary and click on Authorize Azure Data Factory. Step 6: Now you must select the Working Branch. We are using the default master branch. Click Save and then you can proceed to create your Pipeline. Conclusion: Each Time you publish the changes in ADF they are simultaneously stored in the GIT repository’s branch.
 
								 
															