Export data to Hadoop using Polybase – Insert into external table

Introduction

This post is a continuation of Polybase Query Service and Hadoop – Welcome SQL Server 2016

One of the most interesting use cases of Polybase is the ability to store historical data from relational databases into a Hadoop File System. The storage costs could be reduced while keeping the data accessible and still can be joined with the regular relational tables. So let`s do the first steps for our new archiving solution.

Requirements

Create a folder for the pdw_user in Hadoop

Polybase use the default user name pdw_user when connecting to the Hadoop cluster. For this example, I will use an unsecured Hadoop, that is, a Hadoop cluster without Kerberos authentication. For production environments a better security approach should be used.

Open a command line session with administrator rights and issue the following commands:

Create a directory for the pdw_user:

C:\>hadoop fs -mkdir /user/pdw_user

Change the ownership of the directory:

C:\>hadoop fs -chown -R pdw_user /user/pdw_user

Verify the results using the command line:

create pdw_user dir

Verify the results using the web browser:

In my case: http://localhost:50070/explorer.html#/user

browse user dir.JPG

You can name the directory whatever you want, important is to change the ownership to the pdw_user user.

Create an external data source and file format

Open a query window in management studio pointing to the AdventureworksDW2016CTP3 and run the following queries.

CREATE EXTERNAL DATA SOURCE HDP2 WITH
(
    TYPE = HADOOP,
    LOCATION = 'hdfs://localhost:9000'
)

CREATE EXTERNAL FILE FORMAT SalesExport WITH (
        FORMAT_TYPE = DELIMITEDTEXT,
        FORMAT_OPTIONS (
                    FIELD_TERMINATOR =';',
                    DATE_FORMAT = 'yyyy-MM-dd' ,
                    USE_TYPE_DEFAULT = TRUE
                           )
)

Create an external table

CREATE EXTERNAL TABLE HistoricalSales
(
    SalesOrderNumber nvarchar(20)
       ,SalesOrderLineNumber tinyint
       ,ProductName nvarchar(50)
       ,SalesTerritoryCountry nvarchar(50)
       ,OrderQuantity smallint
       ,UnitPrice money
       ,ExtendedAmount money
       ,SalesAmount money
       ,OrderDate date
)
WITH
(
    LOCATION = '/user/pdw_user',
    DATA_SOURCE = HDP2,
    FILE_FORMAT = SalesExport,
    REJECT_TYPE = value,
    REJECT_VALUE=0
)

The key point here is the location. It must point to a directory and not to a specific file like in my previous post. If the location does not exist It will be created.

Insert into external table

This example uses the Adventure Works DW database:

-- Enable INSERT into external table
sp_configure 'allow polybase export', 1;
reconfigure

-- Export data: Move old data to Hadoop while keeping it query-able via an external table.
INSERT INTO [dbo].[HistoricalSales]
 SELECT 
       [SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,p.EnglishProductName as ProductName
      ,st.SalesTerritoryCountry
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[SalesAmount]
      ,convert(date,[OrderDate]) AS [OrderDate]
  FROM [AdventureworksDW2016CTP3].[dbo].[FactInternetSales] a
  inner join dbo.DimProduct p on a.ProductKey = p.ProductKey
  inner join dbo.DimSalesTerritory st 
  on st.SalesTerritoryKey = a.SalesTerritoryKey
  where year(OrderDate) < 2011

insert stmt.JPG

Examining the results

Using the web browser:

browse dir created files

Polybase export operation creates multiple files under the specified location.The external files are named QueryID_date_time_ID.format, where ID is an incremental identifier and format is the exported data format.

Select the exported data from the external table:

select ext table stmt.JPG

Conclusion

With this small tutorial I demonstrated how to use SQL Server 2016 and Hadoop to create a cost effective and functional archiving solution. There are still several other aspects to explain and be considered but we can start to build our proof of concepts, let’s get started.

References

PolyBase Queries

Apache Hadoop File System Shell Commands

Acknowledgments

Special thanks to Sumin Mohanan and Sonya Marshall from Microsoft to helped me to troubleshoot my tests.

About Paul Hernandez

I'm an Electronic Engineer and Computer Science professional, specialized in Data Analysis and Business Intelligence Solutions. Also a father, swimmer and music lover.
This entry was posted in Big Data, hadoop, SQL Server and tagged , , , , . Bookmark the permalink.

13 Responses to Export data to Hadoop using Polybase – Insert into external table

  1. Muy interesante Paul!

  2. Pingback: Inserting Into External Tables – Curated SQL

  3. Shady says:

    When I tried to export data to Hadoop, I got the following error:
    Msg 596, Level 21, State 1, Line 52
    Cannot continue the execution because the session is in the kill state.
    Msg 0, Level 20, State 0, Line 52
    A severe error occurred on the current command. The results, if any, should be discarded.

    • Hi Shady, could you please share your settings (Hadoop version, SQL version, OS, etc)?

      • Shady says:

        Hadoop Hortonworks HDP-2.4.2.0 on Linux Cent OS 6.5, SQL Server 2016 (13.0.1601.5) on Windows Server 2012 R2.

      • Hi Shady, I experience this error message casually today. I was not using Polyase, it was just my database was offline due to a restore. In your case I would have a look in the Polybase logs (they are located in the SQL Server installation folder) and also in the Hadoop logs. After you find the error message you can post it back. Good luck.

  4. Sendhil says:

    Hi Paul,

    This and the previous post really helped me to setup SQL 2016 with HDP 2.4 on Azure.

    There were some Java setting in PATH variable which should refer to jvm.dll and also the first line on the exported csv file from SQL Server needs to be deleted.

    Apart from the above points everything else worked nicely following the post.

    Thanks again.

    Regards,
    Sendhil

  5. Karthik says:

    Can i able use my own Hadoop(2.7.2) running in my local machine,to connect with Polybase? instead of using HDP and CDH?

    • Yes, even when it’s not officially supported I did it with my own built distribution. For the configuration, I would find which HDP comes with Hadoop 2.7.2

      • Karthik says:

        For that how can i map my hadoop distribution in polybase ?

      • Hi Karthik, when you configure the Polybase connectivity (as I explained here) , you need to specify the Hadoop version: sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
        GO
        the parameter @configvalue determine the connectivity setting, please refer to this link to more info PolyBase Connectivity Configuration
        I select for my post the option 7, which correspond to Hortonworks 2.1, 2.1, and 2.3 on Windows Server. In this link you can find which Hadoop version is included in the HDPs 2.1 to 2.3 HDP-versions. You need to know only two things, your Hadoop version 2.7.2, and you OS, Windows or Linux. I have not found a HDP with Hadoop 2.7.2, you need the 2.7.1 or the 2.7.3. But anyway, I would just use the @configvalue = 7 and run a test, I think it should work.

      • Karthik says:

        i have created the table, while fetching the data from the table am facing the below error,
        Msg 46916, Level 16, State 1, Line 2
        Queries that reference external tables are not supported by the legacy cardinality estimation framework. Ensure that trace flag 9481 is not enabled, the database compatibility level is at least 120 and the legacy cardinality estimator is not explicitly enabled through a database scoped configuration setting.

      • Karthik says:

        how do i configure to use my own hadoop distribution ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s