SAP Data Services

As big data technologies are stabilizing, the focus is shifting from making them work to integrating them with various enterprise systems. One such use-case is integration with SAP applications like SAP CRM, SAP ECC, SAP GTS etc. SAP Data Services which is ETL tool in SAP world comes really handy here as its provided ready integration with Hive.

Why integrate with Hive?

Though Hadoop has been touted as a system for unstructured data and it was very much the case in web-scale companies where Hadoop was created and evolved, it is no longer the case. Enterprise data is mostly structured. The closest it comes to unstructured data is the log data which itself has a very precise structure. Since Hive was the first tool to provide structured abstraction over Hadoop, the infrastructure pieces which were built around it are still very useful. One such piece is Hive warehouse which essentially is a segregated part of HDFS which Hive user owns and manages. The default location of hive warehouse is /user/hive/warehouse which can be changed to any convenient location but no one ever changes it.

Hive also has a metastore associated with it which mostly is mysql database (default is derby).To connect to hive from external systems, some interface is needed and that interface is provided by Hive Server2.

Hive Connector

SAP Data Services Hive connector connects with HiveServer2. Since Hive and SAP have different datatypes, a careful datatype mapping is also required to successfully export data to Hive tables.

InfoObjects has deep expertise in integrating SAP data sources with Hadoop using SAP Data Services. If you need help in integrating your SAP eco-system with Big Data, please contact us at bigdata@infoobjects.com and we would be glad to help you.