New data sources like social media sites, website logs, mobile devices and sensors generate unprecedented amount of unstructured and semi-structured data. The explosion of new data sources has not only provided organizations new opportunities to grow revenues and reduce costs but has also opened the door to possibilities. But manual processes for reconciling fragmented, duplicate, inconsistent, inaccurate, and incomplete data, as well as fragmented point solutions, result in dubious data and delayed business insights that can’t be trusted.

A systematic approach that quickly and repeatedly transforms ever-increasing amounts of big data into business value without risk is clearly the ingredient for success. SYNTAX leverages the following offerings from Informatica to make your big data projects successful

1.0 Big Data Management

The gold standard in data management solutions for integrating, governing, and securing big data that your business needs to extract business value quickly.

The Hadoop eco-system is rapidly changing with new innovations continuously emerging in the open source community. Big Data Management builds on top of the open-source Hadoop framework and preserves all the transformation logic in your data pipelines. As a result, Hadoop innovations are implemented faster with less impact and risk to production systems.


Key Features:

  • Data Integration on Hadoop – This solution provides an extensive library of prebuilt data integration transformation capabilities that run natively on Hadoop so you can process all types of data at any scale—from terabytes to petabytes. Your IT team can rapidly develop data flows on Hadoop using a visual development environment that increases productivity over hand coding as much as five times.
  • Performance Optimization – Informatica’s Smart Optimizer enables you to execute the best engine for the highest performance, scalability, and resource utilization without having to rebuild data pipelines as new technologies emerge.
  • Dynamic Schemas & Mapping Templates – Informatica Big Data Management lets you generate hundreds of run-time data flows based on just a handful of design patterns using mapping templates. These mappings can be easily parametrized to handle dynamic schemas such as web and machine log files, which are common to big data projects. This means you can quickly build data flows that are easy to maintain and resilient to changing schemas.
  • Data Profiling on Hadoop – Data on Hadoop can be profiled through the Informatica developer tool and a browser-based analyst tool. This makes it easy for developers, analysts, and data scientists to understand the data, identify data quality issues earlier, collaborate on data flow specifications, and validate mapping transformation and rules logic.
  • Data Quality on Hadoop – Cleanse, match, and standardize data of any type and volume natively on Hadoop to deliver authoritative and trustworthy data. Use an extensive set of prebuilt data quality rules or create your own using the visual development environment. Execute address validation to parse, cleanse, standardize, and enrich global address data.
  • Complex Data Parsing on Hadoop – Informatica Big Data Management makes it easy to access and parse complex, multi-structured, unstructured, and industry-standard data such as Web logs, JSON, XML, and machine device data. Prebuilt parsers for market data and industry standards such as SWIFT, ACORD, HL7, HIPAA, and EDI are also available.
  • Universal Metadata Services – Data scientists and analysts now have a 360 view of their data with universal metadata services and knowledge graph, to quickly search, discover, and understand enterprise data and meaningful data relationships.
  • End-to-End Data Lineage – To ensure trust and regulatory compliance, data analysts and business users can view complete end-to-end data lineage. This visual data lineage includes a detailed history of all data movement and transformations (in Hadoop and traditional systems), from target applications all the way back to original source systems. Business/IT collaboration and search is enhanced with a business glossary of common business terms that relate to data objects and their corresponding data lineage.
  • Universal Data Access – Your IT team can access all types of big transaction data, including RDBMS, OLTP, OLAP, ERP, CRM, mainframe, cloud, and others. You can also access social media data, log files, machine sensor data, Hadoop, NoSQL formats, documents, emails, and other unstructured or multi-structured data types and data stores.
  • High-Speed Data Ingestion and Extraction – You can access, load, transform, and extract big data between source and target systems or directly into Hadoop, NoSQL data stores, or your data warehouse. High-performance connectivity through native APIs to source and target systems with parallel processing ensures high-speed data ingestion and extraction.
  • Data Discovery on Hadoop – Automate the discovery of data domains and relationships on Hadoop. For example, discover customer- and product-related data sets or sensitive data such as Social Security numbers and credit card numbers so that you can mask the data for compliance.


  • Lower Big Data Project Costs
  • Expand Hadoop Adoption Across the Enterprise
  • Minimize Risk of Adopting New Technologies

2.0 Big Data Parser

Informatica HParser provides organizations with the solution they require to extract the value of complex, unstructured data. This powerful data parsing capability in Hadoop empowers organizations to achieve new levels of productivity, efficiency and scalability. Organizations can readily augment their existing IT investments by using Informatica HParser as the standard for data parsing in Hadoop. Using Informatica HParser, customers benefit from an engine-based solution that covers the broadest range of data formats and greatly simplifies and speeds the analytical process by eliminating the risks and costs of one-off custom-coded parsing scripts.

Key Features:

  • Rapid, visual development – HParser’s visual Integrated Development Environment (IDE) for creating and maintaining transformations accelerates development and boosts developer productivity. HParser also turns deep hierarchy and relationships into a flattened, easier to use format while allowing for business rule validation.
  • Single engine covering a broad range of data formats – HParser’s ready-to-use transformation building blocks, or libraries, cover a wide range of general and industry-specific data formats including support for XML and JSON; SWIFT, X12, NACHA for the financial industry; HL7 and HIPAA for healthcare; ASN.1 for telecommunications; and market data.
  • Support for device-generated logs – HParser simplifies the parsing of complex device- or machine-generated content including proprietary log files such as Apache weblogs and Omniture logs.
  • Exploiting parallelism in MapReduce – HParser delivers optimized parsing performance for large files of complex data by running natively inside MapReduce and fully leveraging its parallelism.


  • Big Data objectives are always met in terms of establishing relationships among any data in any formats existing within the organization.

3.0 Big Data Relationship Manager

Ensure the success of big data analytics projects by uncovering accurate relationships among connected data.


  • Single view of party – Matches duplicate party information within and across multiple sources and links it to create a single view of the party
  • 360-degree view – Discovers relationships among parties based on common attributes, then groups them to create a 360-degree view
  • Appends social data – Actively maintains the relationships by appending any new social, demographic, and interaction data
  • Real-time search – Rapidly retrieves information about any party in real time


  • Improve big data analytics – Since big data systems integrate data across multiple internal and external systems, the data can be inconsistent and duplicated. Big Data Relationship Management matches duplicate party information within and across multiple sources and links it to create the most accurate information
  • Infer non-obvious relationships – Big Data Relationship Management infers non-obvious relationships among parties to automatically discover people within a household, organization, or a locale. It sorts parties based on common attributes, and groups them to create a 360-degree view of the party. The result: You can search and view the relationships in real time
  • View social relationships – With Big Data Relationship Management, you can discover and visualize relationships across vast amounts of disparate data brought in from social media. It creates and actively maintains the relationships by appending any new information, internal or external, about the party, such as social, demographic, and interaction data from sources like Facebook and LinkedIn
  • Receive results rapidly – You get rapid results since Big Data Relationship Management processes billions of records of data within hours. Since social media produces vast amounts of data, now business users see accurate and related information about parties in real-time — and can perform their daily tasks more efficiently

4.0 Intelligent Data Lake

The sheer volume of data being ingested into Hadoop systems is overwhelming IT. Business analysts eagerly await quality data from Hadoop. Meanwhile, IT is burdened with manual, time intensive processes to curate raw data into fit-for-purpose data assets. Big data cannot deliver on its promise if it brings progress to a grinding halt because of complex technologies and additional resources required to extract value.

Intelligent Data Lake enables raw big data to be systematically transformed into fit-for-purpose data sets for a variety of data consumers. With such an implementation, organizations can quickly and repeatedly turn big data into trusted information assets that deliver sustainable business value.



  • Find any Data – Informatica Intelligent Data Lake uncovers existing customer data through an automated machine-learning-based discovery process. This discovery process transforms correlated data assets into smart recommendations of new data assets that may be of interest to the analyst. Data assets can also be searched thanks to the metadata cataloguing process, which lets business analysts easily find and access nearly any data in their organization.
  • Discover data relationships that matter – Informatica Intelligent Data Lake effectively breaks down those data locked up in silos, while maintaining the data’s lineage and tracking its usage.
  • Quickly prepare and share the data – Informatica’s self-service data preparation provides a familiar and easy-to-use Excel-like interface for business analysts, allowing them to quickly blend data into the insights they need. Collaboration among data analysts also plays an important role.
  • Operationalize data preparation into re-usable workflows – Informatica Intelligent Data Lake lets you record data preparation steps and then quickly play back steps inside automated processes. This transforms data preparation from a manual process into a re-usable, sustainable, and operationalized machine.


5.0 Enterprise Information Catalog

Enterprise Information Catalog enables Business and IT users realize the full potential of their enterprise data assets by providing a unified metadata view that includes technical metadata, business context, user annotations, relationships, data quality and usage. Discover, classify, and govern your data with visibility into the end-to-end lineage of all data assets cross the enterprise.



  • Enterprise-wide data discovery – Data is growing too fast for manual stewardship. To scale in step with enterprise data growth, Informatica provides a machine-learning-based discovery engine that automatically scans the enterprise for data sources and enables business analysts and data stewards to find more data assets across the enterprise.
  • Business context – Effective data governance requires multi-persona collaboration. Informatica provides the ability to create business classifications and relate them to technical data assets as annotations. This dramatically improves discoverability and visibility of data assets.
  • Maximum discovery – Business analysts and data stewards don’t always know exactly what they are looking for. Informatica’s enhanced keyword searching, auto-complete, and search facets—based on data overlap, column similarity, and inferred domains—enable users to find the right data without having to know exactly what to look for
  • Lower compliance risk – Effective data governance requires knowing what is happening with data in addition to what it is. Detailed data profiling statistics, complete traceability of data movement with column/metric level lineage, as well as detailed impact analysis provide a 360-degree view of data assets for maximum compliance with controls and regulations


6.0 Intelligent Streaming

Informatica Intelligent Streaming allows organizations to prepare and process streams of data and uncover insights while acting in time to suit business needs. Intelligent Streaming provides pre-built high-performance connectors such as Kafka, HDFS, NoSQL databases, and enterprise messaging systems and data transformations to enable a code-free method of defining your data integration logic. And data flows can be scheduled to run at any latency (real time or batch) based on the resources available and business SLAs.

Derive maximum value from IoT streams by gathering and analyzing the information immediately and at an ever increasing scale.


  • High-performance streaming analytics with reliable quality of services – Informatica Intelligent Streaming collects, transforms, and joins data from a variety of sources, scaling for billions of events with a processing latency of less than a second. Data can be stored in Hadoop for ongoing use and to correlate streaming data with historical information. Choose from a number of qualities of service levels according to your business requirements.
  • Real-time decisions with business rules – Business users can write and execute a set of event-driven business rules against transformed and enriched streams of data through an easy-to-use thin-client rule builder. Users can define patterns, abnormalities, and events that, should they pose imminent risk or opportunity, trigger alerts so the right people can respond in real time.
  • Streaming data management on a foundation of open source technologies – Informatica Intelligent Streaming includes an extensive library of prebuilt transforms running natively on Spark Streaming to process all types of data at scale
  • Simple, centralized configuration, administration, and monitoring – Informatica Intelligent Streaming is built on the Informatica Intelligent Data Platform, leveraging a unified set of tools and services to help you effectively administer, monitor, and manage your deployment.
  • High Availability, Scalability and Architectural Flexibility – Informatica Intelligent Streaming supports high availability, automated failover configuration on commodity hardware (with no need for a shared file system), and guaranteed delivery of data.


  • Enable real-time operational intelligence with big data streaming analytics
  • Reduce time-to-value with increased productivity and rapid deployment
  • Deliver information at any latency with one flexible platform
  • Simplify configuration, deployment, administration, and monitoring of real-time streaming
  • Minimize risks associated with complex and evolving open source technologies