Product Overview
Dataiku is the all-in-one data science and machine-learning platform that brings everyone together to drive transformative business impact.Dataiku is the platform for Everyday AI, systemizing the use of data for exceptional business results. Organizations that use Dataiku elevate their people (whether technical and working in code or on the business side and low- or no-code) to extraordinary, arming them with the ability to make better day-to-day decisions with data. More than 450 companies worldwide use Dataiku to systemize their use of data and AI, driving diverse use cases from fraud detection to customer churn prevention, predictive maintenance to supply chain optimization, and everything in between.
Specifications
Data Preparation
The Dataiku visual flow allows coders and non-coders alike to easily build data pipelines with datasets, recipes to join and transform datasets, and the ability to build predictive models.
The visual flow also has code and reusable plugin elements for customization and advanced functions.
Visualization
Dataiku saves time with quick visual analysis of columns, including the distribution of values, top values, outliers, invalids, and overall statistics.
For categorical data, the visual analysis includes the distribution by value, including the count and % of values for each value.
Machine Learning
To aid in the feature engineering process, Dataiku AutoML automatically fills missing values and converts non-numeric data into numerical values using well-established encoding techniques.
Users can also create new features using formulas, code, or built-in visual recipes to provide additional signals to improve model accuracy. Once created, Dataiku stores feature engineering steps in recipes for reuse in scoring and model retraining.
DataOps
Dataiku projects are the central place for all work and collaboration for users. Each Dataiku project has a visual flow, including the pipeline of datasets and recipes associated with the project.
Users can view the project and associated assets (like dashboards), check the project’s overall status, and view recent activity.
MLOps
The Dataiku unified deployer manages project files’ movement between Dataiku design nodes and production nodes for batch and real-time scoring. Project bundles package everything a project needs from the design environment to run on the production environment.
With Dataiku, data scientists can see all the deployed bundles, and data engineers of IT operations can quickly know when a new bundle requires testing and roll-out.
Analytic Apps
Dataiku makes it easy to create project dashboards and share them with business users. Scheduling updates for dashboards or triggering updates is easy and ensures the latest information is available.
With dashboards as part of a Dataiku project, business users and project stakeholders can easily see the outputs of AI projects and track KPIs and value.
Collaboration
Real advanced analytics projects require a series of steps that transform data from one state to the next, resulting in new datasets, features, metrics, charts, dashboards, predictive models, and applications.
The Dataiku visual flow is the canvas where teams collaborate on data projects. With the visual flow, everyone on the team can use common objects and visual language to describe the step-by-step approach and document the entire data process for future users.
Governance
Dataiku permissions control who on the team can access, read, and change a project. Permissions also include creating projects, executing code, executing applications, reading only content, and more. With Dataiku, users can belong to more than one group and have different permissions across projects, or organizations can have global permissions.
Explainability
Dataiku provides critical capabilities for explainable AI, including reports on feature importance, partial dependence plots, subpopulation analysis, and individual prediction explanations.
Together, these techniques can help explain how a model makes decisions and enable data scientists and key stakeholders to understand the factors influencing model predictions.
Architecture
Dataiku can run on-premise or in the cloud — with supported instances on Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure — integrating with storage and various computational layers for each cloud.
Plugins and Connectors
DSS plugins let you extend the power of DSS with your own datasets, recipes, and processors!
Visualization
AutoML
Build optimized models with minimal intervention (create a predictive model in just 3 clicks) with Dataiku’s powerful automated machine learning engine
Visual flow
Simplify collaboration and explainability of data workflows (no matter how big or complex) with Dataiku’s unique visual flow
Deployment
Put models in production with Dataiku’s built-in API Deployer, making high availability and scalable deployments easy.
Data connectors
Get instant access to any data source with 25+ native data connectors across cloud, on-premises databases, and enterprise applications.
Kubernetes
Spin up Kubernetes clusters (AKS, GKE, or EKS) from the Dataiku interface and scale up/down compute resources on-demand.
Deep Learning
Access deep learning capabilities (including training advanced neural networks in a few clicks!) in Dataiku’s visual machine learning environment.
Automation node
Separate development and production environments, plus easily deploy, update, and manage live projects.
Monitoring
Monitor the behavior and overall functional health of Dataiku to ensure production readiness and optimize resource allocation.
90+ data transformers
Scale transformation pipelines by running fully in-database (SQL) or in-cluster (Spark, Hadoop).
Cleanse, normalize, enrich
Cleanse, normalize, and enrich data with the visual Prepare Recipe.
Scenarios
Automate actions and workflows in Dataiku to leverage powerful scheduling capabilities.
Notebooks
Coders can feel at home with Dataiku’s native notebook environment for exploratory or experimental work.
Dashboards
Publish and share insights from data projects with other users and business stakeholders with custom dashboards.
Dataiku Applications
Empower more people within the organization to leverage AI and self-service analytics by visually designing and packaging data projects as reusable applications.
Spark
Dataiku lets you use a Spark engine to run visual recipes, execute code, train machine learning models, and more.
Pushdown execution
Optimize dataflow execution by pushing down Dataiku’s ETL and ML power to the database where the data lies.
Triggers
Automatically trigger scenarios in Dataiku, which can be configured based on time, dataset alterations, or any custom trigger.
Interactive statistics
Perform exploratory data analysis (EDA) in a dedicated visual interface built for advanced statisticians or anyone looking to uncover data patterns & relationships.
Charts
Pick from over 25 built-in chart formats or custom charts to share insights with others.
Version control
Version projects with Dataiku’s built-in, Git-based version control and get complete traceability of every action
Wikis
Track progress and collaborate on project goals and specifications by documenting relevant information
REST API
Interact with Dataiku from any external system — unlock AI insights and access admin controls from prefered applications.
Plugins
Choose from 100+ plugins in the Dataiku marketplace to go beyond built-in capabilities, supporting out-the-box solutions for a variety of use cases.
Time Series
Prepare and analyze time series data with Dataiku’s dedicated time series capabilities.
Processing engine
Leverage Dataiku’s flexible and highly scalable engine for optimal execution of Spark or in-database (SQL) jobs.
Python
Feel at home when working in Python with native integration and notebook-style coding environment.
Work natively in R with Dataiku’s deep integration, including a comprehensive R API.
Connection security
Secure connection to external systems with granular admin capabilities.
Environments
Create and work in standalone and self-contained environments to run Python or R code.
Metrics
Automatically measure indicators on elements of the workflow like datasets (e.g., size/shape), managed folders, and saved models (e.g., performance).