Dataiku competes in the data science and machine learning platforms. It has been present in the Gartner Leader’s quadrant, for Data Science and Machine Learning Platforms, for the last two years. On a scale of 5.0, Dataiku has been rated as 4.9 on parameters such as customer experience, integration, support, product capabilities. Dataiku has been well endorsed by the end user community and Gartner.
“Each and every customer out there, they are all looking at extracting insights from data they have collected,” says Siddhartha Bhatia, Regional Vice President, Middle East and Turkey, Dataiku.
Customers build data lakes and data warehouses and what they want is an end-to-end data science platform, where everything is standardised. “This is where we define what end-to-end really is. We really are a very complete software platform,” he adds.
Data science as a subject has five building blocks. These include:
- Clean and wrangle: This is the cleansing and wrangling of data. How do you connect with any data source whether structured, unstructured, semi-structured data, data at rest, or data in motion?
- Build and apply machine learning models: This can be done in a visual manner, which means that people who do not know anything about coding, like the business user or the business analyst, they can drag and drop and create automated machine learning models.
- Mining and visualisation: Dataiku is unique since one platform addresses both personas, the business side and the technical side. It has all the open-source libraries, that can be accessed by the data scientist, as well as dashboards and tables.
- Deploy to production: Once the prototype has been created, Dataiku makes it easy to move into production, also called operationalisation of models.
- Monitor and adjust: Dataiku provides one single interface to monitor the various models that have been deployed, to see the drift between the real data and the model and adjust the drift.
Dataiku connects to any data, takes a sample of the data, applies cleansing and wrangling to the data, builds and applies the machine learning models, and pushes the output through an integrated business application for consumption by the end user.
The typical data science process is to build a model and train it from a certain data set using a designer. Once you are ready to deploy you move it into production that can be enabled to response through an API or through automation, depending on the use case.
Traditionally data science models were built using straight coding like Python and R, and the process could take weeks and months. The next challenge would be moving the mass of code into production also called operationalisation.
Using Dataiku, productionising of models is rapid and straight forward. “Customers are able to get models from the design stage and development stage into production within a couple of hours,” says Bhatia.
Dataiku is in a unique position since it is one single platform across the five building blocks of data science. “What we offer is one license that addresses all the five circles end to end,” he adds.
Typically, Dataiku works well where an end customer has a business challenge that needs to be solved, through multiple disparate data sources that need to be integrated.
Most large organisations and enterprises typically have the same challenges. The first challenge is the long baseline of historical data that gets saved in multiple applications and multiple databases. The second challenge is the nature of the custodian of the data. Sometimes it is IT, sometimes it is business, sometimes another department. The third challenge are the processes around data consolidation and data sharing.
“These are large organisations with multiple stakeholders, multiple people, multiple personas. What usually happens in a large organisation is, that these silos get created,” points out Bhatia. “Moreover, people are not ready to share the data. That is where the problem is. Siloed-data, siloed-people, siloed-processes.”
“We have seen a pattern where people have collected their data, but they are not getting insights. And for insights, they need a data science platform,” he adds.
How do you explain technical concepts to business audiences? There is a silo there, and a lack of collaboration amongst different teams. Business is putting a lot of pressure, to deliver projects faster. It is also difficult to move, data science projects from a lab environment to a production environment.
In order to scale machine learning inside an organisation you need to thread together data, technology and people. And that thread is provided by Dataiku to enable and achieve successful end to end integration.
“We have the ability to connect to any kind of infrastructure,” continues Bhatia citing various examples. This could be a Snowflake database; a traditional data warehouse like Vertica or SAP HANA; the cloud; Docker based environment; Azure platform or Google platform; data lake powered by Hortonworks or Cloudera; industrial control systems. “We can connect to any kind of data,” he reinforces.
Dataiku is positioned as one platform for everybody in the organisation. And that is a unique positioning in the market with one single, collaborative, governable, auditable environment.
With increasing availability of cloud-native offerings, another trend is the move to cloud. “We have seen a market shift and everybody’s preferring a cloud-based platform. That is a reason why on-premises are sort of fading away,” remarks Bhatia.
Another strength of Dataiku is that the application embraces open source and makes available its libraries. While this is a strength, end-users also expect elements of governance, lineage, security, and making sure that everything is happening in a very auditable environment.
“If I had to summarise it in one line, I would say, we make open-source enterprise-ready for our customers,” reflects Bhatia.
Key takeaways
- Dataiku has been well endorsed by the end user community and Gartner.
- Each and every customer is looking at extracting insights from data they have collected.
- The typical data science process is to build a model and train it from a certain data set using a designer.
- Once you are ready to deploy you move it into production that can be enabled through automation.
- Traditionally data science models were built using straight coding like Python and R.
- It is difficult to move, data science projects from a lab environment to a production environment.
- A typical challenge is moving the mass of code into production also called operationalisation.
- Using Dataiku, customers are able to get models from the development stage into production within a couple of hours.
- Dataiku works well where an end customer has a business challenge that needs to be solved, through multiple disparate data sources.
- Business is putting a lot of pressure, to deliver projects faster.
- In order to scale machine learning inside an organisation you need to thread together data, technology and people.
- Dataiku is positioned as one platform for everybody in the organisation.
Large organisations have multiple stakeholders, multiple people, multiple personas, silos get created, and people are not ready to share the data.