The CRISP DM data mining methodology

An overview of the CRISP DM methodology

Our approach for developing a solution is driven by an integrated methodology offering well-established project management and technical delivery methods. These methods provide the framework for managing the project team and helping to keep the whole project on track. Smart Vision’s project management methods address key management concerns and potential impacts to scope, schedule, and cost, and are designed to provide the identified deliverable materials. The key processes in Smart Vision’s project management methods are as follows:

  • Scope and change management
  • Project planning and tracking
  • Risk management
  • Deliverable Materials management
  • Configuration management
  • Issues and actions management
  • Quality management

Smart Vision strictly follows the cross-industry process for data mining (CRISP-DM). The CRISP-DM methodology provides a structured approach to planning a data mining and predictive analytics project. It is a robust and well-proven methodology.

We do not claim any ownership over it. We did not invent it. We are however evangelists of its powerful practicality, its flexibility and its usefulness when using analytics to solve thorny business issues. It is the golden thread that runs through almost every client engagement. The CRISP-DM model is shown below.

This model is an idealized sequence of events. In practice many of the tasks can be performed in a different order and it will often be necessary to backtrack to previous tasks and repeat certain actions. The model does not try to capture all possible routes through the data mining process.

CRISP-DM focuses data mining on model deployment that delivers quantifiable business returns.

  • Business understanding—achieve a clear understanding of your business challenges
  • Data understanding—determine what data is available to solve your business needs
  • Data preparation—prepare the data in a format to answer your business questions
  • Solution—design data models/solutions to meet your requirements
  • Evaluation—test the results against the goals of the project
  • Deployment—deploy the solution and/or results throughout your organization 

Data-driven model vs. process-driven model

In predictive analytics projects, it is becoming increasingly common to employ data-driven models where process-based models may not fully describe the processes in operational situations.

In some cases, a hybrid of physical and statistical models may be required to solve certain problems. The data-driven models build relationships between input and output data, without worrying too much about the underlying processes, using statistical/machine learning techniques.

On the other hand, the process-driven models are based on well-established business processes.  Data-driven models have the advantage of built in error terms. A large amount of data is used to estimate the parameters to fit the model between the input and output data. Errors can be quantified, and confidence levels can be estimated. 

Meet specific needs with customized solutions

Our goal is the same as yours: to provide an efficient and effective solution to resolve your business issues. Smart Vision's expertise, coupled with our proven approach to predictive analytics, enables you to complete the project more quickly and meet even the most aggressive deadlines. Based on the preferences and needs, we follow the below milestones to implement data-driven analytics

  1. Collect: gather and input data

For too long, data has been held captive within our systems of record -- isolated by the rigid platform choices, segregated business functions, and data types. The result is splintered views and difficulty accessing data, making it impossible to attempt to gain true analytical insight. And this is just today. The challenges are made worse as businesses look to evolve. Data science, machine learning, and deep learning are made moot by the fact that insights are only as good as the access to supporting. ​

At Smart Vision, we believe that in order to change this, you need a proper hybrid data management strategy. One that ensures: ​

  • Access to all data regardless of source or type​
  • Flexibility to support changing workloads and consumption cases​
  • Intelligent analytics such as machine learning can run…at source of data​
  • Access to insights across the business, its functions, and to all users for better decision making​
  1. Organize: store and organize your data to be ready for analytics

Many businesses underestimate the pitfalls poor data can create. In search of instant insights, businesses prioritize data access and analysis over data governance and quality. Without ensuring data is trustworthy, complete and consistent, leaders can’t be confident they are extracting full value from their data. More alarmingly, if an organization doesn’t know what data they have and how they are using it – they can be subject to regulatory non-compliance challenges. ​

At Smart Vision, we’re helping clients achieve digital transformation by building a unified data governance and integration foundation to deliver trusted, business-ready data. One that helps: ​

  • Profile, cleanse, and catalog all types of data​
  • Manage and prepare data
  • Manage data at all stages of the information lifecycle – from planning to deployment​
  • Build the master record for customers for a C360 view.
  1. Analyze: build data science and machine learning models

More than ever, businesses see the potential of data and analytics. And many are actively investing in new capabilities. However, hasty investments can create some serious issues in tools, people, and process. Point solutions create complexity in integration, maintenance and support, leading to increased technical challenges and costs. The surging demand for data scientists makes hiring and retention a huge challenge. And a lack of a unified analytics platform often results in poor ROI and frustration on the part of leaders looking for more positive impact. ​ ​

At Smart Vision, we use a set of flexible environments and tools based on open source and IBM technologies that make analyzing data and building data mining, statistical, machine learning and data science models easier and more accessible. These tools can be used to:

  • Create more accurate plans, budgets and forecasts​
  • Report, model, analyze and manage data on a single platform with interactive dashboards, driven by AI​
  • Explore and analyze structured and unstructured data with a leading-edge cognitive exploration and content analysis platform​
  • Build accurate predictive visual models without programming ​
  • Deploy those models anywhere, securely, so they can operationalize data science faster​
  1. Deploy: deploy predictive analytics and data science models

Predictive analytics and data science practices have evolved to a point where organizations of all sizes are actively experimenting to inject predictive insight into business. Yet, moving from experimentation to production has remained a challenge.

Our solution helps data scientists and developers work together to accelerate the process of moving to deployment and integrate machine learning and predictive analytics into their applications. By simplifying, accelerating and governing ML deployments, it enables organizations to harness machine learning and deep learning to deliver business value.

  1. Trust: trust the predictive analytics and machine learning

As organizations look to transition their analytics models to be widely used across the business, they run into a barrier: how to trust the performance of their analytics-enabled applications. That’s especially true as data-driven analytics become infused into more parts of their business. For all the potential of analytics…if the data used to train the models has unfair bias, and the resulting recommendations aren’t transparent and trusted, then data-driven analytics won’t be embraced and used at scale.

Organizations embracing data-driven analytics can have hundreds of experiments and pilots happening across their enterprise. Each one potentially built using different tools and running in different environments, due to the teams’ preference or to avoid vendor lock-in. Whatever the reason for this current reality, as leaders look to deploy and manage all those models across their workflows, the time and talent required can become prohibitive. ​

At Smart Vision, we’ve anticipated this barrier to scaling enterprise data-driven analytics and developed products to help clients operationalize analytics faster.

  • Detect and proactively mitigate bias, to ensure the performance of models is fair​
  • Allow users to see how and why models make the recommendations they do (explain ability) ​
  • And trace the lineage of the data and training used to build models ​
  • Work with the most commonly used build and runtime environments – across public and private clouds – to ensure an open, integrated workspace ​
  • Automate design, monitoring, and continuous improvement of models to reduce the burden on teams​
  • And integrate models into workflows and process automation platforms​

Our response focuses not only on addressing the requirements expressed in the RFP, but also the wider context and approach to build, sustain and evolve the enterprise-level trusted data hub and analytics platform both now and into the future to utilize machine learning and data science to improve the banking services.