Data Mining
Data Mining is the process of discovering new patterns from large data sets involving methods from artificial intelligence and also classical statistics. The focus of data mining is discovery, detecting something new you did not know before.
We offer professional data mining of large scale data with genuine data mining tools in the whole depth and breadth: From Enterprise Miner 12.1, over sophisticated statistical approaches on a high level, for example decision trees, time series analysis, (multiple) regression, or segmentation (see e.g. Six Sigma, Advanced Analytics), to the planning and implementation of flexible SQL queries, and visual analysis and communication.
Is there something you always -really- wanted to know?
Examples of the modelling of business processes and goals (including from finance, manufacturing, medicine, CRM/marketing and telco):
- Agriculture: Would you like to know the only significant predictors for the best wine in a year? A data miner found out a formula that still out-predicts wine experts in determining the best vintages.
- Retail: If a data miner could have a peek into a shopping basket (SMCG, FMCG), do you think s/he could tell what this customer will buy next? Not only much better than just by guessing, but it also helps much to identify and profile your A customer.
- Selling (cross/up sale): If you had a previous peek in a shopping basket, then you are more likely to be successful selling related products to the same customer, or even the same product set to other, but similar customers.
- Manufacturing: Are production and product quality still "green"? Did you already identify factors responsible for defects?
- Catalogue Marketing: Can I identify patterns among customers and identify the most likely customers to respond to upcoming mailing campaigns?
- Health Care: Mined data could support doctors in their decision-making or even expand their expertise to deliver the best quality treatment.
- Customer Profiling: Do you really know your customers? Find out your customer's desired products or information, rank all your customers by value, address them effectively in mail campaigns, and predict future trends in sales and sales growth (time to churn, time to event).
- Finance: Create customer profiles or classifications (types) by analysis of spending behaviour (e.g. by credit card holders), and predict customer loss rate (classification and segmentation)
- Controlling: Why always wait for enterprise-wide disparate data files, which in turn causes delayed reports? Why not implement an automated KPI system, that delivers automated reports including the results of automated mining on now integrated data.
- Telco: Can I model sales forecasts or predictions of contract extensions (e.g. handsets)?
- Risk: Can I determine risk? How to detect and predict fraud in the business? What could I do if this customer, product or service has this or that features?
- R&D: How to recognize patterns in large scale data, e.g. pharma data, media data, engineering data, social data? The same question, the same answer: Data Mining.
-
Typical Data Mining Approaches:
- Clustering and Segmentation: Two-Stage, k-nearest neighbours (KNN), SOM/Kohonen, Binning, Market Basket, Variable Clustering, Discriminant Analysis etc.
- Causal Modelling: Path Analysis, Neural Network, AutoNeural, Gradient Boosting, Decision Tree, Random Forest, Multilayer Perceptron etc.
- Regression Modelling: e.g. Dmine Regression, DMNeural, LARS, PLS, Regression (multiple, binary, logistic, ordinal, multinomial), Survival etc.
- Text Mining: MapReduce (PROC HADOOP), Text Mining by SPSS Modeller; Visual Analytics by IBM COGNOS (a/k/a 'Many Eyes'); Analysis of unstructured texts using Word Trees, Tag Clouds, Phrase Nets, and HISTORIO.
-
Pre-Processing (beyond EM):
- Values: Imputation (median/mean, EM/FIML, regression, cold/hot deck), deleting, collapsing, flagging etc.
- Rows: Filtering, Indexing, Cleansing, Sampling, Combining, collapsing of categorical input vars etc.
- Columns (condensing): Combining (PCA, regression, factor analysis), univariate correlation with target (Spearman, Hoeffding), further reduction by clustering of input variables.
- Best Model: Stepwise var selection, e.g. by backward regression (depending on data load and processing environment).
- Other data-driven techniques: Subselecting by random and structure (strata), trees, honest assessment of classifier performance, validation, AUC/ROC, cut-off calibration, lift etc. Data-driven techniques do not replace subject matter knowledge-driven validation and testing.
We gladly advise you on benefits and best ways of data mining and multivariate analyses, and plan and conduct professional analyses according to your requirements and specifications. If you could have us a look into your data, we could answer these and other questions. We help you to get to know your customers, products and services. We can advise and apply expert analyses according to your requirements and specifications including possibly necessary steps of integration, preparation, checking (cleansing), formatting, and (finally) analysing. If you want us to.