We are a Data Science consulting firm from Madrid with the following...
Values:
🔍 Transparency: continuous communication with the client showing progress and building together.
✅ Simplicity: “Simple is better“. Our solutions are simple, elegant, and easy to maintain.
💎 Quality: our projects stand out for the results obtained in time. We don't have a business development team, as our clients do that job for us.
🧙♂️ Excellence: we are recognized as one of the best consulting companies specialized in Data Science in Spain.
🕶️ End-to-End capabilities: we are capable of undertaking projects throughout the data cycle, from its extraction to the deployment of Artificial Intelligence models to production.
✨ We don’t sell snake oil: we are engineers that design solutions that work, not presentations. We don’t have a sales department, our happy clients do that job for us.
Focus Areas
Service Focus
- Big Data & BI
- Artificial Intelligence
- IT Services
- Cloud Computing Services
Client Focus
- Large Business
- Medium Business
- Small Business
WhiteBox Clients & Portfolios
Client
Startup from the United States specialized in the application of technology to improve language. Project funded with funds from the NSF (National Science Foundation).
Description
Creation of a digital speech therapist (mobile app) that, using AI, is able to assess the level of language acquisition of children with autism spectrum disorders and guide them in the incorporation of new words and grammatical structures.
Results
A system was developed capable of predicting the level of language acquisition of any person (special focus on children with special needs) based on a few minutes of audio.
A diarization algorithm (detecting the different participants in a conversation) specialized in children was developed with a performance very similar to the best existing solutions for adults (AWS Transcribe).
Technology
Natural Language Processing with libraries like spaCy and NLTK.
Development of Deep Learning models with Tensorflow.
Voice characterization and Resemblyzer diarization and own algorithms.
Recommendation using probabilistic methods (Markov chains) and generative text models.
Client
Research project developed in collaboration with public entities. Financing from the European Union within the framework of the H2020 program.
Description
Creation of a repository of data related by time and geographic location, with a special focus on interesting data for researchers in the health, pharmaceutical and insurance sectors.
Results
A microservices-based system was developed capable of extracting information from various sources and integrating it using a common schema.
Techniques were developed to normalize and cross the different data sources by date and location.
An interactive tool was developed that the end user can use to select and cross data sources.
An algorithm based on Natural Language Processing was developed capable of identifying the different sources of information in a free text and the relationship between them, used in the search system offered to users.
Technology
Data extraction and cleaning using pandas.
Storage in relational databases such as PostgreSQL and Oracle.
Natural Language Processing using spaCy and NLTK.
Client
German pharmaceutical company with great international relevance.
Description
Big performance issue in data ingestion with Spark. Volume of several TB of daily information.
Results
Complete redesign of ingest pipelines, allowing to reduce computing time from several days to only a few hours.
Technology
Spark with Scala for data processing. Flume and Sqoop for intake. HDFS storage available using Hive's SQL engine. Big Data cluster with MapR technology.
Client
Company specialized in conducting network quality comparisons between different mobile operators.
Description
Development of a system capable of creating a global network quality metric for the different mobile operators.
Results
Massive data ingestion from instrumented vehicles. Antenna performance, packet loss, upload and download speeds, etc.
Calculation of quality and aggregation KPIs at the mobile operator and country level.
Technology
Ingest done with Spark with Java.
Data analysis with Impala and Hive.
Data dashboards with Tableau.
Client
One of the main telecommunications companies that provides service worldwide.
Description
The objective of the project is to predict the number of calls to the customer service call center, in order to decongest the service.
Results
A predictive model was developed capable of assigning a score to each customer with an AUC of 0.9, which allowed the company to proactively anticipate calls and make them at off-peak times.
Technology
Data processing with Spark.
Modeling performed with Spark MLib.
Client
Telecommunications company with great growth in Spain.
Description
Development of a user experience analysis system (CEM) for the mobile network.
Results
Massive data ingestion of 3G / 4G antennas.
Calculation of network quality KPIs and creation of dashboards.
Development of predictive models for leakage and customer complaints.
Creation of a surveillance system to detect problematic antennas.
Technology
Ingest made with Python and Impala.
Big Data cluster with Cloudera distribution.
Models made with scikit-learn, visualizations with Matplotlib and seaborn.
Dashboards in PowerBI.
Client
Main manufacturer of wind turbines in Spain.
Description
Creation of our own predictive maintenance software that will improve the results of the current one implemented in the company (external commercial software).
Results
Development of a hybrid predictive maintenance system, based on Machine Learning (anomaly detection) and fatigue time series calculations, obtaining a better result than the commercial software used. Cost savings for the company from a business license of more than € 30k per year.
Technology
Treatment and study of fatigue with pandas.
Predictive model made with scikit-learn.
Performance optimization using Numba, Fortran 90 and NumPy.
Client
Spanish energy company with a worldwide presence.
Description
Prediction of wind generation based on meteorological information (three-dimensional grids of wind speed, pressure, etc.).
Results
10 percentage point improvement compared to traditional prediction models. Greater profitability for our client by attending the US electricity auction.
Technology
Data ingestion with Spark (Scala) in Big Data cluster based on Hortonworks distribution. Modeled with scikit-learn and LightGBM. Orchestration and monitoring with Apache Airflow.
Client
One of the main Spanish startups for urban mobility services.
Description
Analysis and modeling of the behavior of the users of the mobile app, used by more than 0.5M users in the metropolitan area of Madrid.
Results
Development of an analysis of the mobile application registration funnel, resulting in the redesign of a part of the app and a spectacular increase (50% -> 80%) of users who complete the registration.
Development of an unsupervised model capable of inferring the work and home locations of the mobile app users based on geolocated events.
Technology
Data processing with Spark. Visualization with Tableau, Plotly and Matplotlib. Modeling with scikit-learn. Orchestration and monitoring with Apache Airflow. Cloud infrastructure on AWS.
Client
Spanish startup that offers Natural Language Processing products applied to the corporate sector.
Description
Development of a system capable of monitoring news from the main national economic media and identifying various types of risks.
Results
They were developed:
A system for extracting economic news from the main platforms. Currently available on DataMarket .
A Machine Learning model capable of identifying 7 different types of risk (economic, operational, regulatory, cybersecurity, etc.) in economic news. AUC greater than 0.8 in the evaluation set.
A Named Entities Recognition (NER) model capable of identifying up to 20 types of entities (companies, public entities, places of interest, dates, etc.).
Technology
Model made with Tensorflow.
Dockerized deployment using MLflow and Tensorflow Model Server.
Client
Leading market research and demographic company in Spain. EU investment in the project through the H2020 program.
Description
Creation of an opinion monitoring system in social networks.
Results
Development of a massive data extraction system from Twitter and Facebook and creation of Natural Language Processing algorithms capable of calculating the expertise of a user in certain topics of interest (politics, sports, economics, etc.).
Technology
NLP libraries like spaCy and fastText. Orchestration and monitoring of processes with Apache Airflow. Versioning and deployment of the models with MLflow. Cloud infrastructure on AWS.
Client
One of the main web traffic generation and monetization companies in Europe.
Description
Prediction of the CTR of Google and Bing ads based on the text used in the ads.
Results
Predictive model based on NLP algorithms to predict the CTR of an ad based on its characteristics.
Text recommendation system to maximize the CTR of the ads taking into account metadata such as category, country of destination, etc.
Technology
Data processing with Spark (aggregation of several billion ads).
Modeling performed with scikit-learn, spaCy, fastText, LightGBM.
Versioning and deployment of models with MLflow.
Orchestration and monitoring with Apache Airflow.
Optimal text recommendation app built with Flask.
Client
One of the main web traffic generation and monetization companies in Europe.
Description
Prediction of the CPC (Cost per Click) of a huge set of keywords from Google and Bing ads.
Results
Predictive model based on NLP algorithms with two stages:
Prediction of the number of impressions.
CPC prediction.
Technology
Data processing with Spark (aggregation of several billion ads with a variety of keywords).
Modeling performed with spaCy, fastText, LightGBM.
Versioning and deployment of models with MLflow.
Orchestration and monitoring with Apache Airflow.
Client
Main chain of DIY and construction products in Spain.
Description
Prediction of demand for more than 900 references, including products with very seasonal and sporadic sales.
Results
Predictive model of demand for 2 months for more than 900 references, for 30 different stores. Good performance of the model measured in different ways (R2, RMSE, MAPE).
Technology
Data processing with Spark (aggregation of more than 1 Billion records to calculate weekly sales by reference). Modeling carried out with scikit-learn and LightGBM.
Client
Main chain of bookstores in Spain.
Description
Creation of a system capable of classifying in 20 different categories and automatically solving customer service tickets.
Results
A system based on Natural Language Processing was developed capable of classifying tickets created by users (free text) into 20 different categories (defective order, late delivery, etc.).
A logic was implemented by which those tickets in which the model had greater confidence in the prediction were automatically solved, reserving the most ambiguous and complex tickets for treatment by the workers.
Technology
Natural Language Processing with libraries like spaCy and NLTK.
Information processing with pandas.
Client
Main retail company in Spain and Europe.
Description
Creation of a video analysis system capable of automatically locating and counting the number of customers in the vicinity and inside the stores, as well as in potential locations of new establishments.
Outcome
Using Computer Vision techniques in Google Edge TPU, a prototype of a device deployed in stores and outdoors was created, capable of counting the number of people while respecting privacy.
The counts were analyzed to develop heat maps of the interior of the stores, allowing an improvement in sales by redistributing the products.
Technology
Computer Vision model capable of detecting people and counting them made with Tensorflow Lite.
Deployment on Google Edge TPU Dev Board device.