WhiteBox

Visit website

Write a Review Claimed Profile

We design Artificial Intelligence systems.

We are a Data Science consulting firm from Madrid with the following...

Values:

🔍 Transparency: continuous communication with the client showing progress and building together.

✅ Simplicity: “Simple is better“. Our solutions are simple, elegant, and easy to maintain.

💎 Quality: our projects stand out for the results obtained in time. We don't have a business development team, as our clients do that job for us.

🧙‍♂️ Excellence: we are recognized as one of the best consulting companies specialized in Data Science in Spain.

Spain

C/Carretas 14, Madrid, Madrid 28012

677022241

Featured Companies

Service Focus

Focus of Big Data & BI

Data Visualization - 5%
Data Mining - 5%
Data Analytics - 15%
Data Science - 15%
Predictive Analytics - 10%
Data Warehousing - 5%
Text Analytics - 10%
Data Migration - 10%
Data Discovery - 5%
Data Quality Management - 5%
Big Data - 15%

Industry Focus

Information Technology - 30%
Advertising & Marketing - 15%
Financial & Payments - 15%
Retail - 15%
Oil & Energy - 15%
Education - 10%

Client Focus

50% Large Business

30% Medium Business

20% Small Business

Detailed Reviews of WhiteBox

No reviews submitted yet.

Be the first one to review

Client Portfolio of WhiteBox

Project Industry

Healthcare & Medical - 18.8%
Telecommunication - 18.8%
Oil & Energy - 12.5%
Automotive - 6.3%
E-commerce - 12.5%
Advertising & Marketing - 12.5%
Retail - 18.8%

Major Industry Focus

Healthcare & Medical

Project Cost

Not Disclosed - 100.0%

Common Project Cost

Not Disclosed

Project Timeline

1 to 25 Weeks - 50.0%
26 to 50 Weeks - 43.8%
51 to 100 Weeks - 6.3%

Project Timeline

1 to 25 Weeks

Portfolios: 16

Digital speech therapist for children with autism

Client

Startup from the United States specialized in the application of technology to improve language. Project funded with funds from the NSF (National Science Foundation).

Description

Creation of a digital speech therapist (mobile app) that, using AI, is able to assess the level of language acquisition of children with autism spectrum disorders and guide them in the incorporation of new words and grammatical structures.

Results

A system was developed capable of predicting the level of language acquisition of any person (special focus on children with special needs) based on a few minutes of audio.
A diarization algorithm (detecting the different participants in a conversation) specialized in children was developed with a performance very similar to the best existing solutions for adults (AWS Transcribe).

Technology

Natural Language Processing with libraries like spaCy and NLTK.
Development of Deep Learning models with Tensorflow.
Voice characterization and Resemblyzer diarization and own algorithms.
Recommendation using probabilistic methods (Markov chains) and generative text models.

Web App

Analytical Engine for Researchers

Client

Research project developed in collaboration with public entities. Financing from the European Union within the framework of the H2020 program.

Description

Creation of a repository of data related by time and geographic location, with a special focus on interesting data for researchers in the health, pharmaceutical and insurance sectors.

Results

A microservices-based system was developed capable of extracting information from various sources and integrating it using a common schema.
Techniques were developed to normalize and cross the different data sources by date and location.
An interactive tool was developed that the end user can use to select and cross data sources.
An algorithm based on Natural Language Processing was developed capable of identifying the different sources of information in a free text and the relationship between them, used in the search system offered to users.

Technology

Data extraction and cleaning using pandas.
Storage in relational databases such as PostgreSQL and Oracle.
Natural Language Processing using spaCy and NLTK.

Web App

Mass processing of pharmaceutical data

Client

German pharmaceutical company with great international relevance.

Description

Big performance issue in data ingestion with Spark. Volume of several TB of daily information.

Results

Complete redesign of ingest pipelines, allowing to reduce computing time from several days to only a few hours.

Technology

Spark with Scala for data processing. Flume and Sqoop for intake. HDFS storage available using Hive's SQL engine. Big Data cluster with MapR technology.

Web App

Mobile operator benchmarking

Client

Company specialized in conducting network quality comparisons between different mobile operators.

Description

Development of a system capable of creating a global network quality metric for the different mobile operators.

Results

Massive data ingestion from instrumented vehicles. Antenna performance, packet loss, upload and download speeds, etc.
Calculation of quality and aggregation KPIs at the mobile operator and country level.

Technology

Ingest done with Spark with Java.
Data analysis with Impala and Hive.
Data dashboards with Tableau.

Web App

Call center call prediction

Client

One of the main telecommunications companies that provides service worldwide.

Description

The objective of the project is to predict the number of calls to the customer service call center, in order to decongest the service.

Results

A predictive model was developed capable of assigning a score to each customer with an AUC of 0.9, which allowed the company to proactively anticipate calls and make them at off-peak times.

Technology

Data processing with Spark.
Modeling performed with Spark MLib.

Web App

User experience on mobile networks

Client

Telecommunications company with great growth in Spain.

Description

Development of a user experience analysis system (CEM) for the mobile network.

Results

Massive data ingestion of 3G / 4G antennas.
Calculation of network quality KPIs and creation of dashboards.
Development of predictive models for leakage and customer complaints.
Creation of a surveillance system to detect problematic antennas.

Technology

Ingest made with Python and Impala.
Big Data cluster with Cloudera distribution.
Models made with scikit-learn, visualizations with Matplotlib and seaborn.
Dashboards in PowerBI.

Web App

Predictive maintenance of wind turbines

Client

Main manufacturer of wind turbines in Spain.

Description

Creation of our own predictive maintenance software that will improve the results of the current one implemented in the company (external commercial software).

Results

Development of a hybrid predictive maintenance system, based on Machine Learning (anomaly detection) and fatigue time series calculations, obtaining a better result than the commercial software used. Cost savings for the company from a business license of more than € 30k per year.

Technology

Treatment and study of fatigue with pandas.
Predictive model made with scikit-learn.
Performance optimization using Numba, Fortran 90 and NumPy.

Web App

Wind generation prediction

Client

Spanish energy company with a worldwide presence.

Description

Prediction of wind generation based on meteorological information (three-dimensional grids of wind speed, pressure, etc.).

Results

10 percentage point improvement compared to traditional prediction models. Greater profitability for our client by attending the US electricity auction.

Technology

Data ingestion with Spark (Scala) in Big Data cluster based on Hortonworks distribution. Modeled with scikit-learn and LightGBM. Orchestration and monitoring with Apache Airflow.

Web App

Analysis of the behavior of mobile app users

Client

One of the main Spanish startups for urban mobility services.

Description

Analysis and modeling of the behavior of the users of the mobile app, used by more than 0.5M users in the metropolitan area of Madrid.

Results

Development of an analysis of the mobile application registration funnel, resulting in the redesign of a part of the app and a spectacular increase (50% -> 80%) of users who complete the registration.
Development of an unsupervised model capable of inferring the work and home locations of the mobile app users based on geolocated events.

Technology

Data processing with Spark. Visualization with Tableau, Plotly and Matplotlib. Modeling with scikit-learn. Orchestration and monitoring with Apache Airflow. Cloud infrastructure on AWS.

Web App

Risk identification

Client

Spanish startup that offers Natural Language Processing products applied to the corporate sector.

Description

Development of a system capable of monitoring news from the main national economic media and identifying various types of risks.

Results

They were developed:

A system for extracting economic news from the main platforms. Currently available on DataMarket .
A Machine Learning model capable of identifying 7 different types of risk (economic, operational, regulatory, cybersecurity, etc.) in economic news. AUC greater than 0.8 in the evaluation set.
A Named Entities Recognition (NER) model capable of identifying up to 20 types of entities (companies, public entities, places of interest, dates, etc.).

Technology

Model made with Tensorflow.
Dockerized deployment using MLflow and Tensorflow Model Server.

Web App

Social Listening

Client

Leading market research and demographic company in Spain. EU investment in the project through the H2020 program.

Description

Creation of an opinion monitoring system in social networks.

Results

Development of a massive data extraction system from Twitter and Facebook and creation of Natural Language Processing algorithms capable of calculating the expertise of a user in certain topics of interest (politics, sports, economics, etc.).

Technology

NLP libraries like spaCy and fastText. Orchestration and monitoring of processes with Apache Airflow. Versioning and deployment of the models with MLflow. Cloud infrastructure on AWS.

Web App

CTR (Click Through Rate) prediction

Client

One of the main web traffic generation and monetization companies in Europe.

Description

Prediction of the CTR of Google and Bing ads based on the text used in the ads.

Results

Predictive model based on NLP algorithms to predict the CTR of an ad based on its characteristics.
Text recommendation system to maximize the CTR of the ads taking into account metadata such as category, country of destination, etc.

Technology

Data processing with Spark (aggregation of several billion ads).
Modeling performed with scikit-learn, spaCy, fastText, LightGBM.
Versioning and deployment of models with MLflow.
Orchestration and monitoring with Apache Airflow.
Optimal text recommendation app built with Flask.

Web App

CPC Prediction (Cost per Click)

Client

One of the main web traffic generation and monetization companies in Europe.

Description

Prediction of the CPC (Cost per Click) of a huge set of keywords from Google and Bing ads.

Results

Predictive model based on NLP algorithms with two stages:

Prediction of the number of impressions.
CPC prediction.

Technology

Data processing with Spark (aggregation of several billion ads with a variety of keywords).
Modeling performed with spaCy, fastText, LightGBM.
Versioning and deployment of models with MLflow.
Orchestration and monitoring with Apache Airflow.

Web App

Demand prediction

Client

Main chain of DIY and construction products in Spain.

Description

Prediction of demand for more than 900 references, including products with very seasonal and sporadic sales.

Results

Predictive model of demand for 2 months for more than 900 references, for 30 different stores. Good performance of the model measured in different ways (R2, RMSE, MAPE).

Technology

Data processing with Spark (aggregation of more than 1 Billion records to calculate weekly sales by reference). Modeling carried out with scikit-learn and LightGBM.

Web App

Classification of customer service tickets

Client

Main chain of bookstores in Spain.

Description

Creation of a system capable of classifying in 20 different categories and automatically solving customer service tickets.

Results

A system based on Natural Language Processing was developed capable of classifying tickets created by users (free text) into 20 different categories (defective order, late delivery, etc.).
A logic was implemented by which those tickets in which the model had greater confidence in the prediction were automatically solved, reserving the most ambiguous and complex tickets for treatment by the workers.

Technology

Natural Language Processing with libraries like spaCy and NLTK.
Information processing with pandas.

Web App

Analysis of consumer behavior

Client

Main retail company in Spain and Europe.

Description

Creation of a video analysis system capable of automatically locating and counting the number of customers in the vicinity and inside the stores, as well as in potential locations of new establishments.

Outcome

Using Computer Vision techniques in Google Edge TPU, a prototype of a device deployed in stores and outdoors was created, capable of counting the number of people while respecting privacy.
The counts were analyzed to develop heat maps of the interior of the stores, allowing an improvement in sales by redistributing the products.

Technology

Computer Vision model capable of detecting people and counting them made with Tensorflow Lite.
Deployment on Google Edge TPU Dev Board device.

Web App

WhiteBox

Featured Companies

Instinctools

EffectiveSoft

Packetlabs

Anadea

GroupBWT

Pwrteams

Service Focus

Industry Focus

Client Focus

Detailed Reviews of WhiteBox

Client Portfolio of WhiteBox

Project Industry

Major Industry Focus

Project Cost

Common Project Cost

Project Timeline

Project Timeline

Portfolios: 16

Digital speech therapist for children with autism

Analytical Engine for Researchers

Mass processing of pharmaceutical data

Mobile operator benchmarking

Call center call prediction

User experience on mobile networks

Predictive maintenance of wind turbines

Wind generation prediction

Analysis of the behavior of mobile app users

Risk identification

Social Listening

CTR (Click Through Rate) prediction

CPC Prediction (Cost per Click)

Demand prediction

Classification of customer service tickets

Analysis of consumer behavior