WhiteBox

We design Artificial Intelligence systems.

Visit website
Write a Review
Verified Profile

We are a Data Science consulting firm from Madrid with the following...

Values:

🔍 Transparency: continuous communication with the client showing progress and building together.

✅ Simplicity: “Simple is better“. Our solutions are simple, elegant, and easy to maintain.

💎 Quality: our projects stand out for the results obtained in time. We don't have a business development team, as our clients do that job for us.

🧙‍♂️ Excellence: we are recognized as one of the best consulting companies specialized in Data Science in Spain.

🕶️ End-to-End capabilities: we are capable of undertaking projects throughout the data cycle, from its extraction to the deployment of Artificial Intelligence models to production.

✨ We don’t sell snake oil: we are engineers that design solutions that work, not presentations. We don’t have a sales department, our happy clients do that job for us.

$25 - $49/hr
2 - 9
2020
Locations
Spain
C/Carretas 14, Madrid, Madrid 28012
677022241

Focus Areas

Service Focus

30%
30%
25%
15%
  • Big Data & BI
  • Artificial Intelligence
  • IT Services
  • Cloud Computing Services

Client Focus

50%
30%
20%
  • Large Business
  • Medium Business
  • Small Business

Industry Focus

30%
15%
15%
15%
15%
10%
  • Information Technology
  • Advertising & Marketing
  • Financial & Payments

WhiteBox Clients & Portfolios

Digital speech therapist for children with autism
View Portfolio
Digital speech therapist for children with autism
  • Digital speech therapist for children with autism screenshot 1
Not Disclosed
13 weeks
Healthcare & Medical

Client

Startup from the United States specialized in the application of technology to improve language. Project funded with funds from the NSF (National Science Foundation).

Description

Creation of a digital speech therapist (mobile app) that, using AI, is able to assess the level of language acquisition of children with autism spectrum disorders and guide them in the incorporation of new words and grammatical structures.

Results

  • A system was developed capable of predicting the level of language acquisition of any person (special focus on children with special needs) based on a few minutes of audio.

  • A diarization algorithm (detecting the different participants in a conversation) specialized in children was developed with a performance very similar to the best existing solutions for adults (AWS Transcribe).

Technology

  • Natural Language Processing with libraries like spaCy and NLTK.

  • Development of Deep Learning models with Tensorflow.

  • Voice characterization and Resemblyzer diarization and own algorithms.

  • Recommendation using probabilistic methods (Markov chains) and generative text models.

Analytical Engine for Researchers
View Portfolio
Analytical Engine for Researchers
  • Analytical Engine for Researchers screenshot 1
Not Disclosed
52 weeks
Healthcare & Medical

Client

Research project developed in collaboration with public entities. Financing from the European Union within the framework of the H2020 program.

Description

Creation of a repository of data related by time and geographic location, with a special focus on interesting data for researchers in the health, pharmaceutical and insurance sectors.

Results

  • A microservices-based system was developed capable of extracting information from various sources and integrating it using a common schema.

  • Techniques were developed to normalize and cross the different data sources by date and location.

  • An interactive tool was developed that the end user can use to select and cross data sources.

  • An algorithm based on Natural Language Processing was developed capable of identifying the different sources of information in a free text and the relationship between them, used in the search system offered to users.

Technology

  • Data extraction and cleaning using pandas.

  • Storage in relational databases such as PostgreSQL and Oracle.

  • Natural Language Processing using spaCy and NLTK.

Mass processing of pharmaceutical data
View Portfolio
Mass processing of pharmaceutical data
  • Mass processing of pharmaceutical data screenshot 1
Not Disclosed
13 weeks
Healthcare & Medical

Client

German pharmaceutical company with great international relevance.

Description

Big performance issue in data ingestion with Spark. Volume of several TB of daily information.

Results 

Complete redesign of ingest pipelines, allowing to reduce computing time from several days to only a few hours.

Technology

Spark with Scala for data processing. Flume and Sqoop for intake. HDFS storage available using Hive's SQL engine. Big Data cluster with MapR technology.

Mobile operator benchmarking
View Portfolio
Mobile operator benchmarking
  • Mobile operator benchmarking screenshot 1
Not Disclosed
13 weeks
Telecommunication

Client

Company specialized in conducting network quality comparisons between different mobile operators.

Description

Development of a system capable of creating a global network quality metric for the different mobile operators.

Results

  • Massive data ingestion from instrumented vehicles. Antenna performance, packet loss, upload and download speeds, etc.

  • Calculation of quality and aggregation KPIs at the mobile operator and country level.

Technology

  • Ingest done with Spark with Java.

  • Data analysis with Impala and Hive.

  • Data dashboards with Tableau.

Call center call prediction
View Portfolio
Call center call prediction
  • Call center call prediction screenshot 1
Not Disclosed
26 weeks
Telecommunication

Client

One of the main telecommunications companies that provides service worldwide.

Description

The objective of the project is to predict the number of calls to the customer service call center, in order to decongest the service.

Results 

A predictive model was developed capable of assigning a score to each customer with an AUC of 0.9, which allowed the company to proactively anticipate calls and make them at off-peak times.

Technology

  • Data processing with Spark.

  • Modeling performed with Spark MLib.

User experience on mobile networks
View Portfolio
User experience on mobile networks
  • User experience on mobile networks screenshot 1
Not Disclosed
26 weeks
Telecommunication

Client

Telecommunications company with great growth in Spain.

Description

Development of a user experience analysis system (CEM) for the mobile network.

Results

  • Massive data ingestion of 3G / 4G antennas.

  • Calculation of network quality KPIs and creation of dashboards.

  • Development of predictive models for leakage and customer complaints.

  • Creation of a surveillance system to detect problematic antennas.

Technology

  • Ingest made with Python and Impala.

  • Big Data cluster with Cloudera distribution.

  • Models made with scikit-learn, visualizations with Matplotlib and seaborn.

  • Dashboards in PowerBI.

Predictive maintenance of wind turbines
View Portfolio
Predictive maintenance of wind turbines
  • Predictive maintenance of wind turbines screenshot 1
Not Disclosed
26 weeks
Oil & Energy

Client

Main manufacturer of wind turbines in Spain.

Description

Creation of our own predictive maintenance software that will improve the results of the current one implemented in the company (external commercial software).

Results 

Development of a hybrid predictive maintenance system, based on Machine Learning (anomaly detection) and fatigue time series calculations, obtaining a better result than the commercial software used. Cost savings for the company from a business license of more than € 30k per year.

Technology

  • Treatment and study of fatigue with pandas.

  • Predictive model made with scikit-learn.

  • Performance optimization using Numba, Fortran 90 and NumPy.

Wind generation prediction
View Portfolio
Wind generation prediction
  • Wind generation prediction screenshot 1
Not Disclosed
26 weeks
Oil & Energy

Client

Spanish energy company with a worldwide presence.

Description

Prediction of wind generation based on meteorological information (three-dimensional grids of wind speed, pressure, etc.).

Results

10 percentage point improvement compared to traditional prediction models. Greater profitability for our client by attending the US electricity auction.

Technology

Data ingestion with Spark (Scala) in Big Data cluster based on Hortonworks distribution. Modeled with scikit-learn and LightGBM. Orchestration and monitoring with Apache Airflow.

Analysis of the behavior of mobile app users
View Portfolio
Analysis of the behavior of mobile app users
  • Analysis of the behavior of mobile app users screenshot 1
Not Disclosed
26 weeks
Automotive

Client

One of the main Spanish startups for urban mobility services.

Description

Analysis and modeling of the behavior of the users of the mobile app, used by more than 0.5M users in the metropolitan area of ​​Madrid.

Results 

  • Development of an analysis of the mobile application registration funnel, resulting in the redesign of a part of the app and a spectacular increase (50% -> 80%) of users who complete the registration.

  • Development of an unsupervised model capable of inferring the work and home locations of the mobile app users based on geolocated events.

Technology

Data processing with Spark. Visualization with Tableau, Plotly and Matplotlib. Modeling with scikit-learn. Orchestration and monitoring with Apache Airflow. Cloud infrastructure on AWS.

Risk identification
View Portfolio
Risk identification
  • Risk identification screenshot 1
Not Disclosed
13 weeks
E-commerce

Client

Spanish startup that offers Natural Language Processing products applied to the corporate sector.

Description

Development of a system capable of monitoring news from the main national economic media and identifying various types of risks.

Results 

They were developed:

  • A system for extracting economic news from the main platforms. Currently available on DataMarket .

  • A Machine Learning model capable of identifying 7 different types of risk (economic, operational, regulatory, cybersecurity, etc.) in economic news. AUC greater than 0.8 in the evaluation set.

  • A Named Entities Recognition (NER) model capable of identifying up to 20 types of entities (companies, public entities, places of interest, dates, etc.).

Technology

  • Model made with Tensorflow.

  • Dockerized deployment using MLflow and Tensorflow Model Server.

Social Listening
View Portfolio
Social Listening
  • Social Listening screenshot 1
Not Disclosed
26 weeks
E-commerce

Client

Leading market research and demographic company in Spain. EU investment in the project through the H2020 program.

Description

Creation of an opinion monitoring system in social networks.

Results 

Development of a massive data extraction system from Twitter and Facebook and creation of Natural Language Processing algorithms capable of calculating the expertise of a user in certain topics of interest (politics, sports, economics, etc.).

Technology

NLP libraries like spaCy and fastText. Orchestration and monitoring of processes with Apache Airflow. Versioning and deployment of the models with MLflow. Cloud infrastructure on AWS.

CTR (Click Through Rate) prediction
View Portfolio
CTR (Click Through Rate) prediction
  • CTR (Click Through Rate) prediction screenshot 1
Not Disclosed
13 weeks
Advertising & Marketing

Client

One of the main web traffic generation and monetization companies in Europe.

Description

Prediction of the CTR of Google and Bing ads based on the text used in the ads.

Results 

  • Predictive model based on NLP algorithms to predict the CTR of an ad based on its characteristics.

  • Text recommendation system to maximize the CTR of the ads taking into account metadata such as category, country of destination, etc.

Technology

  • Data processing with Spark (aggregation of several billion ads).

  • Modeling performed with scikit-learn, spaCy, fastText, LightGBM.

  • Versioning and deployment of models with MLflow.

  • Orchestration and monitoring with Apache Airflow.

  • Optimal text recommendation app built with Flask.

CPC Prediction (Cost per Click)
View Portfolio
CPC Prediction (Cost per Click)
  • CPC Prediction (Cost per Click) screenshot 1
Not Disclosed
13 weeks
Advertising & Marketing

Client

One of the main web traffic generation and monetization companies in Europe.

Description

Prediction of the CPC (Cost per Click) of a huge set of keywords from Google and Bing ads.

Results 

Predictive model based on NLP algorithms with two stages:

  • Prediction of the number of impressions.

  • CPC prediction.

Technology

  • Data processing with Spark (aggregation of several billion ads with a variety of keywords).

  • Modeling performed with spaCy, fastText, LightGBM.

  • Versioning and deployment of models with MLflow.

  • Orchestration and monitoring with Apache Airflow.

Demand prediction
View Portfolio
Demand prediction
  • Demand prediction screenshot 1
Not Disclosed
13 weeks
Retail

Client

Main chain of DIY and construction products in Spain.

Description

Prediction of demand for more than 900 references, including products with very seasonal and sporadic sales.

Results 

Predictive model of demand for 2 months for more than 900 references, for 30 different stores. Good performance of the model measured in different ways (R2, RMSE, MAPE).

Technology

Data processing with Spark (aggregation of more than 1 Billion records to calculate weekly sales by reference). Modeling carried out with scikit-learn and LightGBM.

Classification of customer service tickets
View Portfolio
Classification of customer service tickets
  • Classification of customer service tickets screenshot 1
Not Disclosed
26 weeks
Retail

Client

Main chain of bookstores in Spain.

Description

Creation of a system capable of classifying in 20 different categories and automatically solving customer service tickets.

Results

  • A system based on Natural Language Processing was developed capable of classifying tickets created by users (free text) into 20 different categories (defective order, late delivery, etc.).

  • A logic was implemented by which those tickets in which the model had greater confidence in the prediction were automatically solved, reserving the most ambiguous and complex tickets for treatment by the workers.

Technology

  • Natural Language Processing with libraries like spaCy and NLTK.

  • Information processing with pandas.

Analysis of consumer behavior
View Portfolio
Analysis of consumer behavior
  • Analysis of consumer behavior screenshot 1
Not Disclosed
13 weeks
Retail

Client

Main retail company in Spain and Europe.

Description

Creation of a video analysis system capable of automatically locating and counting the number of customers in the vicinity and inside the stores, as well as in potential locations of new establishments.

Outcome

  • Using Computer Vision techniques in Google Edge TPU, a prototype of a device deployed in stores and outdoors was created, capable of counting the number of people while respecting privacy.

  • The counts were analyzed to develop heat maps of the interior of the stores, allowing an improvement in sales by redistributing the products.

Technology

  • Computer Vision model capable of detecting people and counting them made with Tensorflow Lite.

  • Deployment on Google Edge TPU Dev Board device.

WhiteBox Reviews

No Review
No reviews submitted yet.
Be the first one to review