Data warehouses are amazing things: you can toss all kinds of information into them then pull mind-blowing insights out the other end. This feat can happen because you’re connected to outside systems holding their own database tables. A copy of whatever has recently gone into the table is taken out and shot through a data pipeline and pushed into your data warehouse. But today’s data stacks contain Multiple clouds, hybrid environments, and so many data pipelines the programs in charge of monitoring and logging the flows almost can’t manage them. It becomes overwhelming to manually check and ensure the quality and integrity of the data. The more sophisticated the systems, the more errors creep into the data. If we rely on flawed data, the outcomes and insights we generate will be equally flawed. This is where data observability comes in.
In this episode you will hear about something called an observability platform. It identifies real-time data anomalies and pipeline errors in data warehouses. Now there’s a twist here because we’re in a cloud computing environment that charges by number of computing cycles. You don’t want an observability tool that’s another pipe accessing client data and running up the meter. The good news is there’s an easier way to detect when data has gone awry, by comparing log files – basically metadata – they are just as effective at alerting you to problems.
If you’d like what this is doing described in a completely non-technical way, think of Hans Christian Andersen’s Princess and the Pea. There is a girl who comes to a castle seeking shelter from the rain claiming to be a princess. The queen doubts whether she is truly of noble blood, and offers her a bed, but this bed has twenty mattresses and twenty down-filled comforters on it. A pea is placed underneath the bottom mattress to test if this girl detects anything. The next morning, the princess says that she endured a sleepless night; there must have been something hard in the bed. They realize then and there that she must be a princess, since no one but a real princess could be so delicate.
I spoke with Yuliia Tkachova, the co-founder and CEO of Masthead Data, a company which recently received $1.3M in a pre-seed round. Originally from Ukraine, Yuliia came to found Masthead after work that convinced her of the need for an observability solution. She had roles as a Product Manager roles at OWOX BI and Boosta, where their data solutions encountered problems. Prior to that, she did marketing for RAGT. She has Bachelors and Masters degrees from Suma State University, specializing in MIS & Statistics. She also serves as an Organizer at MeasureCamp, a volunteer community where analytics professionals come together to learn.
People/Products/Concepts Mentioned in Show
Connect with Yuliia Tkachova on LinkedIn
Image credit: Edmund Dulac in Hans Christian Andersen tales