Just Do More Parsing: Structurizing Semi-Structured Log Data For High Compression and Fast Search

Date

2024-11

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Application logs are important for debugging, alerting, and analytics workloads, making it critical to retain them and make them searchable. These logs are increasingly appearing in semi-structured formats, which allow developers to add arbitrary fields, and express queries on them. However, legacy applications still emit unstructured log-text which may contain fields that developers need to search on. From a certain perspective these are also a kind of semi-structured data, just in a non-standard format. Unfortunately, existing systems are not able to fully leverage the structure of these legacy application logs for both compression and search. Instead the opportunity is wasted, and they are mostly treated as unstructured. This thesis significantly improves µSlope, a semi-structured data management system for application logs, by proposing a custom parser interface that can structurize even non-standard semi-structured data, and capture these structures in concise schema metadata. The design and implementation choices that enable custom parser support are contrasted against an earlier version of µSlope (µSlopeV0) which could only handle a limited class of semi-structured data. In a one-to-one comparison with µSlopeV0, the ability to structurize previously unstructured data offers up to a 32% improvement in compression ratio, and 4.18x improvement to search speed.

Description

Keywords

Compression, Semi-structured data management systems

Citation

DOI

ISSN

Creative Commons

Attribution 4.0 International

Items in TSpace are protected by copyright, with all rights reserved, unless otherwise indicated.