Just Do More Parsing: Structurizing Semi-Structured Log Data For High Compression and Fast Search
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Application logs are important for debugging, alerting, and analytics workloads, making it critical to retain them and make them searchable. These logs are increasingly appearing in semi-structured formats, which allow developers to add arbitrary fields, and express queries on them. However, legacy applications still emit unstructured log-text which may contain fields that developers need to search on. From a certain perspective these are also a kind of semi-structured data, just in a non-standard format. Unfortunately, existing systems are not able to fully leverage the structure of these legacy application logs for both compression and search. Instead the opportunity is wasted, and they are mostly treated as unstructured. This thesis significantly improves µSlope, a semi-structured data management system for application logs, by proposing a custom parser interface that can structurize even non-standard semi-structured data, and capture these structures in concise schema metadata. The design and implementation choices that enable custom parser support are contrasted against an earlier version of µSlope (µSlopeV0) which could only handle a limited class of semi-structured data. In a one-to-one comparison with µSlopeV0, the ability to structurize previously unstructured data offers up to a 32% improvement in compression ratio, and 4.18x improvement to search speed.
Description
Keywords
Citation
DOI
ISSN
Creative Commons
Creative Commons URI
Collections
Items in TSpace are protected by copyright, with all rights reserved, unless otherwise indicated.