Just Do More Parsing: Structurizing Semi-Structured Log Data For High Compression and Fast Search

dc.contributor.advisorYuan, Ding
dc.contributor.authorGibson, Devin Kenneth
dc.contributor.departmentElectrical and Computer Engineering
dc.date2024-11
dc.date.accepted2024-11
dc.date.accessioned2024-11-13T19:24:53Z
dc.date.available2024-11-13T19:24:53Z
dc.date.convocation2024-11
dc.date.issued2024-11
dc.description.abstractApplication logs are important for debugging, alerting, and analytics workloads, making it critical to retain them and make them searchable. These logs are increasingly appearing in semi-structured formats, which allow developers to add arbitrary fields, and express queries on them. However, legacy applications still emit unstructured log-text which may contain fields that developers need to search on. From a certain perspective these are also a kind of semi-structured data, just in a non-standard format. Unfortunately, existing systems are not able to fully leverage the structure of these legacy application logs for both compression and search. Instead the opportunity is wasted, and they are mostly treated as unstructured. This thesis significantly improves µSlope, a semi-structured data management system for application logs, by proposing a custom parser interface that can structurize even non-standard semi-structured data, and capture these structures in concise schema metadata. The design and implementation choices that enable custom parser support are contrasted against an earlier version of µSlope (µSlopeV0) which could only handle a limited class of semi-structured data. In a one-to-one comparison with µSlopeV0, the ability to structurize previously unstructured data offers up to a 32% improvement in compression ratio, and 4.18x improvement to search speed.
dc.description.degreeM.A.S.
dc.identifier.urihttp://hdl.handle.net/1807/141302
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectCompression
dc.subjectSemi-structured data management systems
dc.subject.classification0464
dc.titleJust Do More Parsing: Structurizing Semi-Structured Log Data For High Compression and Fast Search
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gibson_Devin_Kenneth_202411_MAS_thesis.pdf
Size:
2.99 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 3 of 3
No Thumbnail Available
Name:
CC_BY.rdf
Size:
908 B
Format:
RDF serialized in XML
Description:
No Thumbnail Available
Name:
TSpace_LAC_SGS_license_MOA2015.txt
Size:
2.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
TSpace_LAC_SGS_license_MOA2015.pdf
Size:
69.65 KB
Format:
Adobe Portable Document Format
Description: