Just Do More Parsing: Structurizing Semi-Structured Log Data For High Compression and Fast Search
dc.contributor.advisor | Yuan, Ding | |
dc.contributor.author | Gibson, Devin Kenneth | |
dc.contributor.department | Electrical and Computer Engineering | |
dc.date | 2024-11 | |
dc.date.accepted | 2024-11 | |
dc.date.accessioned | 2024-11-13T19:24:53Z | |
dc.date.available | 2024-11-13T19:24:53Z | |
dc.date.convocation | 2024-11 | |
dc.date.issued | 2024-11 | |
dc.description.abstract | Application logs are important for debugging, alerting, and analytics workloads, making it critical to retain them and make them searchable. These logs are increasingly appearing in semi-structured formats, which allow developers to add arbitrary fields, and express queries on them. However, legacy applications still emit unstructured log-text which may contain fields that developers need to search on. From a certain perspective these are also a kind of semi-structured data, just in a non-standard format. Unfortunately, existing systems are not able to fully leverage the structure of these legacy application logs for both compression and search. Instead the opportunity is wasted, and they are mostly treated as unstructured. This thesis significantly improves µSlope, a semi-structured data management system for application logs, by proposing a custom parser interface that can structurize even non-standard semi-structured data, and capture these structures in concise schema metadata. The design and implementation choices that enable custom parser support are contrasted against an earlier version of µSlope (µSlopeV0) which could only handle a limited class of semi-structured data. In a one-to-one comparison with µSlopeV0, the ability to structurize previously unstructured data offers up to a 32% improvement in compression ratio, and 4.18x improvement to search speed. | |
dc.description.degree | M.A.S. | |
dc.identifier.uri | http://hdl.handle.net/1807/141302 | |
dc.rights | Attribution 4.0 International | |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
dc.subject | Compression | |
dc.subject | Semi-structured data management systems | |
dc.subject.classification | 0464 | |
dc.title | Just Do More Parsing: Structurizing Semi-Structured Log Data For High Compression and Fast Search | |
dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Gibson_Devin_Kenneth_202411_MAS_thesis.pdf
- Size:
- 2.99 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 3 of 3
No Thumbnail Available
- Name:
- TSpace_LAC_SGS_license_MOA2015.txt
- Size:
- 2.45 KB
- Format:
- Plain Text
- Description:
No Thumbnail Available
- Name:
- TSpace_LAC_SGS_license_MOA2015.pdf
- Size:
- 69.65 KB
- Format:
- Adobe Portable Document Format
- Description: