Overview
Components
Corrode consists of 5 components:
- The VariableStack. It handles all things variables when reading data, pushing structures and looping bytes.
- The CorrodeBase. This is the part of corrode, reading all bytes. It's the low-level part handling all the small things, preventing you from shooting yourself into the foot. Of course, it also enables you to do just that, if you explicitly ask for it.
- The Assertions. Just like your assertion-library of choice these ensure you read correct data. If not it will abort mission.
- The Mappers. Doing all the bulk parts of work for you, the mappers ensure your data is in the way you need it to be. Think of these like JavaScripts Array.prototype.map. Except for bits and bytes.
- Corrode itself. This is the only thing you - as a developer - are directly in contact with. It connects The CorrodeBase with the Assertions and Mappers to provide a unified interface.
For more info on each of those see the Reference.
For info on ways to configure corrode to your liking, see Configuration.
The Assertions
Asserts help you to make sure, the buffer you're parsing is in the correct format. These assertions are like chai, throwing an error when an assertion doesn't hold.
These functions won't modify your variables.
All assertions can be called from a Corrode instance like this:
parser.assert.equal('var', 5);
The Mappers
These functions provide basic mapping-abilities for Corrode's VariableStack.
Imagine them like this:
const parser = new Corrode();
parser.uint8('value').map.double('value');
Of course there's no existing mapping-function which doubles a value (yet?). But the concept is that they are functions receiving a value, processing it and saving a new value in the {@link VariableStack} in place of the old one.
The imaginary code above would yield { value: 4 }
, parsing a buffer like this [2]
.
Note that all mappers don't check for existance, validity or other assumptions. You have to do that yourself with assertions.
When will i be able to access the variables? (Extensions, Asserts & Mappers)
Corrode#position / Corrode#skip issues (isSeeking)
Getting started
Introduction
This document will guide you through most of the basic functionality of corrode by example. Our goal is to parse a structured file for storing flat, arbitrary data.
Data Structure
This will be the structure we are going to parse:
- Container - Little endian
- Header
- Magic Number [uint32 0xDEADBEEF]
- File version
- major[uint32]
- minor[uint32]
- File creation date[uint32 timestamp]
- Data count[uint32]
- Data Index (array)
- Name[string<utf8>, terminated by 0x00]
- Offset[uint32]
- Length[uint32]
- Flags[uint8]
- Compression[0x80]
- Encryption[0x40]
- Read only[0x20]
- Checksum [Buffer<32>]
- Data (array)
- Data[buffer<Length>]
- Header
How we'll do this
To better structure our code, we will split it into three main parsers: Header, Index, Data.
For cases like this, there's Corrode.addExtension
, which is let's us break down big pieces of code.
Apart from this function and the basic read-structures there are also assertions and mappers.
Assertions allow us to test the parsed data on whether or not it matches certain requirements. This helps in making sure, that the data we parse aligns with our specification. When an assertion is not met, corrode will throw an Error
and halt execution.
Mappers provide us with a way to transform read data into something different, or mapping certain values to other ones.
The Header-parser will show you the basics of their usage.
Let's get started
The header parser
/**
* reads header data from our own specified format
* @throws Error magic number doesn't match
* @return {Object} header-data
* @example
* {
* magic: 0xDEADBEEF,
* version: { major: uint32, minor: uint32 },
* timestamp: Date,
* dataCount: uint32
* }
*/
Corrode.addExtension('containerHeader', function containerHeaderParser(){
this
// read the magic number and compare it against a fixed value
.uint32('magic')
.assert.equal('magic', 0xDEADBEEF)
// tap creates a new named object (version), into which we can write data
// also it allows us to acces data, read up to this point of parsing
.tap('version', function(){
this
.uint32('major')
.uint32('minor');
})
// read the timestamp and convert it into a native Date object via a custom mapper
.uint32('timestamp')
.map.callback('timestamp' t => new Date(t))
.uint32('dataCount');
});
This is all we need for parsing any header of this structure. See how this assert makes sure we have the correct file? And how we're able to convert an uint32 into a native js Date object? This is how we roll.
The index parser
Let's spice things up a bit, by creating our index parser. It needs access to the parsed data of the header, because there's the data-count of the whole file.
Also, instead of just one we will create two extensions for parsing the index:
- One that parses the index itself
- Another one that parses each index-entry
The index entry parser
/**
* reads the data of an index-entry
* @return {Object} index-entry
* @example
* {
* name: string,
* offset: uint32,
* length: uint32,
* flags: { isCompressed: bool, isEncrypted: bool, isReadOnly: bool },
* checksum: Buffer<32>
* }
*/
Corrode.addExtension('containerIndexEntry', function containerIndexEntryParser(){
this
// parse a string terminated by 0x00 (default)
.terminatedString('name')
.uint32('offset')
.uint32('length')
// read the bits for the flags and parse them with bitwise opearators
.uint8('flags')
.map.callback('flags', bits => ({
isCompressed: (bits & 0x80) === 0x80,
isEncrypted: (bits & 0x40) === 0x40,
isReadOnly: (bits & 0x20) === 0x20
}))
// we'll check this later, when parsing the data
.buffer('checksum', 32);
});
The index parser itself
/**
* reads the data of the container-index
* @param {Object<containerHeader>} header parsed header-data of the container-file
* @return {Array<containerIndexEntry>} index
*/
Corrode.addExtension('containerIndex', function containerIndexParser(header){
this
.repeat('entries', header.dataCount, function(){
this
// call our indexEntry-extension and push the result into the
// entries-array created by Corrode#repeat
.ext.containerIndexEntry('entry')
.map.push('entry');
})
// the push function replaces the current value in `this.vars` with the
// one found under the given name. This way we're able to return an array
// of entries here, instead of an object like this { entries: [...] }
.map.push('entries');
});
As you can see, this is a rather simple extension. It executes the containerIndexEntry parser as many times as the header says there are files in the container. The repeat
-function creates a new array in the current variable-layer (@TODO see Overview).
By using the map.push
-function we replace the current variable-layer with our own value. So instead of returning an object containing an array of objects containing the data, we just get an array of index-entries.
The data parser
We will also split the data-parser into two. Again there will be a parser for the entries and one for the data itself.
We don't do this because of technical limitations, but because i like to have clean and separate functions for everything. If you want to, you can still smooch the entry-parser into into the repeat.
The data-entry parser
/**
* reads the data of a data-entry
* @throws Error callback doesn't match
* @param {Object<containerIndexEntry>} indexEntry
* @return {Buffer} data
*/
Corrode.addExtension('containerDataEntry', function containerDataEntryParser(indexEntry){
this
// jump to the offset of the entry
.position(index.offset)
// get buffer (it's a slice, not a copy)
.buffer('data', index.length)
// check, that the checksum of the data matches the one given
.assert.callback('data', data => someChecksumFunction(data) === index.checksum, 'checksum')
// only return the buffer
.map.push('data');
});
Note that the position-function works in a slightly restricted way, in that it only allows skipping forward without any additional configuration. @TODO See Corrode#isSeeking
for more info.
In the way there's a map.callback
there's also a assert.callback
taking a function by which it checks the incoming value.
The data parser itself
/**
* read container data
* @param {Object<containerIndex>} index
* @return {Array<containerDataEntry>} container-data
*/
Corrode.addExtension('containerData', function containerDataParser(index){
this
.repeat('entries', index.length, function(end, discard, i){
const indexEntry = index[i];
this
.ext.containerDataEntry('entryData', indexEntry)
.tap(function(){
const data = this.vars.entryData;
this.vars = indexEntry;
if(this.vars.flags.)
this.vars.data = data;
});
})
.map.push('entries');
});
Wiring everything together
Corrode.addExtension('container', function containerParser(){
this
.ext.containerHeader('header')
.tap(function(){
this
.ext.containerIndex('index', this.vars.header)
.tap(function(){
this.ext.containerData('data', this.vars.index);
});
})
.map.push('data');
});
Parsing a file
const parser = new Corrode();
parser
.ext.container('container')
.map.push('container');
const fstream = fs.createReadStream('./container');
fstream.pipe(parser);
parser.on('finish', () => {
console.log(parser.vars);
});
Escaping callback-hell
Corrode.addExtension('containerIndex', function containerIndexParser(header){
if(typeof header === 'string'){
header = this.vars[header];
}
...
});
Corrode.addExtension('container', function containerParser(){
this
.ext.containerHeader('header')
.ext.containerIndex('index', 'header')
.ext.containerData('data', 'index')
.map.push('data');
});
Further reading
- Configuration
- Examples
Configuration
Corrode provides some configuration-options. These are either defaults, safeguards or advanced-user-stuff.
Corrode accepts an object as its first parameter, containing options.
These options also get passed on to the TransformStream-constructor, so those options are also valid.
endianness
default: 'LE'
accepts: 'LE', 'BE'
(for little endian & big endian)
If you use methods like .uint8()
or .int32()
this option determines what endianness corrode uses when reading your bytes.
Of course using .uint16be()
or .doublele()
will overwrite this default, as you'd expect.
loopIdentifier
default: Symbol(loop-variable)
accepts: Anything which can be used as an identifier in an object.
Determines the identifier of the temporary variable which gets created when using .loop()
.
encoding
default: 'utf8'
accepts: Any encoding Buffer.prototype.toString
accepts. Full list here.
Determines which encoding to use for string-functions like .string()
or .terminatedString()
. Can be overwitten on a per-use-basis by the functions themselves.
finishJobsOnEOF
default: true
accepts: true, false
Determines whether or not to finish everything up, once corrode encounters the end of the stream being piped into corrode, or the end of the buffer.
Set to false
if you want to parse a file which was split in two or more parts. Or somethin along those lines.
What this flag does in detail is to clean up all remaining jobs from the job list, which are read-related, once EOF is reached. Then the job list will be worked on, until there are no more jobs. Meaning all VariableStack-Layers have been popped, giving you access to all that sweet data of yours.
anonymousLoopDiscardDeep
default: false
accepts: true, false
Corrode provides the discard()
function inside .loop()
- and .repeat()
-callbacks. If you use those anonymously (meaning you don't push an array onto the variable-stack by giving a string as the first parameter) and then call the discard()
function, corrode will discard whatever data you read, and restore what's been there before.
By default this is done in a shallow way, meaning that corrode will clone the users data before the callback is called shallowly by using Object.assign()
and that shallow copy gets assigned again when calling discard()
.
This may lead to problems, when modifiyng objects within the curren VariableStack-layer. Those won't get replaced with their original version.
To circumvent this problem you can set this option to true. This is not done, because probably no-one needs this, and it may be a huge performance-hit to clone an entire object, everytime the loop-callback is called.
strictObjectMode
default: true
accepts: true, false
When this option is set to true, corrode will prevent you from pushing into anything that's not an object. Meaning moves like this:
parser.uint8('val').tap('val', function(){});
will throw an error. This way corrode provides a naive way of type-safety.