Builtin saving scripts

Ayakashi comes bundled with a set of scripts to help persisting extracted data without writing any extra code.

Table of contents

Limitations

The builtin saving scripts are designed to work out of the box without the need of any extra code.
In order for this to work they need to be wrapped in an object with a key of your choice so they can be mapped into proper sql rows, csv lines and json objects.

Returning a single prop result can be done like this:

return {
    someKey: await ayakashi.extractFirst("myProp", "text")
};

And for multiple prop results:

const data1 = await ayakashi.extractFirst("myProp1", "text");
const data2 = await ayakashi.extractFirst("myProp2", "text");
const data3 = await ayakashi.extractFirst("myProp3", "text");

return {
    key1: data1,
    key2: data2,
    key3: data3
};

If you need to return groups of multiple data sets please take a look in the Grouping extracted data section.

printToConsole

Not exactly a saving script, the printToConsole script will just print the data passed to it in a table format.
Can really help while developing.
Use it like any other script by including it in your pipeline after the step that outputs the data you want to print:

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "printToConsole"
    }]
}

saveToJSON

The saveToJSON will persist the data in a json file in a performant way

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "saveToJSON",
        params: {
            file: "myData.json"
        }
    }]
}

You may configure the filename of the generated json file by passing a file param.
By default it will output a data.json file.

saveToCSV

The saveToCSV will persist the data in a csv file in a performant way

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "saveToCSV",
        params: {
            file: "myData.csv"
        }
    }]
}

You may configure the filename of the generated csv file by passing a file param.
By default it will output a data.csv file.

saveToSQL

The saveToSQL script allows saving data to all of the major SQL engines in a table format

{
    waterfall: [{
        type: "scraper",
        module: "myScraper"
    }, {
        type: "script",
        module: "saveToSQL",
        params: {}
    }]
}

There are a lot of parameters you can configure:

params: {
    dialect?: "mysql" | "mariadb" | "postgres" | "mssql" | "sqlite",
    host?: string,
    port?: number,
    database?: string,
    username?: string,
    password?: string,
    connectionURI?: string,
    table?: string
}

Dialect

You may use any of the following dialects:

If no dialect is defined, sqlite is used by default.

Connection options

In all cases except of sqlite you probably need to supply some kind of connection options.
You may use the host, port, database, username and password options or construct a connectionURI and use that instead.
If using sqlite only the database option has effect and it will change the database filename. By default a database.sqlite filename is used.

Table name

You can change the table name in any dialect by specifying the table option.
If no table option is provided an internal uuid will be used.

dataTypes

Supported dataTypes are the following:

  • booleans
  • integers
  • floats
  • objects
  • strings

Any other dataType will be saved as a string.

Objects will be saved in a JSON column if the dialect is set to postgres or sqlite.
For all the others a TEXT field will be used.