Data Loaders
Data loaders provide a convenient way to process data from various data files in your streams. You can access the values of each data set as if it were an object, containing the header names as attributes.
To use the loaders, you can import them with this line:
from pystreamapi.loaders import csv, json, xml, yaml
CSV loader
In order to load the data from a CSV file, you can use the csv
loader.
You just need the file's path, and you can optionally specify the delimiter and the encoding. By default, the encoding is set to UTF-8.
By default, all values get converted to int
, float
, bool
or otherwise str
. The type casting can be disabled to speed up the reading time by setting the cast_types
parameter to False
.
The examples below use this CSV file:
name;age
Joe;20
Jane;30
John;78
from pystreamapi import Stream
from pystreamapi.loaders import csv
Stream.of(csv("path/to/data.csv", delimiter=";", encoding="us-ascii")) \
.map(lambda x: x.name) \
.for_each(print) # "Joe", "Jane", "John"
If you want to disable type conversion, you can use the loader like this:
from pystreamapi import Stream
from pystreamapi.loaders import csv
Stream.of(csv("path/to/data.csv", cast_types=False, delimiter=";")) \
.map(lambda x: x.age) \
.for_each(print) # "20", "30", "78"
JSON loader
In order to load the data from a JSON file, you can use the json
loader.
You can read data either from a JSON file or a string containing JSON. If you read from a string you have to set the read_from_src
parameter to True
.
By default, all values get converted to int
, float
, bool
or otherwise str
.
The example below uses this JSON file:
[
{
"name": "Joe",
"age": 20
},
{
"name": "Jane",
"age": 30
},
{
"name": "John",
"age": 78
}
]
from pystreamapi import Stream
from pystreamapi.loaders import json
Stream.of(json("path/to/data.json")) \
.map(lambda x: x.name) \
.for_each(print) # "Joe", "Jane", "John"
If you want to pass the JSON directly as a string, you can do it like that:
from pystreamapi import Stream
from pystreamapi.loaders import json
Stream.of(json("[{\"name\":\"Joe\",\"age\":20},{\"name\":\"Jane\",\"age\":30}]",
read_from_src=True)) \
.map(lambda x: x.age) \
.for_each(print) # 20, 30
XML loader
In order to load the data from an XML file, you can use the xml
loader.
def xml(src: str, read_from_src=False, retrieve_children=True, cast_types=True,
encoding="utf-8")
The loader isn't included in the core version of pystreamapi. You can install it using the following command:
pip install 'streams.py[xml_loader]'
🎉 Now you can use the loader as described below!
You just need the file's path, and you can optionally specify the encoding. By default, the encoding is set to UTF-8.
You can read data either from an XML file or a string containing XML. If you read from a string, you have to set the read_from_src
parameter to True
.
By default, all values get converted to int
, float
, bool
or otherwise str
. The type casting can be disabled to speed up the reading time by setting the cast_types
parameter to False
.
The XML loader directly retrieves the children nodes from the XML's root. By setting the retrieve_children
parameter to False
you disable this feature and your stream will only consist of one object containing the whole XML tree.
The examples below use this XML file:
<employees>
<employee>
<name>John Doe</name>
<cars>
<car>Audi</car>
</cars>
</employee>
<employee>
<name>Alice Smith</name>
<cars>
<car>Volvo</car>
<car>Volkswagen</car>
</cars>
</employee>
<founder>
<name>Martini Boss</name>
<cars>
<car>Bugatti</car>
<car>Mercedes</car>
</cars>
</founder>
</employees>
Here you can see a few examples illustrating how to access different nodes.
from pystreamapi import Stream
from pystreamapi.loaders import xml
Stream.of(xml("path/to/data.xml")) \
.map(lambda x: x.name) \
.for_each(print) # John Doe, Alice Smith, Martini Boss
Stream.of(xml("path/to/data.xml")) \
.map(lambda x: x.cars.car) \
.for_each(print) # 'Audi', ['Volvo', 'Volkswagen'], ['Bugatti', 'Mercedes']
Stream.of(xml("path/to/data.xml")) \
.map(lambda x: type(x).__name__) \
.for_each(print) # employee, employee, founder
If you disable child retrieving, you have to map the object's children manually:
<employees>
<employee>
<name>John Doe</name>
</employee>
<employee>
<name>Alice Smith</name>
</employee>
</employees>
from pystreamapi import Stream
from pystreamapi.loaders import xml
Stream.of(xml("data.xml", retrieve_children=False)) \
.map(lambda x: x.employee) \
.flat_map(lambda x: Stream.of(x)) \
.map(lambda x: x.name) \
.for_each(print) # John Doe, Alice Smith
YAML loader
In order to load the data from a YAML file, you can use the yaml
loader.
You can read data either from a YAML file or a string containing YAML. If you read from a string you have to set the read_from_src
parameter to True
.
By default, all values get converted to int
, float
, bool
or otherwise str
.
The example below uses this JSON file:
- name: Joe
age: 20
- name: Jane
age: 30
- name: John
age: 78
from pystreamapi import Stream
from pystreamapi.loaders import yaml
Stream.of(yaml("path/to/data.yaml")) \
.map(lambda x: x.name) \
.for_each(print) # "Joe", "Jane", "John"
If you want to pass the YAML directly as a string, you can do it like that:
from pystreamapi import Stream
from pystreamapi.loaders import yaml
Stream.of(yaml("- name: Joe\n age: 20\n- name: Jane\n age: 30",
read_from_src=True)) \
.map(lambda x: x.age) \
.for_each(print) # 20, 30
Last updated
Was this helpful?