Customize Dataset¶
If we have a new model, and if there is no suitable dataset class, then we need to design a new dataset. Here, we present how to develop a new dataset, and apply it to the LibCity
.
Create a New Dataset Class¶
To begin with, we should create a new dataset implementing from AbstractDataset
or one of the subclass of AbstractDataset
.
For example, we would like to develop a dataset for traffic state prediction task named as NewDataset
and write the code to newdataset.py
in the directory libcity/data/dataset/
.
Here we inherit subclass TrafficStatePointDataset
of class AbstractDataset
.
from libcity.data.dataset import TrafficStatePointDataset
class NewDatasets(TrafficStatePointDataset):
def __init__(self, config):
super().__init__(config)
pass
Or you can inherit class AbstractDataset
directly.
from libcity.data.dataset import AbstractDataset
class NewDatasets(AbstractDataset):
def __init__(self, config):
pass
Rewrite Corresponding Methods¶
The function get_data()
in AbstractDataset
is used to get the divided train_dataloader
, eval_dataloader
and test_ dataloader
. You need to call the function libcity.data.utils.generate_dataloader
to get data-loader from list of input data, where the generated data-loader contains Batch object.
The function get_data_feature()
in AbstractDataset
is used to return the features of some datasets for use by the model and executor.
Other interfaces defined in subclasses of AbstractDataset
will not be described here.
If there is no suitable dataset class, then you can rewrite the corresponding interface mentioned above.
Example 1¶
Here we explain how to inherit AbstractDataset
directly and rewrite the function get_data_feature()
to return some values we want.
from libcity.data.dataset import AbstractDataset
class NewDatasets(AbstractDataset):
def __init__(self, config):
pass
def get_data_feature(self):
return {"scaler": self.scaler, "adj_mx": self.adj_mx,
"num_nodes": self.num_nodes, "feature_dim": self.feature_dim,
"output_dim": self.output_dim}
Example 2¶
Here we explain how to inherit a subclass of AbstractDataset
and rewrite one of its methods.
from libcity.data.dataset import TrafficStatePointDataset
class NewDatasets(TrafficStatePointDataset):
def __init__(self, config):
super().__init__(config)
pass
# We will rewrite this method which is used to calculate `self.adj_mx` based on the atmoic file `rel_file.rel`.
def _load_rel(self):
relfile = pd.read_csv(self.data_path + self.rel_file + '.rel')
self.adj_mx = np.zeros((len(self.geo_ids), len(self.geo_ids)))
self.adj_mx[:] = 1 # set all one
Example 3¶
Here we explain how to inherit a subclass of AbstractDataset
and return different keys in batch from the origin. Specifically, we intend to return three key values: X, Y and Z. This is just an example, for more details, you can refer to TrafficStateCPTDataset
whose has four keys in batch.
from libcity.data.dataset import TrafficStateDataset
class NewDatasets(TrafficStateDataset):
def __init__(self, config):
super().__init__(config)
# the origin code
# self.feature_name = {'X': 'float', 'y': 'float'}
# the modified code
self.feature_name = {'X': 'float', 'Y': 'float', 'Z': 'int'}
pass
def get_data(self):
# Load datset for the keys x,y,z, generate [x|y|z]_[train|val|test].
# ... (implement it yourself)
# Data normalization using self.scaler.
# ... (implement it yourself)
# Aggregate X, Y, Z into a list.
# The i-th element in train_data(a list) is a tuple, consists of x_train[i], y_train[i] and z_train[i].
train_data = list(zip(x_train, y_train, z_train))
eval_data = list(zip(x_val, y_val, z_val))
test_data = list(zip(x_test, y_test, z_test))
# Get dataloader by libcity.data.utils.generate_dataloader.
self.train_dataloader, self.eval_dataloader, self.test_dataloader = \
generate_dataloader(train_data, eval_data, test_data, self.feature_name,
self.batch_size, self.num_workers)
# Return the dataloader
return self.train_dataloader, self.eval_dataloader, self.test_dataloader