Dataset For Task¶
In this section, we introduce to you the correspondence between our provided standard datasets tasks and models. It is worth noting that some datasets can only support some models in a certain task.
If you think this table does not look convenient enough, please click here to view this table on Github
Task | Dataset | Supported Model | Remark |
---|---|---|---|
Traffic Flow Prediction | TAXIBJ, PORTO, NYCTAXI_GRID, NYCBIKE, AUSTINRIDE, BIKEDC, BIKECHI, NYCBike20140409, NYCBike20160708, NYCBike20160809, NYCTaxi20140112, NYCTaxi20150103, NYCTaxi20160102, T_DRIVE20150206, T_DRIVE_SMALL | ACFM, STResNet, DSAN, ACFMCommon, STResNetCommon | Grid based dataset |
PEMSD3, PeMSD4, PeMSD7, PeMSD8, BEIJING_SUBWAY, M_DENSE, SHMETRO, HZMETRO, NYCTAXI_DYNA | RNN, Seq2Seq, FNN, AutoEncoder, AGCRN, ASTGCNCommon, MSTGCNCommon, STSGCN, CONVGCNCommon, ToGCN, MultiSTGCnetCommon, STNN, ASTGCN, MSTGCN, CONVGCN, DGCN, ResLSTM, MultiSTGCnet | Point based dataset | |
TAXIBJ, PORTO, NYCTAXI_GRID, NYCBIKE, AUSTINRIDE, BIKEDC, BIKECHI, NYCBike20140409, NYCBike20160708, NYCBike20160809, NYCTaxi20140112, NYCTaxi20150103, NYCTaxi20160102, T_DRIVE20150206, T_DRIVE_SMALL | RNN, Seq2Seq, FNN, AutoEncoder, AGCRN, ASTGCNCommon, MSTGCNCommon, STSGCN, CONVGCNCommon, ToGCN, MultiSTGCnetCommon, STNN, ASTGCN, MSTGCN, CONVGCN, DGCN, ResLSTM, MultiSTGCnet | Need simple modification for grid based dataset. See Note 3. | |
M_DENSE | CRANN | Need .ext file |
|
NYCBike20160708, NYCTaxi20150103 | STDN | Need .gridod file |
|
Traffic Speed Prediction | METR_LA, LOS_LOOP, PeMSD4, PeMSD8, PEMSD7(M), PEMS_BAY, LOS_LOOP_SMALL, SZ_TAXI, LOOP_SEATTLE, Q_TRAFFIC, ROTTERDAM | RNN, Seq2Seq, FNN, AutoEncoder, DCRNN, STGCN, GWNET, MTGNN, STMGAT, TGCN, ATDM, HGCN, DKFN, STTN, GTS, GMAN, STAGGCN, TGCLSTM | Point based dataset. |
LOOP_SEATTLE | TGCLSTM | See Note 2. | |
On-Demand Service Prediction | TAXIBJ, PORTO, NYCTAXI_GRID, NYCBIKE, AUSTINRIDE, BIKEDC, BIKECHI, NYCBike20140409, NYCBike20160708, NYCBike20160809, NYCTaxi20140112, NYCTaxi20150103, NYCTaxi20160102, T_DRIVE20150206, T_DRIVE_SMALL | DMVSTNet | Grid based dataset |
PEMSD3, PeMSD4, PeMSD7, PeMSD8, BEIJING_SUBWAY, M_DENSE, SHMETRO, HZMETRO, NYCTAXI_DYNA | CCRNN, STG2Seq | Point based dataset | |
TAXIBJ, PORTO, NYCTAXI_GRID, NYCBIKE, AUSTINRIDE, BIKEDC, BIKECHI, NYCBike20140409, NYCBike20160708, NYCBike20160809, NYCTaxi20140112, NYCTaxi20150103, NYCTaxi20160102, T_DRIVE20150206, T_DRIVE_SMALL | RNN, Seq2Seq, FNN, AutoEncoder, CCRNN, STG2Seq | Need simple modification for grid based dataset. See Note 3. | |
Od Matrix Prediction | NYCTAXI_OD | GEML | Point-OD based dataset |
NYC_TOD | GEML | Need simple modification for grid-od based dataset. See Note 4. | |
NYC_TOD | CSTN | Grid-OD based dataset and .ext file |
|
Traffic Accidents Prediction | NYC_RISK, CHICAGO_RISK | GSNet | Grid based traffic accidents dataset |
Trajectory Next-Location Prediction | Gowalla, BrightKite | FPMC, RNN, ST-RNN, ATST-LSTM, DeepMove, HST-LSTM, LSTPM, STAN | Trajectory based dataset |
Fousquare, Instagram | FPMC, RNN, ST-RNN, ATST-LSTM, DeepMove, HST-LSTM, LSTPM, GeoSAN, STAN, SERM, CARA | Trajectory based dataset | |
Estimated Time of Arrival | Chengdu_Taxi_Sample1 | DeepTTE | Trajectory based dataset |
Beijing_Taxi_Sample | TTPNet, DeepTTE | Trajectory based dataset | |
Map Matching | Seattle, global | STMatching, IVMM, HMMM | Trajectory based dataset |
Road Network Representation Learning | bj_roadmap_edge | ChebConv, LINE | Road network dataset |
Note 1
The bolded dataset is the one we recommend.
Note 2
For TGCLSTM
, need to set dataset_class
to TrafficStatePointDataset
. Otherwise, the default dataset_class=TGCLSTMDataset
is only suitable for dataset LOOP_SEATTLE
.
Note 3
Here is how to generalize models used for point-based data for grid-based data.
(1) If the dataset class used by the model is TrafficStatePointDataset
, such as AGCRN
, ASTGCNCommon
, CCRNN
, etc., you can directly set dataset_class
to TrafficStateGridDataset
in task_file.json
or through a custom configuration file(--config_file
). Then set the parameter use_row_column
of TrafficStateGridDataset
to False
.
(2) If the dataset class used by the model is the subclass of TrafficStatePointDataset
, such as ASTGCNDataset
, CONVGCNDataset
, STG2SeqDataset
, etc., you can modify the file of the dataset class to make it inherit TrafficStateGridDataset
instead of the current TrafficStatePointDataset
. Then set the parameter use_row_column
in the function __init__()
to False
.
Example (1):
Before modification:
# task_config.json
"RNN": {
"dataset_class": "TrafficStatePointDataset",
},
# TrafficStateGridDataset.json
{
"use_row_column": true
}
After modification:
# task_config.json
"RNN": {
"dataset_class": "TrafficStateGridDataset",
},
# TrafficStateGridDataset.json
{
"use_row_column": false
}
Example (2)::
Before modification:
# task_config.json
"STG2Seq": {
"dataset_class": "STG2SeqDataset",
},
# STG2SeqDataset.json
{
"use_row_column": false
}
# stg2seq_dataset.py
from libcity.data.dataset import TrafficStatePointDataset
class STG2SeqDataset(TrafficStatePointDataset):
def __init__(self, config):
super().__init__(config)
pass
After modification:
# task_config.json
"STG2Seq": {
"dataset_class": "STG2SeqDataset",
},
# STG2SeqDataset.json
{
"use_row_column": false
}
# stg2seq_dataset.py
from libcity.data.dataset import TrafficStateGridDataset
class STG2SeqDataset(TrafficStateGridDataset):
def __init__(self, config):
super().__init__(config)
self.use_row_column = False
pass
Note 4
Here is how to generalize models used for point-based data for grid-based data. (Similar to Note 3)
(1) If the dataset class used by the model is TrafficStateOdDataset
, such as GEML
etc., you can directly set dataset_class
to TrafficStateGridOdDataset
in task_file.json
or through a custom configuration file(--config_file
). Then set the parameter use_row_column
of TrafficStateGridOdDataset
to False
.
(2) If the dataset class used by the model is the subclass of TrafficStateOdDataset
, you can modify the file of the dataset class to make it inherit TrafficStateGridOdDataset
instead of the current TrafficStateOdDataset
. Then set the parameter use_row_column
in the function __init__()
to False
.