# Raw Data ### METR_LA **Place:** Los Angeles County, USA **Duration:** Mar. 1, 2012 ~ Jun. 27, 2012 **Link:** https://github.com/liyaguang/DCRNN **Description:** The METR-LA dataset collected in the highway by loop detectors, contains traffic speed data from 207 sensors. ### LOS_LOOP **Place:** Los Angeles County, USA **Duration:** Mar. 1, 2012 ~ Jun. 27, 2012 **Link:** https://github.com/lehaifeng/T-GCN/tree/master/data **Description:** It is slightly different from METR_LA, and the missing values are supplemented by linear interpolation. ### SZ_TAXI **Place:** Shenzhen, China **Duration:** Jan. 1, 2015 ~ Jan. 31, 2015 **Link:** https://github.com/lehaifeng/T-GCN/tree/master/data **Description:** The SZ-Taxi dataset contains the taxi trajectory of Shenzhen, including roads adjacency matrix and road traffic speed information. ### LOOP_SEATTLE **Place:** Greater Seattle Area, USA **Duration:** over the entirely of 2015 **Link:** https://github.com/zhiyongc/Seattle-Loop-Data **Description:** The Loop Seattle dataset is collected by the inductive loop detectors deployed on freeways (I-5, I-405, I-90, and SR-520) in Seattle area and contains traffic state data from 323 sensor stations. ### Q_TRAFFIC **Place:** Beijing, China **Duration:** Apr. 1, 2017 ~ May 31, 2017 **Link:** https://github.com/JingqingZ/BaiduTraffic **Description:** The Q-Traffic dataset contains three sub-datasets: query sub-dataset, traffic speed sub-dataset and road network sub-dataset. ### PEMS **Place:** California, USA **Duration:** 2001 ~ present **Link:** http://pems.dot.ca.gov **Description:** PEMS records California highway speed data, including time_hour, average_time, lane_points. ### PEMSD3 **Place:** District 3 of California, USA **Duration:** Sept. 1, 2018 ~ Nov. 30, 2018 **Link:** https://github.com/Davidham3/STSGCN **Description:** The PEMSD3 dataset includes 358 sensors and flow information. ### PEMSD4 **Place:** San Francisco Bay Area, USA **Duration:** Jan. 1, 2018 ~ Feb. 28, 2018 **Link:** https://github.com/Davidham3/ASTGCN/tree/master/data/PEMS04 **Description:** The PEMSD4 dataset describes the the speed flow occupancy information of California freeway and contains 3848 sensors on 29 roads. ### PEMSD7 **Place:** District 7 of California, USA **Duration:** Jul. 1, 2016 ~ Aug. 31, 2016 **Link:** https://github.com/Davidham3/STSGCN **Description:** The PEMSD7 dataset contains traffic flow information from 883 sensor stations. ### PEMSD8 **Place:** San Bernardino Area, USA **Duration:** Jul. 1, 2016 ~ Aug. 31, 2016 **Link:** https://github.com/Davidham3/ASTGCN/tree/master/data/PEMS08 **Description:** The PEMSD8 dataset describes the speed occupancy of California freeways with data from 1979 sensors on 8 roads. ### PEMSD7(M) **Place:** District 7 of California, USA **Duration:** the weekdays of May and June of 2012 **Link:** https://github.com/Davidham3/STGCN/tree/master/datasets **Description:** The PEMSD7(M) dataset describes highway speed information at 228 stations in the 7th District of California. ### PEMS_BAY **Place:** San Francisco Bay Area, USA **Duration:** Jan. 1, 2017 ~ Jun. 30, 2017 **Link:** https://github.com/liyaguang/DCRNN **Description:** The PEMS-BAY dataset contains 6 months of statistics on traffic speed, including 325 sensors. ### BEIJING_SUBWAY **Place:** Beijing, China **Duration:** Feb. 29, 2016 - Apr. 3, 2016 **Link:** https://github.com/JinleiZhangBJTU/ResNet-LSTM-GCN **Description:** This dataset is collected from the Beijing subway between 05:00 and 23:00 for five consecutive weeks from February 29 to April 3, 2016. There were 17 lines and 276 subway stations (excluding the airport express line and the stations on it) in March 2016 in Beijing. ### M_DENSE **Place:** Madrid, Spain **Duration:** Jan. 1, 2018 - Dec. 21, 2019 **Link:** https://github.com/rdemedrano/crann_traffic **Description:** This dataset contains historical data of traffic measurements in the city of Madrid. The measurements are taken every 15 minutes at each point, including traffic intensity in number of cars per hour. ### ROTTERDAM **Place:** Rotterdam, Holland **Duration:** 135 days of 2018 **Link:** https://github.com/RomainLITUD/DGCN_traffic_forecasting **Description:** ROTTERDAM dataset contains traffic state information of 208 links. ### SHMETRO **Place:** Shanghai, China **Duration:** Jul. 1, 2016 - Sept. 30, 2016 **Link:** https://github.com/ivechan/PVCGN **Description:** This dataset was built based on the metro system of Shanghai, China. A total of 811.8 million transaction records were collected from Jul. 1st 2016 to Sept. 30th 2016, with 8.82 million ridership per day. ### HZMETRO **Place:** Hangzhou, China **Duration:** Jan. 1, 2019 - Jan. 25, 2019 **Link:** https://github.com/ivechan/PVCGN **Description:** This dataset was created with the transaction records of the Hangzhou metro system collected in January 2019. With 80 operational stations and 248 physical edges, this system has 2.35 million ridership per day. ### TaxiBJ **Place:** Beijing, China **Duration:** Jul. 1, 2013 ~ Oct. 30, 2013, Mar. 1, 2014 ~ Jun. 30, 2014, Mar. 1, 2015 ~ Jun. 30, 2015 and Nov. 1, 2015 ~ Apr. 10, 2016 **Link:** https://github.com/TolicWang/DeepST/issues/3 **Description:** The TaxiBJ dataset contains the taxicab GPS data, including crowd flow, meteorology and holiday information. ### T_DRIVE **Place:** Beijing, China **Duration:** Feb. 2, 2008 ~ Feb. 8, 2008 **Link:** https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/ **Description:** The T-Drive trajectory dataset sample containing the weekly trajectories of 10,357 Beijing taxis is about 15 million points, and the total distance of trajectories reaches 9 million kilometers. ### PORTO **Place:** Porto, Portugal **Duration:** Jul. 1, 2013 ~ Jun. 30, 2014 **Link:** https://archive.ics.uci.edu/ml/datasets/Taxi+Service+Trajectory+-+Prediction+Challenge%2C+ECML+PKDD+2015 **Description:** The Porto dataset describes trajectories performed by all the 442 taxis running in the city of Porto, in Portugal. ### NYCTAXI **Place:** New York, USA **Duration:** 2009 ~ present **Link:** https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page **Description:** The NYC-Taxi dataset contains trajectories of different types of taxi collected by GPS for New York City from 2009 to 2020. ### NYCTAXI_DYNA - NYCTAXI_DYNA is a dataset that counts the inflow and outflow of the region with an irregular area division method. ### NYCTAXI_OD - NYCTAXI_OD is a dataset that counts the origin-destination flow between regions with an irregular area division method. ### NYCTAXI_GRID - NYCTAXI_GRID is a dataset that counts the inflow and outflow of the region with a grid-base division method. ### NYC_TOD **Place:** New York, USA **Duration:** 2014 **Link:** https://github.com/liulingbo918/CSTN#:~:text=download%20NYC-TOD.tar.gz%20with%20following%20links%20and%20put%20it%20into%20folder%20NYC-TOD/. **Description:** NYC_TOD Calculate the inflow and outflow of the area using a grid-based division method。Generate by the author of [CSTN](https://arxiv.org/pdf/1905.06335) ### NYCBIKE **Place:** New York, USA **Duration:** Jun. 2013 ~ present **Link:** https://www.citibikenyc.com/system-data **Description:** The NYC-Bike dataset contains bike trajectories collected from NYC CitiBike system. ### AUSTINRIDE **Place:** Austin, USA **Duration:** Jun. 4, 2016 ~ Apr. 13, 2017 **Link:** https://data.world/ride-austin/ride-austin-june-6-april-13 **Description:** The AustinRide dataset contains Austin ride trajectories spans from August 1, 2016 to April 13, 2017, including over 1.4 million trips. ### BIKEDC **Place:** Washington, USA **Duration:** Sept. 20, 2010 ~ Oct. 2020 **Link:** https://www.capitalbikeshare.com/system-data **Description:** The BikeDC dataset describes the bike trails of the Washington Bicycle System, which includes 472 stops. ### BIKECHI **Place:** Chicago, USA **Duration:** Jun. 27, 2013 ~ 2018 **Link:** https://www.divvybikes.com/system-data **Description:** The BikeCHI dataset shows the development of bike-sharing in Chicago from 2013 to 2018. ### Foursquare **Duration:** Apr. 12, 2012 ~ Feb. 16, 2013 **Link:** https://sites.google.com/site/yangdingqi/home/foursquare-dataset#h.p_ID_46 **Description:** Foursquare a location-based social networking website where users share their locations by checking-in. We use the second dataset in the link, which is the *NYC and Tokyo Check-in Dataset*. We preprocessed the raw data provided by the link and split it into Foursquare-TKY and Foursquare-NYC. ### Gowalla **Place:** **Duration:** Feb. 2009 ~ Oct. 2010 **Link:** https://snap.stanford.edu/data/loc-gowalla.html **Description:** Gowalla is a location-based social networking website where users share their locations by checking-in,containing information of users, users' check-in time, users' latitude, longitude,users' location id. ### Brightkite **Place:** Global **Duration:** Apr. 2008 ~ Oct. 2010 **Link:** http://snap.stanford.edu/data/loc-brightkite.html **Description:** Brightkite is a location-based social networking website where users share their locations by checking-in,containing information of users, users' check-in time, users' latitude, longitude,users' location id. ### Instagram **Place:** New York, USA **Duration:** Jun. 15, 2011 - Nov. 8, 2016 **Link:** https://dmis.korea.ac.kr/cape **Description:** The dataset's biggest feature is that each check-in record contains not only the POI information but also the text information written when the user created the check-in record. Therefore, this dataset is particularly important for related researchs that incorporates trajectory semantic features into trajectory prediction. ### Seattle **Place:** Seattle,WA,USA **Duration:** Jan. 17,2009 20:27:37~22:34:28 **Link:** https://www.microsoft.com/en-us/research/publication/hidden-markov-map-matching-noise-sparseness/ **Description:** This dataset (for Map Matching Task) shows a test GPS data taken on a drive in Seattle, WA, USA and its eastern suburbs. The trip starts in the upper right corner near Marymoor Park. The data was sampled at 1 Hz using a RoyalTek RBT-2300 GPS logger. The drive took place on Saturday, January 17, 2009 starting at 20:27:37 UTC (12:27:37 local time) and ending at 22:34:28 UTC (14:34:28 local time), for an elapsed time of 02:06:51. ### Global **Place:** 100 places all over the world **Duration:** — **Link:** https://zenodo.org/record/57731#.YVwZ7WJBxnK **Description:** This dataset (for Map Matching Task) is large enough to prove or disprove map-matching hypotheses on a world-wide scale. Because of the global coverage of this dataset, learning does not have to be be biased to the part of the world where the algorithm was tested. ### BJ_ROADMAP **Place:** Beijing, China **Duration:** - **Link:** - **Description:** The origin dataset contains properties of nodes and edges in OpenStreetMap format. Two ways, including creating the graph whose nodes are intersections and relationships are road sections, and whose nodes are road sections and relationships are intersections, are implemented, named "bj_roadmap_node" and "bj_roadmap_edge" separately. ### BJ_TRAJ **Place:** Beijing, China **Duration:** Apr. 2015 - Jul. 2015 **Link:** - **Description:** BJ_TRAJ dataset contains a huge amount of trajectory data, approximately 7.60 million per month. ### CD_TRAJ **Place:** Chengdu, China **Duration:** Oct. 01, 2018 - Nov. 30, 2018 **Link:** - **Description:** CD_TRAJ dataset contains a huge amount of trajectory data, approximately 2.58 million per 10 days. There is data of 2 months available. ### XA_TRAJ **Place:** Xi'an, China **Duration:** Oct. 01, 2018 - Nov. 30, 2018 **Link:** - **Description:** CD_TRAJ dataset contains a huge amount of trajectory data, approximately 2.14 million per 15 days. There is data of 2 months available. ### Chengdu_Taxi_Sample1 **Place:** Chengdu, China **Duration:** Aug. 03, 2014 - Aug. 30, 2014 **Link:** https://github.com/UrbComp/DeepTTE/tree/master/data **Description:** Chengdu_Taxi_Sample1 dataset is part of Chengdu_Taxi dataset. It contains 1800 taxi trajectory data in Chengdu. ### Beijing_Taxi_Sample **Place**: Beijing, China **Duration**: Oct. 01, 2013 - Oct. 31, 2013 **Link**: https://github.com/YibinShen/TTPNet/tree/master/data **Description**: Beijing_Taxi_Sample dataset is part of Beijing_Taxi dataset. It contains 1000 taxi trajectory data per day in October 2013. ### NYC_RISK **Place:** New York, USA **Duration:** Jan.01,2013 ~ Dec.31,2013 **Link:** https://github.com/Echohhhhhh/GSNet **Description:** The NYC accident dataset contains road, risk and POI data of New York City in 2013. ### CHICAGO_RISK **Place:** New York, USA **Duration:** Feb.01,2016 ~ Sep.30,2016 **Link:** https://github.com/Echohhhhhh/GSNet **Description:** The CHICAGO accident dataset contains road and risk data of Chicago in 2016.