Post-COVID Cities — Understanding New Dynamics of Travel Behaviour and Activities

Role: Project Leader / Working Paper

Mentors: Asst.Prof. Shuli Luo

Jun 2024 – Oct 2024

Overview

This research investigates post-COVID travel behaviour shifts by analyzing Twitter data from Hong Kong across pre-pandemic (July-December 2019) and post-pandemic (July-September 2022) periods. We observed substantial decreases in entertainment and work-related travel, while healthcare-related travel increased significantly. Using Support Vector Machine (SVM) for data cleansing and Latent Dirichlet Allocation (LDA) for topic modelling, we developed a framework to extract nuanced travel patterns, providing critical insights for transportation policy and resource allocation.

Introduction

COVID-19 has fundamentally reshaped urban mobility patterns, with remote work surging and public transport usage declining. Understanding these evolving travel dynamics is crucial for efficient resource allocation and policy formulation. Traditional transportation surveys are costly and time-consuming, while social media platforms like Twitter offer access to vast geotagged datasets at significantly lower costs. However, most studies have focused on geographical behavioral changes, leaving intricate shifts in travel structure—particularly nuanced travel behaviors rooted in patterns and semantics—largely unexplored.

This research addresses this gap by developing a comprehensive framework to understand the temporal evolution of mobility patterns.

Methodology

Data Resources and Processing Framework

This research focused on the Hong Kong area, examining Twitter data from two key periods:

Pre-pandemic: July 1 - December 31, 2019
Post-pandemic: July 1 - September 31, 2022

Initial datasets comprised 284,656 and 37,484 Twitter entries respectively. After rigorous data refinement using machine learning techniques, we distilled this to 17,568 and 12,575 relevant entries suitable for mobility analysis.

Data resources and framework of data processing

Timeline of COVID-19 progression showing key phases of the pandemic

Support Vector Machine (SVM) for Bot Detection

A significant challenge in social media data analysis is the prevalence of bot accounts generating advertising content. We implemented SVM with a Radial Basis Function (RBF) kernel to identify and eliminate robot users with 94% accuracy.

Key parameters:

C₁ = 100
C₂ = 0.1
Loss function incorporating both labeled and unlabeled data

The SVM classifier separates human users from bot accounts by establishing an optimal hyperplane in feature space, achieving 94% accuracy in bot detection.

Results of Support Vector Machine: brown dots = human users, yellow dots = suspected bots, dashed line = separation hyperplane

POI Matching and Coordinate Conversion

We utilized offline maps containing approximately 250,000 Points of Interest (POI) from 2019 and 2022. Due to coordinate system differences between Google Maps (WGS84) and Twitter data (Mars coordinate system), coordinate conversion was necessary. Using k-d tree methodology, we matched each Twitter GPS data point to the nearest POI, eliminating matches with distances exceeding 50 meters to ensure accuracy.

Topic Modeling with LDA

We employed Latent Dirichlet Allocation (LDA), a generative probabilistic model widely used for unsupervised text classification, to extract travel semantics from Twitter content. The model assumes a set of words, a set of texts (documents), and a set of hidden topics. Each topic is determined by a conditional probability distribution over words, following a multinomial distribution with Dirichlet priors.

Vocabulary Generation

We generated activity vocabularies by encoding tweets with spatial-temporal information. Each vocabulary item follows the format: A[POI_type]B[time_of_day]Y[day_of_week] (e.g., A01B09Y1 = restaurant visit at 9 AM on Monday).

Determining Optimal Topic Number

We applied perplexity analysis to identify the optimal number of topics. Perplexity measures language model quality by calculating sentence probability. Lower perplexity indicates better model fit. Analysis revealed the optimal number of topics to be K=6.

Perplexity under different topic numbers showing optimal K=6

Results

Six Key Travel Patterns Identified

Using variational EM algorithm to solve the LDA model with K=6, we identified six distinct travel topics:

Pattern	Keywords	Activity Type
Topic 1	Office locations, weekday mornings/afternoons	Working
Topic 2	Restaurants, shopping centers, weekend evenings	Shopping and Catering
Topic 3	Residential areas, late evenings, weekends	Home and Accommodation
Topic 4	Tourist attractions, leisure venues, weekends	Tourism and Leisure
Topic 5	Hospitals, clinics, weekday mornings	Medical Treatment
Topic 6	Service facilities, various times	Life Services

Figure 11: Results of LDA model showing topic distribution before and during COVID-19

Major Findings

Entertainment and Work Travel Decline: Both entertainment consumption travel (tourism, shopping) and work-related travel (commuting) decreased significantly post-COVID. Entertainment travel experienced a more pronounced decline, likely due to its non-mandatory nature.
Healthcare Travel Surge: Travel related to medical treatment increased significantly after COVID-19, reflecting heightened health consciousness and potentially deferred medical care during lockdowns.
Spatial Variations: Analysis across five Hong Kong districts (Outlying Islands, Northern District, Central and Western District, Kowloon Urban Areas, and Yau Tsim Mong) revealed:
- Central and Western District showed the highest growth rate in healthcare travel, attributable to advanced medical facilities
- Kowloon City exhibited the minimum decrease in work commuting, possibly due to significant adoption of work-from-home strategies

Figure 12: Results of traveling pattern changes in different zones

Conclusions

This research designed a comprehensive framework for mining travel patterns from Twitter data using SVM (94% accuracy in bot elimination) and LDA topic modeling, revealing nuanced changes in travel structure beyond geographical shifts. By integrating spatial-temporal analysis, we uncovered district-level variations in travel pattern changes across Hong Kong. The findings provide critical support for transportation resource allocation, policy formulation, and urban planning in the post-pandemic era. This framework demonstrates how social media data can efficiently supplement traditional transportation surveys, offering real-time insights at significantly lower costs while maintaining analytical rigor.