Post-COVID Cities — Understanding New Dynamics of Travel Behaviour and Activities
Overview
This research investigates post-COVID travel behaviour shifts by analyzing Twitter data from Hong Kong across pre-pandemic (July-December 2019) and post-pandemic (July-September 2022) periods. We observed substantial decreases in entertainment and work-related travel, while healthcare-related travel increased significantly. Using Support Vector Machine (SVM) for data cleansing and Latent Dirichlet Allocation (LDA) for topic modelling, we developed a framework to extract nuanced travel patterns, providing critical insights for transportation policy and resource allocation.
Introduction
COVID-19 has fundamentally reshaped urban mobility patterns, with remote work surging and public transport usage declining. Understanding these evolving travel dynamics is crucial for efficient resource allocation and policy formulation. Traditional transportation surveys are costly and time-consuming, while social media platforms like Twitter offer access to vast geotagged datasets at significantly lower costs. However, most studies have focused on geographical behavioral changes, leaving intricate shifts in travel structure—particularly nuanced travel behaviors rooted in patterns and semantics—largely unexplored.
This research addresses this gap by developing a comprehensive framework to understand the temporal evolution of mobility patterns.
Methodology
Data Resources and Processing Framework
This research focused on the Hong Kong area, examining Twitter data from two key periods:
- Pre-pandemic: July 1 - December 31, 2019
- Post-pandemic: July 1 - September 31, 2022
Initial datasets comprised 284,656 and 37,484 Twitter entries respectively. After rigorous data refinement using machine learning techniques, we distilled this to 17,568 and 12,575 relevant entries suitable for mobility analysis.
Data resources and framework of data processing
Timeline of COVID-19 progression showing key phases of the pandemicSupport Vector Machine (SVM) for Bot Detection
A significant challenge in social media data analysis is the prevalence of bot accounts generating advertising content. We implemented SVM with a Radial Basis Function (RBF) kernel to identify and eliminate robot users with 94% accuracy.
Key parameters:
- C₁ = 100
- C₂ = 0.1
- Loss function incorporating both labeled and unlabeled data
The SVM classifier separates human users from bot accounts by establishing an optimal hyperplane in feature space, achieving 94% accuracy in bot detection.
Results of Support Vector Machine: brown dots = human users, yellow dots = suspected bots, dashed line = separation hyperplanePOI Matching and Coordinate Conversion
We utilized offline maps containing approximately 250,000 Points of Interest (POI) from 2019 and 2022. Due to coordinate system differences between Google Maps (WGS84) and Twitter data (Mars coordinate system), coordinate conversion was necessary. Using k-d tree methodology, we matched each Twitter GPS data point to the nearest POI, eliminating matches with distances exceeding 50 meters to ensure accuracy.
Topic Modeling with LDA
We employed Latent Dirichlet Allocation (LDA), a generative probabilistic model widely used for unsupervised text classification, to extract travel semantics from Twitter content. The model assumes a set of words, a set of texts (documents), and a set of hidden topics. Each topic is determined by a conditional probability distribution over words, following a multinomial distribution with Dirichlet priors.
Vocabulary Generation
We generated activity vocabularies by encoding tweets with spatial-temporal information. Each vocabulary item follows the format: A[POI_type]B[time_of_day]Y[day_of_week] (e.g., A01B09Y1 = restaurant visit at 9 AM on Monday).
Determining Optimal Topic Number
We applied perplexity analysis to identify the optimal number of topics. Perplexity measures language model quality by calculating sentence probability. Lower perplexity indicates better model fit. Analysis revealed the optimal number of topics to be K=6.
Perplexity under different topic numbers showing optimal K=6Results
Six Key Travel Patterns Identified
Using variational EM algorithm to solve the LDA model with K=6, we identified six distinct travel topics:
| Pattern | Keywords | Activity Type |
|---|---|---|
| Topic 1 | Office locations, weekday mornings/afternoons | Working |
| Topic 2 | Restaurants, shopping centers, weekend evenings | Shopping and Catering |
| Topic 3 | Residential areas, late evenings, weekends | Home and Accommodation |
| Topic 4 | Tourist attractions, leisure venues, weekends | Tourism and Leisure |
| Topic 5 | Hospitals, clinics, weekday mornings | Medical Treatment |
| Topic 6 | Service facilities, various times | Life Services |
Figure 11: Results of LDA model showing topic distribution before and during COVID-19Major Findings
Entertainment and Work Travel Decline: Both entertainment consumption travel (tourism, shopping) and work-related travel (commuting) decreased significantly post-COVID. Entertainment travel experienced a more pronounced decline, likely due to its non-mandatory nature.
Healthcare Travel Surge: Travel related to medical treatment increased significantly after COVID-19, reflecting heightened health consciousness and potentially deferred medical care during lockdowns.
Spatial Variations: Analysis across five Hong Kong districts (Outlying Islands, Northern District, Central and Western District, Kowloon Urban Areas, and Yau Tsim Mong) revealed:
- Central and Western District showed the highest growth rate in healthcare travel, attributable to advanced medical facilities
- Kowloon City exhibited the minimum decrease in work commuting, possibly due to significant adoption of work-from-home strategies
Figure 12: Results of traveling pattern changes in different zonesConclusions
This research designed a comprehensive framework for mining travel patterns from Twitter data using SVM (94% accuracy in bot elimination) and LDA topic modeling, revealing nuanced changes in travel structure beyond geographical shifts. By integrating spatial-temporal analysis, we uncovered district-level variations in travel pattern changes across Hong Kong. The findings provide critical support for transportation resource allocation, policy formulation, and urban planning in the post-pandemic era. This framework demonstrates how social media data can efficiently supplement traditional transportation surveys, offering real-time insights at significantly lower costs while maintaining analytical rigor.