Search | arXiv e-print repository

arXiv:2505.02366 [pdf, other]

JTCSE: Joint Tensor-Modulus Constraints and Cross-Attention for Unsupervised Contrastive Learning of Sentence Embeddings

Authors: Tianyu Zong, Hongzhu Yi, Bingkang Shi, Yuanxiang Wang, Jungang Xu

Abstract: Unsupervised contrastive learning has become a hot research topic in natural language processing. Existing works usually aim at constraining the orientation distribution of the representations of positive and negative samples in the high-dimensional semantic space in contrastive learning, but the semantic representation tensor possesses both modulus and orientation features, and the existing works… ▽ More Unsupervised contrastive learning has become a hot research topic in natural language processing. Existing works usually aim at constraining the orientation distribution of the representations of positive and negative samples in the high-dimensional semantic space in contrastive learning, but the semantic representation tensor possesses both modulus and orientation features, and the existing works ignore the modulus feature of the representations and cause insufficient contrastive learning. % Therefore, we firstly propose a training objective that aims at modulus constraints on the semantic representation tensor, to strengthen the alignment between the positive samples in contrastive learning. Therefore, we first propose a training objective that is designed to impose modulus constraints on the semantic representation tensor, to strengthen the alignment between positive samples in contrastive learning. Then, the BERT-like model suffers from the phenomenon of sinking attention, leading to a lack of attention to CLS tokens that aggregate semantic information. In response, we propose a cross-attention structure among the twin-tower ensemble models to enhance the model's attention to CLS token and optimize the quality of CLS Pooling. Combining the above two motivations, we propose a new \textbf{J}oint \textbf{T}ensor representation modulus constraint and \textbf{C}ross-attention unsupervised contrastive learning \textbf{S}entence \textbf{E}mbedding representation framework JTCSE, which we evaluate in seven semantic text similarity computation tasks, and the experimental results show that JTCSE's twin-tower ensemble model and single-tower distillation model outperform the other baselines and become the current SOTA. In addition, we have conducted an extensive zero-shot downstream task evaluation, which shows that JTCSE outperforms other baselines overall on more than 130 tasks. △ Less

Submitted 6 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

arXiv:2503.12739 [pdf, other]

TNCSE: Tensor's Norm Constraints for Unsupervised Contrastive Learning of Sentence Embeddings

Authors: Tianyu Zong, Bingkang Shi, Hongzhu Yi, Jungang Xu

Abstract: Unsupervised sentence embedding representation has become a hot research topic in natural language processing. As a tensor, sentence embedding has two critical properties: direction and norm. Existing works have been limited to constraining only the orientation of the samples' representations while ignoring the features of their module lengths. To address this issue, we propose a new training obje… ▽ More Unsupervised sentence embedding representation has become a hot research topic in natural language processing. As a tensor, sentence embedding has two critical properties: direction and norm. Existing works have been limited to constraining only the orientation of the samples' representations while ignoring the features of their module lengths. To address this issue, we propose a new training objective that optimizes the training of unsupervised contrastive learning by constraining the module length features between positive samples. We combine the training objective of Tensor's Norm Constraints with ensemble learning to propose a new Sentence Embedding representation framework, TNCSE. We evaluate seven semantic text similarity tasks, and the results show that TNCSE and derived models are the current state-of-the-art approach; in addition, we conduct extensive zero-shot evaluations, and the results show that TNCSE outperforms other baselines. △ Less

Submitted 16 March, 2025; originally announced March 2025.

arXiv:2409.18236 [pdf, other]

Spatial Visibility and Temporal Dynamics: Revolutionizing Field of View Prediction in Adaptive Point Cloud Video Streaming

Authors: Chen Li, Tongyu Zong, Yueyu Hu, Yao Wang, Yong Liu

Abstract: Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points in a viewer's FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content's i… ▽ More Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points in a viewer's FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content's impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model that leverages the historical 3D visibility data and incorporates spatial perception, neighboring cell correlation, and occlusion information to predict the cell visibility in the future. Our model significantly improves the long-term cell visibility prediction, reducing the prediction MSE loss by up to 50% compared to the state-of-the-art models while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points. △ Less

Submitted 1 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

arXiv:2312.13470 [pdf, ps, other]

Coffee: Cost-Effective Edge Caching for 360 Degree Live Video Streaming

Authors: Chen Li, Tingwei Ye, Tongyu Zong, Liyang Sun, Houwei Cao, Yong Liu

Abstract: While live 360 degree video streaming delivers immersive viewing experience, it poses significant bandwidth and latency challenges for content delivery networks. Edge servers are expected to play an important role in facilitating live streaming of 360 degree videos. In this paper, we propose a novel predictive edge caching algorithm (Coffee) for live 360 degree video that employ collaborative FoV… ▽ More While live 360 degree video streaming delivers immersive viewing experience, it poses significant bandwidth and latency challenges for content delivery networks. Edge servers are expected to play an important role in facilitating live streaming of 360 degree videos. In this paper, we propose a novel predictive edge caching algorithm (Coffee) for live 360 degree video that employ collaborative FoV prediction and predictive tile prefetching to reduce bandwidth consumption, streaming cost and improve the streaming quality and robustness. Our light-weight caching algorithms exploit the unique tile consumption patterns of live 360 degree video streaming to achieve high tile caching gains. Through extensive experiments driven by real 360 degree video streaming traces, we demonstrate that edge caching algorithms specifically designed for live 360 degree video streaming can achieve high streaming cost savings with small edge cache space consumption. Coffee, guided by viewer FoV predictions, significantly reduces back-haul traffic up to 76% compared to state-of-the-art edge caching algorithms. Furthermore, we develop a transcoding-aware variant (TransCoffee) and evaluate it using comprehensive experiments, which demonstrate that TransCoffee can achieve 63\% lower cost compared to state-of-the-art transcoding-aware approaches. △ Less

Submitted 27 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2303.08336 [pdf, ps, other]

Progressive Frame Patching for FoV-based Point Cloud Video Streaming

Authors: Tongyu Zong, Yixiang Mao, Chen Li, Yong Liu, Yao Wang

Abstract: Many XR applications require the delivery of volumetric video to users with six degrees of freedom (6-DoF) movements. Point Cloud has become a popular volumetric video format. A dense point cloud consumes much higher bandwidth than a 2D/360 degree video frame. User Field of View (FoV) is more dynamic with 6-DoF movement than 3-DoF movement. To save bandwidth, FoV-adaptive streaming predicts a user… ▽ More Many XR applications require the delivery of volumetric video to users with six degrees of freedom (6-DoF) movements. Point Cloud has become a popular volumetric video format. A dense point cloud consumes much higher bandwidth than a 2D/360 degree video frame. User Field of View (FoV) is more dynamic with 6-DoF movement than 3-DoF movement. To save bandwidth, FoV-adaptive streaming predicts a user's FoV and only downloads point cloud data falling in the predicted FoV. However, it is vulnerable to FoV prediction errors, which can be significant when a long buffer is utilized for smoothed streaming. In this work, we propose a multi-round progressive refinement framework for point cloud video streaming. Instead of sequentially downloading point cloud frames, our solution simultaneously downloads/patches multiple frames falling into a sliding time-window, leveraging the inherent scalability of octree-based point-cloud coding. The optimal rate allocation among all tiles of active frames are solved analytically using the heterogeneous tile rate-quality functions calibrated by the predicted user FoV. Multi-frame downloading/patching simultaneously takes advantage of the streaming smoothness resulting from long buffer and the FoV prediction accuracy at short buffer length. We evaluate our streaming solution using simulations driven by real point cloud videos, real bandwidth traces, and 6-DoF FoV traces of real users. Our solution is robust against the bandwidth/FoV prediction errors, and can deliver high and smooth view quality in the face of bandwidth variations and dynamic user and point cloud movements. △ Less

Submitted 2 October, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: Transactions on Multimedia (under review)

arXiv:2210.02657 [pdf, ps, other]

Predictive Edge Caching through Deep Mining of Sequential Patterns in User Content Retrievals

Authors: Chen Li, Xiaoyu Wang, Tongyu Zong, Houwei Cao, Yong Liu

Abstract: Edge caching plays an increasingly important role in boosting user content retrieval performance while reducing redundant network traffic. The effectiveness of caching ultimately hinges on the accuracy of predicting content popularity in the near future. However, at the network edge, content popularity can be extremely dynamic due to diverse user content retrieval behaviors and the low-degree of u… ▽ More Edge caching plays an increasingly important role in boosting user content retrieval performance while reducing redundant network traffic. The effectiveness of caching ultimately hinges on the accuracy of predicting content popularity in the near future. However, at the network edge, content popularity can be extremely dynamic due to diverse user content retrieval behaviors and the low-degree of user multiplexing. It's challenging for the traditional reactive caching systems to keep up with the dynamic content popularity patterns. In this paper, we propose a novel Predictive Edge Caching (PEC) system that predicts the future content popularity using fine-grained learning models that mine sequential patterns in user content retrieval behaviors, and opportunistically prefetches contents predicted to be popular in the near future using idle network bandwidth. Through extensive experiments driven by real content retrieval traces, we demonstrate that PEC can adapt to highly dynamic content popularity, and significantly improve cache hit ratio and reduce user content retrieval latency over the state-of-art caching policies. More broadly, our study demonstrates that edge caching performance can be boosted by deep mining of user content retrieval behaviors. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2101.05885 [pdf, ps, other]

Cocktail Edge Caching: Ride Dynamic Trends of Content Popularity with Ensemble Learning

Authors: Tongyu Zong, Chen Li, Yuanyuan Lei, Guangyu Li, Houwei Cao, Yong Liu

Abstract: Edge caching will play a critical role in facilitating the emerging content-rich applications. However, it faces many new challenges, in particular, the highly dynamic content popularity and the heterogeneous caching configurations. In this paper, we propose Cocktail Edge Caching, that tackles the dynamic popularity and heterogeneity through ensemble learning. Instead of trying to find a single do… ▽ More Edge caching will play a critical role in facilitating the emerging content-rich applications. However, it faces many new challenges, in particular, the highly dynamic content popularity and the heterogeneous caching configurations. In this paper, we propose Cocktail Edge Caching, that tackles the dynamic popularity and heterogeneity through ensemble learning. Instead of trying to find a single dominating caching policy for all the caching scenarios, we employ an ensemble of constituent caching policies and adaptively select the best-performing policy to control the cache. Towards this goal, we first show through formal analysis and experiments that different variations of the LFU and LRU policies have complementary performance in different caching scenarios. We further develop a novel caching algorithm that enhances LFU/LRU with deep recurrent neural network (LSTM) based time-series analysis. Finally, we develop a deep reinforcement learning agent that adaptively combines base caching policies according to their virtual hit ratios on parallel virtual caches. Through extensive experiments driven by real content requests from two large video streaming platforms, we demonstrate that CEC not only consistently outperforms all single policies, but also improves the robustness of them. CEC can be well generalized to different caching scenarios with low computation overheads for deployment. △ Less

Submitted 14 January, 2021; originally announced January 2021.

Comments: INFOCOM 2021

Showing 1–7 of 7 results for author: Zong, T