Showing 1–2 of 2 results for author: K, P A

Search v0.5.6 released 2020-02-24

arXiv:2007.08928 [pdf, ps, other]

eess.SP

Low Complexity Reconfigurable Modified FRM Architecture with Full Spectral Utilization for Efficient Channelizers

Authors: Parvathi A. K., V. Sakthivel

Abstract: This paper proposes a design of low complexity, reconfigurable and narrow transition band (NTB) filter bank (FB). In our proposed Modified Frequency Response Masking (ModFRM) architecture, the modal filter and complementary filter in conventional FRM approach are replaced by a power complementary and linear phase FB. Additionally, a new masking strategy is proposed by which an M-channel FB can be… ▽ More This paper proposes a design of low complexity, reconfigurable and narrow transition band (NTB) filter bank (FB). In our proposed Modified Frequency Response Masking (ModFRM) architecture, the modal filter and complementary filter in conventional FRM approach are replaced by a power complementary and linear phase FB. Additionally, a new masking strategy is proposed by which an M-channel FB can be designed by alternately masking even channels and odd channels. By using this masking strategy, it is only necessary to alternately mask even channels and odd channels by the two masking filters while modulating them over the multiple spectra replicas appearing in [0,2$π$] to generate uniform ModFRM FB. Also, reconfigurability of filter bank can be achieved by adjusting the interpolation values appropriately, to obtain more masking responses. To reduce the overall implementation complexity, masking filters are optimized using the interpolated FIR (IFIR) technique. The results indicate that the proposed method requires substantially less multipliers in comparison to the reconfigurable FB existing in literature. Finally, non-uniform FB are generated from the uniform FB by combining nearby channels. The proposed non-uniform ModFRM FB is used for the extraction of different communication standards in the software defined radio (SDR) channelizer. When more channels are to be extracted, the proposed scheme was able to achieve a reduced hardware complexity in comparison to other filter bank based SDR channelizers. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: 19 pages, 19 figures
arXiv:1906.12170 [pdf, other]

cs.CV cs.LG cs.SD eess.AS eess.IV

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

Authors: Dilip Kumar Margam, Rohith Aralikatti, Tanay Sharma, Abhinav Thanda, Pujitha A K, Sharad Roy, Shankar M Venkatesan

Abstract: In recent years, deep learning based machine lipreading has gained prominence. To this end, several architectures such as LipNet, LCANet and others have been proposed which perform extremely well compared to traditional lipreading DNN-HMM hybrid systems trained on DCT features. In this work, we propose a simpler architecture of 3D-2D-CNN-BLSTM network with a bottleneck layer. We also present analy… ▽ More In recent years, deep learning based machine lipreading has gained prominence. To this end, several architectures such as LipNet, LCANet and others have been proposed which perform extremely well compared to traditional lipreading DNN-HMM hybrid systems trained on DCT features. In this work, we propose a simpler architecture of 3D-2D-CNN-BLSTM network with a bottleneck layer. We also present analysis of two different approaches for lipreading on this architecture. In the first approach, 3D-2D-CNN-BLSTM network is trained with CTC loss on characters (ch-CTC). Then BLSTM-HMM model is trained on bottleneck lip features (extracted from 3D-2D-CNN-BLSTM ch-CTC network) in a traditional ASR training pipeline. In the second approach, same 3D-2D-CNN-BLSTM network is trained with CTC loss on word labels (w-CTC). The first approach shows that bottleneck features perform better compared to DCT features. Using the second approach on Grid corpus' seen speaker test set, we report $1.3\%$ WER - a $55\%$ improvement relative to LCANet. On unseen speaker test set we report $8.6\%$ WER which is $24.5\%$ improvement relative to LipNet. We also verify the method on a second dataset of $81$ speakers which we collected. Finally, we also discuss the effect of feature duplication on BLSTM-HMM model performance. △ Less

Submitted 25 June, 2019; originally announced June 2019.

Comments: Submitted to Interspeech 2019

Search v0.5.6 released 2020-02-24