nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2023, 01, v.29 116-126
基于时空数据特征的寄递涉烟犯罪分析方法
基金项目(Foundation):
邮箱(Email): liuwei@xupt.edu.cn;
DOI: 10.16472/j.chinatobacco.2021.166
摘要:

【目的】使用大数据和人工智能技术研究基于寄递大数据的“互联网+寄递”新型涉烟犯罪分析方法。【方法】使用中文分词技术对寄递大数据进行预处理。提出了“寄递时空模式”新概念并计算其时域和频域统计量作为时空特征。使用特征选择和降维方法计算时空特征集合中的优选特征,并比较不同分类器算法结合优选特征构建的涉烟犯罪分析模型的性能。【结果】(1)提出的时空特征具有区分涉烟和不涉烟寄递数据的能力。随机森林和GBDT分类器整体性能最好,在准确率、阳性和阴性预测值等指标上均达到0.94以上。(2)基于优选特征建立的分析模型可以取得和初始特征模型接近的预测结果,优选特征数据储存量仅为原始特征数据的40%。(3)CFS特征选择方法选出的优选特征对涉烟预测模型结果的可解释性提供了依据。(4)初步实验表明本文方法可满足寄递涉烟分析的实时性要求。【结论】基于“寄递时空模式”计算的时空特征结合分类器可区分涉烟和不涉烟寄递数据。

Abstract:

[Background] This study aims to study the express-related counterfeit cigarette criminality based on big data and artificial intelligence technology. [Methods] In the pre-processing stage, Chinese word segmentation method was adopted to process the original data. Then a novel concept named “spatio-temporal pattern of delivery and receiving address” was presented, which is actually time series data established based express package delivery and receiving frequency data within a time span. Spatio-temporal data features can be computed based on spatio-temporal pattern by using time and frequency domain statistics. Next, a CFS(Correlation-based feature selection)or PCA(Principal component analysis) algorithm was applied for the initial spatio-temporal feature pool to determine an optimal feature cluster. Then, the express-related counterfeit cigarette criminality analysis model was trained and optimized and the performance of models using different classifiers was compared. [Results](1) All four classifier models including random forest, logistic regression, gradient boosting decision tree and long short-term memory deep neural network applied in the experiments achieved encouraging experimental results with satisfactory accuracy, PPV and NPV, which implied the proposed spatio-temporal data features has the ability to discriminate the cigarette-related from normal express data. Decision tree-based classifier models like random forest and gradient boosting decision tree classifier yielded the highest accuracy, PPV and NPV, which were all greater than 0.94.(2) Prediction models with optimal feature cluster determined by CFS(Correlation-based feature selection) or PCA(Principal component analysis) algorithm all exhibited slightly lower performance than that of initial feature pool. The storage space of optimal feature cluster accounted for only 40 percent of the initial feature pool.(3) CFS method utilized in the experiments can pick out optimal feature cluster from initial feature pool, which supports the interpretability for prediction results generated by the model.(4) Preliminary experimental results showed that the proposed prediction model can meet the real-time requirements of express-related counterfeit cigarette criminality analysis. [Conclusion] Classifiers in cooperated with the spatio-temporal data features computed based on “the spatio-temporal pattern of delivery and receiving address” can discriminate counterfeit cigarette-related express packages from normal express packages.

参考文献

[1]卜心农.“互联网+”新商业模式下涉烟违法犯罪的现状及对策[J].发展,2017, 1:87-88.BU Xinnong. Status analysis and countermeasure study of tobacco related crimes with novel Internet+business model[J]. Developing,2017, 1:87-88.

[2]高磊.浅析“互联网+”新模式下涉烟违法犯罪的现状及对策[R].中国烟草学会2016年度优秀论文汇编:专卖管理主题,2016.GAO Lei. Brief status analysis and countermeasure study of tobacco related crimes with novel Internet+model[R]. 2016Excellent papers compilation of Chinese Tobacco Association:special issue on monopoly administration, 2016.

[3]黄祥铭.制售假烟犯罪案件侦查研究[D].中国人民公安大学,2019.HUANG Xiangmin. Research on the investigation of the crimes of producing and selling counterfeit cigarettes[D]. Dissetation of People’s Public Security University of China, 2019.

[4]Paul P V, Monica K, Trishanka M. A survey on big data analytics using social media data[C]. 2017 Innovations in Power and Advanced Computing Technologies, Vellore, India, 21-22 April,2017.

[5]Li Q, Chen Y, Wang J. Web media and stock markets:a survey and future directions from a big data perspective[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(2):381-399.

[6]Bahri S, Zoghlami N, Abed M. Big data for healthcare:a survey[J].IEEE Access, 2020, 7:7397-7408.

[7]Zhu L, Yu F R, Wang Y G. Big data analytics in intelligent transportation systems:a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(1):383-398.

[8]Cun Y L, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521:436-444.

[9]Greenspan H, Ginneken B V, Summers R M. Deep learning in medical imaging:overview and future promise of an exciting new technique[J]. IEEE Transactions on Medical Imaging, 2016, 35(5):1153-1159.

[10]Shi F, Wang J, Shi J. Review of artificial intelligence techniques in imaging data acquisitionm, segmentation and diagnosis for covid-19[J]. IEEE Reviews in Biomedical Engineering, 2020,14:4-15.

[11]Pham Q V, Nguyen D C, The T H. Artificial intelligence and big data for coronavirus pandemic:a survey on the state-of-the-arts[J].IEEE Access, 2020, 8:130820-130839.

[12]谭旭.基于物流数据的快递网络分析与建模[D].浙江大学,2015.TAN Xu. Analyzing and modeling express shipping service network based on logistics data[D]. Dissetation of Zhejiang University, 2015.

[13]任思源,郭斌,张曼.寄递大数据城市画像[J].浙江大学学报(工学版),2019, 53(9):1779-1787.REN Siyuan, GUO Bin, ZHANG Man. Urban profiling using express big data[J]. Journal of Zhejiang University(Engineering Science), 2019, 53(9):1779-1787.

[14]刘二超.快递服务便利店选址问题研究[D].清华大学,2014.LIU Erchao. Location planning for express service convenience store[D]. Dissetation of Tsinghua University, 2014.

[15]Li W G, Zhong X Y. Research on the application of smart logistics system based on big data:taking jingdong logistics as an example[C]. 2021 IEEE International Conference on Artificial Intelligence and Industrial Design, Guangzhou, China, 28-30 May,2021.

[16]郝晟.面向侦查的快递数据分析挖掘系统[D].天津大学,2014.HAO Sheng. Express data mining system for technical investigation[D]. Dissetation of Tianjin University, 2014.

[17]李万彪,余志,龚峻峰.基于关系数据模型的犯罪网络挖掘研究[J].中山大学学报(自然科学版),2014, 53(5):1-7.LI Wanbiao, YU Zhi, GONG Junfeng. An approach of crime network analysis based on association data model[J]. Journal of Sun Yat-sen University, 2014, 53(5):1-7.

[18]文杰锋.快递物流配送异常检测方法研究[D].重庆邮电大学,2016.WEN Jiefeng. Research on outlier detection algorithm for express logistics[D]. Dissetation of Chongqing University of Posts and Telecommunications, 2016.

[19]郭小伟.我国利用快递运输违禁品犯罪研究[D].中国人民公安大学,2019.GUO Xiaowei. Research on the crime of using express delivery to transport contraband in China[D]. Dissetation of People’s Public Security University of China, 2019.

[20]王宁.寄递渠道贩毒案件侦查研究[D].中国人民公安大学,2020.WANG Ning. Research on the investigation of drug trafficking cases through delivery channels[D]. Dissetation of People’s Public Security University of China, 2020.

[21]徐志成,严超,陶忆南.浅谈构建物流运输业涉烟犯罪网络监管体系[R].中国烟草学会2016年度优秀论文汇编:专卖管理主题,2016.XU Zhicheng, YAN Chao, TAO Yinan. Discussion on the construction of the network monitoring system of tobacco related crimes in logistics industry[R]. 2016 Excellent papers compilation of Chinese Tobacco Association:special issue on monopoly administration, 2016.

[22]曾超.涉烟犯罪问题研究[D].中南大学, 2012.ZENG Chao. A study on the crime involving tobacco[D].Dissetation of Central South University, 2012.

[23]刘言.“互联网+”背景下烟草专卖管理研究[D].华中师范大学,2018.LIU Yan. Research on tobacco monopoly administration in context of Internet+[D]. Dissetation of Central China Normal University,2018.

[24]Goldberg Y.基于深度学习的自然语言处理[M].车万翔等译.北京:机械工业出版社, 2020:57-67.Goldberg Y. Deep learning-based Natural Language processing[M].Beijing:China Machine Press, 2020:57-67.

[25]结巴分词[N/OL]. Github网站.(2018-07-23)[2021-08-16].https://github.com/fxsjy/jieba.Jieba word segmentation[N/OL]. Github website.(2018-07-23)[2021-08-16]. https://github.com/fxsjy/jieba.

[26]陈芳,顾凡及,徐京华.一种新的人脑信息传输复杂性的研究[J].生物物理学报,1998, 14(3):508-512.CHEN Fang, GU Fanji, XU Jinghua. A new measurement of complexity for studying EEG mutual information[J]. Acta biophysica sinica, 1998, 14(3):508-512.

[27]沈恩华.脑电的复杂度分析[D].复旦大学,2004.SHEN Enhua. Complexity analysis of EEG signals[D]. Dissetation of Fudan University, 2004.

[28]Hall M A. Correlation-based feature selection for machine learning[D]. Dissetation of The University of Waikato, 1999.

[29]周志华.机器学习[M].北京:清华大学出版社,2016.ZHOU Zhihua. Machine learning[M]. Beijing:Tsinghua University Press, 2016.

基本信息:

DOI:10.16472/j.chinatobacco.2021.166

中图分类号:TP311.13;TP18;D917

引用信息:

[1]乔浪超,王进录,高宝红等.基于时空数据特征的寄递涉烟犯罪分析方法[J].中国烟草学报,2023,29(01):116-126.DOI:10.16472/j.chinatobacco.2021.166.

基金信息:

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文