Chenwei Zhang

Chenwei Zhang

I am an Applied Scientist at Amazon since August 2019, working with Xin Luna Dong in the Product Graph team. I finished my PhD at University of Illinois at Chicago, advised by Philip S. Yu. My research interests lie in the fields of data mining and natural language processing. In particular, I am interested in text mining and mining structured information from heterogeneous information sources.

Contact: cwzhang910 AT gmail D0T com
Google Scholar | ResearchGate | LinkedIn

 

Research Topics:

• Natural Language Processing

• Text/Graph Mining

• Knowledge Graph

  What’s New

[2021/01] One paper on Minimally-Supervised Text Classification is accepted by TheWebConf 2021.

[2020/11] One paper on Low-Shot NLP is accepted by CogMI 2020.

[2020/10] Accepted the invitation to serve on the Program Committee of NAACL-HLT 2021.

[2020/09] Two papers on Few-Shot NLU and Self-Supervised OpenIE are accepted by EMNLP 2020.

[2020/08] I am co-organizing the KR2ML workshop at NeurIPS 2020. Call for papers is out now!

See all

  Education

[2014/08 - 2019/05] Ph.D. in Computer Science, University of Illinois at Chicago, 2019. Advisor: Prof. Philip S. Yu

[2010/09 - 2014/05] B.Eng in Computer Science and Technology, Southwest University, China, 2014.

   Work Experiences

[2019/08 -    Now    ] Applied Scientist at Amazon, Seattle, WA

[2017/08 - 2019/05] Research Assistant at UIC Big Data and Social Computing Lab, Chicago, IL

[2018/05 - 2018/08] Research Intern at Tencent Medical AI Lab, Palo Alto, CA

[2017/05 - 2017/08] Research Intern at Baidu Research Big Data Lab, Sunnyvale, CA

[2016/05 - 2016/07] Research Intern at Baidu Research Big Data Lab, Sunnyvale, CA

[2015/05 - 2015/08] Research Intern at Baidu Research Big Data Lab, Sunnyvale, CA


  Publications (By Year | By Topic)

2021

  1. TheWebConf
    Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks Xinyang Zhang, Chenwei Zhang, Xin Luna Dong, Jingbo Shang, and Jiawei Han In Proceedings of the Web Conference 2021 2021 [Abstract] [BibTex] [Code]
    Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting that aims to categorize documents effectively, with a couple of seed documents annotated per category. We recognize that texts collected from the Web are often structure-rich, i.e., accompanied by various metadata. One can easily organize the corpus into a text-rich network, joining raw text documents with document attributes, high-quality phrases, label surface names as nodes, and their associations as edges. Such a network provides a holistic view of the corpus’ heterogeneous data sources and enables a joint optimization for network-based analysis and deep textual model training. We therefore propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases – a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Each module generates pseudo training labels from the unlabeled document set, and both modules mutually enhance each other by co-training using pooled pseudo labels. We test our model on two real-world datasets. On the challenging e-commerce product categorization dataset with 683 categories, our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%, significantly outperforming all compared methods; our accuracy is only less than 2% away from the supervised BERT model trained on about 50K labeled documents.
    @inproceedings{zhang2021minimally,
      abbr = {TheWebConf},
      topic = {Graph Mining},
      title = {Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks},
      author = {Zhang, Xinyang and Zhang, Chenwei and Dong, Xin Luna and Shang, Jingbo and Han, Jiawei},
      booktitle = {Proceedings of the Web Conference 2021},
      year = {2021},
      pdf = {https://arxiv.org/pdf/2102.11479.pdf},
      code = {}
    }
    

2020

  1. EMNLP
    SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction Xuming Hu, Chenwei Zhang, Yusong Xu, Lijie Wen, and Philip S. Yu In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing 2020 [Abstract] [BibTex] [Code]
    Open relation extraction is the task of extracting open-domain relation facts from natural language sentences. Existing works either utilize heuristics or distant-supervised annotations to train a supervised classifier over pre-defined relations, or adopt unsupervised methods with additional assumptions that have less discriminative power. In this work, we propose a self-supervised framework named SelfORE, which exploits weak, self-supervised signals by leveraging large pretrained language model for adaptive clustering on contextualized relational features, and bootstraps the self-supervised signals by improving contextualized features in relation classification. Experimental results on three datasets show the effectiveness and robustness of SelfORE on open-domain Relation Extraction when comparing with competitive baselines.
    @inproceedings{hu2020selfore,
      abbr = {EMNLP},
      topic = {NLP},
      title = {SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction},
      author = {Hu, Xuming and Zhang, Chenwei and Xu, Yusong and Wen, Lijie and Yu, Philip S.},
      booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2004.02438.pdf},
      code = {https://github.com/THU-BPM/SelfORE}
    }
    
  2. EMNLP
    Dynamic Semantic Matching and Aggregation Network for Few-shot Intent Detection Hoang Nguyen, Chenwei Zhang, Congying Xia, and Philip S. Yu In Findings of the 2020 Conference on Empirical Methods in Natural Language Processing 2020 [Abstract] [BibTex] [Code]
    Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances. Although recent works demonstrate that multi-level matching plays an important role in transferring learned knowledge from seen training classes to novel testing classes, they rely on a static similarity measure and overly fine-grained matching components. These limitations inhibit generalizing capability towards Generalized Few-shot Learning settings where both seen and novel classes are co-existent. In this paper, we propose a novel Semantic Matching and Aggregation Network where semantic components are distilled from utterances via multi-head self-attention with additional dynamic regularization constraints. These semantic components capture high-level information, resulting in more effective matching between instances. Our multi-perspective matching method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances. We also propose a more challenging evaluation setting that considers classification on the joint all-class label space. Extensive experimental results demonstrate the effectiveness of our method.
    @inproceedings{nguyen2020semantic,
      abbr = {EMNLP},
      topic = {NLP},
      title = {Dynamic Semantic Matching and Aggregation Network for Few-shot Intent Detection},
      author = {Nguyen, Hoang and Zhang, Chenwei and Xia, Congying and Yu, Philip S.},
      booktitle = {Findings of the 2020 Conference on Empirical Methods in Natural Language Processing},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2010.02481.pdf},
      code = {https://github.com/nhhoang96/Semantic_Matching}
    }
    
  3. CogMI
    Low-shot Learning in Natural Language Processing Congying Xia, Chenwei Zhang, Jiawei Zhang, Tingting Liang, Hao Peng, and Philip S. Yu In Proceedings of the Second IEEE International Conference on Cognitive Machine Intelligence: Vision Track 2020 [BibTex]
    @inproceedings{xia2020lowshot,
      abbr = {CogMI},
      topic = {NLP},
      title = {Low-shot Learning in Natural Language Processing},
      author = {Xia, Congying and Zhang, Chenwei and Zhang, Jiawei and Liang, Tingting and Peng, Hao and Yu, Philip S.},
      booktitle = {Proceedings of the Second IEEE International Conference on Cognitive Machine Intelligence: Vision Track},
      year = {2020},
      pdf = {}
    }
    
  4. TKDE
    KGGen: A Generative Approach for Incipient Knowledge Graph Population Hao Chen, Chenwei Zhang, Jun Li, Philip S. Yu, and Ning Jing IEEE Transactions on Knowledge and Data Engineering 2020 [HTML] [Abstract] [BibTex] [Code]
    Knowledge graph is becoming an indispensable resource for numerous AI applications. However, the knowledge graph often suffers from its incompleteness. Building a complete, high-quality knowledge graph is time-consuming and requires significant human annotation efforts. In this paper, we study the Knowledge Graph Population task, which aims at extending the scale of structured knowledge, with a special focus on reducing data preparation and annotation efforts. Previous works mainly based on discriminative methods build classifiers and verify candidate triplets that are extracted from texts, which heavily rely on the quality of data collection and co-occurrance of entities in the text. We introduce a generative perspective to approach this task. A generative model KGGEN is proposed, which samples from the learned data distribution for each relation and can generate triplets regardless of entity pair co-occurrence in the corpus. To further improve the generation quality while alleviate human annotation efforts, adversarial learning is adopted to not only encourage generating high quality triplets, but also give model the ability to automatically assess the generation quality. Quantitative and qualitative experimental results conducted on two real-world generic knowledge graphs show that KGGEN generates novel and meaningful triplets and less human annotation comparing with the state-of-the-art approaches.
    @article{chen2020kggen,
      abbr = {TKDE},
      topic = {Knowledge Graph},
      title = {KGGen: A Generative Approach for Incipient Knowledge Graph Population},
      author = {Chen, Hao and Zhang, Chenwei and Li, Jun and Yu, Philip S. and Jing, Ning},
      journal = {IEEE Transactions on Knowledge and Data Engineering},
      year = {2020},
      publisher = {IEEE},
      html = {https://ieeexplore.ieee.org/abstract/document/9158381},
      code = {https://github.com/hchen118/KGGen-master}
    }
    
  5. KDD
    Octet: Online Catalog Taxonomy Enrichment with Self-Supervision Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, and Jiawei Han In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 [Abstract] [BibTex]
    Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-theart methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.
    @inproceedings{mao2020octet,
      abbr = {KDD},
      topic = {Graph Mining},
      title = {Octet: Online Catalog Taxonomy Enrichment with Self-Supervision},
      author = {Mao, Yuning and Zhao, Tong and Kan, Andrey and Zhang, Chenwei and Dong, Xin Luna and Faloutsos, Christos and Han, Jiawei},
      booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
      pages = {2247--2257},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2006.10276.pdf}
    }
    
  6. KDD
    AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, and others In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 [Abstract] [BibTex] [Media]
    Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.
    @inproceedings{dong2020autoknow,
      abbr = {KDD},
      topic = {Knowledge Graph},
      title = {AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types},
      author = {Dong, Xin Luna and He, Xiang and Kan, Andrey and Li, Xian and Liang, Yan and Ma, Jun and Xu, Yifan Ethan and Zhang, Chenwei and Zhao, Tong and Blanco Saldana, Gabriel and others},
      booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
      pages = {2724--2734},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2006.13473.pdf},
      media = {https://www.amazon.science/blog/building-product-graphs-automatically}
    }
    
  7. IJCAI
    Entity Synonym Discovery via Multipiece Bilateral Context Matching Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S. Yu In IJCAI 2020 [Abstract] [BibTex] [Code]
    Being able to automatically discover synonymous entities in an open-world setting benefits various tasks such as entity disambiguation or knowledge graph canonicalization. Existing works either only utilize entity features, or rely on structured annotations from a single piece of context where the entity is mentioned. To leverage diverse contexts where entities are mentioned, in this paper, we generalize the distributional hypothesis to a multi-context setting and propose a synonym discovery framework that detects entity synonyms from free-text corpora with considerations on effectiveness and robustness. As one of the key components in synonym discovery, we introduce a neural network model SYNONYMNET to determine whether or not two given entities are synonym with each other. Instead of using entities features, SYNONYMNET makes use of multiple pieces of contexts in which the entity is mentioned, and compares the context-level similarity via a bilateral matching schema. Experimental results demonstrate that the proposed model is able to detect synonym sets that are not observed during training on both generic and domain-specific datasets: Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4.16% improvement in terms of Area Under the Curve and 3.19% in terms of Mean Average Precision compared to the best baseline method. Code and data are available.
    @inproceedings{zhang2020entity,
      abbr = {IJCAI},
      topic = {Knowledge Graph},
      title = {Entity Synonym Discovery via Multipiece Bilateral Context Matching},
      author = {Zhang, Chenwei and Li, Yaliang and Du, Nan and Fan, Wei and Yu, Philip S.},
      booktitle = {IJCAI},
      year = {2020},
      pdf = {https://arxiv.org/pdf/1901.00056.pdf},
      code = {https://github.com/czhang99/SynonymNet}
    }
    
  8. WWWJ
    Generative temporal link prediction via self-tokenized sequence modeling Yue Wang, Chenwei Zhang, Shen Wang, Philip S. Yu, Lu Bai, Lixin Cui, and Guandong Xu World Wide Web 2020 [Abstract] [BibTex]
    We formalize networks with evolving structures as temporal networks and propose a generative link prediction model, Generative Link Sequence Modeling (GLSM), to predict future links for temporal networks. GLSM captures the temporal link formation patterns from the observed links with a sequence modeling framework and has the ability to generate the emerging links by inferring from the probability distribution on the potential future links. To avoid overfitting caused by treating each link as a unique token, we propose a self-tokenization mechanism to transform each raw link in the network to an abstract aggregation token automatically. The self-tokenization is seamlessly integrated into the sequence modeling framework, which allows the proposed GLSM model to have the generalization capability to discover link formation patterns beyond raw link sequences. We compare GLSM with the existing state-of-art methods on five real-world datasets. The experimental results demonstrate that GLSM obtains future positive links effectively in a generative fashion while achieving the best performance (2-10% improvements on AUC) among other alternatives.
    @article{wang2020generative,
      abbr = {WWWJ},
      topic = {Graph Mining},
      author = {Wang, Yue and Zhang, Chenwei and Wang, Shen and Yu, Philip S. and Bai, Lu and Cui, Lixin and Xu, Guandong},
      journal = {World Wide Web},
      number = {4},
      pages = {2471--2488},
      title = {Generative temporal link prediction via self-tokenized sequence modeling},
      url = {https://doi.org/10.1007/s11280-020-00821-y},
      volume = {23},
      year = {2020},
      pdf = {https://arxiv.org/pdf/1911.11486.pdf}
    }
    
  9. HEALTHINF
    Med2Meta: Learning representations of medical concepts with meta-embeddings Shaika Chowdhury, Chenwei Zhang, Philip S. Yu, and Yuan Luo In 13th International Conference on Health Informatics, HEALTHINF 2020-Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020 2020 [Abstract] [BibTex]
    Distributed representations of medical concepts have been used to support downstream clinical tasks recently. Electronic Health Records (EHR) capture different aspects of patients’ hospital encounters and serve as a rich source for augmenting clinical decision making by learning robust medical concept embeddings. However, the same medical concept can be recorded in different modalities (e.g., clinical notes, lab results) — with each capturing salient information unique to that modality — and a holistic representation calls for relevant feature ensemble from all information sources. We hypothesize that representations learned from heterogeneous data types would lead to performance enhancement on various clinical informatics and predictive modeling tasks. To this end, our proposed approach makes use of meta-embeddings, embeddings aggregated from learned embeddings. Firstly, modality-specific embeddings for each medical concept is learned with graph autoencoders. The ensemble of all the embeddings is then modeled as a meta-embedding learning problem to incorporate their correlating and complementary information through a joint reconstruction. Empirical results of our model on both quantitative and qualitative clinical evaluations have shown improvements over state-ofthe-art embedding models, thus validating our hypothesis.
    @inproceedings{chowdhury2020med2meta,
      abbr = {HEALTHINF},
      topic = {Knowledge Graph},
      title = {Med2Meta: Learning representations of medical concepts with meta-embeddings},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S. and Luo, Yuan},
      booktitle = {13th International Conference on Health Informatics, HEALTHINF 2020-Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020},
      pages = {369--376},
      year = {2020},
      organization = {SciTePress},
      pdf = {https://arxiv.org/pdf/1912.03366.pdf}
    }
    
  10. arXiv
    Semi-supervised Relation Extraction via Incremental Meta Self-Training Xuming Hu, Fukun Ma, Chenyao Liu, Chenwei Zhang, Lijie Wen, and Philip S. Yu arXiv preprint 2020 [Abstract] [BibTex] [Code]
    To alleviate human efforts from obtaining large-scale annotations, Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples. Existing self-training methods suffer from the gradual drift problem, where noisy pseudo labels on unlabeled data are incorporated during training. To alleviate the noise in pseudo labels, we propose a method called MetaSRE, where a Relation Label Generation Network generates accurate quality assessment on pseudo labels by (meta) learning from the successful and failed attempts on Relation Classification as an additional meta-objective. To reduce the influence of noisy pseudo labels, MetaSRE adopts a pseudo label selection and exploitation scheme which assesses pseudo label quality on unlabeled samples and only exploits highquality pseudo labels in a self-training fashion to incrementally augment labeled samples for both robustness and accuracy. Experimental results on two public datasets demonstrate the effectiveness of the proposed approach. Source code is available.
    @article{hu2020semi,
      abbr = {arXiv},
      topic = {NLP},
      title = {Semi-supervised Relation Extraction via Incremental Meta Self-Training},
      author = {Hu, Xuming and Ma, Fukun and Liu, Chenyao and Zhang, Chenwei and Wen, Lijie and Yu, Philip S.},
      journal = {arXiv preprint},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2010.16410.pdf},
      code = {https://github.com/THU-BPM/MetaSRE}
    }
    
  11. arXiv
    CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection Congying Xia, Chenwei Zhang, Hoang Nguyen, Jiawei Zhang, and Philip S. Yu arXiv preprint arXiv:2004.01881 2020 [Abstract] [BibTex]
    In this paper, we formulate a more realistic and difficult problem setup for the intent detection task in natural language understanding, namely Generalized Few-Shot Intent Detection (GFSID). GFSID aims to discriminate a joint label space consisting of both existing intents which have enough labeled data and novel intents which only have a few examples for each class. To approach this problem, we propose a novel model, Conditional Text Generation with BERT (CG-BERT). CG-BERT effectively leverages a large pre-trained language model to generate text conditioned on the intent label. By modeling the utterance distribution with variational inference, CG-BERT can generate diverse utterances for the novel intents even with only a few utterances available. Experimental results show that CGBERT achieves state-of-the-art performance on the GFSID task with 1-shot and 5-shot settings on two real-world datasets.
    @article{xia2020cg,
      abbr = {arXiv},
      topic = {NLP},
      title = {CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection},
      author = {Xia, Congying and Zhang, Chenwei and Nguyen, Hoang and Zhang, Jiawei and Yu, Philip S.},
      journal = {arXiv preprint arXiv:2004.01881},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2004.01881.pdf}
    }
    
  12. arXiv
    Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering Ye Liu, Shaika Chowdhury, Chenwei Zhang, Cornelia Caragea, and Philip S. Yu arXiv preprint arXiv:2008.02434 2020 [Abstract] [BibTex]
    Healthcare question answering assistance aims to provide customer healthcare information, which widely appears in both Web and mobile Internet. The questions usually require the assistance to have proficient healthcare background knowledge as well as the reasoning ability on the knowledge. Recently a challenge involving complex healthcare reasoning, HeadQA dataset, has been proposed, which contains multiple choice questions authorized for the public healthcare specialization exam. Unlike most other QA tasks that focus on linguistic understanding, HeadQA requires deeper reasoning involving not only knowledge extraction, but also complex reasoning with healthcare knowledge. These questions are the most challenging for current QA systems, and the current performance of the state-of-the-art method is slightly better than a random guess. In order to solve this challenging task, we present a Multi-step reasoning with Knowledge extraction framework (MurKe). The proposed framework first extracts the healthcare knowledge as supporting documents from the large corpus. In order to find the reasoning chain and choose the correct answer, MurKe iterates between selecting the supporting documents, reformulating the query representation using the supporting documents and getting entailment score for each choice using the entailment model. The reformulation module leverages selected documents for missing evidence, which maintains interpretability. Moreover, we are striving to make full use of off-the-shelf pretrained models. With less trainable weight, the pretrained model can easily adapt to healthcare tasks with limited training samples. From the experimental results and ablation study, our system is able to outperform several strong baselines on the HeadQA dataset.
    @article{liu2020interpretable,
      abbr = {arXiv},
      topic = {NLP},
      title = {Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering},
      author = {Liu, Ye and Chowdhury, Shaika and Zhang, Chenwei and Caragea, Cornelia and Yu, Philip S.},
      journal = {arXiv preprint arXiv:2008.02434},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2008.02434.pdf}
    }
    

2019

  1. Thesis
    Structured Knowledge Discovery from Massive Text Corpus Chenwei Zhang 2019 [BibTex]
    @phdthesis{zhang2019structured,
      abbr = {Thesis},
      title = {Structured Knowledge Discovery from Massive Text Corpus},
      author = {Zhang, Chenwei},
      year = {2019},
      school = {University of Illinois at Chicago},
      pdf = {https://arxiv.org/pdf/1908.01837.pdf}
    }
    
  2. ACL
    Joint Slot Filling and Intent Detection via Capsule Neural Networks Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S. Yu In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 [BibTex] [Poster] [Code]
    @inproceedings{zhang2019joint,
      abbr = {ACL},
      topic = {NLP},
      title = {Joint Slot Filling and Intent Detection via Capsule Neural Networks},
      author = {Zhang, Chenwei and Li, Yaliang and Du, Nan and Fan, Wei and Yu, Philip S.},
      booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
      pages = {5259--5267},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1812.09471.pdf},
      poster = {https://drive.google.com/file/d/1rZpP-4WY7T8AtARXde7qZd5enV53yNOL/view},
      code = {https://github.com/czhang99/Capsule-NLU}
    }
    
  3. ACL
    Multi-grained Named Entity Recognition Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, and Philip S. Yu In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 [BibTex] [Code]
    @inproceedings{xia2019multi,
      abbr = {ACL},
      topic = {NLP},
      title = {Multi-grained Named Entity Recognition},
      author = {Xia, Congying and Zhang, Chenwei and Yang, Tao and Li, Yaliang and Du, Nan and Wu, Xian and Fan, Wei and Ma, Fenglong and Yu, Philip S.},
      booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
      pages = {1430--1440},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1906.08449.pdf},
      code = {https://github.com/congyingxia/Multi-Grained-NER}
    }
    
  4. CIKM
    Generative question refinement with deep reinforcement learning in retrieval-based QA system Ye Liu, Chenwei Zhang, Xiaohui Yan, Yi Chang, and Philip S. Yu In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019 [BibTex]
    @inproceedings{liu2019generative,
      abbr = {CIKM},
      topic = {NLP},
      title = {Generative question refinement with deep reinforcement learning in retrieval-based QA system},
      author = {Liu, Ye and Zhang, Chenwei and Yan, Xiaohui and Chang, Yi and Yu, Philip S.},
      booktitle = {Proceedings of the 28th ACM International Conference on Information and Knowledge Management},
      pages = {1643--1652},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1908.05604.pdf}
    }
    
  5. ICDM
    Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking Yue Wang, Yao Wan, Chenwei Zhang, Lu Bai, Lixin Cui, and Philip S. Yu In 2019 IEEE International Conference on Data Mining (ICDM) 2019 [BibTex]
    @inproceedings{wang2019competitive,
      abbr = {ICDM},
      topic = {ML & Misc.},
      title = {Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking},
      author = {Wang, Yue and Wan, Yao and Zhang, Chenwei and Bai, Lu and Cui, Lixin and Yu, Philip S.},
      booktitle = {2019 IEEE International Conference on Data Mining (ICDM)},
      pages = {1366--1371},
      year = {2019},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1908.04573.pdf}
    }
    
  6. IJCNN
    Missing entity synergistic completion across multiple isomeric online knowledge libraries Bowen Dong, Jiawei Zhang, Chenwei Zhang, Yang Yang, and Philip S. Yu In 2019 International Joint Conference on Neural Networks (IJCNN) 2019 [BibTex]
    @inproceedings{dong2019missing,
      abbr = {IJCNN},
      topic = {Knowledge Graph},
      title = {Missing entity synergistic completion across multiple isomeric online knowledge libraries},
      author = {Dong, Bowen and Zhang, Jiawei and Zhang, Chenwei and Yang, Yang and Yu, Philip S.},
      booktitle = {2019 International Joint Conference on Neural Networks (IJCNN)},
      pages = {1--8},
      year = {2019},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1905.06365.pdf}
    }
    
  7. WWW
    MCVAE: Margin-based Conditional Variational Autoencoder for Relation Classification and Pattern Generation Fenglong Ma, Yaliang Li, Chenwei Zhang, Jing Gao, Nan Du, and Wei Fan In The World Wide Web Conference 2019 [BibTex]
    @inproceedings{ma2019mcvae,
      abbr = {WWW},
      topic = {NLP},
      title = {MCVAE: Margin-based Conditional Variational Autoencoder for Relation Classification and Pattern Generation},
      author = {Ma, Fenglong and Li, Yaliang and Zhang, Chenwei and Gao, Jing and Du, Nan and Fan, Wei},
      booktitle = {The World Wide Web Conference},
      pages = {3041--3048},
      year = {2019},
      pdf = {http://www.personal.psu.edu/ffm5105/files/2019/www19.pdf}
    }
    
  8. arXiv
    Hierarchical Semantic Correspondence Learning for Post-Discharge Patient Mortality Prediction Shaika Chowdhury, Chenwei Zhang, Philip S. Yu, and Yuan Luo arXiv preprint arXiv:1910.06492 2019 [BibTex]
    @article{chowdhury2019hierarchical,
      abbr = {arXiv},
      topic = {Graph Mining},
      title = {Hierarchical Semantic Correspondence Learning for Post-Discharge Patient Mortality Prediction},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S. and Luo, Yuan},
      journal = {arXiv preprint arXiv:1910.06492},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1910.06492.pdf}
    }
    
  9. arXiv
    Mixed Pooling Multi-View Attention Autoencoder for Representation Learning in Healthcare Shaika Chowdhury, Chenwei Zhang, Philip S. Yu, and Yuan Luo arXiv preprint arXiv:1910.06456 2019 [BibTex]
    @article{chowdhury2019mixed,
      abbr = {arXiv},
      topic = {Graph Mining},
      title = {Mixed Pooling Multi-View Attention Autoencoder for Representation Learning in Healthcare},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S. and Luo, Yuan},
      journal = {arXiv preprint arXiv:1910.06456},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1910.06456.pdf}
    }
    

2018

  1. KDD
    On the generative discovery of structured medical knowledge Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S. Yu In Proceedings of the 24th ACM SIGKDD international conference on Knowledge Discovery & Data Mining 2018 [BibTex] [Video]
    @inproceedings{zhang2018generative,
      abbr = {KDD},
      topic = {Knowledge Graph},
      title = {On the generative discovery of structured medical knowledge},
      author = {Zhang, Chenwei and Li, Yaliang and Du, Nan and Fan, Wei and Yu, Philip S.},
      booktitle = {Proceedings of the 24th ACM SIGKDD international conference on Knowledge Discovery \& Data Mining},
      pages = {2720--2728},
      year = {2018},
      pdf = {https://dl.acm.org/doi/pdf/10.1145/3219819.3220010},
      video = {https://www.youtube.com/watch?v=ZxmcsSKp0ko}
    }
    
  2. EMNLP
    Zero-shot User Intent Detection via Capsule Neural Networks Congying Xia*, Chenwei Zhang*, Xiaohui Yan, Yi Chang, and Philip S. Yu In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018 [BibTex] [Code] [Video]
    @inproceedings{xia2018zero,
      abbr = {EMNLP},
      topic = {NLP},
      title = {Zero-shot User Intent Detection via Capsule Neural Networks},
      author = {Xia*, Congying and Zhang*, Chenwei and Yan, Xiaohui and Chang, Yi and Yu, Philip S.},
      booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
      pages = {3090--3099},
      year = {2018},
      pdf = {https://arxiv.org/pdf/1809.00385.pdf},
      video = {https://vimeo.com/305945714},
      code = {https://github.com/congyingxia/ZeroShotCapsule}
    }
    
  3. WWW
    Multi-task pharmacovigilance mining from social media posts Shaika Chowdhury, Chenwei Zhang, and Philip S. Yu In Proceedings of the 2018 World Wide Web Conference 2018 [BibTex]
    @inproceedings{chowdhury2018multi,
      abbr = {WWW},
      topic = {NLP},
      title = {Multi-task pharmacovigilance mining from social media posts},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S.},
      booktitle = {Proceedings of the 2018 World Wide Web Conference},
      pages = {117--126},
      year = {2018},
      pdf = {https://arxiv.org/pdf/1801.06294.pdf}
    }
    
  4. Antennas Propag.
    Direction-of-arrival estimation based on deep neural networks with robustness to array imperfections Zhang-Meng Liu, Chenwei Zhang, and Philip S. Yu IEEE Transactions on Antennas and Propagation 2018 [HTML] [BibTex] [Code]
    @article{liu2018direction,
      abbr = {Antennas Propag.},
      topic = {ML & Misc.},
      title = {Direction-of-arrival estimation based on deep neural networks with robustness to array imperfections},
      author = {Liu, Zhang-Meng and Zhang, Chenwei and Yu, Philip S.},
      journal = {IEEE Transactions on Antennas and Propagation},
      volume = {66},
      number = {12},
      pages = {7315--7327},
      year = {2018},
      publisher = {IEEE},
      html = {https://ieeexplore.ieee.org/document/8485631},
      code = {https://github.com/LiuzmNUDT/DNN-DOA}
    }
    
  5. BigData
    Market Abnormality Period Detection via Co-movement Attention Model Yue Wang, Chenwei Zhang, Shen Wang, Philip S. Yu, Lu Bai, and Lixin Cui In 2018 IEEE International Conference on Big Data (Big Data) 2018 [HTML] [BibTex]
    @inproceedings{wang2018market,
      abbr = {BigData},
      topic = {Graph Mining},
      title = {Market Abnormality Period Detection via Co-movement Attention Model},
      author = {Wang, Yue and Zhang, Chenwei and Wang, Shen and Yu, Philip S. and Bai, Lu and Cui, Lixin},
      booktitle = {2018 IEEE International Conference on Big Data (Big Data)},
      pages = {1514--1523},
      year = {2018},
      organization = {IEEE},
      html = {https://ieeexplore.ieee.org/document/8621877}
    }
    
  6. BigData
    Data-driven blockbuster planning on online movie knowledge library Ye Liu, Jiawei Zhang, Chenwei Zhang, and Philip S. Yu In 2018 IEEE International Conference on Big Data (Big Data) 2018 [BibTex]
    @inproceedings{liu2018data,
      abbr = {BigData},
      topic = {Knowledge Graph},
      title = {Data-driven blockbuster planning on online movie knowledge library},
      author = {Liu, Ye and Zhang, Jiawei and Zhang, Chenwei and Yu, Philip S.},
      booktitle = {2018 IEEE International Conference on Big Data (Big Data)},
      pages = {1612--1617},
      year = {2018},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1810.10175.pdf}
    }
    
  7. ICBK
    Deep Co-Investment Network Learning for Financial Assets Yue Wang, Chenwei Zhang, Shen Wang, Philip S. Yu, Lu Bai, and Lixin Cui In 2018 IEEE International Conference on Big Knowledge (ICBK) 2018 [BibTex]
    @inproceedings{wang2018deep,
      abbr = {ICBK},
      topic = {Graph Mining},
      title = {Deep Co-Investment Network Learning for Financial Assets},
      author = {Wang, Yue and Zhang, Chenwei and Wang, Shen and Yu, Philip S. and Bai, Lu and Cui, Lixin},
      booktitle = {2018 IEEE International Conference on Big Knowledge (ICBK)},
      pages = {41--48},
      year = {2018},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1809.04227.pdf}
    }
    
  8. arXiv
    Finding similar medical questions from question answering websites Yaliang Li, Liuyi Yao, Nan Du, Jing Gao, Qi Li, Chuishi Meng, Chenwei Zhang, and Wei Fan arXiv preprint arXiv:1810.05983 2018 [BibTex]
    @article{li2018finding,
      abbr = {arXiv},
      topic = {NLP},
      title = {Finding similar medical questions from question answering websites},
      author = {Li, Yaliang and Yao, Liuyi and Du, Nan and Gao, Jing and Li, Qi and Meng, Chuishi and Zhang, Chenwei and Fan, Wei},
      journal = {arXiv preprint arXiv:1810.05983},
      year = {2018},
      pdf = {https://arxiv.org/pdf/1810.05983.pdf}
    }
    

2017

  1. Big Data
    Bringing semantic structures to user intent detection in online medical queries Chenwei Zhang, Nan Du, Wei Fan, Yaliang Li, Chun-Ta Lu, and Philip S. Yu In 2017 IEEE International Conference on Big Data (Big Data) 2017 [BibTex]
    @inproceedings{zhang2017bringing,
      abbr = {Big Data},
      topic = {NLP},
      title = {Bringing semantic structures to user intent detection in online medical queries},
      author = {Zhang, Chenwei and Du, Nan and Fan, Wei and Li, Yaliang and Lu, Chun-Ta and Yu, Philip S.},
      booktitle = {2017 IEEE International Conference on Big Data (Big Data)},
      pages = {1019--1026},
      year = {2017},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1710.08015.pdf}
    }
    
  2. ICDM
    BL-MNE: emerging heterogeneous social network embedding through broad learning with aligned autoencoder Jiawei Zhang, Congying Xia, Chenwei Zhang, Limeng Cui, Yanjie Fu, and Philip S. Yu In 2017 IEEE International Conference on Data Mining (ICDM) 2017 [BibTex]
    @inproceedings{zhang2017bl,
      abbr = {ICDM},
      topic = {Graph Mining},
      title = {BL-MNE: emerging heterogeneous social network embedding through broad learning with aligned autoencoder},
      author = {Zhang, Jiawei and Xia, Congying and Zhang, Chenwei and Cui, Limeng and Fu, Yanjie and Yu, Philip S.},
      booktitle = {2017 IEEE International Conference on Data Mining (ICDM)},
      pages = {605--614},
      year = {2017},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1711.09409.pdf}
    }
    
  3. KDD
    Deepmood: modeling mobile phone typing dynamics for mood detection Bokai Cao, Lei Zheng, Chenwei Zhang, Philip S. Yu, Andrea Piscitello, John Zulueta, Olu Ajilore, Kelly Ryan, and Alex D Leow In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017 [BibTex] [Code] [Video]
    @inproceedings{cao2017deepmood,
      abbr = {KDD},
      topic = {ML & Misc.},
      title = {Deepmood: modeling mobile phone typing dynamics for mood detection},
      author = {Cao, Bokai and Zheng, Lei and Zhang, Chenwei and Yu, Philip S. and Piscitello, Andrea and Zulueta, John and Ajilore, Olu and Ryan, Kelly and Leow, Alex D},
      booktitle = {Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
      pages = {747--755},
      year = {2017},
      pdf = {https://arxiv.org/pdf/1803.08986.pdf},
      video = {https://www.youtube.com/watch?v=w1TfSp8NpfM},
      code = {https://www.cs.uic.edu/~bcao1/code/DeepMood.py}
    }
    
  4. CIKM
    Broad learning based multi-source collaborative recommendation Junxing Zhu, Jiawei Zhang, Lifang He, Quanyuan Wu, Bin Zhou, Chenwei Zhang, and Philip S. Yu In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management 2017 [BibTex]
    @inproceedings{zhu2017broad,
      abbr = {CIKM},
      topic = {Graph Mining},
      title = {Broad learning based multi-source collaborative recommendation},
      author = {Zhu, Junxing and Zhang, Jiawei and He, Lifang and Wu, Quanyuan and Zhou, Bin and Zhang, Chenwei and Yu, Philip S.},
      booktitle = {Proceedings of the 2017 ACM on Conference on Information and Knowledge Management},
      pages = {1409--1418},
      year = {2017},
      pdf = {http://www.ifmlab.org/files/paper/2017_cikm_paper2.pdf}
    }
    
  5. IEEE Access
    CHRS: cold start recommendation across multiple heterogeneous information networks Junxing Zhu, Jiawei Zhang, Chenwei Zhang, Quanyuan Wu, Yan Jia, Bin Zhou, and Philip S. Yu IEEE Access 2017 [BibTex]
    @article{zhu2017chrs,
      abbr = {IEEE Access},
      topic = {Graph Mining},
      title = {CHRS: cold start recommendation across multiple heterogeneous information networks},
      author = {Zhu, Junxing and Zhang, Jiawei and Zhang, Chenwei and Wu, Quanyuan and Jia, Yan and Zhou, Bin and Yu, Philip S.},
      journal = {IEEE Access},
      volume = {5},
      pages = {15283--15299},
      year = {2017},
      publisher = {IEEE},
      pdf = {https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7976276}
    }
    

2016 & Before

  1. WWW
    Mining user intentions from medical queries: A neural network based heterogeneous jointly modeling approach Chenwei Zhang, Wei Fan, Nan Du, and Philip S. Yu In Proceedings of the 25th International Conference on World Wide Web 2016 [BibTex] [Slides]
    @inproceedings{zhang2016mining,
      abbr = {WWW},
      topic = {NLP},
      title = {Mining user intentions from medical queries: A neural network based heterogeneous jointly modeling approach},
      author = {Zhang, Chenwei and Fan, Wei and Du, Nan and Yu, Philip S.},
      booktitle = {Proceedings of the 25th International Conference on World Wide Web},
      pages = {1373--1384},
      year = {2016},
      pdf = {http://gdac.uqam.ca/WWW2016-Proceedings/proceedings/p1373.pdf},
      slides = {https://drive.google.com/file/d/0B0NF2TxreW8hTUhRb1ljVldadlk/view}
    }
    
  2. CIKM
    Multi-source hierarchical prediction consolidation Chenwei Zhang, Sihong Xie, Yaliang Li, Jing Gao, Wei Fan, and Philip S. Yu In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management 2016 [BibTex] [Poster]
    @inproceedings{zhang2016multi,
      abbr = {CIKM},
      topic = {Graph Mining},
      title = {Multi-source hierarchical prediction consolidation},
      author = {Zhang, Chenwei and Xie, Sihong and Li, Yaliang and Gao, Jing and Fan, Wei and Yu, Philip S.},
      booktitle = {Proceedings of the 25th ACM International on Conference on Information and Knowledge Management},
      pages = {2251--2256},
      year = {2016},
      pdf = {https://arxiv.org/pdf/1608.03344.pdf},
      poster = {https://drive.google.com/file/d/0B0NF2TxreW8hZFJYaXlxSmN2NFE/view}
    }
    
  3. ICDM
    Augmented LSTM framework to construct medical self-diagnosis android Chaochun Liu, Huan Sun, Nan Du, Shulong Tan, Hongliang Fei, Wei Fan, Tao Yang, Hao Wu, Yaliang Li, and Chenwei Zhang In 2016 IEEE 16th International Conference on Data Mining (ICDM) 2016 [BibTex]
    @inproceedings{liu2016augmented,
      abbr = {ICDM},
      topic = {NLP},
      title = {Augmented LSTM framework to construct medical self-diagnosis android},
      author = {Liu, Chaochun and Sun, Huan and Du, Nan and Tan, Shulong and Fei, Hongliang and Fan, Wei and Yang, Tao and Wu, Hao and Li, Yaliang and Zhang, Chenwei},
      booktitle = {2016 IEEE 16th International Conference on Data Mining (ICDM)},
      pages = {251--260},
      year = {2016},
      organization = {IEEE},
      pdf = {http://web.cse.ohio-state.edu/~sun.397/docs/selfdiagnosis-icdm.pdf}
    }
    
  4. TBD
    Extracting medical knowledge from crowdsourced question answering website Yaliang Li, Chaochun Liu, Nan Du, Wei Fan, Qi Li, Jing Gao, Chenwei Zhang, and Hao Wu IEEE Transactions on Big Data 2016 [HTML] [BibTex]
    @article{li2016extracting,
      abbr = {TBD},
      topic = {Knowledge Graph},
      title = {Extracting medical knowledge from crowdsourced question answering website},
      author = {Li, Yaliang and Liu, Chaochun and Du, Nan and Fan, Wei and Li, Qi and Gao, Jing and Zhang, Chenwei and Wu, Hao},
      journal = {IEEE Transactions on Big Data},
      year = {2016},
      publisher = {IEEE},
      html = {https://ieeexplore.ieee.org/abstract/document/7572985}
    }
    
  5. Cybernetics & Systems
    An evidential spam-filtering framework Chenwei Zhang, Xiaoyan Su, Yong Hu, Zili Zhang, and Yong Deng Cybernetics and Systems 2016 [BibTex]
    @article{zhang2016evidential,
      abbr = {Cybernetics & Systems},
      topic = {ML & Misc.},
      title = {An evidential spam-filtering framework},
      author = {Zhang, Chenwei and Su, Xiaoyan and Hu, Yong and Zhang, Zili and Deng, Yong},
      journal = {Cybernetics and Systems},
      volume = {47},
      number = {6},
      pages = {427--444},
      year = {2016},
      publisher = {Taylor \& Francis}
    }
    
  6. KBS
    A new method to determine basic probability assignment using core samples Chenwei Zhang, Yong Hu, Felix TS Chan, Rehan Sadiq, and Yong Deng Knowledge-Based Systems 2014 [BibTex]
    @article{zhang2014new,
      abbr = {KBS},
      topic = {ML & Misc.},
      title = {A new method to determine basic probability assignment using core samples},
      author = {Zhang, Chenwei and Hu, Yong and Chan, Felix TS and Sadiq, Rehan and Deng, Yong},
      journal = {Knowledge-Based Systems},
      volume = {69},
      pages = {140--149},
      year = {2014},
      publisher = {Elsevier}
    }