Publications (By Year | By Topic)

2024

  1. LREC-COLING
    CORI: CJKV Benchmark with Romanization Integration - A step towards Cross-lingual Transfer Beyond Textual Scripts Hoang Nguyen, Chenwei Zhang, Ye Liu, Natalie Parde, Eugene Rohrbaugh, and Philip Yu In The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation 2024 [Abstract] [BibTex]
    @inproceedings{nguyen2024cori,
      abbr = {LREC-COLING},
      topic = {NLP},
      title = {CORI: CJKV Benchmark with Romanization Integration - A step towards Cross-lingual Transfer Beyond Textual Scripts},
      author = {Nguyen, Hoang and Zhang, Chenwei and Liu, Ye and Parde, Natalie and Rohrbaugh, Eugene and Yu, Philip},
      booktitle = {The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation},
      year = {2024},
      pdf = {}
    }
    

2023

  1. EMNLP
    Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks Hoang H Nguyen, Ye Liu, Chenwei Zhang, Tao Zhang, and Philip S. Yu In The 2023 Conference on Empirical Methods in Natural Language Processing 2023 [Abstract] [BibTex]
    While Chain-of-Thought prompting is popular in reasoning tasks, its application to Large Language Models (LLMs) in Natural Language Understanding (NLU) is under-explored. Motivated by multi-step reasoning of LLMs, we propose Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps where LLMs can learn to acquire and leverage essential concepts to solve tasks from different granularities. Moreover, we propose leveraging semantic-based Abstract Meaning Representation (AMR) structured knowledge as an intermediate step to capture the nuances and diverse structures of utterances, and to understand connections between their varying levels of granularity. Our proposed approach is demonstrated effective in assisting the LLMs adapt to the multi-grained NLU tasks under both zero-shot and few-shot multi-domain settings.
    @inproceedings{nguyen2023enhancing,
      selected = {1},
      abbr = {EMNLP},
      topic = {NLP},
      title = {Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks},
      author = {Nguyen, Hoang H and Liu, Ye and Zhang, Chenwei and Zhang, Tao and Yu, Philip S.},
      booktitle = {The 2023 Conference on Empirical Methods in Natural Language Processing},
      year = {2023},
      pdf = {}
    }
    
  2. EMNLP
    Knowledge-Selective Pretraining for Attribute Value Extraction Hui Liu, Qingyu Yin, Zhengyang Wang, Chenwei Zhang, Haoming Jiang, Yifan Gao, Zheng Li, Xian Li, Chao Zhang, Bing Yin, William Yang Wang, and Xiaodan Zhu In Findings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023 [Abstract] [BibTex]
    Attribute Value Extraction (AVE) aims to retrieve the values of attributes from the product profiles. The state-of-the-art methods tackle the AVE task through a question-answering (QA) paradigm, where the value is predicted from the context (i.e. product profile) given a query (i.e. attributes). Despite of the substantial advancements that have been made, the performance of existing methods on rare attributes is still far from satisfaction, and they cannot be easily extended to unseen attributes due to the poor generalization ability. In this work, we propose to leverage pretraining and transfer learning to address the aforementioned weaknesses. We first collect the product information from various E-commerce stores and retrieve a large number of (profile, attribute, value) triples, which will be used as the pretraining corpus. To more effectively utilize the retrieved corpus, we further design a Knowledge-Selective Framework (KSelF) based on query expansion that can be closely combined with the pretraining corpus to boost the performance. Meanwhile, considering the public AE-pub dataset contains considerable noise, we construct and contribute a larger benchmark EC-AVE collected from E-commerce websites. We conduct evaluation on both of these datasets. The experimental results demonstrate that our proposed KSelF achieves new state-of-the-art performance without pretraining. When incorporated with the pretraining corpus, the performance of KSelF can be further improved, particularly on the attributes with limited training resources.
    @inproceedings{liu2023knowledge,
      selected = {1},
      abbr = {EMNLP},
      topic = {NLP},
      title = {Knowledge-Selective Pretraining for Attribute Value Extraction},
      author = {Liu, Hui and Yin, Qingyu and Wang, Zhengyang and Zhang, Chenwei and Jiang, Haoming and Gao, Yifan and Li, Zheng and Li, Xian and Zhang, Chao and Yin, Bing and Wang, William Yang and Zhu, Xiaodan},
      booktitle = {Findings of the 2023 Conference on Empirical Methods in Natural Language Processing},
      year = {2023},
      pdf = {}
    }
    
  3. TKDE
    Reading Broadly to Open Your Mind: Improving Open Relation Extraction with Self-supervised Information in Documents Xuming Hu, Zhaochen Hong, Chenwei Zhang, Aiwei Liu, Shiao Meng, Lijie Wen, Irwin King, and Philip S. Yu In IEEE Transactions on Knowledge and Data Engineering 2023 [BibTex]
    @inproceedings{hu2023reading,
      selected = {1},
      abbr = {TKDE},
      topic = {NLP},
      title = {Reading Broadly to Open Your Mind: Improving Open Relation Extraction with Self-supervised Information in Documents},
      author = {Hu, Xuming and Hong, Zhaochen and Zhang, Chenwei and Liu, Aiwei and Meng, Shiao and Wen, Lijie and King, Irwin and Yu, Philip S.},
      booktitle = {IEEE Transactions on Knowledge and Data Engineering},
      year = {2023},
      pdf = {}
    }
    
  4. SIGDIAL
    Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning Hoang Nguyen, Chenwei Zhang, Ye Liu, and Philip Yu In The 2023 SIGDIAL Meeting on Discourse and Dialogue 2023 [Abstract] [BibTex]
    Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations. We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit (1) unsupervised semantic knowledge extracted from PLM, and (2) additional sentence level intent label signals available from TOD. Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets. When generalized to emerging intents, our SI objectives also provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.
    @inproceedings{nguyen2023slot,
      abbr = {SIGDIAL},
      topic = {NLP},
      title = {Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning},
      author = {Nguyen, Hoang and Zhang, Chenwei and Liu, Ye and Yu, Philip},
      booktitle = {The 2023 SIGDIAL Meeting on Discourse and Dialogue},
      year = {2023},
      pdf = {}
    }
    
  5. ACL
    Towards Open-World Product Attribute Mining: A Lightly-Supervised Approach Liyan Xu, Chenwei Zhang, Xian Li, Jingbo Shang, and Jinho D. Choi In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics 2023 [Abstract] [BibTex]
    We present a new task setting for attribute mining on e-commerce products, serving as a practical solution to extract open-world attributes without extensive human intervention. Our supervision comes from a high-quality seed attribute set bootstrapped from existing resources, and we aim to expand the attribute vocabulary of existing seed types, and also to discover any new attribute types automatically. A new dataset is created to support our setting, and our approach Amacer is proposed specifically to tackle the limited supervision. Especially, given that no direct supervision is available for those unseen new attributes, our novel formulation exploits self-supervised heuristic and unsupervised latent attributes, which attains implicit semantic signals as additional supervision by leveraging product context. Experiments suggest that our approach surpasses various baselines by 12 F1, expanding attributes of existing types significantly by up to 12 times, and discovering values from 39% new types. Our dataset and code will be publicly available.
    @inproceedings{xu2023topic,
      selected = {1},
      abbr = {ACL},
      topic = {Knowledge Graph},
      title = {Towards Open-World Product Attribute Mining: A Lightly-Supervised Approach},
      author = {Xu, Liyan and Zhang, Chenwei and Li, Xian and Shang, Jingbo and Choi, Jinho D.},
      booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics},
      year = {2023},
      pdf = {https://arxiv.org/pdf/2305.18350.pdf}
    }
    
  6. ACL
    GDA: Generative Data Augmentation Techniques for Relation Extraction Tasks Xuming Hu, Aiwei Liu, Zeqi Tan, Xin Zhang, Chenwei Zhang, Irwin King, and Philip S. Yu In Findings of the 61st Annual Meeting of the Association for Computational Linguistics 2023 [BibTex]
    @inproceedings{hu2023generative,
      abbr = {ACL},
      topic = {NLP},
      title = {GDA: Generative Data Augmentation Techniques for Relation Extraction Tasks},
      author = {Hu, Xuming and Liu, Aiwei and Tan, Zeqi and Zhang, Xin and Zhang, Chenwei and King, Irwin and Yu, Philip S.},
      booktitle = {Findings of the 61st Annual Meeting of the Association for Computational Linguistics},
      year = {2023},
      pdf = {https://arxiv.org/pdf/2305.16663.pdf}
    }
    
  7. ACL
    Enhancing Cross-Lingual Transfer via Phonemic Transcription Integration Hoang Nguyen, Chenwei Zhang, Tao Zhang, Eugene Rohrbaugh, and Philip S. Yu In Findings of the 61st Annual Meeting of the Association for Computational Linguistics 2023 [BibTex]
    @inproceedings{nguyen2023enhancinh,
      abbr = {ACL},
      topic = {NLP},
      title = {Enhancing Cross-Lingual Transfer via Phonemic Transcription Integration},
      author = {Nguyen, Hoang and Zhang, Chenwei and Zhang, Tao and Rohrbaugh, Eugene and Yu, Philip S.},
      booktitle = {Findings of the 61st Annual Meeting of the Association for Computational Linguistics},
      year = {2023},
      pdf = {https://arxiv.org/pdf/2307.04361.pdf}
    }
    
  8. ACL
    PV2TEA: Patching Visual Modality to Textual-Established Product Attribute Extraction Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, Jingbo Shang, Carl Yang, and Xian Li In Findings of the 61st Annual Meeting of the Association for Computational Linguistics 2023 [BibTex]
    @inproceedings{cui2023patch,
      abbr = {ACL},
      topic = {NLP},
      title = {PV2TEA: Patching Visual Modality to Textual-Established Product Attribute Extraction},
      author = {Cui, Hejie and Lin, Rongmei and Zalmout, Nasser and Zhang, Chenwei and Shang, Jingbo and Yang, Carl and Li, Xian},
      booktitle = {Findings of the 61st Annual Meeting of the Association for Computational Linguistics},
      year = {2023},
      pdf = {https://arxiv.org/pdf/2306.01016.pdf}
    }
    
  9. ACL
    Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs Zijie Huang, Daheng Wang, Binxuan Huang, Chenwei Zhang, Jingbo Shang, Yan Liang, Zhengyang Wang, Xian Li, Christos Faloutsos, Yizhou Sun, and Wei Wang In Findings of the 61st Annual Meeting of the Association for Computational Linguistics 2023 [BibTex]
    @inproceedings{huang2023concept,
      abbr = {ACL},
      topic = {Knowledge Graph},
      title = {Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs},
      author = {Huang, Zijie and Wang, Daheng and Huang, Binxuan and Zhang, Chenwei and Shang, Jingbo and Liang, Yan and Wang, Zhengyang and Li, Xian and Faloutsos, Christos and Sun, Yizhou and Wang, Wei},
      booktitle = {Findings of the 61st Annual Meeting of the Association for Computational Linguistics},
      year = {2023},
      pdf = {https://arxiv.org/pdf/2307.01933.pdf}
    }
    
  10. ACL
    Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-training for E-commerce Catalog Kewei Cheng, Xian Li, Zhengyang Wang, Chenwei Zhang, Binxuan Huang, Yifan Ethan Xu, Xin Luna Dong, and Yizhou Sun In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics - Industry Track 2023 [BibTex]
    @inproceedings{cheng2023table,
      abbr = {ACL},
      topic = {NLP},
      title = {Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-training for E-commerce Catalog},
      author = {Cheng, Kewei and Li, Xian and Wang, Zhengyang and Zhang, Chenwei and Huang, Binxuan and Xu, Yifan Ethan and Dong, Xin Luna and Sun, Yizhou},
      booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics - Industry Track},
      year = {2023},
      pdf = {https://aclanthology.org/2023.acl-industry.18.pdf}
    }
    
  11. SIGIR
    Think Rationally about What You See: Continuous Rationale Extraction for Relation Extraction Xuming Hu, Zhaochen Hong, Chenwei Zhang, Irwin King, and Philip S. Yu In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023 [BibTex]
    @inproceedings{xu2023think,
      abbr = {SIGIR},
      topic = {NLP},
      title = {Think Rationally about What You See: Continuous Rationale Extraction for Relation Extraction},
      author = {Hu, Xuming and Hong, Zhaochen and Zhang, Chenwei and King, Irwin and Yu, Philip S.},
      booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
      year = {2023},
      pdf = {https://arxiv.org/pdf/2305.03503.pdf}
    }
    

2022

  1. arXiv
    Gradient Imitation Reinforcement Learning for General Low-Resource Information Extraction Xuming Hu, Shiao Meng, Chenwei Zhang, Xiangli Yang, Lijie Wen, Irwin King, and Philip S. Yu In arXiv 2022 [Abstract] [BibTex]
    Information Extraction (IE) aims to extract structured information from heterogeneous sources. IE from natural language texts include sub-tasks such as Named Entity Recognition (NER), Relation Extraction (RE), and Event Extraction (EE). Most IE systems require comprehensive understandings of sentence structure, implied semantics, and domain knowledge to perform well; thus, IE tasks always need adequate external resources and annotations. However, it takes time and effort to obtain more human annotations. Low-Resource Information Extraction (LRIE) strives to use unsupervised data, reducing the required resources and human annotation. In practice, existing systems either utilize self-training schemes to generate pseudo labels that will cause the gradual drift problem, or leverage consistency regularization methods which inevitably possess confirmation bias. To alleviate confirmation bias due to the lack of feedback loops in existing LRIE learning paradigms, we develop a Gradient Imitation Reinforcement Learning (GIRL) method to encourage pseudo-labeled data to imitate the gradient descent direction on labeled data, which can force pseudo-labeled data to achieve better optimization capabilities similar to labeled data. Based on how well the pseudo-labeled data imitates the instructive gradient descent direction obtained from labeled data, we design a reward to quantify the imitation process and bootstrap the optimization capability of pseudo-labeled data through trial and error. In addition to learning paradigms, GIRL is not limited to specific sub-tasks, and we leverage GIRL to solve all IE sub-tasks (named entity recognition, relation extraction, and event extraction) in low-resource settings (semi-supervised IE and few-shot IE).
    @inproceedings{hu2022gradient,
      abbr = {arXiv},
      topic = {NLP},
      title = {Gradient Imitation Reinforcement Learning for General Low-Resource Information Extraction},
      author = {Hu, Xuming and Meng, Shiao and Zhang, Chenwei and Yang, Xiangli and Wen, Lijie and King, Irwin and Yu, Philip S.},
      booktitle = {arXiv},
      year = {2022},
      pdf = {https://arxiv.org/pdf/2211.06014.pdf}
    }
    
  2. NAACL
    HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction Shuliang Liu, Xuming Hu, Chenwei Zhang, Shu’ang Li, Lijie Wen, and Philip S. Yu In The 2022 Conference of the North American Chapter of the Association for Computational Linguistics 2022 [Abstract] [BibTex] [Code]
    Unsupervised relation extraction aims to extract relationship between entities from natural language sentences without prior information on relational scope or distribution. Existing works either utilize self-supervised schemes to refine relational feature signals by iteratively leveraging adaptive clustering and classification that provoke gradual drift problems, or adopt instance-wise contrastive learning which unreasonably pushes apart those sentence pairs that are semantically similar. To overcome these defects, we propose a novel contrastive learning framework named HiURE, which has the capability to derive hierarchical signals from relational feature space using cross hierarchy attention and effectively optimize relation representation of sentences under exemplar-wise contrastive learning. Experimental results on two public datasets demonstrate the advanced effectiveness and robustness of HiURE on unsupervised relation extraction when compared with state-of-the-art models.
    @inproceedings{liu2022hierarchical,
      abbr = {NAACL},
      topic = {NLP},
      title = {HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction},
      author = {Liu, Shuliang and Hu, Xuming and Zhang, Chenwei and Li, Shu'ang and Wen, Lijie and Yu, Philip S.},
      booktitle = {The 2022 Conference of the North American Chapter of the Association for Computational Linguistics},
      year = {2022},
      pdf = {https://arxiv.org/pdf/2205.02225.pdf},
      code = {https://github.com/THU-BPM/HiURE}
    }
    
  3. TheWebConf
    OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, and Jiawei Han In Proceedings of the Web Conference 2022 [Abstract] [BibTex] [Slides] [Code] [Video]
    Automatic extraction of product attributes from their textual descriptions is essential for online shopper experience. One inherent challenge of this task is the emerging nature of e-commerce products — we see new types of products with their unique set of new attributes constantly. Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data. In this work, we study the attribute mining problem in an open-world setting to extract novel attributes and their values. Instead of providing comprehensive training data, the user only needs to provide a few examples for a few known attribute types as weak supervision. We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes. The candidate generation step probes a pre-trained language model to extract phrases from product titles. Then, an attribute-aware fine-tuning method optimizes a multitask objective and shapes the language model representation to be attribute-discriminative. Finally, we discover new attributes and values through the self-ensemble of our framework, which handles the open-world challenge. We run extensive experiments on a large distantly annotated development set and a gold standard human-annotated test set that we collected. Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.
    @inproceedings{zhang2022open,
      abbr = {TheWebConf},
      topic = {Knowledge Graph},
      title = {OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision},
      author = {Zhang, Xinyang and Zhang, Chenwei and Li, Xian and Dong, Xin Luna and Shang, Jingbo and Faloutsos, Christos and Han, Jiawei},
      booktitle = {Proceedings of the Web Conference},
      year = {2022},
      pdf = {https://assets.amazon.science/d5/d3/ce07fed14287b4a8c23a7d34bf59/oa-mine-open-world-attribute-mining-for-ecommerce-products-with-weak-supervision.pdf},
      code = {https://github.com/xinyangz/OAMine},
      video = {https://www.youtube.com/watch?v=vrDPV8EMLnA},
      slides = {OA-Mine_2022TheWebConf_slides.pdf}
    }
    
  4. PAKDD
    Sparse Imbalanced Drug-Target Interaction Prediction via Heterogeneous Data Augmentation and Node Similarity Runze Wang, Zehua Zhang, Yueqin Zhang, Zhongyuan Jiang, Shilin Sun, and Chenwei Zhang In The 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining 2022 [Abstract] [BibTex] [Code]
    Drug-Target Interaction (DTI) prediction usually devotes to accurately identify the potential binding targets on proteins so as to guide the drug development. However, the sparse imbalance of known drug-target pairs remains a challenge for high-quality representation learning of drugs and targets, interfering with accurate prediction. The labeled drug-target pairs are far less than the missed since the obtained DTIs are recorded with pathogenic proteins and sophisticated bio-experiments. Therefore, we propose a deep learning paradigm via Heterogeneous graph data Augmentation and node Similarity (HAS) to solve the sparse imbalanced problem on drug-target interaction prediction. Heterogeneous graph data augmentation is devised to generate multi-view augmented graphs through a heterogeneous neighbors sampling strategy. Then the consistency across different graph structures is captured using graph contrastive optimization. Node similarity is calculated on the heterogeneous entity association matrices, aiming to integrate similarity information and heterogeneous attribute gain for drug-target interaction prediction. Extensive experiments show that HAS offers superior performance in sparse imbalanced scenario compared state-of-the-art methods. Ablation studies prove the effectiveness of heterogeneous graph data augmentation and node similarity.
    @inproceedings{wang2022drug,
      abbr = {PAKDD},
      topic = {Graph Mining},
      title = {Sparse Imbalanced Drug-Target Interaction Prediction via Heterogeneous Data Augmentation and Node Similarity},
      author = {Wang, Runze and Zhang, Zehua and Zhang, Yueqin and Jiang, Zhongyuan and Sun, Shilin and Zhang, Chenwei},
      booktitle = {The 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining},
      year = {2022},
      pdf = {},
      code = {}
    }
    

2021

  1. EMNLP
    Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction Xuming Hu, Chenwei Zhang, Yawen Yang, Xiaohe Li, Li Lin, Lijie Wen, and Philip S. Yu In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021 [Abstract] [BibTex] [Code]
    Low-resource Relation Extraction (LRE) aims to extract relation facts from limited labeled corpora when human annotation is scarce. Existing works either utilize self-training scheme to generate pseudo labels that will cause the gradual drift problem, or leverage meta-learning scheme which does not solicit feed-back explicitly. To alleviate selection bias due to the lack of feedback loops in existing LRE learning paradigms, we developed a Gradient Imitation Reinforcement Learning method to encourage pseudo label data to imitate the gradient descent direction on labeled data and bootstrap its optimization capability through trial and error. We also propose a framework called GradLRE, which handles two major scenarios in low-resource relation extraction. Besides the scenario where unlabeled data is sufficient, GradLRE handles the situation where no unlabeled data is available, by exploiting a contextualized augmentation method to generate data. Experimental results on two public datasets demonstrate the effectiveness of GradLRE on low resource relation extraction when comparing with baselines.
    @inproceedings{hu2021gradient,
      abbr = {EMNLP},
      topic = {NLP},
      selected = {1},
      title = {Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction},
      author = {Hu, Xuming and Zhang, Chenwei and Yang, Yawen and Li, Xiaohe and Lin, Li and Wen, Lijie and Yu, Philip S.},
      booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
      year = {2021},
      pdf = {https://arxiv.org/pdf/2109.06415.pdf},
      code = {https://github.com/THU-BPM/GradLRE}
    }
    
  2. EMNLP
    Semi-supervised Relation Extraction via Incremental Meta Self-Training Xuming Hu, Chenwei Zhang, Fukun Ma, Chenyao Liu, Lijie Wen, and Philip S. Yu In Findings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021 [Abstract] [BibTex] [Code]
    To alleviate human efforts from obtaining large-scale annotations, Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples. Existing self-training methods suffer from the gradual drift problem, where noisy pseudo labels on unlabeled data are incorporated during training. To alleviate the noise in pseudo labels, we propose a method called MetaSRE, where a Relation Label Generation Network generates accurate quality assessment on pseudo labels by (meta) learning from the successful and failed attempts on Relation Classification as an additional meta-objective. To reduce the influence of noisy pseudo labels, MetaSRE adopts a pseudo label selection and exploitation scheme which assesses pseudo label quality on unlabeled samples and only exploits highquality pseudo labels in a self-training fashion to incrementally augment labeled samples for both robustness and accuracy. Experimental results on two public datasets demonstrate the effectiveness of the proposed approach.
    @inproceedings{hu2021semi,
      abbr = {EMNLP},
      topic = {NLP},
      title = {Semi-supervised Relation Extraction via Incremental Meta Self-Training},
      author = {Hu, Xuming and Zhang, Chenwei and Ma, Fukun and Liu, Chenyao and and Lijie Wen and Yu, Philip S.},
      booktitle = {Findings of the 2021 Conference on Empirical Methods in Natural Language Processing},
      year = {2021},
      pdf = {https://arxiv.org/pdf/2010.16410.pdf},
      code = {https://github.com/THU-BPM/MetaSRE}
    }
    
  3. EMNLP
    End-to-End Conversational Search for Online Shopping with Utterance Transfer Liqiang Xiao, Jun Ma, Xin Luna Dong, Pascual Martínez-Gómez, Nasser Zalmout, Chenwei Zhang, Tong Zhao, Hao He, and Yaohui Jin In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021 [Abstract] [BibTex]
    Successful conversational search systems can present natural, adaptive and interactive shopping experience for online shopping customers. However, building such systems from scratch faces real word challenges from both imperfect product schema/knowledge and lack of training dialog data. In this work we f irst propose ConvSearch, an end-to-end conversational search system that deeply combines the dialog system with search. It leverages the text profile to retrieve products, which is more robust against imperfect product schema/knowledge compared with using product attributes alone. We then address the lack of data challenges by proposing an utterance transfer approach that generates dialogue utterances by using existing dialog from other domains, and leveraging the search behavior data from e-commerce retailer. With utterance transfer, we introduce a new conversational search dataset for online shopping. Experiments show that our utterance transfer method can significantly improve the availability of training dialogue data without crowd-sourcing, and the conversational search system significantly outperformed the best tested baseline.
    @inproceedings{xiao2021end,
      abbr = {EMNLP},
      topic = {NLP},
      title = {End-to-End Conversational Search for Online Shopping with Utterance Transfer},
      author = {Xiao, Liqiang and Ma, Jun and Dong, Xin Luna and Martínez-Gómez, Pascual and Zalmout, Nasser and Zhang, Chenwei and Zhao, Tong and He, Hao and Jin, Yaohui},
      booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
      year = {2021},
      pdf = {https://arxiv.org/pdf/2109.05460.pdf}
    }
    
  4. KDD
    All You Need to Know to Build a Product Knowledge Graph Nasser Zalmout, Chenwei Zhang, Xian Li, Yan Liang, and Xin Luna Dong In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2021 [Abstract] [BibTex] [Media]
    We answer the following key questions in this tutorial: What are unique challenges to build a product knowledge graph and what are solutions? Are these techniques applicable to building other domain knowledge graphs? What are practical tips to make this to production?
    @inproceedings{zalmout2021all,
      abbr = {KDD},
      topic = {Knowledge Graph},
      title = {All You Need to Know to Build a Product Knowledge Graph},
      author = {Zalmout, Nasser and Zhang, Chenwei and Li, Xian and Liang, Yan and Dong, Xin Luna},
      booktitle = {Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
      year = {2021},
      pdf = {https://naixlee.github.io/Product_Knowledge_Graph_Tutorial_KDD2021/},
      media = {https://naixlee.github.io/Product_Knowledge_Graph_Tutorial_KDD2021/}
    }
    
  5. TheWebConf
    Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks Xinyang Zhang, Chenwei Zhang, Xin Luna Dong, Jingbo Shang, and Jiawei Han In Proceedings of the Web Conference 2021 [Abstract] [BibTex] [Code] [Video]
    Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting that aims to categorize documents effectively, with a couple of seed documents annotated per category. We recognize that texts collected from the Web are often structure-rich, i.e., accompanied by various metadata. One can easily organize the corpus into a text-rich network, joining raw text documents with document attributes, high-quality phrases, label surface names as nodes, and their associations as edges. Such a network provides a holistic view of the corpus’ heterogeneous data sources and enables a joint optimization for network-based analysis and deep textual model training. We therefore propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases – a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Each module generates pseudo training labels from the unlabeled document set, and both modules mutually enhance each other by co-training using pooled pseudo labels. We test our model on two real-world datasets. On the challenging e-commerce product categorization dataset with 683 categories, our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%, significantly outperforming all compared methods; our accuracy is only less than 2% away from the supervised BERT model trained on about 50K labeled documents.
    @inproceedings{zhang2021minimally,
      abbr = {TheWebConf},
      topic = {Graph Mining},
      selected = {1},
      title = {Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks},
      author = {Zhang, Xinyang and Zhang, Chenwei and Dong, Xin Luna and Shang, Jingbo and Han, Jiawei},
      booktitle = {Proceedings of the Web Conference},
      year = {2021},
      pdf = {https://arxiv.org/pdf/2102.11479.pdf},
      code = {https://github.com/xinyangz/ltrn},
      video = {https://videolectures.net/www2021_zhang_minimally_supervised/}
    }
    
  6. KBS
    Hierarchical GAN-Tree and Bi-Directional Capsules for Multi-Label Image Classification Boyan Wang, Xuegang Hua, Chenwei Zhang, Peipei Lia, and Philip S. Yu In Knowledge-Based Systems 2021 [Abstract] [BibTex]
    Compared with the flat multi-label image classification, the hierarchical structure reserves a richer source of structural information to represent complicated relationships between labels in the real world. However, existing multi-label image classification methods focus on the accuracy of label prediction, ignoring the structural information embedded in the hierarchical label space. Furthermore, they hardly form the relevant visual feature space corresponding to the hierarchical label structure. In this paper, we propose a novel hierarchical framework based on the feature and label structural information named Hierarchical GAN-Tree and Bi-Directional Capsules (HGT&BC) to address these problems. We conduct Hierarchical GAN-Tree for feature space representation and Hierarchical Bi-Directional Capsules for label space classification, respectively. Hierarchical GAN-Tree generates hierarchical feature space using the unsupervised divisive clustering pattern according to the hierarchical structure, alleviating the mode collapse of generators and the overfitting manifestation of conventional GANs. Hierarchical Bi-Directional Capsules utilize the hierarchical label structure in iterations of top-down and bottom-up processes: the top-down process integrates hierarchical relationships into the probability computation to enhance partial hierarchical relationships; the bottom-up process modifies the dynamic routing mechanism between capsules to represent semantic objects for the comprehensive global hierarchical classifiers. Owing to the two components, HGT&BC successfully expresses the hierarchical relationships in both feature and label space and improves the performance of multi-label image classification. Extensive experimental results on four benchmark datasets demonstrate the effectiveness and efficiency of our hierarchical framework in practice.
    @inproceedings{wang2021hierarchical,
      abbr = {KBS},
      topic = {ML & Misc.},
      title = {Hierarchical GAN-Tree and Bi-Directional Capsules for Multi-Label Image Classification},
      author = {Wang, Boyan and Hua, Xuegang and Zhang, Chenwei and Lia, Peipei and Yu, Philip S.},
      booktitle = {Knowledge-Based Systems},
      year = {2021},
      pdf = {}
    }
    

2020

  1. EMNLP
    SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction Xuming Hu, Chenwei Zhang, Yusong Xu, Lijie Wen, and Philip S. Yu In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing 2020 [Abstract] [BibTex] [Code]
    Open relation extraction is the task of extracting open-domain relation facts from natural language sentences. Existing works either utilize heuristics or distant-supervised annotations to train a supervised classifier over pre-defined relations, or adopt unsupervised methods with additional assumptions that have less discriminative power. In this work, we propose a self-supervised framework named SelfORE, which exploits weak, self-supervised signals by leveraging large pretrained language model for adaptive clustering on contextualized relational features, and bootstraps the self-supervised signals by improving contextualized features in relation classification. Experimental results on three datasets show the effectiveness and robustness of SelfORE on open-domain Relation Extraction when comparing with competitive baselines.
    @inproceedings{hu2020selfore,
      abbr = {EMNLP},
      topic = {NLP},
      selected = {1},
      title = {SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction},
      author = {Hu, Xuming and Zhang, Chenwei and Xu, Yusong and Wen, Lijie and Yu, Philip S.},
      booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2004.02438.pdf},
      code = {https://github.com/THU-BPM/SelfORE}
    }
    
  2. EMNLP
    Dynamic Semantic Matching and Aggregation Network for Few-shot Intent Detection Hoang Nguyen, Chenwei Zhang, Congying Xia, and Philip S. Yu In Findings of the 2020 Conference on Empirical Methods in Natural Language Processing 2020 [Abstract] [BibTex] [Code]
    Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances. Although recent works demonstrate that multi-level matching plays an important role in transferring learned knowledge from seen training classes to novel testing classes, they rely on a static similarity measure and overly fine-grained matching components. These limitations inhibit generalizing capability towards Generalized Few-shot Learning settings where both seen and novel classes are co-existent. In this paper, we propose a novel Semantic Matching and Aggregation Network where semantic components are distilled from utterances via multi-head self-attention with additional dynamic regularization constraints. These semantic components capture high-level information, resulting in more effective matching between instances. Our multi-perspective matching method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances. We also propose a more challenging evaluation setting that considers classification on the joint all-class label space. Extensive experimental results demonstrate the effectiveness of our method.
    @inproceedings{nguyen2020semantic,
      abbr = {EMNLP},
      topic = {NLP},
      title = {Dynamic Semantic Matching and Aggregation Network for Few-shot Intent Detection},
      author = {Nguyen, Hoang and Zhang, Chenwei and Xia, Congying and Yu, Philip S.},
      booktitle = {Findings of the 2020 Conference on Empirical Methods in Natural Language Processing},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2010.02481.pdf},
      code = {https://github.com/nhhoang96/Semantic_Matching}
    }
    
  3. CogMI
    Low-shot Learning in Natural Language Processing Congying Xia, Chenwei Zhang, Jiawei Zhang, Tingting Liang, Hao Peng, and Philip S. Yu In Proceedings of the Second IEEE International Conference on Cognitive Machine Intelligence: Vision Track 2020 [BibTex]
    @inproceedings{xia2020lowshot,
      abbr = {CogMI},
      topic = {NLP},
      title = {Low-shot Learning in Natural Language Processing},
      author = {Xia, Congying and Zhang, Chenwei and Zhang, Jiawei and Liang, Tingting and Peng, Hao and Yu, Philip S.},
      booktitle = {Proceedings of the Second IEEE International Conference on Cognitive Machine Intelligence: Vision Track},
      year = {2020},
      pdf = {}
    }
    
  4. TKDE
    KGGen: A Generative Approach for Incipient Knowledge Graph Population Hao Chen, Chenwei Zhang, Jun Li, Philip S. Yu, and Ning Jing IEEE Transactions on Knowledge and Data Engineering 2020 [HTML] [Abstract] [BibTex] [Code]
    Knowledge graph is becoming an indispensable resource for numerous AI applications. However, the knowledge graph often suffers from its incompleteness. Building a complete, high-quality knowledge graph is time-consuming and requires significant human annotation efforts. In this paper, we study the Knowledge Graph Population task, which aims at extending the scale of structured knowledge, with a special focus on reducing data preparation and annotation efforts. Previous works mainly based on discriminative methods build classifiers and verify candidate triplets that are extracted from texts, which heavily rely on the quality of data collection and co-occurrance of entities in the text. We introduce a generative perspective to approach this task. A generative model KGGEN is proposed, which samples from the learned data distribution for each relation and can generate triplets regardless of entity pair co-occurrence in the corpus. To further improve the generation quality while alleviate human annotation efforts, adversarial learning is adopted to not only encourage generating high quality triplets, but also give model the ability to automatically assess the generation quality. Quantitative and qualitative experimental results conducted on two real-world generic knowledge graphs show that KGGEN generates novel and meaningful triplets and less human annotation comparing with the state-of-the-art approaches.
    @article{chen2020kggen,
      abbr = {TKDE},
      topic = {Knowledge Graph},
      title = {KGGen: A Generative Approach for Incipient Knowledge Graph Population},
      author = {Chen, Hao and Zhang, Chenwei and Li, Jun and Yu, Philip S. and Jing, Ning},
      journal = {IEEE Transactions on Knowledge and Data Engineering},
      year = {2020},
      publisher = {IEEE},
      html = {https://ieeexplore.ieee.org/abstract/document/9158381},
      code = {https://github.com/hchen118/KGGen-master}
    }
    
  5. KDD
    Octet: Online Catalog Taxonomy Enrichment with Self-Supervision Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, and Jiawei Han In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 [Abstract] [BibTex]
    Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-theart methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.
    @inproceedings{mao2020octet,
      abbr = {KDD},
      topic = {Graph Mining},
      title = {Octet: Online Catalog Taxonomy Enrichment with Self-Supervision},
      author = {Mao, Yuning and Zhao, Tong and Kan, Andrey and Zhang, Chenwei and Dong, Xin Luna and Faloutsos, Christos and Han, Jiawei},
      booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
      pages = {2247--2257},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2006.10276.pdf}
    }
    
  6. KDD
    AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, and others In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 [Abstract] [BibTex] [Media]
    Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.
    @inproceedings{dong2020autoknow,
      abbr = {KDD},
      topic = {Knowledge Graph},
      selected = {1},
      title = {AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types},
      author = {Dong, Xin Luna and He, Xiang and Kan, Andrey and Li, Xian and Liang, Yan and Ma, Jun and Xu, Yifan Ethan and Zhang, Chenwei and Zhao, Tong and Blanco Saldana, Gabriel and others},
      booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
      pages = {2724--2734},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2006.13473.pdf},
      media = {https://www.amazon.science/blog/building-product-graphs-automatically}
    }
    
  7. IJCAI
    Entity Synonym Discovery via Multipiece Bilateral Context Matching Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S. Yu In IJCAI 2020 [Abstract] [BibTex] [Code] [Video]
    Being able to automatically discover synonymous entities in an open-world setting benefits various tasks such as entity disambiguation or knowledge graph canonicalization. Existing works either only utilize entity features, or rely on structured annotations from a single piece of context where the entity is mentioned. To leverage diverse contexts where entities are mentioned, in this paper, we generalize the distributional hypothesis to a multi-context setting and propose a synonym discovery framework that detects entity synonyms from free-text corpora with considerations on effectiveness and robustness. As one of the key components in synonym discovery, we introduce a neural network model SYNONYMNET to determine whether or not two given entities are synonym with each other. Instead of using entities features, SYNONYMNET makes use of multiple pieces of contexts in which the entity is mentioned, and compares the context-level similarity via a bilateral matching schema. Experimental results demonstrate that the proposed model is able to detect synonym sets that are not observed during training on both generic and domain-specific datasets: Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4.16% improvement in terms of Area Under the Curve and 3.19% in terms of Mean Average Precision compared to the best baseline method. Code and data are available.
    @inproceedings{zhang2020entity,
      abbr = {IJCAI},
      topic = {Knowledge Graph},
      title = {Entity Synonym Discovery via Multipiece Bilateral Context Matching},
      author = {Zhang, Chenwei and Li, Yaliang and Du, Nan and Fan, Wei and Yu, Philip S.},
      booktitle = {IJCAI},
      year = {2020},
      pdf = {https://arxiv.org/pdf/1901.00056.pdf},
      code = {https://github.com/czhang99/SynonymNet},
      video = {https://www.ijcai.org/proceedings/2020/video/24954}
    }
    
  8. WWWJ
    Generative temporal link prediction via self-tokenized sequence modeling Yue Wang, Chenwei Zhang, Shen Wang, Philip S. Yu, Lu Bai, Lixin Cui, and Guandong Xu World Wide Web 2020 [Abstract] [BibTex]
    We formalize networks with evolving structures as temporal networks and propose a generative link prediction model, Generative Link Sequence Modeling (GLSM), to predict future links for temporal networks. GLSM captures the temporal link formation patterns from the observed links with a sequence modeling framework and has the ability to generate the emerging links by inferring from the probability distribution on the potential future links. To avoid overfitting caused by treating each link as a unique token, we propose a self-tokenization mechanism to transform each raw link in the network to an abstract aggregation token automatically. The self-tokenization is seamlessly integrated into the sequence modeling framework, which allows the proposed GLSM model to have the generalization capability to discover link formation patterns beyond raw link sequences. We compare GLSM with the existing state-of-art methods on five real-world datasets. The experimental results demonstrate that GLSM obtains future positive links effectively in a generative fashion while achieving the best performance (2-10% improvements on AUC) among other alternatives.
    @article{wang2020generative,
      abbr = {WWWJ},
      topic = {Graph Mining},
      author = {Wang, Yue and Zhang, Chenwei and Wang, Shen and Yu, Philip S. and Bai, Lu and Cui, Lixin and Xu, Guandong},
      journal = {World Wide Web},
      number = {4},
      pages = {2471--2488},
      title = {Generative temporal link prediction via self-tokenized sequence modeling},
      url = {https://doi.org/10.1007/s11280-020-00821-y},
      volume = {23},
      year = {2020},
      pdf = {https://arxiv.org/pdf/1911.11486.pdf}
    }
    
  9. HEALTHINF
    Med2Meta: Learning representations of medical concepts with meta-embeddings Shaika Chowdhury, Chenwei Zhang, Philip S. Yu, and Yuan Luo In 13th International Conference on Health Informatics, HEALTHINF 2020-Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020 2020 [Abstract] [BibTex]
    Distributed representations of medical concepts have been used to support downstream clinical tasks recently. Electronic Health Records (EHR) capture different aspects of patients’ hospital encounters and serve as a rich source for augmenting clinical decision making by learning robust medical concept embeddings. However, the same medical concept can be recorded in different modalities (e.g., clinical notes, lab results) — with each capturing salient information unique to that modality — and a holistic representation calls for relevant feature ensemble from all information sources. We hypothesize that representations learned from heterogeneous data types would lead to performance enhancement on various clinical informatics and predictive modeling tasks. To this end, our proposed approach makes use of meta-embeddings, embeddings aggregated from learned embeddings. Firstly, modality-specific embeddings for each medical concept is learned with graph autoencoders. The ensemble of all the embeddings is then modeled as a meta-embedding learning problem to incorporate their correlating and complementary information through a joint reconstruction. Empirical results of our model on both quantitative and qualitative clinical evaluations have shown improvements over state-ofthe-art embedding models, thus validating our hypothesis.
    @inproceedings{chowdhury2020med2meta,
      abbr = {HEALTHINF},
      topic = {Knowledge Graph},
      title = {Med2Meta: Learning representations of medical concepts with meta-embeddings},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S. and Luo, Yuan},
      booktitle = {13th International Conference on Health Informatics, HEALTHINF 2020-Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020},
      pages = {369--376},
      year = {2020},
      organization = {SciTePress},
      pdf = {https://arxiv.org/pdf/1912.03366.pdf}
    }
    
  10. arXiv
    CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection Congying Xia, Chenwei Zhang, Hoang Nguyen, Jiawei Zhang, and Philip S. Yu arXiv preprint arXiv:2004.01881 2020 [Abstract] [BibTex]
    In this paper, we formulate a more realistic and difficult problem setup for the intent detection task in natural language understanding, namely Generalized Few-Shot Intent Detection (GFSID). GFSID aims to discriminate a joint label space consisting of both existing intents which have enough labeled data and novel intents which only have a few examples for each class. To approach this problem, we propose a novel model, Conditional Text Generation with BERT (CG-BERT). CG-BERT effectively leverages a large pre-trained language model to generate text conditioned on the intent label. By modeling the utterance distribution with variational inference, CG-BERT can generate diverse utterances for the novel intents even with only a few utterances available. Experimental results show that CGBERT achieves state-of-the-art performance on the GFSID task with 1-shot and 5-shot settings on two real-world datasets.
    @article{xia2020cg,
      abbr = {arXiv},
      topic = {NLP},
      title = {CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection},
      author = {Xia, Congying and Zhang, Chenwei and Nguyen, Hoang and Zhang, Jiawei and Yu, Philip S.},
      journal = {arXiv preprint arXiv:2004.01881},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2004.01881.pdf}
    }
    
  11. arXiv
    Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering Ye Liu, Shaika Chowdhury, Chenwei Zhang, Cornelia Caragea, and Philip S. Yu arXiv preprint arXiv:2008.02434 2020 [Abstract] [BibTex]
    Healthcare question answering assistance aims to provide customer healthcare information, which widely appears in both Web and mobile Internet. The questions usually require the assistance to have proficient healthcare background knowledge as well as the reasoning ability on the knowledge. Recently a challenge involving complex healthcare reasoning, HeadQA dataset, has been proposed, which contains multiple choice questions authorized for the public healthcare specialization exam. Unlike most other QA tasks that focus on linguistic understanding, HeadQA requires deeper reasoning involving not only knowledge extraction, but also complex reasoning with healthcare knowledge. These questions are the most challenging for current QA systems, and the current performance of the state-of-the-art method is slightly better than a random guess. In order to solve this challenging task, we present a Multi-step reasoning with Knowledge extraction framework (MurKe). The proposed framework first extracts the healthcare knowledge as supporting documents from the large corpus. In order to find the reasoning chain and choose the correct answer, MurKe iterates between selecting the supporting documents, reformulating the query representation using the supporting documents and getting entailment score for each choice using the entailment model. The reformulation module leverages selected documents for missing evidence, which maintains interpretability. Moreover, we are striving to make full use of off-the-shelf pretrained models. With less trainable weight, the pretrained model can easily adapt to healthcare tasks with limited training samples. From the experimental results and ablation study, our system is able to outperform several strong baselines on the HeadQA dataset.
    @article{liu2020interpretable,
      abbr = {arXiv},
      topic = {NLP},
      title = {Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering},
      author = {Liu, Ye and Chowdhury, Shaika and Zhang, Chenwei and Caragea, Cornelia and Yu, Philip S.},
      journal = {arXiv preprint arXiv:2008.02434},
      year = {2020},
      pdf = {https://arxiv.org/pdf/2008.02434.pdf}
    }
    

2019

  1. Thesis
    Structured Knowledge Discovery from Massive Text Corpus Chenwei Zhang 2019 [BibTex]
    @phdthesis{zhang2019structured,
      abbr = {Thesis},
      title = {Structured Knowledge Discovery from Massive Text Corpus},
      author = {Zhang, Chenwei},
      year = {2019},
      school = {University of Illinois at Chicago},
      pdf = {https://arxiv.org/pdf/1908.01837.pdf}
    }
    
  2. ACL
    Joint Slot Filling and Intent Detection via Capsule Neural Networks Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S. Yu In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 [BibTex] [Poster] [Code]
    @inproceedings{zhang2019joint,
      abbr = {ACL},
      topic = {NLP},
      selected = {1},
      title = {Joint Slot Filling and Intent Detection via Capsule Neural Networks},
      author = {Zhang, Chenwei and Li, Yaliang and Du, Nan and Fan, Wei and Yu, Philip S.},
      booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
      pages = {5259--5267},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1812.09471.pdf},
      poster = {https://drive.google.com/file/d/1rZpP-4WY7T8AtARXde7qZd5enV53yNOL/view},
      code = {https://github.com/czhang99/Capsule-NLU}
    }
    
  3. ACL
    Multi-grained Named Entity Recognition Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, and Philip S. Yu In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 [BibTex] [Code]
    @inproceedings{xia2019multi,
      abbr = {ACL},
      topic = {NLP},
      title = {Multi-grained Named Entity Recognition},
      author = {Xia, Congying and Zhang, Chenwei and Yang, Tao and Li, Yaliang and Du, Nan and Wu, Xian and Fan, Wei and Ma, Fenglong and Yu, Philip S.},
      booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
      pages = {1430--1440},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1906.08449.pdf},
      code = {https://github.com/congyingxia/Multi-Grained-NER}
    }
    
  4. CIKM
    Generative question refinement with deep reinforcement learning in retrieval-based QA system Ye Liu, Chenwei Zhang, Xiaohui Yan, Yi Chang, and Philip S. Yu In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019 [BibTex]
    @inproceedings{liu2019generative,
      abbr = {CIKM},
      topic = {NLP},
      title = {Generative question refinement with deep reinforcement learning in retrieval-based QA system},
      author = {Liu, Ye and Zhang, Chenwei and Yan, Xiaohui and Chang, Yi and Yu, Philip S.},
      booktitle = {Proceedings of the 28th ACM International Conference on Information and Knowledge Management},
      pages = {1643--1652},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1908.05604.pdf}
    }
    
  5. ICDM
    Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking Yue Wang, Yao Wan, Chenwei Zhang, Lu Bai, Lixin Cui, and Philip S. Yu In 2019 IEEE International Conference on Data Mining (ICDM) 2019 [BibTex]
    @inproceedings{wang2019competitive,
      abbr = {ICDM},
      topic = {ML & Misc.},
      title = {Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking},
      author = {Wang, Yue and Wan, Yao and Zhang, Chenwei and Bai, Lu and Cui, Lixin and Yu, Philip S.},
      booktitle = {2019 IEEE International Conference on Data Mining (ICDM)},
      pages = {1366--1371},
      year = {2019},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1908.04573.pdf}
    }
    
  6. IJCNN
    Missing entity synergistic completion across multiple isomeric online knowledge libraries Bowen Dong, Jiawei Zhang, Chenwei Zhang, Yang Yang, and Philip S. Yu In 2019 International Joint Conference on Neural Networks (IJCNN) 2019 [BibTex]
    @inproceedings{dong2019missing,
      abbr = {IJCNN},
      topic = {Knowledge Graph},
      title = {Missing entity synergistic completion across multiple isomeric online knowledge libraries},
      author = {Dong, Bowen and Zhang, Jiawei and Zhang, Chenwei and Yang, Yang and Yu, Philip S.},
      booktitle = {2019 International Joint Conference on Neural Networks (IJCNN)},
      pages = {1--8},
      year = {2019},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1905.06365.pdf}
    }
    
  7. WWW
    MCVAE: Margin-based Conditional Variational Autoencoder for Relation Classification and Pattern Generation Fenglong Ma, Yaliang Li, Chenwei Zhang, Jing Gao, Nan Du, and Wei Fan In The World Wide Web Conference 2019 [BibTex]
    @inproceedings{ma2019mcvae,
      abbr = {WWW},
      topic = {NLP},
      title = {MCVAE: Margin-based Conditional Variational Autoencoder for Relation Classification and Pattern Generation},
      author = {Ma, Fenglong and Li, Yaliang and Zhang, Chenwei and Gao, Jing and Du, Nan and Fan, Wei},
      booktitle = {The World Wide Web Conference},
      pages = {3041--3048},
      year = {2019},
      pdf = {http://www.personal.psu.edu/ffm5105/files/2019/www19.pdf}
    }
    
  8. arXiv
    Hierarchical Semantic Correspondence Learning for Post-Discharge Patient Mortality Prediction Shaika Chowdhury, Chenwei Zhang, Philip S. Yu, and Yuan Luo arXiv preprint arXiv:1910.06492 2019 [BibTex]
    @article{chowdhury2019hierarchical,
      abbr = {arXiv},
      topic = {Graph Mining},
      title = {Hierarchical Semantic Correspondence Learning for Post-Discharge Patient Mortality Prediction},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S. and Luo, Yuan},
      journal = {arXiv preprint arXiv:1910.06492},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1910.06492.pdf}
    }
    
  9. arXiv
    Mixed Pooling Multi-View Attention Autoencoder for Representation Learning in Healthcare Shaika Chowdhury, Chenwei Zhang, Philip S. Yu, and Yuan Luo arXiv preprint arXiv:1910.06456 2019 [BibTex]
    @article{chowdhury2019mixed,
      abbr = {arXiv},
      topic = {Graph Mining},
      title = {Mixed Pooling Multi-View Attention Autoencoder for Representation Learning in Healthcare},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S. and Luo, Yuan},
      journal = {arXiv preprint arXiv:1910.06456},
      year = {2019},
      pdf = {https://arxiv.org/pdf/1910.06456.pdf}
    }
    

2018

  1. KDD
    On the generative discovery of structured medical knowledge Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S. Yu In Proceedings of the 24th ACM SIGKDD international conference on Knowledge Discovery & Data Mining 2018 [BibTex] [Video]
    @inproceedings{zhang2018generative,
      abbr = {KDD},
      topic = {Knowledge Graph},
      title = {On the generative discovery of structured medical knowledge},
      author = {Zhang, Chenwei and Li, Yaliang and Du, Nan and Fan, Wei and Yu, Philip S.},
      booktitle = {Proceedings of the 24th ACM SIGKDD international conference on Knowledge Discovery \& Data Mining},
      pages = {2720--2728},
      year = {2018},
      pdf = {https://dl.acm.org/doi/pdf/10.1145/3219819.3220010},
      video = {https://www.youtube.com/watch?v=ZxmcsSKp0ko}
    }
    
  2. EMNLP
    Zero-shot User Intent Detection via Capsule Neural Networks Congying Xia*, Chenwei Zhang*, Xiaohui Yan, Yi Chang, and Philip S. Yu In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018 [BibTex] [Code] [Video]
    @inproceedings{xia2018zero,
      abbr = {EMNLP},
      topic = {NLP},
      selected = {1},
      title = {Zero-shot User Intent Detection via Capsule Neural Networks},
      author = {Xia*, Congying and Zhang*, Chenwei and Yan, Xiaohui and Chang, Yi and Yu, Philip S.},
      booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
      pages = {3090--3099},
      year = {2018},
      pdf = {https://arxiv.org/pdf/1809.00385.pdf},
      video = {https://vimeo.com/305945714},
      code = {https://github.com/congyingxia/ZeroShotCapsule}
    }
    
  3. WWW
    Multi-task pharmacovigilance mining from social media posts Shaika Chowdhury, Chenwei Zhang, and Philip S. Yu In Proceedings of the 2018 World Wide Web Conference 2018 [BibTex]
    @inproceedings{chowdhury2018multi,
      abbr = {WWW},
      topic = {NLP},
      title = {Multi-task pharmacovigilance mining from social media posts},
      author = {Chowdhury, Shaika and Zhang, Chenwei and Yu, Philip S.},
      booktitle = {Proceedings of the 2018 World Wide Web Conference},
      pages = {117--126},
      year = {2018},
      pdf = {https://arxiv.org/pdf/1801.06294.pdf}
    }
    
  4. Ant. Propag.
    DOA estimation based on deep neural networks with robustness to array imperfections Zhang-Meng Liu, Chenwei Zhang, and Philip S. Yu IEEE Transactions on Antennas and Propagation 2018 [HTML] [BibTex] [Code]
    @article{liu2018direction,
      abbr = {Ant. Propag.},
      topic = {ML & Misc.},
      title = {DOA estimation based on deep neural networks with robustness to array imperfections},
      author = {Liu, Zhang-Meng and Zhang, Chenwei and Yu, Philip S.},
      journal = {IEEE Transactions on Antennas and Propagation},
      volume = {66},
      number = {12},
      pages = {7315--7327},
      year = {2018},
      publisher = {IEEE},
      html = {https://ieeexplore.ieee.org/document/8485631},
      code = {https://github.com/LiuzmNUDT/DNN-DOA}
    }
    
  5. BigData
    Market Abnormality Period Detection via Co-movement Attention Model Yue Wang, Chenwei Zhang, Shen Wang, Philip S. Yu, Lu Bai, and Lixin Cui In 2018 IEEE International Conference on Big Data (Big Data) 2018 [HTML] [BibTex]
    @inproceedings{wang2018market,
      abbr = {BigData},
      topic = {Graph Mining},
      title = {Market Abnormality Period Detection via Co-movement Attention Model},
      author = {Wang, Yue and Zhang, Chenwei and Wang, Shen and Yu, Philip S. and Bai, Lu and Cui, Lixin},
      booktitle = {2018 IEEE International Conference on Big Data (Big Data)},
      pages = {1514--1523},
      year = {2018},
      organization = {IEEE},
      html = {https://ieeexplore.ieee.org/document/8621877}
    }
    
  6. BigData
    Data-driven blockbuster planning on online movie knowledge library Ye Liu, Jiawei Zhang, Chenwei Zhang, and Philip S. Yu In 2018 IEEE International Conference on Big Data (Big Data) 2018 [BibTex]
    @inproceedings{liu2018data,
      abbr = {BigData},
      topic = {Knowledge Graph},
      title = {Data-driven blockbuster planning on online movie knowledge library},
      author = {Liu, Ye and Zhang, Jiawei and Zhang, Chenwei and Yu, Philip S.},
      booktitle = {2018 IEEE International Conference on Big Data (Big Data)},
      pages = {1612--1617},
      year = {2018},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1810.10175.pdf}
    }
    
  7. ICBK
    Deep Co-Investment Network Learning for Financial Assets Yue Wang, Chenwei Zhang, Shen Wang, Philip S. Yu, Lu Bai, and Lixin Cui In 2018 IEEE International Conference on Big Knowledge (ICBK) 2018 [BibTex]
    @inproceedings{wang2018deep,
      abbr = {ICBK},
      topic = {Graph Mining},
      title = {Deep Co-Investment Network Learning for Financial Assets},
      author = {Wang, Yue and Zhang, Chenwei and Wang, Shen and Yu, Philip S. and Bai, Lu and Cui, Lixin},
      booktitle = {2018 IEEE International Conference on Big Knowledge (ICBK)},
      pages = {41--48},
      year = {2018},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1809.04227.pdf}
    }
    
  8. arXiv
    Finding similar medical questions from question answering websites Yaliang Li, Liuyi Yao, Nan Du, Jing Gao, Qi Li, Chuishi Meng, Chenwei Zhang, and Wei Fan arXiv preprint arXiv:1810.05983 2018 [BibTex]
    @article{li2018finding,
      abbr = {arXiv},
      topic = {NLP},
      title = {Finding similar medical questions from question answering websites},
      author = {Li, Yaliang and Yao, Liuyi and Du, Nan and Gao, Jing and Li, Qi and Meng, Chuishi and Zhang, Chenwei and Fan, Wei},
      journal = {arXiv preprint arXiv:1810.05983},
      year = {2018},
      pdf = {https://arxiv.org/pdf/1810.05983.pdf}
    }
    

2017

  1. Big Data
    Bringing semantic structures to user intent detection in online medical queries Chenwei Zhang, Nan Du, Wei Fan, Yaliang Li, Chun-Ta Lu, and Philip S. Yu In 2017 IEEE International Conference on Big Data (Big Data) 2017 [BibTex]
    @inproceedings{zhang2017bringing,
      abbr = {Big Data},
      topic = {NLP},
      title = {Bringing semantic structures to user intent detection in online medical queries},
      author = {Zhang, Chenwei and Du, Nan and Fan, Wei and Li, Yaliang and Lu, Chun-Ta and Yu, Philip S.},
      booktitle = {2017 IEEE International Conference on Big Data (Big Data)},
      pages = {1019--1026},
      year = {2017},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1710.08015.pdf}
    }
    
  2. ICDM
    BL-MNE: emerging heterogeneous social network embedding through broad learning with aligned autoencoder Jiawei Zhang, Congying Xia, Chenwei Zhang, Limeng Cui, Yanjie Fu, and Philip S. Yu In 2017 IEEE International Conference on Data Mining (ICDM) 2017 [BibTex]
    @inproceedings{zhang2017bl,
      abbr = {ICDM},
      topic = {Graph Mining},
      title = {BL-MNE: emerging heterogeneous social network embedding through broad learning with aligned autoencoder},
      author = {Zhang, Jiawei and Xia, Congying and Zhang, Chenwei and Cui, Limeng and Fu, Yanjie and Yu, Philip S.},
      booktitle = {2017 IEEE International Conference on Data Mining (ICDM)},
      pages = {605--614},
      year = {2017},
      organization = {IEEE},
      pdf = {https://arxiv.org/pdf/1711.09409.pdf}
    }
    
  3. KDD
    Deepmood: modeling mobile phone typing dynamics for mood detection Bokai Cao, Lei Zheng, Chenwei Zhang, Philip S. Yu, Andrea Piscitello, John Zulueta, Olu Ajilore, Kelly Ryan, and Alex D Leow In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017 [BibTex] [Code] [Video]
    @inproceedings{cao2017deepmood,
      abbr = {KDD},
      topic = {ML & Misc.},
      title = {Deepmood: modeling mobile phone typing dynamics for mood detection},
      author = {Cao, Bokai and Zheng, Lei and Zhang, Chenwei and Yu, Philip S. and Piscitello, Andrea and Zulueta, John and Ajilore, Olu and Ryan, Kelly and Leow, Alex D},
      booktitle = {Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
      pages = {747--755},
      year = {2017},
      pdf = {https://arxiv.org/pdf/1803.08986.pdf},
      video = {https://www.youtube.com/watch?v=w1TfSp8NpfM},
      code = {https://www.cs.uic.edu/~bcao1/code/DeepMood.py}
    }
    
  4. CIKM
    Broad learning based multi-source collaborative recommendation Junxing Zhu, Jiawei Zhang, Lifang He, Quanyuan Wu, Bin Zhou, Chenwei Zhang, and Philip S. Yu In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management 2017 [BibTex]
    @inproceedings{zhu2017broad,
      abbr = {CIKM},
      topic = {Graph Mining},
      title = {Broad learning based multi-source collaborative recommendation},
      author = {Zhu, Junxing and Zhang, Jiawei and He, Lifang and Wu, Quanyuan and Zhou, Bin and Zhang, Chenwei and Yu, Philip S.},
      booktitle = {Proceedings of the 2017 ACM on Conference on Information and Knowledge Management},
      pages = {1409--1418},
      year = {2017},
      pdf = {http://www.ifmlab.org/files/paper/2017_cikm_paper2.pdf}
    }
    
  5. IEEE Access
    CHRS: cold start recommendation across multiple heterogeneous information networks Junxing Zhu, Jiawei Zhang, Chenwei Zhang, Quanyuan Wu, Yan Jia, Bin Zhou, and Philip S. Yu IEEE Access 2017 [BibTex]
    @article{zhu2017chrs,
      abbr = {IEEE Access},
      topic = {Graph Mining},
      title = {CHRS: cold start recommendation across multiple heterogeneous information networks},
      author = {Zhu, Junxing and Zhang, Jiawei and Zhang, Chenwei and Wu, Quanyuan and Jia, Yan and Zhou, Bin and Yu, Philip S.},
      journal = {IEEE Access},
      volume = {5},
      pages = {15283--15299},
      year = {2017},
      publisher = {IEEE},
      pdf = {https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7976276}
    }
    

2016 & Before

  1. WWW
    Mining user intentions from medical queries: A neural network based heterogeneous jointly modeling approach Chenwei Zhang, Wei Fan, Nan Du, and Philip S. Yu In Proceedings of the 25th International Conference on World Wide Web 2016 [BibTex] [Slides]
    @inproceedings{zhang2016mining,
      abbr = {WWW},
      topic = {NLP},
      title = {Mining user intentions from medical queries: A neural network based heterogeneous jointly modeling approach},
      author = {Zhang, Chenwei and Fan, Wei and Du, Nan and Yu, Philip S.},
      booktitle = {Proceedings of the 25th International Conference on World Wide Web},
      pages = {1373--1384},
      year = {2016},
      pdf = {http://gdac.uqam.ca/WWW2016-Proceedings/proceedings/p1373.pdf},
      slides = {https://drive.google.com/file/d/0B0NF2TxreW8hTUhRb1ljVldadlk/view}
    }
    
  2. CIKM
    Multi-source hierarchical prediction consolidation Chenwei Zhang, Sihong Xie, Yaliang Li, Jing Gao, Wei Fan, and Philip S. Yu In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management 2016 [BibTex] [Poster]
    @inproceedings{zhang2016multi,
      abbr = {CIKM},
      topic = {Graph Mining},
      title = {Multi-source hierarchical prediction consolidation},
      author = {Zhang, Chenwei and Xie, Sihong and Li, Yaliang and Gao, Jing and Fan, Wei and Yu, Philip S.},
      booktitle = {Proceedings of the 25th ACM International on Conference on Information and Knowledge Management},
      pages = {2251--2256},
      year = {2016},
      pdf = {https://arxiv.org/pdf/1608.03344.pdf},
      poster = {https://drive.google.com/file/d/0B0NF2TxreW8hZFJYaXlxSmN2NFE/view}
    }
    
  3. ICDM
    Augmented LSTM framework to construct medical self-diagnosis android Chaochun Liu, Huan Sun, Nan Du, Shulong Tan, Hongliang Fei, Wei Fan, Tao Yang, Hao Wu, Yaliang Li, and Chenwei Zhang In 2016 IEEE 16th International Conference on Data Mining (ICDM) 2016 [BibTex]
    @inproceedings{liu2016augmented,
      abbr = {ICDM},
      topic = {NLP},
      title = {Augmented LSTM framework to construct medical self-diagnosis android},
      author = {Liu, Chaochun and Sun, Huan and Du, Nan and Tan, Shulong and Fei, Hongliang and Fan, Wei and Yang, Tao and Wu, Hao and Li, Yaliang and Zhang, Chenwei},
      booktitle = {2016 IEEE 16th International Conference on Data Mining (ICDM)},
      pages = {251--260},
      year = {2016},
      organization = {IEEE},
      pdf = {http://web.cse.ohio-state.edu/~sun.397/docs/selfdiagnosis-icdm.pdf}
    }
    
  4. TBD
    Extracting medical knowledge from crowdsourced question answering website Yaliang Li, Chaochun Liu, Nan Du, Wei Fan, Qi Li, Jing Gao, Chenwei Zhang, and Hao Wu IEEE Transactions on Big Data 2016 [HTML] [BibTex]
    @article{li2016extracting,
      abbr = {TBD},
      topic = {Knowledge Graph},
      title = {Extracting medical knowledge from crowdsourced question answering website},
      author = {Li, Yaliang and Liu, Chaochun and Du, Nan and Fan, Wei and Li, Qi and Gao, Jing and Zhang, Chenwei and Wu, Hao},
      journal = {IEEE Transactions on Big Data},
      year = {2016},
      publisher = {IEEE},
      html = {https://ieeexplore.ieee.org/abstract/document/7572985}
    }
    
  5. KBS
    A new method to determine basic probability assignment using core samples Chenwei Zhang, Yong Hu, Felix TS Chan, Rehan Sadiq, and Yong Deng Knowledge-Based Systems 2014 [BibTex]
    @article{zhang2014new,
      abbr = {KBS},
      topic = {ML & Misc.},
      title = {A new method to determine basic probability assignment using core samples},
      author = {Zhang, Chenwei and Hu, Yong and Chan, Felix TS and Sadiq, Rehan and Deng, Yong},
      journal = {Knowledge-Based Systems},
      volume = {69},
      pages = {140--149},
      year = {2014},
      publisher = {Elsevier}
    }