Evidently, criminal network activities have shown an increasing trend in terms of complexity and frequency, particularly with the advent of social media and modern telecommunication systems. In these circumstances, law enforcement agencies have to be armed with advance criminal network analysis (CNA) tools capable of uncovering with speed, probable key hidden relationships (links/edges) and players (nodes) in order to anticipate, undermine and cripple organised crime syndicates and activities. The development of link prediction models for network orientated domains is based on Social Network Analysis (SNA) methods and models. The key objective of this research is to develop a link prediction model that incorporates a fusion of metadata (i.e. environment data sources such as arrest warrants, judicial judgement, wiretap records and police station proximity) with a time-evolving criminal dataset in order to be aware of real-world situations to improve the quality of link prediction. Based on the review of related work, most of the models are constructed by leveraging on classical machine learning (ML) techniques such as support vector machine (SVM) without metadata fusion. The problem with the use of classical ML techniques is the lack of available domain dataset which is sufficiently large for training purpose. Compared to sociaI network, criminal network dataset by nature tends to relatively much smaller. In view of this, deep reinforcement learning (DRL) technique which could improve the training of models with the self-generated dataset is leveraged upon to construct the model. In this research, a purely time-evolving DRL model (TDRL-CNA) without metadata fusion is designed as a baseline for comparison with the metadata fusion model (FDRL-CNA). The experimental results show that the predictive accuracy of new and recurrent links by the FDRL-CNA model is higher than the baseline TDRL-CNA model that does not factor data fusion from different data sources.