Digital Libraries (cs.DL)

  • PDF
    Finding hot topics in scholarly fields can help researchers to keep up with the latest concepts, trends, and inventions in their field of interest. Due to the rarity of complete large-scale scholarly data, earlier studies target this problem based on manual topic extraction from a limited number of domains, with their focus solely on a single feature such as coauthorship, citation relations, and etc. Given the compromised effectiveness of such predictions, in this paper we use a real scholarly dataset from Microsoft Academic Graph, which provides more than 12000 topics in the field of Computer Science (CS), including 1200 venues, 14.4 million authors, 30 million papers and their citation relations over the period of 1950 till now. Aiming to find the topics that will trend in CS area, we innovatively formalize a hot topic prediction problem where, with joint consideration of both inter- and intra-topical influence, 17 different scientific features are extracted for comprehensive description of topic status. By leveraging all those 17 features, we observe good accuracy of topic scale forecasting after 5 and 10 years with R2 values of 0.9893 and 0.9646, respectively. Interestingly, our prediction suggests that the maximum value matters in finding hot topics in scholarly fields, primarily from three aspects: (1) the maximum value of each factor, such as authors' maximum h-index and largest citation number, provides three times the amount of information than the average value in prediction; (2) the mutual influence between the most correlated topics serve as the most telling factor in long-term topic trend prediction, interpreting that those currently exhibiting the maximum growth rates will drive the correlated topics to be hot in the future; (3) we predict in the next 5 years the top 100 fastest growing (maximum growth rate) topics that will potentially get the major attention in CS area.
  • PDF
    Since the MEDLINE database was released, the number of documents indexed by this entity has risen every year. Several tools have been developed by the National Institutes of Health (NIH) to query this corpus of scientific publications. However, in terms of advances in big data, text-mining and data science, an option to build a local relational database containing all metadata available on MEDLINE would be truly useful to optimally exploit these resources. MEDOC (MEdline DOwnloading Contrivance) is a Python program designed to download data on an FTP and to load all extracted information into a local MySQL database. It took MEDOC 4 days and 17 hours to load the 26 million documents available on this server onto a standard computer. This indexed relational database allows the user to build complex and rapid queries. All fields can thus be searched for desired information, a task that is difficult to accomplish through the PubMed graphical interface. MEDOC is free and publicly available at

Recent comments

lucy.vanderwende May 07 2015 16:13 UTC

The authors will want to look at work that Simone Teufel has done, in particular her Argumentative Zoning, which discusses the stance that the paper author takes with respect to the citations in that paper.

Steve Flammia Feb 05 2013 22:31 UTC

Here's our man:
The links on the left are in Arabic, but some of them still have English text when you click through.

He also has an El Naschie-like photo montage of himself posing with famous scientists:

He is a prolific autho

Māris Ozols Feb 05 2013 15:57 UTC

This one is an interesting read...

It provides a novel way of measuring the impact factor and h-index as a percentage. In this way we can compare different journals and authors on an absolute scale! The only drawback is that the new method works only for Arabic journals.

Needless to say, the p

Earl Campbell Feb 06 2013 07:51 UTC

There is certainly some originality in the research of Abdel-Aty. For instance, he seems to be the first person to have gone realised that the concept of entanglement sudden death can be inverted to study entanglement sudden *birth*.

Earl Campbell Feb 06 2013 07:57 UTC

Also, seems that Barry Sanders has had some interaction with this Abdel-Aty before. According to:
Barry used to be managing editor of the AMIS journal, but is no involved in the journal!