-
arXiv:1303.5367 [pdf, ps, other]
Taming the zoo - about algorithms implementation in the ecosystem of Apache Hadoop
Abstract: Content Analysis System (CoAnSys) is a research framework for mining scientific publications using Apache Hadoop. This article describes the algorithms currently implemented in CoAnSys including classification, categorization and citation matching of scientific publications. The size of the input data classifies these algorithms in the range of big data problems, which can be efficiently solved on… ▽ More
Submitted 16 March, 2014; v1 submitted 21 March, 2013; originally announced March 2013.
Comments: This paper (with changed content) appeared under the title "Content Analysis of Scientific Articles in Apache Hadoop Ecosystem" in "Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation", "Studies in Computational Intelligence", Volume 541, 2014, http://link.springer.com/book/10.1007/978-3-319-04714-0
ACM Class: H.3.7
-
arXiv:1303.5234 [pdf, ps, other]
How to perform research in Hadoop environment not losing mental equilibrium - case study
Abstract: Conducting a research in an efficient, repetitive, evaluable, but also convenient (in terms of development) way has always been a challenge. To satisfy those requirements in a long term and simultaneously minimize costs of the software engineering process, one has to follow a certain set of guidelines. This article describes such guidelines based on the research environment called Content Analysis… ▽ More
Submitted 16 March, 2014; v1 submitted 21 March, 2013; originally announced March 2013.
Comments: This paper (with changed content) appeared under the title "Chrum: The Tool for Convenient Generation of Apache Oozie Workflows" in "Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation", "Studies in Computational Intelligence", Volume 541, 2014, http://link.springer.com/book/10.1007/978-3-319-04714-0
ACM Class: H.3.7