Title: Predicting the popularity of online articles with random forests
Authors: Vanwinckelen, Gitte
Meert, Wannes
Issue Date: Sep-2014
Host Document: pages:1-6
Conference: ECML/PKDD Workshop on Predictive Web Analytics location:Nancy, France date:19 September 2014
Article number: 4
Abstract: In this paper, we describe our submission to the predictive
web analytics Discovery Challenge at ECML/PKDD 2014.
The main goal of the challenge was to predict the number of
visitors of a web page 48 hours in the future after observing
this web page for an hour. An additional goal was to pre-
dict the number of times the URL appeared in a tweet on
Twitter and the number of times a Facebook message con-
taining the URL was liked. We present an analysis of the
time series data generated by the Chartbeat web analytics
engine, which was made available for this competition, and
the approach we used to predict page visits. Our model is
based on random forest regression and learned on a set of
features derived from the given time series data to capture
the expected amount of visits, rate of change and temporal
effect. Our approach won second place for predicting the
number of visitors and the number of Facebook likes, and
first place for predicting the number of tweets.
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section

Files in This Item:
File Description Status SizeFormat
discovery_challenge.pdfPredicting the popularity of online articles with random forests Published 1244KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.