Applying multiple kernel learning and support vector machine for solving the multicriteria and nonlinearity problems of traffic flow prediction
Authors
Chenyun Yu,
Corresponding author
Department of Civil and Architectural Engineering, City University of Hong Kong, Hong Kong
Correspondence to: Chenyun Yu, Department of Civil and Architectural Engineering, City University of Hong Kong, Hong Kong. E-mail: sharonyun1202@yahoo.com.cn
An accurate and reliable traffic flow prediction is significant to the success of a transportation project, because of its directly determining the overall approach to funding and the level of tolls need to be charged during the service operation, and as a result affecting both the project owner or investor's interests and financial risk. The uncertainty of traffic flow could cause revenue risk, which may bring financial problems to private sector. Meanwhile, it may cause an unreasonable increase of toll fee or prolonged concession period, and as a result affect the users' willingness to pay and the expected income of the project owner (usually the government). Additionally, as Delmon [1] mentioned, the private sectors suffer risk from traffic flow prediction of transport management in road projects and is also one of the major weakness of transport projects. Furthermore, the aim of traffic flow analysis is to develop a model or system that would enable vehicles to reach their destination in the shortest possible time using the maximum service capacity, which is called as traffic assignment. Before moving on to the traffic assignment, an estimation of the traffic trips is critical. Therefore, a reliable traffic flow prediction can assist to the decision-making process of traffic assignment, and help the decision makers to manage the revenue risk, so as to reduce the financial risk.
Traffic flow prediction is a complex problem, because of the nature of such as nonlinearity and multicriteria. However, the decision makers usually perform this task on the basis of their own experiences, knowledge, expertise, and historical data, which could result in that the outcomes differ significantly from one person to another. This kind of limitation has weakened the accuracy and efficiency of solving the traffic flow problem. Meanwhile, traffic flow prediction is regarded as a dynamic problem incorporating various influential factors such as toll fee level and competition from other services. Hence, the selection, classification, and ranking of the variables should be concerned with. Additionally, the linear models in existing research studies cannot handle the problem with the nature of nonlinearity, and thus nonlinear models are needed to solve the problem. Furthermore, the qualitative factors should be dealt with and transferred into quantitative ones, so as to standardize all the factors. And because not all the variables are likely to be considered equally significant, scaling of data is needed, and data should be normalized before being employed for model training, after which advanced methods are required to be applied to give weight to these variables.
Recently, various research works have been conducted to solve the traffic flow-relation problems, such as the traffic assignment models [2, 3], development of traffic flow model and control [4-7], traffic signal coordination or optimization [8, 9], and traffic flow prediction [10, 11, 13]. For the traffic assignment model, as the origin–destination demand is time varying at the peak periods of congestion, dynamic assignment models are needed [2]. Szeto et al. [3] propose a multiclass dynamic traffic assignment problem that considers the random evolution of traffic states. Previous research works of the prediction mainly concentrated on linear and nonlinear models. Wild [35] conducted studies on a forecasting method on the basis of classified historical patterns for traffic volume, where the developed forecasting procedure has some advantages compared with conventional approaches. While a combination approach based on principal component analysis and combined neural network (CNN) was designed for volume forecasting, in which the relationship between the main affecting factors and the forecasted volume was considered [11]. On the basis of the route-changing behavior of end-users, Lee [12] proposed a scheduling model to calculate the traffic delay of vehicles by microscopic simulation. Also, Chen and Muller [13] stated that accurate traffic flow forecasting is beneficial to road network management and compared the differences of neural networks and statistical models for traffic volume forecasting in their research, which showed the “simple dynamic model” had the best forecasting performance. Meanwhile, Zhang and Liu [10] employed one-step-ahead out-of-sample method to predict the peak and nonpeak traffic, and in their research, several linear techniques are combined. Each method represents its own pros and cons, with regard to different application purposes. The process of prediction deals with the relationship between the predicted traffic flow and its influential factors, which is regarded as a nonlinear problem, and thus those linear models cannot handle this problem well than nonlinear models. Among the nonlinear models, artificial neural networks (ANN) is viewed as the most commonly used method and can minimize the errors of the coefficient adjustment of comparable transactions. However, apart from a long training time, the major drawback of ANN is the generation of a local minimum solution [14], which could affect the quality of outcomes. Support vector machine (SVM) is concerned primarily with learning and curve fitting [15] and widely applied on prediction and forecasting. According to Yonas et al. [16], the merit of SVM is that due to Mercers conditions on the kernels, the corresponding optimization problems are convex and have no local minima. However, recent developments reported in the literature [17] on SVM and other kernel methods have demonstrated the need to consider multiple kernel methods. Therefore, it is worth investigating whether multiple kernel learning (MKL) can improve the traffic flow prediction so as to enhance the decision-making process. In addition, Yu et al. [18] developed a hybrid model, on the basis of SVM technique, to predict bus arrival times. Table 1 shows the comparison of these methods with regard to their advantages, limitations, and computing software.
Table 1. Comparison of the prediction methods.
Methods
Expert system
Linear methods
ANN
SVM
MKL
ANN, artificial neural networks; SVM, support vector machine; MKL, multiple kernel learning.
Advantages
Based on knowledge elicitation; simulation of expertise
Fast training time; robust than expert system
Robust towards noisy data and hence suitable for construction industry applications; not rule-based and hence can handle wide issues, e.g. unknown, complex, and dynamic; relevant hybrid constructs can improve the neural network results, e.g. combinations such as rules and fuzzy approaches; identities underlying trends.
Solve the local minima problem of ANN; more stable compared with ANN.
Describe the problem from multiple perspectives
Limitations
Lack of robustness and rigidity
Not stable; cannot deal with nonlinear problem
Long training time; not stable; black box approach; local minima problem.
Parameter debugging is based on experience.
(Not mentioned in the literature sources)
Computing software
No need
“SPSS”
“MATLAB” computing program
“LIBSVM” system
“MATLAB” computing program
To develop the traffic flow prediction model, two tiers are integrated in this research (Figure 1). The first tier concentrates on investigating the key factors affecting the traffic volume, whereas the second tier focuses on setting up a prediction model with its verification through a case study. The factors were selected and summarized on the basis of previous research works, and then classified and further defined. Because of some factors are quantitative while some are qualitative, five-point scale method was employed to transfer them to the same standard. Thus, the output of the first tier is the input of the second tier. In the second tier, SVM and MKL were applied for traffic flow prediction, with the influential factors and the traffic flow as the input and output variables, respectively. To verify the proposed model, the traffic volume data of a tunnel project in Hong Kong from 2000 to 2010 were collected.
2 VARIABLE OF TRAFFIC FLOW PREDICTION
2.1 A summary of the influential factors
Various factors were involved in the risk management of infrastructure project. Revenue risk is related to traffic shortfall, the volatility of prices, or demand of the target product and service [19], whereas income risk is associated with alternative routes and connecting road network [20]. According to Lam [21], there is daunting risk of inadequate traffic volume for toll road, and Lang [22] addressed the uncertainty over the quantity of the demand and price risk in toll road projects. Meanwhile, the operator of the Chanel Tunnel faces foreign exchange exposure risk and demand change risk [23]. Moreover, network links [24], capacity of the product/the road [24, 25], market competition/change [25, 26], policy change [26], and traffic flow [25, 27] are also affecting the toll price to some degree. For the factors directly affecting the traffic flow, Ng et al. [28] addressed that traffic volume changes consequence upon the availability of alternatives and the economic growth, and the traffic volume may vary in different stages because of the competition from old or new alternatives or the increase of maintenance cost. Demographic change, demand shift, competition, cost increase (maintenance), and willingness to pay should also be taken into consideration for traffic forecast, so as to reduce the traffic flow risk [1]. From the research of Tang et al. [4], it illustrates that good road condition can enhance the speed and flow of traffic flow, whereas bad road condition will reduce the speed and flow. Also, Zhou et al. [25] pointed out that traffic volume is related to alternative route, convenience, and economic recession, whereas traffic flow forecast are affected by population growth, economic growth, private car growth, and competition from alternatives [22]. On the basis of the factors aforementioned, further summary can be made. In Hong Kong, when the toll fees increase, the traffic volume will decline. For example, in 1998, the toll fees of Eastern Harbor Crossing (EHC) were increased by 3 to 15 dollars for different vehicles, but the traffic volume declined from 31 to 26 million vehicles in 1997, with a sharp downward trend. So, the factors influencing revenue level can also affect traffic volume. Table 2 shows the factors affecting the revenue level and the volume of transportation project under a Build – Operate – Transfer contract.
Table 2. Factors affecting the revenue level and the volume of transportation project under Build – Operate – Transfer mode (I).
Convenience (better access at both ends to road and highway networks)
Economic growth
Maintenance cost
Cost increase (maintenance)
Population growth
Economic recession
Demographic changes
Economic growth
Demand shift
Private car growth
Willingness to pay
Competition
2.2 Variables selection
On the basis of the factors summarized in Table 2, there are two dimensions integrated: factors affecting the volume and the revenue (toll) level. For factors affecting the revenue level, the factors of “traffic shortfall” and “inadequate traffic volume” express the change of traffic flow, so both of them can be classified as “traffic flow.” In the same way, the factors of “demand risk,” “demand for products and service sold,” “uncertainty over the quantity of the demand,” and “demand change” are related to the “demand,” so these factors can be called as “demand for services.” What is more, “network links” and “insufficient connecting road network” are associated with road network links, and thus they can be combined together. Besides, “the volatility of prices” and “price risk” indicate the change of the toll fee, so these two factors can be categorized as toll fee level. In conclusion, there are nine factors affecting the revenue level; they are traffic flow, demand for services, market competition, network links, toll fee level, capacity of the road, market changes, policy changes, and foreign exchange exposure.
For factors affecting the volume, there are also nine factors: competition of alternative services, maintenance cost, economic growth/recession, demographic/population changes, demand shift, willingness to pay, toll adjustment mechanism, private car growth, and convenience. Because of the factor of “cost increase” is mainly involved in the operation period, it can be categorized as maintenance cost, whereas the factors of “demographic changes” and “population growth” are classified as demographic/population changes. Because “economic growth” and “economic recession” are associated with the economic change, these two factors can be attributed to one factor, “economic growth/recession.” Meanwhile, there is an interaction between the toll fees and traffic volume, the traffic flow changes with the change of toll price, so the traffic volume change can also be represented by the toll fee level.
2.3 Variables classification
Table 3 shows the classification of factors affecting the traffic volume. There are 13 factors affecting traffic volume on the basis of the aforementioned summary. These factors can be classified into two categories, internal factors and external factors. Internal factors include capacity of the services, maintenance cost, toll fee level, and toll adjustment mechanism, whereas external factors can be further divided as: microeconomic and macroeconomic factors. Microeconomic factors comprise network links/convenience and competition of alternative service. The former means the ability to link with existing service and thus brings more convenience, as a result, connecting road network, and convenience can be categorized as network links/convenience. Both market competition and alternatives means the competitiveness of target service, which may be affected by other similar service. So, these factors can be represented by competition from alternative service. According to Shen and Wu [26], market changes are related to alternatives; therefore, this factor can be attributable to the factor of competition of alternative services. On the other hand, macroeconomic factors consist of economic recession/growth, policy change, foreign exchange exposure, private car growth, demographic/population change, and willingness to pay. In addition, the demand can be reflected through private car growth, demographic/population change, and willingness to pay. So, these three factors will be used for analysis afterward.
Table 3. Classification of factors affecting the traffic volume.
Factors
References
a
b
c
d
e
f
g
h
i
j
k
Total
Keys: a, Saleh and Sammer [24]; b, Zhou et al. [25]; c, Ng [28]; d, Shen and Wu [26]; e, Delmon [1]; f, Grimsey and Lewis [19]; g, Zhang and Kumaraswamy [27]; h, Lam [21]; i, Lang [22]; j, Kreydieh [23]; k, Walker and Smith [20].
Internal Factors
Capacity of the product
√
√
2
Maintenance cost
√
√
2
Toll fee level
√
√
√
√
√
5
Toll adjustment mechanism
√
1
External Factors
Micro
Network links/ Convenience (better access at both ends to road and highway networks)
√
√
√
3
Competition from alternative service
√
√
√
√
√
√
6
Macro
Economic recession/growth
√
√
√
3
Foreign exchange exposure
√
1
Demand of the service
√
√
√
√
4
Demographic change (population growth)
√
√
2
Private car growth
√
1
Policy change
√
1
Willingness to pay
√
1
2.4 Normalization of the variables
There are 13 factors affecting traffic flow on the basis of the aforementioned discussion. These factors can be classified into two categories, internal factors and external factors. The internal factors include capacity of the services, maintenance cost, toll fee level, and toll adjustment mechanism, whereas external factors can be further divided as: microeconomic and macroeconomic factors. Microeconomic factors comprise network links and competition from alternative service. The former means the ability to link with existing service and thus brings more convenience. Both market competition and alternatives means the competitiveness of target service, which may be affected by other similar service. So, these factors can be represented by competition of alternative service. According to Shen and Wu [26], market changes are related to alternatives; therefore, this factor can be attributable to the factor of competition of alternative services. On the other hand, macroeconomic factors consist of economic recession/growth, policy change, foreign exchange exposure, private car growth, demographic/population change, and willingness to pay. In addition, the demand can be reflected through private car growth, demographic/population change, and willingness to pay. So, these three factors will be used for analysis afterward. Among these factors, some are quantitative, whereas some are qualitative, which should be quantified before model training. As there is some difference of the significance of these variables, these data need to be scaled after qualification. The use of scales in attitude testing has been well established for several decades, and Stevens [29] refers to the use of numerically based scales. The utilization of the five-point scale method in many studies [30] indicates that this method can efficiently reflect the meaning of the subjective judgments. Among these factors, some are quantitative, such as the capacity of the service, maintenance cost, toll fee level, network links/convenience (better access at both ends to road and highway networks), competition of alternative service, private car growth, demographic change, and economic recession/growth, so these factors can be employed directly. Although some are qualitative, such as the factors of demand of the service, willingness to pay, toll adjustment mechanism, foreign exchange exposure, and policy change, therefore, these data should be quantified before model training. Table 4 as follows shows qualification of each variable, and Table 5 describes the scaling assumption for each variable.
Table 4. Qualification of each factor.
Variable
Qualification
GDP, Gross Domestic Product.
Capacity of the road
Traffic flow/design capacity (V/C) is used as the indicator.
Maintenance cost
The operation period of the service is used for this variable.
Toll fee level
The toll fee is used for this variable.
Toll adjustment mechanism
Give “1” service with toll adjustment mechanism and “0” if not.
Network links
The location and the connecting routes of the service.
Competition from alternative services
The number of alternatives.
Foreign exchange exposure
Give “1” for toll with foreign exchange exposure and “0” if not.
Economic growth/recession
The percentage change of GDP.
Policy change
Give “1” with policy change and “0” if not.
Demographic/population change
The percentage change of population.
Private car growth
The percentage change of private cars licensed.
Willingness to pay
The level of satisfaction with the toll fee.
Table 5. Assumption for each variable.
Variables
Assumptions
Scaling
1
2
3
4
5
GDP, Gross Domestic Product.
^{a}
PC, private cars, taxis, and motorcycles;
^{b}
BUS, double-decked busses, single-decked busses, and light busses;
^{c}
GV, goods vehicles.
Capacity of the services
Higher capacity → more traffic flow
<25% or >125%
25–80% or 120–125%
80–95%or 110–120%
95–105%
105–110%
Maintenance cost
Less maintenance cost → lower toll fee → more traffic flow
Better location and connecting routes → more traffic flow
Worst location and connecting routes with lower usage
Bad location and connecting routes with low usage
Neutral
Good location and connecting routes with high usage
Better location and connection routes with higher usage
Competition from alternative services
Less alternative services → more traffic flow
>3
3
2
1
0
Economic growth/recession
Higher GDP → more traffic flow
≤−10% and <−5%
≤−5% and <5%
≤5% and <10%
≤10% and <20%
≥20%
Foreign exchange exposure
No foreign exchange exposure → stable toll fee& traffic flow
With (bad)
With
Without
With
With (good)
Policy change
Stable policy → stable traffic flow
With (bad)
With
Without
With
With (good)
Private car growth
More private car → more traffic flow
≤0% and <200%
≤200% and <400%
≤400% and <600%
≤600% and <800%
≥800%
Demographic/population change
More population → more traffic flow
≤0% and <2%
≤2% and <4%
≤4% and <6%
≤6% and <8%
≥8%
Willingness to pay
Lower toll fee → more traffic flow
≤10% and <15%
≤15% and <20%
≤20% and <25%
≤25% and <30%
≥30%
3 MODEL DESIGN
3.1 Support vector machine algorithm
In brief, the learning process to construct decision functions of SVM is completely represented by the structure of two layers, which is similar with back-propagation neural network. However, the learning algorithm in SVM is different from ANN, for SVM is trained with optimization theory that minimizes misclassification on the basis of statistical learning theory. The first layer selects the basis K (x, x_{i}), i = 1,…, N and the number of support vectors from given set of bases defined be the kernel. The second layer constructs the optimal hyperplane in the corresponding feature space [31]. The input pattern (for which a prediction should be made) is mapped into feature space by a map ϕ. Then, dot products are computed with the images of the training patterns under the map ϕ, this corresponds to evaluating kernel functions at locations K (x_{i}, x_{j}). Finally, the dot products are added up using the weights Î±1âˆ’Î±1*. This plus the constant term b yields the final prediction output.
For a brief review of SVM, we consider a regression function f(x) that is estimated by using a hyperplane w on the basis of training patterns {x_{i},y_{i}}, i = 1,…, k, x_{i} ∈ R^{d}, y^{i} ∈ R. The function is:
fx=wâ€¢Ï•x+b(1)
where w is the weight, b is the threshold, and ϕ(x) is the nonlinear mapping function. By using this ϕ(x), SVM can map input space into high-dimensional feature space, then in the new space, construct an optimal separating hyperplane and make the data linearly separable. By the support vector regression (SVR) principle, the generalization accuracy is optimized over the empirical error and the flatness of the regression function that is guaranteed on a small w. Therefore, the objective of SVR is to include training patterns inside an ε-insensitive tube (ε-tube) while keeping the norm ‖w‖^{2} as small as possible. The coefficients are estimated by minimizing the regularized risk function:
where C, ε, and ξ(ξ*) are a trade-off cost between the empirical error and the flatness, the size of the ε-tube, and slack variables, respectively. By adding Lagrangian multipliers α, α*, the QP problem can be optimized as a dual problem.
where Q_{ij} is an element of the kernel function matrix Q; it can be defined as:
Qij=Ï†xiTÏ†xj=Kxixj(5)
Then, the regression function estimated by SVM can be written as the following kernel expansion:
fx=âˆ‘i=1l(Î±iâˆ’Î±i*)Kxixj+b(6)
Then, our objective is to obtain the parameters of α_{i}, Î±i*, and b. The function K (x_{i}, x_{j}) is defined as the kernel function for generating the inner products of the two vectors x_{i}, x_{j} in the feature space ϕ(x_{i}) and ϕ(x_{j}), that is, K (x_{i}, x_{j}) = ϕ(x_{i}) × ϕ(x_{j}). The output is the linear association of hidden neurons; every hidden neuron has a support vector, and there are lots of kernel functions because any function satisfying Mercer's condition can be used as kernel function, but only three are more useful, and are the following:
The first is the polynomial kernel:
Kxixj=xiâ€¢xj+1q(7)
where q is the degree of the polynomial kernel
The second is the radial basis function (RBF) kernel:
Kxixj=expâˆ’Î³xiâˆ’xj2(8)
For this function, the determine parameter for SVM training is γ. γ is the bandwidth of the radial basis function kernel and is a positive parameter for controlling the radius.
The third is the tangent kernel:
Kxixj=tanhÎ½xiâ€¢xj+c(9)
On the basis of the research of Dibike et al. [32], better performance was achieved with the radial basis function kernel. So, it will be used in this research:
3.2 Multiple kernel learning algorithm
Multiple kernel learning aims at simultaneously learning a kernel and the associated predictor under supervised learning settings. As this so-called MKL problem can in principle be solved via cross-validation, it provides flexibility and reflects the fact that typical learning problems often involve multiple, heterogeneous data sources. Furthermore, it leads to an elegant method to interpret the results, which can lead to a deeper understanding of the application. SVM describes the problems with a single kernel, whereas MKL uses multiple kernels. Then, MKL may be more desirable than SVM when it comes to some decision-making processes with complicated natures.
For kernel algorithms, the solution of the learning problem is based on the objective function of:
fx=âˆ‘i=1lÎ±i*Kxxi+b*(10)
where Î±i* and b* are some coefficients to be learned from examples, whereas K(⋅,⋅) is a given positive definite kernel associated with a reproducing kernel Hilbert space (RKHS) H. In the SVM methodology, there coefficients are obtained by solving the following optimization problem:
Under some situations, flexible models of kernel might be more appropriate. Recent applications have shown that using multiple kernels instead of a single one can enhance performances [33]. A convenient approach is to consider that the kernel K(x, x ') is a convex combination of basis kernels:
Kx,x'=âˆ‘m=1MdmKmx,x',dmâ‰¥0,âˆ‘m=1Mdm=1(12)
where M is the total number of kernels. Each basis function K_{m} may either use the full set of variables describing x or subsets of variables stemming from different data sources. Alternatively, it can simply be classical kernels with different parameters. Learning both the coefficients α_{i} and the weights d_{m} is known as the MKL problem. It can be shown in [34] that the space H constructed by K(x, x ') is an RKHS, and these coefficients can be obtained by solving the following optimization problem:
where each function f_{m} belongs to a different RKHS H_{m} associated with a kernel K_{m}. To realize Equation (13), according to Rakotomamonjy et al. [34], it is suggested to consider the following constrained optimization problem:
Equation (14) is actually an optimal SVM objective value, and Equation (15) can be solved by a simple gradient method. Hence, the coefficients α_{i} and the weights d_{m} can be obtained after several iteration as follows:
Equation (14) is optimized with d fixed, and the objective value is obtained;
Equation (15) is solved by a simple gradient method.
The algorithm is terminated when a stopping criterion is met. With regard to the capability of solving the problem of nonlinearity, SVM has been proved to be an adaptive system, as it can change its structure on the basis of external or internal information that flows through the network during the learning phase. Hence, it is a desirable model for classification, including pattern and sequence recognition, novelty detection and sequential decision-making. However, SVM describes the problems with a single kernel, whereas MKL uses multiple kernels, so it can describe or solve a problem from multiple perspectives. As the author discussed before, MKL provides more flexibility and reflects the fact that typical learning problems often involve multiple, heterogeneous data sources. Furthermore, MKL leads to an elegant method to interpret the results, which can lead to a deeper understanding of the application. Then, MKL may be more desirable than SVM when it comes to some decision-making process with complicated natures, such as the contractor prequalification, which involves the problems of subjectivity, nonlinearity, and multicriteria. And it is expected to contribute both theoretical and practical knowledge for academics and practitioners in the field.
3.3 Objective function of the model
The relationship between the traffic flow and the influential factors are regarded as nonlinear. The objective functions of the prediction model are described as follows:
where f_{1}(X_{1}) is the objective functions of traffic flow; V_{1j} is the influential factors of traffic flow; δ_{i} is the weight of each variable of traffic flow. Because the relationship between the facts (the predicted items and the corresponding influential factors) is viewed as a complicated nonlinear one, a proper mathematical method should be employed to deal with it. Kernel learning (SVM and MKL) methods-based prediction models were applied in this section. The proposed method is an improved neural network (ANN) method; the distribution and value of the network weight can be obtained by modeling.
3.4 Model design
The flowchart of SVM and MKL models are described in Figures 2 and 3. The first step is to normalize the raw data. The information should be collected from different prequalification projects and companies, some of them are quantitative while some are qualitative, and hence, the five-point scale method should be employed here to standardize all the raw data. Then, the index value X_{i} {X_{1}, X_{2}, X_{3},…, X_{n}}(normalized data) and Yi (real results of “failed” or “passed”) of samples are divided into two sets, that is, training set and testing set. The next step is to determine the parameters of the objective function. This progress is realized by “LIBSVM” and “MATLAB” (7.0) toolboxes for SVM and MKL, respectively. When all the training samples have been inputted, the total error will be calculated. Then, an adjustment of the parameters will be made to minimize the total error. The aforementioned process will be repeated until the system achieves a convergence status. This algorithm is an iterative process, which means that the parameters will be adjusted once in each round. When the optimal value of the parameters are obtained, the testing set will be input into the optimal objective model to verify the accuracy and reliability.
For SVM, the Gaussian kernel function is used in the experiments. Cross-validation (10-fold) is first used to find optimal parameters. For MKL, only Gaussian kernel is used. Actually, other kernel functions can be applied. Hence, multiple kernel includes several Gaussian kernels with different kernel parameters (including optimal parameter obtained by cross-validation). Ten tests are run. In each run, 90% data are randomly chosen for training, while 10% are chosen for testing. Because the number of data is small (only 13), square root of error is used to evaluate the performance of the resulting regression. The result is averaged on 10 runs.
4 CASE STUDY
A set of practical data collected from a tunnel project in Hong Kong was arranged for this research. The dataset consists of 10 years' volume record, including one dependent variable and 12 independent variables. The dependent variable is the traffic volume, whereas the independent variables are the capacity of the services, maintenance cost, toll fee level, toll adjustment mechanism, network links/convenience, competition of alternative service, economic recession/growth, foreign exchange exposure, policy change, demographic/population change, private car growth, and willingness to pay.
4.1 Variable ranking
Table 6 shows the ranking results on the basis of entropy and the calculation process of entropy towards each variable. The larger the normalization value, the higher the ranking is. Among these ranked variables, the capacity of the services (0.178591/1), toll fee level (0.104317/1), competition of alternative services (0.084558/1), and foreign exchange exposure (0.084558/1), are ranked as the highest ones. It is expected that this ranking list can assist the decision-making process for traffic flow prediction.
Table 6. The calculation process of entropy.
Variable
m
Em
H_{m}
eHm
1PRm
1PRmâˆ‘i=1k1PRm
X1
1
1.375000
1.561200
4.764530
3.465110
0.178591
X2
2
2.928600
1.491930
4.445670
1.518030
0.078239
X3
3
2.357100
1.562530
4.770880
2.024010
0.104317
X4
4
3.684200
1.508300
4.519040
1.226600
0.063219
X5
5
3.761900
1.536840
4.649870
1.236040
0.063705
X6
6
3.000000
1.593690
4.921880
1.640630
0.084558
X7
7
3.200000
1.481990
4.401700
1.375530
0.070894
X8
8
3.000000
1.593690
4.921880
1.640630
0.084558
X9
9
5.000000
1.593690
4.921880
0.984380
0.050735
X10
10
3.294100
1.508630
4.520530
1.372300
0.070728
X11
11
3.171400
1.536850
4.649910
1.466190
0.075567
X12
12
3.300000
1.567590
4.795080
1.453050
0.074890
TOTAL
1.000000
On the basis of this ranking list, weighting was given to the normalized value of each variable. Thus, the objective function of traffic flow (refer to Equation (16)) can be denoted as:
On the basis of this newly developed objective function, with weighting to each variable, a set of weighted normalized values was obtained, which is summarized in Table A-II in APPENDIX A. The normalized (Table A-I in APPENDIX A) and weighted normalized value (Table A-II in APPENDIX A) of each variable in terms of year were used to train and test the prediction model for traffic flow developed. The results of the model are analyzed and discussed in the next section.
The authors found that not all the factors affect the traffic volume equally with regard to the tunnel project. From the year 2000 to 2007, there was no policy change towards the transport planning in Hong Kong, and the problem of foreign exchange exposure for this tunnel project does not exist. Therefore, these two factors should not be considered in this case. Meanwhile, there was little improvement of the network links in relation to the project, and still, there are two alternatives competing with it from 2000 to 2009. Hence, these factors may not the key factors to the traffic flow.
Figure 4 describes the relationships between the traffic flow and the influential factors, such as Gross Domestic Product (GDP), operation cost, private car growth, and population growth. From Figure 4, there was a small fluctuation in the population change and the change in private cars licensed from 2000 to 2009, while the whole trend was stable, so the traffic volume was not obviously affected by these two factors in this case. However, from Figure 4, there was a similar difference between the operation cost and the traffic volume, and thus the factor of maintenance cost may have influenced the traffic volume. And the toll fee changed for four times during these 10 years, which may have led to the decrease of traffic flow in 2001, 2003, and 2008. Also, the curves of traffic flow and GDP move with a similar trend (Figure 4). In 2003, because of SARS, the volume declined sharply, whereas there was a significant growth rate of GDP and traffic flow in 2007. As a result, it seems that there was a significant impact from the factors of toll fee level, economic recession/growth, and willingness to pay (affected by the toll fees) on the traffic volume, with regard to this case study.
On the basis of the aforementioned analysis, some variables, such as willingness to pay, capacity of the services, toll fee level, network links, and economic recession/growth can be selected for volume forecasting in the case of the Tunnel C.
4.2 Traffic flow prediction
In this section, the methods of ANN, SVM, and MKL were applied for traffic flow prediction. ANN is a commonly used method for forecasting, whereas SVM is a newly developed method for prediction in recent years and has been proved to perform more desirably than ANN in prediction. Meanwhile, another kernel learning method—MKL—was proposed for prediction, which is more advanced than SVM, as it can describe problems from multiple perspectives. Comparison of the prediction capacity of these three methods was conducted, so as to find the most desirable one among them. Computing programs were designed to realize this modeling. The computing programs of MATLAB 7.0 and LIBSVM were used to realize the developed models in this case study. The models' outputs resulting from the predetermined inputs are compared with the actual values in reality. Therefore, the author chose to compare and discuss the three methods as follows:
4.2.1 Comparison of the result generated by artificial neural networks and support vector machine
Two designs of the network, namely, radial basis function network (for SVM) and back propagation network (for ANN), were used in this case study. Figure 5 describes the two results generated by these two networks. The value of the X-axis refers to the observed value (actual value) of the prediction item, whereas the Y-axis is the predicted value. The values in the X and Y axes refer to the number of vehicles per year.
In Figure 5(a) SVM, the data points are plotted randomly within the limit, whereas there is a trend line in Figure 5(b) ANN. The trend line in Figure 5(b) ANN moves like a linear line, which denotes that there is a linear relationship between the variables and the predicted item (traffic flow). On the other hand, the random points generated in Figure 5(a) SVM denote a nonlinear relationship between the variables and traffic flow. As discussed before, the major drawback of ANN is that it may cause a local minimum problem [14]. However, owing to the local minimum problem, there are thus lots of “optimal” results. This problem brings “Over-fitting” to the results, which means that the ANN model ignores lots of useful information, and regards such information as noise data, finally deleting them from the modeling process. Thus, in this case study, the ANN model generates the result that there is a linear relationship between the variables and the traffic flow, but which should be a nonlinear one.
Support vector machine has been proved to be powerful for a wide range of different data analysis problems. The merit of SVM is that, because of Mercers conditions on the kernels, the corresponding optimization problems are convex and have no local minima. This merit has been proved in this section. It shows clearly that, under the same circumstances, the SVM can identify the accuracy type of the relationship between the input variables and the predicted item. In this research, the traffic flow prediction is regarded as a problem incorporating a complex nature of subjectivity, nonlinearity, and multicriteria. Thus, after a comparison of ANN and SVM in this section, SVM serves this function much more desirably than ANN. An in-depth investigation of SVM is discussed in Section 4.2.2.
4.2.2 An in-depth investigation to support vector machine
In Section 4.2.1, SVM has been proved to serve the function of prediction more desirably than ANN. In this section, to examine the consistency of the prediction model by SVM over a range of conditions, comparisons were made under different input values of the key parameter (parameter debugging), different division of the datasets, different number of input variables, and different sets of variables (weighted or nonweighted). Hence, four parts are involved in this section as shown in Table 7:
The upper bound C and the kernel parameter γ play an important role in the performance of SVM. Improper selection of these two parameters can cause over-fitting or under-fitting problems. Also, an appropriate range for γ was suggested from 1 to 100. Besides, the accuracy of the training set increases monotonically as C increases. C is usually bounded at 100 or 1,000 for training purposes. In this section, to achieve a better outcome, the value of kernel parameter γ can be adjusted with the fixed value of C. Then, on the basis of the selected value of parameter γ, the input value of C is changed and compared. From the results shown in Table 8, a desirable result can be obtained when the parameter is set as 1,000 (the predicted number of traffic flow is 15 649.2 with an error rate of 11.09%), compared with the value of 100 (the predicted volume of traffic flow is 15 126, with an error rate of 14.06%).
The comparison under different division of the datasets is described here. The dataset was divided into training and testing sets. The training set was used to build the model, whereas the testing set was used for testing the model. It is significant that data employed for training a model are not also used to test the model. Otherwise, there would be a “self-fulfilling prophesy” with consequent distortion to the fit, variability and accuracy measurement, which would inevitably cause a bad effect to the final outcome of the modeling, for instance, an oversatisfactory result could be generated. The error rate from an arrangement of 12:1 generates a better result (with an error rate of 11.09%), compared with the one under the division of 11:2 (with an error rate of 14.52%). This result denotes that a larger sample of training data can generate a result with a higher accuracy than a smaller one. For the prediction problem, the result is more desirable if there is more training data.
After a comparison of the results under different variable sets, a better result was obtained when the variables were preselected (selection of the first nine variables in the ranking list of Table 6). However, this improvement is not obvious (11.09% vs. 10.87%), which means that when the input variables are preselected, the accuracy of the result improves but not very much because although the variables deleted are ranked with low priority in the list, they do affect the output. For example, for the factor of policy change, as discussed in previous section, it clearly shows that there was no policy change towards the project from 2000 to 2007 in this case study. However, once there is a policy change, it has a large impact on the predicted item. Thus, to obtain the results with high accuracy, weighting should be given to each variable, other than deleting those ranked low in the priority list.
Table 7. Predictions by support vector machine.
No.
Value of C
Training set : testing set
Number of input variables
With or without weighting
Predicted value
Actual value
Error rate
I. Comparison under different input value of parameter C
I-a
100
12:1
12
Without
15126
17601
14.06%
I-b
1000
12:1
12
Without
15649
17601
11.09%
II. Comparison under different datasets
II-a
1000
12:1
12
Without
15649
17601
11.09%
II-b
1000
11:2
12
Without
14982 14999
17474 17601
14.52%
III. Comparison under different variable sets
III-a
1000
12:1
9
Without
15688
17601
10.87%
III-b
1000
12:1
12
Without
15649
17601
11.09%
IV. Comparison under variables with/without weighting
IV-a
1000
12:1
12
With
15918
17601
9.56%
IV-b
1000
12:1
12
Without
15649
17601
11.09%
Table 8. Comparison of support vector machine (SVM) and multiple kernel learning (MKL) for prediction.
SVM
MKL
Mean Error Rate
24.09%
18.48%
Because not all the variables are likely to be considered equally significant, weighting should be given to each variable. The weighted normalized value was employed in this section. The weighting given to each variable was based on the assumptions made in before (Table 5), and the method of entropy was employed for the ranking. The result shows that the error can be reduced to 9.56% when the input variables are weighted, compared with the error of 11.09% under the condition that the input variables are not weighted. This section has generated the most desirable result, with the weighted variables as input variables.
Therefore, in this case study, the conclusion can be drawn that the optimal result was obtained if parameter C is set as 1,000, with a set of weighted input variables, and under the circumstance that more training data is better.
4.2.3 Comparison of the results generated by support vector machine and multiple kernel learning
Recent developments reported in the literature on SVM and other kernel methods have shown the need to consider multiple kernels. As discussed before, a multiple kernel method could describe a problem from several or lots of perspectives other than only one, and thus, it is expected to improve the prediction accuracy. Hence, the MKL method was proposed in this research, to see if it can perform more desirably than SVM with regard to the prediction.
Table 8 shows the comparison of SVM and MKL, with regard to their capability of predicting the traffic flow. The optimal parameters were obtained by cross-validation. During the training process, 90% of data were randomly chosen for training, while 10% were chosen for testing, and thus the mean error rate obtained in this training would be higher (24.09%) compared with the results obtained in Table 8 (with an average error of 13%). However, the results show that MKL (with a mean error rate of 18.48%) performs more desirable than SVM (with a mean error rate of 24.09%) under the same setting of parameters and training pattern. Hence, it was proved that both SVM and MKL perform well in prediction, and MKL is more preferable than SVM. Therefore, MKL can enhance the decision-making of traffic flow prediction.
5 CONCLUSION
Traffic flow prediction is significant to the feasibility study or cash flow forecasting of a transportation project. Both the government and private investors can benefit from a reasonable and accurate prediction of the traffic flow, as the revenue risk as well as the financial risk can be reduced. On the basis of previous research studies, there are various factors affecting the traffic flow, which can be classified into internal and external factors. Additionally, the external factor can be further divided into micro and macro ones. As there are some differences of the significance of the variables for different case, the entropy method was applied for variables ranking after all the variables were scaled by five-point scale method. According to the result of the case study, the factors of capacity of the service, toll fee level, competition from alternative services, and foreign exchange exposure are the most significant variables. ANN, SVM, and MKL algorithms were employed to build up the prediction models. In the proposed model, the nonlinearity relationship between the traffic volume and its influential factors was established. The results of this research have suggested that both the MKL and SVM model can perform well in the prediction, but MKL is more desirable than SVM. The merits of the established prediction model are the following: it assists to eliminate the uncertainty of traffic flow, as the influential factors were well considered in this model. Additionally, for the model itself, SVM describes the problems with single kernel, whereas MKL uses multiple kernels, which can describe or solve a problem from multiple perspectives. As discussed before, MKL provides more flexibility and reflects the fact that typical learning problems often involve multiple, heterogeneous data sources. Furthermore, MKL leads to an elegant method to interpret the results, which can lead to a deeper understanding of the application. Then, MKL is more desirable than SVM when it comes to a decision-making process of a complicated nature.
This opens up a range of opportunities for future research. However, despite these two mentioned methods' encouraging performance in this study, some aspects still need to be addressed. For example, the choice of the kernel function and the determination of optimal values of the parameters such as C, γ, and ε improve the mean error rates of SVM and MKL. Genetic algorithm is an optimization method that can gain popularity in solving complex optimization problems in global searching. Thus, an investigation of genetic algorithm and kernel function comparison should be conducted to develop a structured method, which is aimed at selecting an optimal value for parameters for the best prediction performance as well as the effect of other factors that are fixed in the aforementioned experiment such as the kernel function. Meanwhile, some aspects still need to be addressed in future studies. Following this research, further study is considered for benchmarking in relation to the potential standardization of the influential factors, and exploring hybrid models, and subsequently setting up a proper quantitative model targeting best value for prediction.
6 LIST OF SYMBOLS AND ABBREVIATIONS
6.1 Symbols
V_{n}
The influential factors
δ_{i}
The weight of each variable of traffic flow
β_{i}
The weight of each variable of toll fee
b
Real parameter value for separating hyperplanes
C
Capacity factor (of learning machine)
f
Function of
K
Kernel function
L
Lagrange function
l
Number of examples in training set
R^{n}
n-dimensional real vector space
w
Parameter vector for separating hyperplanes
x
Input vector (of independent variables)
y
Output variable
(α)
Lagrange multiplier (SVM parameter)
(ε)
Error insensitive zone
(Ξ)
Constraint violation (estimation error)
E
Expected value of five scales of level of significance
H
Entropy of coefficients
E_{m}
Expected value of V_{m}
S_{i}
Scale of a degree of significance
Pmi
Property of scale
i
A constant from 1 to k
K
Number of scale
H_{m}
Entropy of V_{m}
PR_{m}
Priority rating of V_{m}
n
Sample size
6.2 Abbreviations
ANN
Artificial Neural Network
BOT
Build-Operate-Transfer
GDP
Gross Domestic Product
GV
Goods Vehicles
MKL
Multiple Kernel Learning
PC
Private Cars
PCA
Principal Component Analysis
QP
Quadratic Programming
RBF
Radial Basis Function
RKHS
Reproducing Kernel Hilbert Space
SVM
Support Vector Machine
SVR
Support Vector Regression
V/C
Actual traffic flow/the capacity design
ACKNOWLEDGEMENT
The work described in this paper was fully supported by a grant from City University of Hong Kong (project No. 7002683).
APPENDIX A
Table A.I. The final normalized value of each variable in terms of different year.
^{a}
Normalized value of each variable in terms of different year.
^{b}
The value of “actual traffic flow” is adapted from the Project Profile of the tunnel (2010).