TITLE: Some New Ideas on Fractional Factorial Design and Computer Experiment

STUDENT: Heng Su

ADVISOR:  Dr. Jeff Wu

SUMMARY:

This thesis consists of two parts. The first part is on fractional factorial design, and the second part is on computer experiment. The first part has two chapters. In the first chapter, we use the concept of conditional main effect, and propose the CME analysis to solve the problem of effect aliasing in two-level fractional factorial design. In the second chapter, we study the conversion rates of a system of webpages with the proposed funnel testing method. The second part also has two chapters. In the third chapter, we use statistical models to calibrate the Perez model. In the last chapter, we propose a new Gaussian process that can jointly model both point and integral responses.

Ever since the founding work by Finney, it has been widely known and accepted that aliased effects in two-level regular designs cannot be “de-aliased” without adding more runs. A surprising result by Wu in his 2011 Fisher Lecture showed that aliased effects can sometimes be “de-aliased” using a new framework based on the concept of conditional main effects (CMEs). In the first chapter, this idea is further developed into a methodology that can be readily used. Some key properties are derived that govern the relationships among CMEs or between them and related effects. As a consequence, some rules for data analysis are developed. Based on these rules, a new CME-based methodology is proposed. Three real examples are used to illustrate the methodology. The CME analysis can offer substantial increase in the R-squared value with fewer effects in the chosen models. Moreover, the selected CME effects are often more interpretable.

Nowadays, internet has become an important source of revenue for various companies. How to design the webpages to maximize the conversions is now a hot topic in e-commerce. In the second chapter, we propose a new method called the funnel testing to simultaneously study a system of webpages and optimize its overall conversions. Directed graph is used to represent the system of webpages and identify its structure. Fractional factorial design is used to conduct the experiment systematically. A new method of analysis is proposed to maximize the total conversion rate of the system. A toy example is used to demonstrate the idea along the description of the method. Another more complicated simulated example is given to further illustrate the methodology.

Traditional uncertainty quantification (UQ) in the prediction of building energy consumption has been limited to the propagation of uncertainties in model input parameters. Models by definition ignore, at least to some degree, and, in almost all cases, simplify the physical processes that govern the reality of interest, thereby introducing additional uncertainty in model predictions that cannot be captured as input parameter uncertainty. Quantification of this type of uncertainty (which we will refer to as model form uncertainty) is a necessary step toward the complete UQ of model predictions. In the third chapter, we introduce a general framework for model form UQ and shows its application to the widely used sky irradiation model developed by Perez (1990), which computes solar diffuse irradiation on inclined surfaces. We collect a dataset of one-year measurements of solar irradiation at one location in the United States. The measurements were done at surfaces with different tilt angles and orientations, for a wide spectrum of sky conditions. A statistical analysis using both this dataset and published studies worldwide suggests that the Perez model performs non-uniformly across different locations and produces a certain bias in its predictions. Based on the same data, we then use a two-phase regression model, to express model form uncertainty in the use of the Perez model at this particular location. Using a holdout validation test, we demonstrate that the two-phase regression model considerably reduces the model bias errors and root mean square errors for every tilted surface. Lastly, we discuss the significance of including model form uncertainty in the energy consumption predictions obtained with whole building simulation.

In some computer experiments, the quantity of interest may be the average value of the responses over a specific region. One example from building energy simulation is the diffuse solar irradiance on a building façade representing the integral of the irradiance over the sky dome that the façade is exposed to. Treating this information as point responses will lead to estimation efficiency loss. In the last chapter, we extend the standard point Gaussian process framework so that it can handle both point and integral responses. This new methodology is called the point-integral Gaussian process model, which is abbreviated as the PIG process model. A generic expression of the PIG process model is given with its complicated covariance functions. Parameter estimation and prediction following the frequentist approach is shown. Closed-form expressions of the covariance functions are derived for axis-parallel rectangular regions, whose computational time are compared with the numerical integration using quadrature. Two examples are given to demonstrate the use and the performance of the new methodology. Two point GP models, one ignores the integral responses and the other transforms the integral into point responses, are compared with the PIG process. In all cases, the proposed PIG process model obtains higher prediction accuracy.