Applied Regression Modeling - Iain Pardoe 4 стр.


 How is the book organized?Chapter 1 reviews the essential details of an introductory statistics course necessary for use in later chapters. Chapter 2 covers the simple linear regression model for analyzing the linear association between two variables (a response and a predictor). Chapter 3 extends the methods of Chapter 3 to multiple linear regression where there can be more than one predictor variable. Chapters 4 and 5 provide guidance on building regression models, including transforming variables, using interactions, incorporating qualitative information, and diagnosing problems. Chapter 6 (www.wiley.com/go/pardoe/AppliedRegressionModeling3e) contains three case studies that apply the linear regression modeling techniques considered in this book to examples on real estate prices, vehicle fuel efficiency, and pharmaceutical patches. Chapter 7 (www.wiley.com/go/pardoe/AppliedRegressionModeling3e) introduces some extensions to the multiple linear regression model and outlines some related topics. The appendices contain a list of statistical software that can be used to carry out all the analyses covered in the book, a ttable for use in calculating confidence intervals and conducting hypothesis tests, notation and formulas used throughout the book, a glossary of important terms, a short mathematics refresher, a tutorial on multiple linear regression using matrices, and brief answers to selected problems.

 What else do you need?The preferred calculation method for understanding the material and completing the problems is to use statistical software rather than a statistical calculator. It may be possible to apply many of the methods discussed using spreadsheet software (such as Microsoft Excel), although some of the graphical methods may be difficult to implement and statistical software will generally be easier to use. Although a statistical calculator is not recommended for use with this book, a traditional calculator capable of basic arithmetic (including taking logarithmic and exponential transformations) will be invaluable.

 What other resources are recommended?Good supplementary textbooks (some at a more advanced level) include Chatterjee and Hadi (2013), Dielman (2004), Draper and Smith (1998), Fox (2015), Gelman et al. (2020), Kutner et al. (2004), Mendenhall and Sincich (2020), Montgomery et al. (2021), Ryan (2008), and Weisberg (2013).

About the Companion Website

This book is accompanied by a companion website for Instructors and Students:

www.wiley.com/go/pardoe/AppliedRegressionModeling3e

 Datasets used for examples

 R code

 Presentation slides

 Statistical software packages

 Chapter 6 Case studies

 Chapter 7 Extensions

 Appendix A Computer Software help

 Appendix B Critical values for t-distributions

 Appendix C Notation and formulas

 Appendix D Mathematics refresher

 Appendix E Multiple Linear Regression Using Matrices

 Appendix F Answers for selected problems

 Instructor's manual

Chapter 1 Foundations

After reading this chapter you should be able to:

 Summarize univariate data graphically and numerically.

 Calculate and interpret a confidence interval for a univariate population mean.

 Conduct and draw conclusions from a hypothesis test for a univariate population mean using both the rejection region and pvalue methods.

 Calculate and interpret a prediction interval for an individual univariate value.

1.1 Identifying and Summarizing Data

The process of framing a problem in such a way that it is amenable to quantitative analysis is clearly an important step in the decisionmaking process, but this lies outside the scope of this book. Similarly, while data collection is also a necessary taskoften the most timeconsuming part of any analysiswe assume from this point on that we have already obtained data relevant to the problem at hand. We will return to the issue of the manner in which these data have been collectednamely, whether we can consider the sample data to be representative of some larger population that we wish to make statistical inferences forin Section 1.3.

For now, we consider identifying and summarizing the data at hand. For example, suppose that we have moved to a new city and wish to buy a home. In deciding on a suitable home, we would probably consider a variety of factors, such as size, location, amenities, and price. For the sake of illustration, we focus on price and, in particular, see if we can understand the way in which sale prices vary in a specific housing market. This example will run through the rest of the chapter, and, while no one would probably ever obsess over this problem to this degree in real life, it provides a useful, intuitive application for the statistical ideas that we use in the rest of the book in more complex problems.

For this example, identifying the data is straightforward: the units of observation are a random sample of size singlefamily homes in our particular housing market, and we have a single measurement for each observation, the sale price in thousands of dollars ($), represented using the notation . Here, is the generic letter used for any univariate data variable, while is the specific variable name for this dataset. These data, obtained from Victoria Whitman, a realtor in Eugene, Oregon, are available in the HOMES1 data file on the book websitethey represent sale prices of 30 homes in south Eugene during 2005. This represents a subset of a larger file containing more extensive information on 76 homes, which is analyzed as a case study in Chapter 6 (refer www.wiley.com/go/pardoe/AppliedRegressionModeling3e).

The particular sample in the HOMES1 data file is random because the 30 homes have been selected randomly somehow from the population of all singlefamily homes in this housing market. For example, consider a list of homes currently for sale, which are considered to be representative of this population. A random number generatorcommonly available in spreadsheet or statistical softwarecan be used to pick out 30 of these. Alternative selection methods may or may not lead to a random sample. For example, picking the first 30 homes on the list would not lead to a random sample if the list were ordered by the size of the sale price.

We can simply list small datasets such as this. The values of in this case are as follows:

1 | 6 2 | 0011344 2 | 5666777899 3 | 002223444 3 | 666

In this plot, the decimal point is two digits to the right of the stem. So, the 1 in the stem and the 6 in the leaf represents 160 or, because of rounding, any number between 155 and 164.9. In particular, it represents the lowest price in the dataset of 155.5 (thousand dollars). The next part of the graph shows two prices between 195 and 204.9, two prices between 205 and 214.9, one price between 225 and 234.9, two prices between 235 and 244.9, and so on. A stemandleaf plot can easily be constructed by hand for small datasets such as this, or it can be constructed automatically using statistical software. The appearance of the plot can depend on the type of statistical software usedthis particular plot was constructed using R statistical software (as are all the plots in this book). Instructions for constructing stemandleaf plots are available as computer help #13 in the software information files available from the book website at www.wiley.com/go/pardoe/AppliedRegressionModeling3e.

The overall impression from this graph is that the sample prices range from the mid150s to the mid350s, with some suggestion of clustering around the high 200s. Perhaps the sample represents quite a range of moderately priced homes, but with no very cheap or very expensive homes. This type of observation often arises throughout a data analysisthe data begin to tell a story and suggest possible explanations. A good analysis is usually not the end of the story since it will frequently lead to other analyses and investigations. For example, in this case, we might surmise that we would probably be unlikely to find a home priced at much less than in this market, but perhaps a realtor might know of a nearby market with more affordable housing.

Назад Дальше