Mathematics
Inference for Distributions of Categorical Data
Inference for distributions of categorical data involves using statistical methods to draw conclusions about the distribution of categorical variables in a population based on sample data. This typically includes techniques such as chi-square tests and confidence intervals for proportions. The goal is to make inferences about the underlying population distribution from the observed sample data.
Written by Perlego with AI-assistance
Related key terms
1 of 5
4 Key excerpts on "Inference for Distributions of Categorical Data"
- eBook - PDF
- Alan Agresti(Author)
- 2014(Publication Date)
- Wiley(Publisher)
CHAPTER 1 Introduction: Distributions and Inference for Categorical Data From helping to assess the value of new medical treatments to evaluating the factors that affect our opinions and behaviors, analysts today are finding myriad uses for categorical data methods. In this book we introduce these methods and the theory behind them. Statistical methods for categorical responses were late in gaining the level of sophistica- tion achieved early in the twentieth century by methods for continuous responses. Despite influential work around 1900 by the British statistician Karl Pearson, relatively little de- velopment of models for categorical responses occurred until the 1960s. In this book we describe the early fundamental work that still has importance today but place primary emphasis on more recent modeling approaches. 1.1 CATEGORICAL RESPONSE DATA A categorical variable has a measurement scale consisting of a set of categories. For instance, political philosophy is often measured as liberal, moderate, or conservative. Diag- noses regarding breast cancer based on a mammogram use the categories normal, benign, probably benign, suspicious, and malignant. The development of methods for categorical variables was stimulated by the need to analyze data generated in research studies in both the social and biomedical sciences. Categorical scales are pervasive in the social sciences for measuring attitudes and opinions. Categorical scales in biomedical sciences measure outcomes such as whether a medical treatment is successful. Categorical data are by no means restricted to the social and biomedical sciences. - No longer available |Learn more
- (Author)
- 2014(Publication Date)
- Orange Apple(Publisher)
________________________ WORLD TECHNOLOGIES ________________________ Chapter- 4 Statistical Inference Statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation. More substantially, the terms statistical inference , statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation. Initial requirements of such a system of procedures for inference and induction are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations. The outcome of statistical inference may be an answer to the question what should be done next?, where this might be a decision about making further experiments or surveys, or about drawing a conclusion before implementing some organizational or governmental policy. Introduction Scope For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. More generally, data about a random process is obtained from its observed behavior during a finite period of time. Given a parameter or hypothesis about which one wishes to make inference, statistical inference most often uses: • a statistical model of the random process that is supposed to generate the data, and • a particular realization of the random process; i.e., a set of data. The conclusion of a statistical inference is a statistical proposition. Some common forms of statistical proposition are: • an estimate; i.e., a particular value that best approximates some parameter of interest, • a confidence interval (or set estimate); i.e., an interval constructed from the data in such a way that, under repeated sampling of datasets, such intervals would - Clifford Spiegelman, Eun Sug Park, Laurence R. Rilett(Authors)
- 2016(Publication Date)
- CRC Press(Publisher)
However, in large samples this should not be very important. 9.6 CONCLUSIONS Categorical data are used regularly by transportation agencies and, as such, are often analyzed by engineers. This is because some transportation-related factors are, by definition, discrete. Examples of these could include (1) num-ber of lanes on a roadway, (2) personal characteristics such as gender, (3) experiential characteristics such as whether someone has taken a traffic course, (4) specific treatment such as type of de-icer, and (5) number of cars in a household. In other situations the variables are discrete because that FIGURE 9.4 Histogram for headway data with an overlaid exponential distribution. 176 ◾ Transportation Statistics and Microsimulation is how the information was collected. For example, it is rare to ask specific questions related to salary; rather, the survey typically asks responders to fill in their answer as part of a range. When the data are categorical, the transportation professionals often ask questions related to proportions of the categorical data. The techniques used to analyze one or two proportions are very similar to those related to the inference of the estimated mean value of continu-ous variables. Both hypothesis tests and confidence intervals can be used for inference questions. As previously explained, the authors recommend using confidence intervals for inference. The results will be the same as for standard hypothesis testing; however, the authors feel the CIs provide more useful information. Note that when there were more than three pro-portions being compared, only hypotheses tests were presented. This chapter also introduced a number of issues that are particular to categorical inference. For example, if no events are observed for a par-ticular situation, the traditional CI calculations will give answers that are clearly inappropriate.- eBook - PDF
Statistics
Learning from Data
- Roxy Peck, Tom Short(Authors)
- 2018(Publication Date)
- Cengage Learning EMEA(Publisher)
Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. CHAPTER 7 An Overview of Statistical Inference—Learning from Data 382 How were these researchers able to reach these conclusions? The estimates (the 6% and the 86% in the previous statements) are based on sample data. Do these estimates provide an accurate picture of the entire population of online daters? The researchers concluded that the data supported the claim that those who place greater importance on developing a long-term, face-to-face relationship are more honest in the way they represented them-selves online, but how did they reach this conclusion, and should you be convinced? These are important questions. In this chapter and those that follow, you will see how questions like these can be answered. ■ SECTION 7.1 Statistical Inference—What You Can Learn from Data Statistical inference is all about learning from data. Inferential methods involve using sample data to learn about a population or using experiment data to learn about treatment effects. Learning from Sample Data When you obtain information from a sample selected from some population, it is usually because 1. You want to learn something about characteristics of the population. This results in an estimation problem . It involves using sample data to estimate population characteristics. OR 2. You want to use the sample data to decide whether there is support for some claim or statement about the population. This results in a hypothesis testing problem . It involves testing a claim (a hypothesis) about the population. Example 7.1 Deception in Online Dating Profiles Revisited Let’s revisit the online dating example of the chapter preview. In that example, the popula-tion of interest was all online daters. Three questions about this population were identified: 1. What proportion of online daters believe they have misrepresented themselves in an online profile? 2.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.



