Globalization means different things to different people. Apple manufactures most of its products in China and has them shipped to the United States to be sold at their apple stores as well as retail stores such as Wal-Mart. As well as properties, benefits, harmful aspects of creation, and poor labor conditions.

And finally, globalization and all of the complicated problems related to it must not be used as excuses to avoid searching for new ways to cooperate in the overall interest of countries and people. In fall he was a Visiting Fellow at the Kellogg Institute. Globalization is the procedure utilized by businesses and associations especially to create the universal impact of mechanical progression on a worldwide scale. It has been around for a long time in one shape or another.

Creative person essay on nature essay about crisis god is good?. This essay reviews the principles motivating contemporary critical Globalization school mathematics mathematical validity Western Apple, M. Globalization is a multifaceted phenomenon which entails several economic, cultural, and political pros and cons. Feb 13, Today, the term globalization has socio-political interpretations as well. Often, the process begins with a single motive, such as market expansion on the part of a corporation or increased access to healthcare on the part of a nonprofit organization.

Company Apple was created by Steve Jobs in and has now became one of the most successful corporations in the world. Globalization, Technology and Society At the most generic level, globalization is simply the shrinking of geographic space of politically defined borders that accelerates and magnifies flows of money, goods, people and culture around the world.

However, today we are going to learn that there is more than one way to organise an essay, especially when that essay is asking you to compare two different ideas and give your opinion. Countries throughout North America and Europe have experienced waves of anti-globalization sentiment, but most business leaders are uncertain about whether to retreat, change strategy Globalization refers to the interaction of one economy with all the other economies of the world.

The following are common examples of globalization. This deals with effective management of technologies and other supports to deal with the proper usages of network as well as proper maintenance of strong connectivity of network. Although the debate about the benefits and challenges of globalization is not new, it has recently come into sharper focus.

But although the global goods trade has flattened and cross-border capital flows have declined sharply since , globalization is not heading into reverse. Benefits of globalisation. What causes globalization? View Globalization. The world is taking notice to the technical giant and surely buying into the apple revolution. The essay proceeds by defining the concept of neoliberalism and differentiating it from the various other strands of liberal economic theory. Apple has been successfully riding the two great forces of our era—technology and globalization.

International trade allows for countries to exploit competitive advantages in production, meaning that if country B can produce something for cheaper than country A, they will produce that good and then ship it to country A. Globalization and its Impact on Apple Inc. Free trade Free trade is a way for countries to exchange goods and resources. We can exchange goods , money and ideas faster and cheaper than ever before. Information technology also referred to as IT is the application of computers and telecommunications equipment to store, retrieve, transmit and manipulate data, often in the context of a business or other enterprise.

Globalization is an economic tidal wave that is sweeping over the world. Free Essay: Globalisation: The Apple Corporation Globalisation highlights a process in which national and more localised economies, societies, and cultures. Globalization is the concept of international integration, resulting from the interchange of trade, products, ideas, culture.

Globalization is the new adage in the apple economy, assertive the apple back the nineties of the aftermost century. In this paper, I am going to bring out the similarities and differences between Microsoft and Apple. With competition, Apple has revisited and changed strategy for the better to meet market demands. It refers to the export, import, sharing, repurposing and adapting of values, ideas, norms, common sense, lifestyles, language, behaviors, and practices on a global scale.

Such process as globalization, obviously has as advantages and disadvantages. Globalization affects all aspects, not just of the corporate world, but transactional and cultural relationships generally. The Success Of Apple Inc. If you follow this structure, it will help you to achieve a high score in Coherence and Cohesion, since it provides your ideas in a well organised way.

Samsung from producing black white TV as their sole purpose transform to a world reputable high technology corporation nowadays, Samsung was making a lot of right in innovative their products. This interaction can be in terms of financial transactions, trade, politics,education, production etc. Read press releases, get updates, watch video and download images. Kibin does not guarantee the accuracy, timeliness, or completeness of the essays in the library; essay content should not be construed as advice. These blankets would have typically been hand sewn here in America.

McDonald's is a clear sign of globalization as everywhere you go there will be a McDonald's! Be it in Japan, Germany etc there will always be a McDonald's around every corner! In the grand scheme of things, I think it is mostly irrelevant as to whether globalization is making the world smaller or larger. Read More. He calls this Globalization 3. It affects people, companies, their workforce and consumers. So why than are we still allowing corporations to hurt them.

We have provided very unique and general topics essay which are generally assigned to students in the school. While globalization has served to improve living standards, it has also served to create new problems. This debate is important to all of us, and I think it is particularly relevant to India given its growing role in the global economy. However, for some of you, it is impossible to deal with your paper on your own. Globalization, very much like industrialization and the erstwhile colonization, can make one country or region very powerful.

A convincing case could be made for either position. Essential Findings. Several implications for civil society, for governments and for multinational institutions stem from the challenges of globalization.

- Water (The Akasha, Book 1)!
- Apple globalization essay.
- Navigation menu.
- Navigation menu;
- Special order items.
- Runners World Essential Guides: Weight Loss: Everything You Need to Know about Running to Slim Down?

Globalization picked up steam with the inventi The Apple iPhone products is the most developed multimedia and Internet enabled collection of smart phones. Apple is a huge success in America and word is getting around. Benefits of outsourcing B. Apple has an image to uphold and globally the logo is known without words. The word globalization is used to convey the hope and determination of order-making on a worldwide scale. This saying has never been more true, and if trends continue to develop the way they are, the world may continue to shrink. People, companies and organizations in different countries can live and work together.

Essay - Apple was founded in by Steve Jobs and Steve Wozniak, who were determined to change the way people were utilizing the computer. Because the company develops hardware, software, and associated digital services rather than just focusing on one dimension, it can provide an unmatched user experience. Apple is a strong market contender with a diversified product line within the computer and electronic gadget market.

Globalized markets have Verdict on the Pros and Cons of Globalization Globalization gives us all an opportunity to live, work, and communicate in ways that bring all of us closer together. Introduction Today the world has become a global village where each and every thing is interlinked.

AAPL stock, price quote and chart, trading and investing tools. Nevertheless, economic globalization also has some negative aspects. Apple designs, manufactures and markets personal computers, portable media players, mobile phones, computer software, computer hardware and peripherals.

Globalization is an economic concept that works by easing the movement of goods and people across borders. Q: How do they end up being reached by globalization? For the rural areas, it really depends on how much globalization involves agriculture and that varies country to country. Globalization refers to the process of integrating governments, cultures, and financial markets through international trade into a single world market.

The product to specify and use in the globalization essay is an Apple Iphone. We get dozens of globalization essay assignments on a daily basis and have the experience required to come up with an effective globalization assignment. However, generally it is the process of globalizing products, businesses, technologies, philosophies, etc all through the world. In light of advanced technology, higher demands from markets and faster turnaround times, globalization has become a staple for world commerce. With technological advancement as a macro-environmental factor that influence people and organizations on a global scale, the mobile industry is most sensitive to updates and developments that impinge on organizational strategies and performance… Apple was founded by the late Steve Jobs, Ronald Wayne and Steve Wozniak in the year Globalization is an emerging trend in business.

The Silk Road spanned one-sixth the diameter of the planet — literally connecting the West and Conclusions and Recommendations. McDonald's in Japan, French films being played in Minneapolis, and the United Nations are all representations of globalization. The most of the factories are located in Asia. Driving Force of Globalization Essay Sample. This important new book offers an engaging and challenging introduction to the thorny paths of the globalization debate.

This is a broad trend that has been underway for centuries. This lesson plan focuses on various aspects of globalization to help students understand it via an activity, discussion topics and questions, extension, reading, and homework. Though the developement is progressing rapidly,still many basic problems like rural poverty,corruption and political instability and the fiber-optic micro cable, combined with the rise of work-flow software. Apple is expanding through globalization to every major market available. Globalization is term used to describe the trend towards countries joining together economically, through politics, and education.

Click here to download and read them for free! Globalization accelerates the change of technology. Globalization has also been driven by technology, including use of the Internet, mobile phones and satellite-tracking technology. This means countries can specialise in producing goods where they have a comparative advantage this means they can produce goods at a lower opportunity cost. The iphone was presented to the market by Apple CEO, Steve Jobs on the 9th january, and was released to the market on the 27thJune, Lysistrata essay.

Globalization occurred for economic, social, cultural and political purposes. Welcome to our essay examples section where you will find numerous sample essays written by our academic experts. Image via Wikipedia Globalization, the increasing integration and interdependence of domestic and overseas markets, has three sides: the good side, the bad side, and the ugly side. You are saving essay on apple fruit percent. Nevertheless, states are not losing power.

Discover here the implications and arguments for and against globalization. Essay about apple globalization effect; Synthesis essay on mandatory community service; Essay foreign language jobs in mumbai essay questions example of narrative? Globalization has a profound impact on all of us. To protect the anonymity of contributors, we've removed their names and personal information from the essays.

Excerpt from Essay : Apple's business strategies are well aligned with the company's mission and vision statements. Those statements are best summarized as the desire to "create products that consumers will find easy to use and marry innovative technology to work productivity and personal entertainment" Mallin and Finkle, , p.

Globalization is a term used to describe how trade and technology have made the world into a more connected and interdependent place. Apple Inc is a multinational American company that design and sells computer software, consumer gadgets and personal computers. I then eat breakfast, which will often consist of milk or orange juice, and an apple from a European country. I am profoundly sad that Rudi Dornbusch, who should have delivered the Ely Lecture, died in July last year and that I am here in his place.

Others, the colossi of the internet age, have become global only in recent decades, notably Google, Microsoft, and Apple. What is crucial, however, is that we become aware of the effects of globalization and that we realize how these effects are occurring so rapidly and regularly. When it comes to clients. The management of Apple, Inc. Get help with your writing. The company was later incorporated the following year Young and Simon Apple Newsroom is the source for news about Apple. Every day it seems that a new technological innovation is being created. Citizenship All kids who were born in a state should receive citizenship of that state.

Drivers of Globalization. The globalization strategy does not accommodate diversity in that it does not have modifications for the accommodation of the site-specific aspects. We undertook extensive econometric analyses of several datasets, using a series of new proxies for This paper looks into the global benchmarking plan of Apple as they sell new cell phones. UK government depends on the unyielding interface of markets, countries, technologies, and capital in a much cheaper and faster rate that globalization brings.

It is evident that rivalry is the mother of innovations in the contemporary world. Advances in information technology, in particular, have dramatically transformed economic life. Like other free research paper examples, it is not a custom research paper. Transnational integration and increased mobility can simultaneously strengthen and diminish the protection of individual rights and the dignity of individuals.

Globalization is a most basic subject these days understudies can be allotted for composing an essay in their school or amid essay composing rivalry. What is meant by the globalization of human capital? Is this inevitable as firms increase their global operations? Jun 6, Check out our essay example on Apple Globalization to start writing!

The companies that will be compared in this paper are Apple and Samsung. The benefits of globalization have reached a large proportion of humanity. These trends have been driven by anti-immigration sentiments in Europe, although election results veer more pro- than anti-globalization. Globalization also captures in its scope the economic and social changes that have come about as a result. Apple is considering pioneer in innovation and its different brands including iTunes, iPod, iPhone, iPhone enjoy a first mover advantage in the market.

Globalization is a defining word of our age and the way in which we live. We will write a custom essay sample on The global environment: Apple Inc. The logo is distinct and recognizable to people of all origins around the world Burrows, In this essay about globalization, I will give examples of positive and negative effects of it on humanity and planet. Vector Apple's five biggest misses of Apple has problems.

When a company decides to go "global", it will often experience an increase in profitability. Get breaking news and analysis on Apple Inc. A defining feature of globalization, therefore, is an international industrial and financial business structure. Similarly, Apple agreed to help Foxconn develop a counseling center so that workers could vent and express themselves Weir. Apple Inc is well-known for being innovative as they kept on producing new innovations from the first Apple Manufacturing. The emergence of international competition is the first plus of globalization.

Fresh persuasive essay topics for a heated dispute: Globalization Intergovernmental unions do more harm than good e. Negative effects of apple cider vinegar Apple cider vinegar ACV is a form of vinegar that constitutes cider and acetic acid as its prime ingredients. This structure gives everyone an opportunity to create a world for themselves where any dream becomes possible.

- America’s First Regional Theatre: The Cleveland Play House and Its Search for a Home?
- Statistics: A Very Short Introduction (Very Short Introductions).
- Islam and Human Rights in Practice: Perspectives Across the Ummah (Routledge Advances in Middle East and Islamic Studies).
- Read Book Thought: A Very Short Introduction (Very Short Introductions) ebook textbooks.
- My Faith So Far: A Story of Conversion and Confusion.

The concept of global village or global merger is based on the grounds of globalization. Globalization refers to the situation when individuals, groups, associations, businesses, social organizations work on international scale. An essay illustrating the Companies like Apple, Inc.

As a complex and multifaceted phenomenon, globalization is considered by some as a form of capitalist expansion which entails the integration of local and national economies into a global, unregulated market economy. Market globalization refers to the process of carrying out business in the international market. If you need help writing your assignment, please use our custom writing services and buy a paper on any of the political science research paper topics.

The disadvantages of globalization. Globalization Essay: Benefits and drawbacks of International Incorporation It is sometimes complicated to discover much more fashionable subject for opinions as international incorporation. But in fact this computational advantage is essentially irrelevant in the computer age: in real statistical analyses the computer takes over the tedium of arithmetic juggling.

Statistics Here is an illustration. Sometimes it requires careful thought to decide which measure is appropriate. This example also illustrates the relative impact of extreme values on the mean and the median. In the pay example above, the mean is nearly three times the median. The size of just a single value can have a dramatic effect on the mean, but leave the median untouched. This sensitivity of the mean to extreme values is one reason why the median may sometimes be chosen in preference to the mean. The mean and the median are not the only two representative value summaries.

Another important one is the mode. This is the value taken most frequently in a sample. For example, suppose that I count the number of children per family for families in a certain population. In this case, the mode of the number of children per family would be two.

The horizontal axis shows salaries in millions of dollars, and the vertical axis the numbers in each salary range Dispersion Averages, such as the mean and the median, provide single numerical summaries of collections of numerical values. They are useful because they can give an indication of the general size of the values in the data. But, as we have seen in the example above, single summary values can be misleading. In particular, single values might deviate substantially from individual values in a set of numbers. To illustrate, suppose that we have a set of a million and one numbers, taking the values 0, 1, 2, 3, 4,.

Both the mean and the median of this set of values are , At the extremes, one value in the set is half a 31 million larger and one value is half a million smaller than the mean and median. Statistics What is missing when we rely solely on an average to summarize a set of data is some indication of how widely dispersed the data are around that average. Are some data points much larger than the average? Are some much smaller? Or are they all tightly bunched about the average? In general, how different are the values in the data set from each other?

Statistical measures of dispersion provide precisely this information, and as with averages there is more than one such measure. The simplest measure of dispersion is the range. Both of these examples, with large ranges, show that there are substantial departures from the mean.

This paints a very different picture, telling us that the employees with these new salaries earn much the same as each other. The range is all very well, and has many attractive properties as a measure of dispersion, not least its simplicity and ready interpretability. However, we might feel that it is not ideal. After all, it ignores most of the data, being based on only the largest and smallest values. To illustrate, consider two data sets, each consisting of a thousand values.

One data set has one value of 0, values of , and one value of The other data set has values of 0 and values of Both of these data sets have a range of and, incidentally, both also have a mean of 32 , but they are clearly very different in character. This shortcoming can be overcome by using a measure of dispersion which takes all of the values into account. One slight complication arises from the fact that the variance involves squared values.

It is not obvious what to make of this. This changes the units back to the original units, and produces the measure of dispersion called the standard deviation. Squaring the differences makes the values all positive, otherwise positive and negative differences would cancel out when we calculated the mean. If the resulting mean of the squared differences is small, it tells us that, on average, the numbers are not too different from their mean.

That is, they are not widely dispersed. This mean squared difference measure is called the variance of the data — or, in some disciplines, simply the mean squared deviation. This is the variance. If most of the data points are clustered very closely together, with just a few outlying points, this will be recognized by the standard deviation being small. In contrast, if the data points take very different values, even if they have the same largest and smallest value, the standard deviation will be much larger.

Statistics Skewness Measures of dispersion tell us how much the individual values deviate from each other. But they do not tell us in what way they deviate. In particular, they do not tell us if the larger deviations tend to be for the larger values or the smaller values in the data set. A measure of dispersion the standard deviation, for example would tell us that the values were quite widely spread out, but would not tell us that one of the values was much larger than the others. To detect this difference, we need another statistic to summarize the data, one which picks up on and measures the asymmetry in the distribution of values.

One kind of asymmetry in distributions of values is called skewness. This distribution has many smaller values and very few larger values. A classic example is the distribution of wealth, in which there are many individuals with small sums and just a few individuals with many billions of dollars. Quantiles This is taken further to produce deciles dividing the data set into tenths, from the lowest tenth through to the highest tenth and percentiles dividing the data into ths.

The general term, including quartiles, deciles, percentiles, etc. We might, however, be interested in just parts of a distribution. This idea can be generalized. Chapter 3 Collecting good data Raw data, like raw potatoes, usually require cleaning before use. Ronald A. Thisted Data provide a window to the world, but it is important that they give us a clear view. A window with scratches, distortions, or with marks on the glass is likely to mislead us about what lies beyond, and it is the same with data.

If data are distorted or corrupted in some way then mistaken conclusions can easily arise. In general, not all data are of high quality. Perhaps you should ask what preprocessing the data set has been subjected to which makes it look so perfect. We will return to the question of preprocessing later. This is understandable, since the aim in such books is to describe the methods, and it detracts from the clarity of the description to say what to do if the data are not what they should be.

However, this book is rather different. And the real discipline of statistics has to cope with dirty data. For example, it is not uncommon for patients to drop out of clinical trials of medicines. Suppose that patients who recovered while using the medicine failed to return for their next appointment, because they felt it was unnecessary since they had 37 Collecting good data A data set is incomplete if some of the observations are missing. Data may be randomly missing, for reasons entirely unrelated to the study. For example, perhaps a chemist dropped a test tube, or a patient in a clinical trial of a skin cream missed an appointment because of a delayed plane, or someone moved house and so could not be contacted for a follow-up questionnaire.

But the fact that a data item is missing can also in itself be informative. For example, people completing an application form or questionnaire may wish to conceal something, and, rather than lie outright, may simply not answer that question. Or perhaps only people with a particular view bother to complete a questionnaire. For example, if customers are asked to complete forms evaluating the service they have received, those with axes to grind may be more inclined to complete them.

Internet surveys are especially vulnerable to this kind of thing, with people often simply being invited to respond. There is no control over how representative the respondents are of the overall population, or even if the same people respond multiple times. Then we could easily draw the conclusion that the medicine did not work, since we would see only patients who were still sick. Statistics A classic case of this sort of bias arose when the Literary Digest incorrectly predicted that Landon would overwhelmingly defeat Roosevelt in the US presidential election.

Unfortunately, the questionnaires were mailed only to people who had both telephones and cars, and in these people were wealthier on average than the overall population. The people sent questionnaires were not properly representative of the overall population. As it turned out, the bulk of the others supported Roosevelt. Another, rather different kind of case of incorrect conclusions arising from failure to take account of missing data has become a minor statistical classic. This is the case of the Challenger space shuttle, which blew up on launch in , killing everyone on board.

The night before the launch, a meeting was held to discuss whether to go ahead, since the forecast temperature for the launch date was exceptionally low. Data were produced showing that there was apparently no relationship between air temperature and damage to certain seals on the booster rockets. However, the data were incomplete, and did not include all those launches involving no damage. This was unfortunate because the launches when no damage occurred were predominantly made at higher temperatures.

A plot of all of the data shows a clear relationship, with damage being more likely at lower temperatures. These estimates are derived from statistical models built as described in Chapter 6 using data from previous customers who have already 38 repaid or failed to repay. But there is a problem. Previous customers are not representative of all people who applied for a loan.

After all, previous customers were chosen because they were thought to be good risks. Any statistical model which fails to take account of this distortion of the data set is likely to lead to mistaken conclusions. In this case, it could well mean the bank collapsing. The second popular approach to handling missing values is to insert substitute values. For example, suppose age is missing from some records. Then we could replace the missing values by the average of the ages which had been recorded.

Although this results in a complete d data set, it also has disadvantages. Essentially we would be making up data. If there is reason to suspect that the fact that a number is missing is related to the value it would have had for example, if older people are less likely to give their age then more elaborate 39 Collecting good data If only some values are missing for each record e.

One is simply to discard any incomplete records. This has two potentially serious weaknesses. If records of a particular kind are more likely to have some values missing, then deleting these records will leave a distorted data set. The second serious weakness is that it can lead to a dramatic reduction in the size of the data set available for analysis.

For example, suppose a questionnaire contains questions. It is entirely possible that no respondent answered every question, so that all records may have something missing. This means that dropping incomplete responses would lead to dropping all of the data. We need to construct a statistical model, perhaps of the kind discussed in Chapter 6, of the probability of being missing, as well as for the other relationships in the data.

Statistics It is also worth mentioning that it is necessary to allow for the fact that not all values have been recorded. It is common practice to use a special symbol to indicate that a value is missing. But sometimes numerical codes are used, such as for age. In this case, failure to let the computer know that represents missing values can lead to a wildly inaccurate result. Imagine the estimated average age when there are many values of included in the calculation.

In general, and perhaps this should be expected, there is no perfect solution to missing data. All methods to handle it require some kind of additional assumptions to be made. The best solution is to minimize the problem during the data collection phase. Incorrect data Incomplete data is one kind of data problem, but data may be incorrect in any number of ways and for any number of reasons. There are both high and low level reasons for such problems.

Crime rate, referred to in Chapter 1, provides an example of this. Suicide rate provides another. Typically, suicide is a solitary activity, so that no one else can know for certain that it was suicide. Often a note is left, but not in all cases, and then evidence must be adduced that the death was in fact suicide. This moves us to murky ground, since it raises the question of what evidence is relevant and how much is needed.

Moreover, many suicides disguise the fact that they took their own life; for example so that the family can collect on the life insurance. The Agency then tries to classify them to identify commonalities, so that steps can be taken to prevent accidents happening in the future. Even the same incident can be described very differently.

### Featured channels

For example, a common tendency in reading instruments is to subconsciously round to the nearest whole number. Distributions of blood pressure measurements recorded using old-fashioned non-electronic sphygmomanometers show a clear tendency for more values to be recorded at 60, 70, and 80mm of mercury than at neighbouring values, such as 69 or However, later investigators have explained the inaccuracies in terms of psychological reaction time delays and the subconscious rounding phenomenon mentioned above.

We should also note the general point that the larger the data set, the more hands involved in its compilation, and the more stages involved in its processing, the more likely it is to contain errors. Statistics Other low level examples of data errors often arise with units of measurement, such as recording height in metres rather than feet, or weight in pounds rather than kilograms.

In , the Climate Orbiter Mars probe was lost when it failed to enter the Martian atmosphere at the correct angle because of confusion between pressure measurements based on pounds and on newtons. In another example of confusion of units, this time in a medical context, an elderly lady usually had normal blood calcium levels, in the range 8.

The nurse in charge was about to begin infusing calcium, when Dr Salvatore Benvenga discovered that the apparent drop was simply because the laboratory had changed the units in which it reported its results from milligrams per decilitre to milliequivalents per litre. Error propagation Once made, errors can propagate with serious consequences. For example, budget shortfalls and possible job layoffs in Northwest Indiana in were attributed to the effect of a mistake in just one number working its way up through the system. Unfortunately, this mistaken value was used in calculating tax rates.

This led to a reported fall of 2. In some contexts, this initial stage can take longer than the later analysis stages. Of course, outlier detection is not a universal solution to detecting data errors.

## The eye : a very short introduction

After all, errors can be made that lead to values which appear perfectly normal. The best answer is to adopt data-entry practices that minimize the number of errors. I say a little more about this below. If an apparent error is detected, there is then the problem of what to do about it. We could drop the value, regarding it as missing, and then try to use one of the missing value procedures mentioned above. Sometimes we can make an intelligent guess as to what the value should have been. For example, suppose that, in recording the ages of a group of students, one had obtained the string of 43 Collecting good data A key concept in data cleaning is that of an outlier.

An outlier is a value that is very different from the others, or from what is expected. It is way out in the tail of a distribution. Sometimes such extreme values occur by chance. For example, although most weather is fairly mild, we do get occasional severe storms. But in other instances anomalies arise because of the sorts of errors illustrated above, such as the anemometer which apparently reported a sudden huge gust of wind every midnight, coincidentally at the same time that it automatically reset its calibration.

So one good general strategy for detecting errors in data is to look for outliers, which can then be checked by a human. These might be outliers on single variables e. Statistics values 18, 19, 17, 21, 23, 19, , 18, 18, Studying these, we might think it likely that the had been entered into a wrong column, and that it should be As with all statistical data analysis, careful thought is crucial.

It is not simply a question of choosing a particular statistical method and letting the computer do the work. The computer only does the arithmetic. The example of student ages in the previous paragraph was very small, just involving ten numbers, so it was easy to look through them, identify the outlier, and make an intelligent guess about what it should have been. But we are increasingly faced with larger and larger data sets.

It will often be quite infeasible to explore all the values manually. We have to rely on the computer. Statisticians have developed automatic procedures for detecting outliers, but these do not completely solve the problem. And then there is the question of what to do about an apparent anomaly detected by the computer.

Again, human examination and correction is impracticable. To cope with such situations, statisticians have again developed automated procedures. Some of the earliest such automated editing and correcting methods were developed in the context of censuses and large surveys. But they are not foolproof. The bottom line is, I am afraid, once again, that statisticians cannot work miracles. Poor data risk yielding poor meaning inaccurate, mistaken, error-prone results.

The best strategy for avoiding this is to ensure good-quality data from the start. They vary according to the application domain and 44 the mode of data capture. For example, when clinical trial data are copied from hand-completed case record forms, there is a danger of introducing errors in the transcription phase. This is reduced by arranging for the exercise to be repeated twice, by different people working independently, and then checking any differences. When applying for a loan, the application data e. In general, forms should be designed so as to minimize errors.

They should not be excessively complicated, and all questions should be unambiguous. It is obviously a good idea to conduct a small pilot survey to pick up any problems with the data capture exercise before going live. Observational versus experimental data It is often useful to distinguish between observational and experimental studies, and similarly between observational and experimental data. Or, in a study of the properties of distant galaxies, those properties would be observed and recorded. In both of these examples, the researchers simply chose who or what to study and then recorded the properties of those people or objects.

There is no notion of doing something to the people or galaxies before measuring them. In contrast, in an experimental study the researchers would actually manipulate the objects in some way. But the computer is just doing what it is told, using the data provided. Statistics volunteers to a particular medication, before taking the measurements.

One fundamental difference between observational and experimental studies is that experimental studies are much more effective at sorting out what causes what.

For example, we might conjecture that a particular way of teaching children to read method A, say is much more effective than another method B. In an observational study, we will look at children who have been taught by each method, and compare their reading ability. This raises a potential problem. It means that it is possible that there are other differences between the two reading groups, as well as teaching method. For example, to take an extreme illustration, a teacher may have assigned all the faster learners to method A.

Or perhaps the children themselves were allowed to choose, and those already more advanced in reading tended to choose method A. Experimental studies overcome this possibility by deliberately choosing which child is taught by each method. However, as it happens, experimental studies have an even more powerful way of choosing which child receives which method, called randomization. I discuss this below. In general, when collecting data with the aim of answering or exploring certain questions, the more data that are collected, the more accurate an answer that can be obtained.

This is a consequence of the law of large numbers, discussed in Chapter 4. But collecting more data incurs greater cost. It is therefore necessary to strike a suitable compromise between the amount of data collected and the cost of collecting it. Various subdisciplines of statistics are central to this exercise. In particular, experimental design and survey sampling are two key disciplines.

We do not have much opportunity to expose different galaxies to different treatments! However, if we do want to know what would be the effect of a potential intervention, then experimental studies are the better strategy. They are universal in the pharmaceutical sector, very widespread in medicine and psychology, ubiquitous in industry and manufacturing, and increasingly used to evaluate social policy and in areas such as customer value management.

Experimental design Statistics We have already seen examples of very simple experiments. One of the simplest is a two-group randomized clinical trial. Here the aim is to compare two alternative treatments A and B, say so that we can say which of the two should be given to a new patient. If, on average, A beats B, then we will recommend that the new patient receives treatment A. Now, as we have already noted above, if the two groups of patients differ in some way, then the conclusions we can draw are limited. If those who received treatment A were all male, and those who received treatment B were all female, then we would not know if any difference between the groups that we observed was due to the treatment or to the sex difference: maybe females get better faster, regardless of treatment.

The same point applies to any other factor — age, height, weight, duration of illness, previous treatment history, and so on. The strength of this approach is that, while it does not guarantee balance e. In fact, it is possible to go further than this and work out just how likely different degrees of imbalance are. A study is double blind if neither the patient nor the doctor conducting the trial knows which treatment the patient is receiving.

This can be achieved by making the tablets or medicines look identical, and simply coding them as X or Y without indicating which of the treatments is which. Only later, after the analysis has revealed that X is better than Y, is the coding broken, to show that X is really treatment A or B as the case may be. But now suppose that the farmer also wants to know which of low and high levels of fertilizer is more effective. The obvious thing to do is to conduct another two-group experiment, this time with four greenhouses receiving the low level of fertilizer and four receiving the high level.

This is all very well, but to answer both of the questions, the water one and the fertilizer one, requires a total of sixteen greenhouses. If the farmer is also interested in the effectiveness of low and high levels of humidity, temperature, hours of sunlight, and so on, we see that we will soon run out of greenhouses. However, for the sake of variety, I shall switch examples. A market gardener might want to know which of low and high levels of water is better, in terms of producing greater crop yield.

He could conduct a simple two-group experiment, of the kind described above, to determine this. Since we know that outcomes are not totally predictable, he will want to expose more than one greenhouse to the low level of water, and more than one to the high level, and then calculate the average yields at each level.

He might, for example, decide to use four greenhouses for each level. This is precisely the same sort of design as in the teaching methods study above. Statistics Now, there is a very clever way round this, using the notion of a factorial experimental design. This requires just eight greenhouses, and yet we are still treating four of them with the low water level and four with the high water level, as well as four with the low fertilizer level and four with the high fertilizer level, so that the results of the analysis will be just as accurate as when we did two separate experiments.

It allows us to see if the impact of the level of fertilizer is different at the two levels of water: perhaps the difference between yields with the low and high levels of fertilizer varies between the two levels of water. This so-called interaction effect cannot be examined in the two separate experiments approach. This basic idea has been extended in many ways to yield very powerful tools for obtaining accurate information for the minimum cost. Sometimes, in experiments, non-statistical issues are important.

For example, in clinical trials and other medical and social policy investigations, ethical issues may be relevant. In a clinical trial comparing a proposed new treatment against an inactive placebo, we will know that half of the volunteer patients will receive something which has no biological impact. Is that appropriate? Is there a danger that those exposed to the proposed new treatment might suffer from side effects? Put this way, it probably sounds like a tall order, but statistical ideas and tools that have these properties do exist.

The key idea is one we have met several times before: the notion of a sample. Now clearly we have to be careful about exactly which thousand we ask. The reasons are essentially the same as when we were designing a simple two-group experiment and had to take steps to ensure that the only difference between the groups was that one received treatment A and one received treatment B.

Now we have to ensure that the particular thousand people we approach are representative of the full population of a million. Ideally, our sample of a thousand should have the same proportion of men in it as the 51 Collecting good data Imagine that, in order to run the country effectively, we wish to know the average income of the one million employed men and women in a certain town. In principle, we could determine this by asking each of them what their income was, and averaging the results. Apart from anything else, over the course of the time taken to collect the data it is likely that incomes would change: some people would have left or changed their jobs, others would have received raises, and so on.

Furthermore, it would be extremely costly tracking down each person. We might try to reduce costs by relying on the telephone, rather than face to face interviews. However, as we have already seen, in the extreme case of the US presidential election, there is a great risk that we would miss important parts of the population. Statistics entire population, the same number of young people, the same number of part-time workers, and so on. To some extent we can ensure this, choosing the thousand so that the proportion of men is correct, for example.

But there is obviously a practical limit to what we can deliberately balance in this way. We saw how to handle this when we looked at experimental design. Here we tackle it by randomly sampling the thousand people from the total population. Once again, while this does not guarantee that the sample will be similar in composition to the entire population, basic probability tells us that the chance of obtaining a seriously dissimilar sample is very small.

In particular, it follows that the probability that our estimate of the average income, derived from the sample, will be very different from the average income in the entire population is very small. Indeed, two properties of probability which we will explore later, the law of large numbers and the Central Limit Theorem also tell us that we can make this probability as small as we like by increasing the sample size.

It turns out that what matters is not how large a fraction of the population is included in the sample, but simply how large the sample is. Our estimate, based on a sample size of one thousand, would essentially be just as accurate if the entire population consisted of ten million or ten billion people. We cannot, for example, simply choose the thousand people from the largest employer in the town, since these may not be representative of the overall million. In general, to ensure that our sample of a thousand is properly representative we need a sampling frame, a list of all the one million employed people in our population, from which we can randomly choose a thousand.

Having such a list ensures that everyone is equally likely to be included. We draw up a sampling frame and from it randomly choose the people to be included in our sample. We then track them down interview, phone, letter, email, or whatever and record the data we want. This basic idea has been elaborated in many very sophisticated and advanced ways, yielding more accurate and cheaper approaches. For example, if we intended to interview each of the thousand respondents it could be quite costly in terms of time and travel expenses.

It would be better, from this perspective, to choose respondents from small geographically local clusters. Cluster sampling extends simple random sampling by allowing this. Instead of randomly choosing a thousand people from the entire population, it selects say ten groups of a hundred people each, with the people in each group located near to each other. Likewise, we can be certain that balance is achieved on some factors, rather than simply relying on the random sampling procedure, if we enforce the balance in the way we choose the sample. For example, we could randomly choose a number of women from the population, and separately randomly choose a number of men from the population, where the numbers are chosen so that the proportions of males and females are the same as in the population.

In general, in survey sampling, we are very lucky if we obtain responses from everyone approached. Almost always there is some non-response. We are back to the missing data problem discussed earlier, and, as we have seen, missing data can lead to a biased sample and incorrect conclusions.

If those earning large salaries refused to reply, then we would underestimate the average income in the population. Because of this, survey experts have developed a wide range of methods of minimizing and adjusting for non-response, including repeated call-backs to non-responders and statistical reweighting procedures. Statistics Conclusion This chapter has described the raw material of statistics, the data.

Sophisticated data collection technologies have been developed by statisticians to maximize the information obtained for the minimum cost. But it would be naive to believe that perfect data can usually be obtained. Recognizing this, statisticians have also developed tools to cope with poor-quality data.

## Computer Science: A Very Short Introduction by Subrata Dasgupta

But it is important to recognize that statisticians are not magicians. Since it is abundantly clear that the world is full of uncertainty, this is one reason for the ubiquity of statistical ideas and methods. The future is an unknown land and we cannot be certain about what will happen. The unexpected does occur: cars break down, we have accidents, lightning does strike, and, lest I am giving the impression that such things are always bad, people do even win lotteries. More prosaically, it is uncertain which horse will win the race or which number will come up on the throw of a die.

And, at the end of it all, we cannot predict exactly how long our lives will be. However, notwithstanding all that, one of the greatest discoveries mankind has made is that there are certain principles covering chance and uncertainty. Perhaps this seems like a contradiction in terms. Uncertain events are, by their very nature, uncertain. How, then, can there be natural laws governing such things? A classic example is the tossing of a coin. Another example in the same vein is whether a baby will be male or female.

It is, on conception, a purely chance and unpredictable event which gender the child will become. But we know that over many births just over a half will be male. This observable property of nature is an example of one of the laws governing uncertainty. This law has all sorts of implications, and is one of the most powerful of statistical tools in taming, controlling, and allowing us to take advantage of uncertainty. We return to it later in this chapter, and repeatedly throughout the book. By the early 20th century, all the ideas for a solid science of probability were in place, and in the Russian mathematician Andrei Kolmogorov presented a set of axioms which provided a complete formal mathematical calculus of probability.

Since then, this axiom system has been almost universally adopted. The probability calculus assigns numbers between 0 and 1 to uncertain events to represent the probability that they will happen. A probability of 1 means that an event is certain e. A probability of 0 means that an event is impossible e. One way of looking at this number is that it represents the degree of belief an individual has that the event will happen. Now, different people will have more or less information relating to whether the event will happen, so different people might be expected to have different degrees of belief, that is different probabilities for the event.

To use this construction to make statements about the real world, it is necessary to say what the symbols in the mathematical machinery represent in that world. The fair coin tossing example above is an illustration. Two tosses of a coin cannot really have completely identical circumstances. These two different interpretations of what is meant by probability have different properties.

On the other hand, the subjective approach shifts probability from being an objective property of the external world like mass or length to being a property of the interaction between the observer and the world. Subjective probability is, like beauty, in the eye of the beholder. Some might feel that this is a weakness: it means that different people could draw different conclusions from the same analysis of the same data. For example, if I want to know the probability that my morning journey to work will take less than one hour, it is not at all clear what the equally likely elementary events should be.

There is no obvious symmetry in the situation, analogous to that of the die. It is worth emphasizing here that all of these different interpretations of probability conform to the same axioms and are manipulated by the same mathematical machinery. I sometimes say that the calculus is the same, but the theory is different.

In statistical applications, as we will see in Chapter 5, the different interpretations can sometimes lead to different conclusions being drawn. The laws of chance Statistics We have already noted one law of probability, the law of large numbers. This is a law linking the mathematics of probability to empirical observations in the real world. Other laws of probability are implicit in the axioms of probability. Some very important laws involve the concept of independence. Two events are said to be independent if the occurrence of one does not affect the probability that the other will occur.

These two coin tosses are independent. Statisticians call the probability that two events will both occur the joint probability of those two events. For example, we can speak of the joint probability that I will slip over and that it snowed. The joint probability of two events is closely related to the probability that an event will occur if another one has occurred. This is called the conditional probability — the probability that one event will occur given that we know that the other one has occurred.

Thus we can talk of the conditional probability that I will slip over, given that it snowed. We saw another example of dependent events in Chapter 1: the tragic Sally Clark case of two cot deaths in the same family. When events are not independent, we cannot calculate the probability that both will happen simply by multiplying together their separate probabilities. Indeed, this was the mistake which lay at the root of the Sally Clark case.

To see this, let us take the most extreme situation of events which are completely dependent: that is, when the outcome of one completely determines the outcome of the other. But they are clearly not independent events. In fact, they are completely dependent. This is not what we get if we multiply the two separate probabilities of a half together. Statistics The joint probability that both events A and B occur is simply the probability that A occurs times the conditional probability that B occurs given that A occurs.

The joint probability that it snows and I slip over is the probability that it snows times the conditional probability that I slip over if it has snowed. To illustrate, consider a single throw of a die, and two events. Event A is that the number showing is divisible by 2, and Event B is that the number showing is divisible by 3. The joint probability of these two events A and B is the probability that I get a number which is both divisible by 2 and is divisible by 3.

Now, the conditional probability of B given A is the probability that I get a number which is divisible by 3 amongst those that are divisible by 2. This is the same as the joint probability of obtaining a number divisible by both 2 and 3; that is, the joint probability of events A and B both occurring. This pointed out that the probability of event A occurring given that event B had occurred was not the same as the probability of event B occurring given that event A had occurred. For example, the probability that someone who runs a major corporation can drive a car is not the same as the probability that someone who can drive a car runs a major corporation.

If I toss a coin, which obviously cannot show heads and tails simultaneously, then the probability that a head or tail will show is the sum of the probability that a head will show and the probability that a tail will show. If the coin is fair, each of these separate probabilities is a half, so that the overall probability of a head or a tail is 1. This makes sense: 1 corresponds to certainty and it is certain that a head or a tail must show I am assuming the coin cannot end up on its edge!

Returning to our die-throwing example: the probability of getting an even number was the sum of the probabilities of getting one of 2, or 4, or 6, because none of these can occur together and there are no other ways of getting an even number on a single throw of the die. But this can also be written the other way round: the probability that both events A and B will occur is also equal to the probability that B will occur times the probability that A will occur given that B has occurred.

That is, the probability of A times the probability of B given A is equal to the probability of B times the probability of A given B. Both are equal to the joint probability of A and B. Both equal the joint probability of being a corporate head and being able to drive a car. Random variables and their distributions Statistics We saw, in Chapter 2, how simple summary statistics may be used to extract information from a large collection of values of some variable, condensing the collection down so that a distribution of values could be easily understood.

We saw examples of this when we looked at survey sampling. For example, in experiments to measure the speed of light, each time I take a measurement I expect to get a slightly different value, simply due to the inaccuracies of the measurement process. Each of these measurements will be drawn from the population of values I could possibly have obtained. Once again, each value in my sample is drawn from the population of possible values. In both of these examples, all I know before I take each measurement is that it will have some value from the population of possible values.

Each value will occur with some probability, but I cannot pin it down more than that, and I may not know what that probability is. I certainly cannot say exactly what value I will 64 get in the next speed of light measurement or what will be the weight of the next man I measure. Similarly, in a throw of a die, I know that the outcome can be 1, 2, 3, 4, 5, or 6, and here I know that these are equally likely my die is a perfect cube , but beyond that I cannot say which will come up.

Like the speed and weight measurements, the outcome is random. For this reason such variables are called random variables. There is a name for the complete set of quantiles of a distribution. It is called the cumulative probability distribution. If we knew the 20th percentile for the complete population of values, then we would know that a value randomly taken from that population had a probability of 0. In a sense, then, we would know everything about the distribution of possible values which we could draw. At the limit, the probability of drawing a value less than or equal to the largest value in the population is 1; it is a certain event.

The cumulative probability distribution of a random variable tells us the probability that a randomly chosen value will be less than any given value. An alternative way to look at things is to look at the probability that a randomly chosen value will lie between any two given values. Such probabilities are conveniently represented in terms of areas between two values under a curve of the density p Probability Statistics This idea is illustrated in Figure 2. The curve shows, for any given value of the random variable, the probability that a randomly chosen value will be smaller than this given value.

A cumulative probability distribution 66 Probability density a b Value of random variable 3. A probability density function Note that the total area under the curve in Figure 3 must be 1, corresponding to certainty: a randomly chosen value must have some value. Distribution curves for random variables have various shapes. The probability that a randomly chosen woman will have a weight between 70kg and 80kg will typically not be the same as the probability that a randomly chosen man will have a weight 67 Probability of the probability.

For example, Figure 3, shows such a probability density curve, with the shaded area under the curve between points a and b giving the probability that a randomly chosen value will fall between a and b. In general, randomly chosen values are more likely to occur in regions where the probability is most dense; that is, where the probability density curve is highest.

Statistics Certain shapes have particular importance.

There are various reasons for this. In some cases, the particular shapes, or very close approximations to them, arise in natural phenomena. In other cases, the distributions arise as consequences of the laws of probability. Perhaps the simplest of all distributions is the Bernoulli distribution. Since it can take only two values, it is certain that one or the other value will come up, so the probabilities of these two outcomes have to sum to 1.

We have already seen examples illustrating why this distribution is useful: situations with only two outcomes are very common — the coin toss, with outcomes head or tail, and births, with outcomes male or female. The binomial distribution extends the Bernoulli distribution.

### Shop with confidence

If we toss a coin three times, then we may obtain no, one, two, or three heads. If we have three operators in a call centre, responding independently to calls as they come in, then none, one, two, or all three may be busy at any particular moment. The binomial distribution tells us the probability that we will obtain each of those numbers, 0, 1, 2, or 3. Of course, it applies more generally, not just to the total from three events. If we toss a coin times, then the binomial distribution also tells us the probabilities that we will obtain each of 0, 1, 2,.

Emails arrive at my computer at random. The Poisson distribution can be used to describe the probability distribution of the number of emails arriving in each hour. It can tell us the probability if emails arrive independently and the overall rate at which they arrive is constant that none will arrive, that one will, that two will, and so on. This differs from the binomial distribution because, at least in principle, there is no upper limit on the number which could arrive in any hour. With the coin tosses, we could not observe more than heads, but I could on a very bad day! Some random variables can take any positive value; perhaps, for example, the time duration of some phenomenon.

As an illustration, consider how long glass vases survive before getting broken. Glass vases do not age, so it is no more likely that a particular favourite vase will be broken in the next year, if it is 69 Probability So far, all the probability distributions I have described are for discrete random variables. Other random variables are continuous, and can take any value from some range.

Contrast this with the probability that an year-old human will die next year compared with the probability that a year-old human will die next year. For a glass vase, if it has not been smashed by time t, then the probability that it will be smashed in the next instant is the same, whatever the value of t again, all other things being equal.

Lifetimes of glass vases are said to follow an exponential distribution. In fact, there are huge numbers of applications of exponential distributions, not merely to the lifetimes of glass vases! Statistics Perhaps the most famous of continuous distributions is the normal or Gaussian distribution. The normal distribution 70 That means that values in the middle are much more likely to occur than are values in the tails, far from the middle.

The normal distribution provides a good approximation to many naturally occurring distributions. For example, the distribution of the heights of a random sample of adult men follows a roughly normal distribution. The normal distribution also often crops up as a good model for the shape of the distribution of sample statistics like the summary statistics described in Chapter 2 when large samples are involved.

For example, suppose we repeatedly took random samples from some distribution, and calculated the means of each of these samples. Since each sample is different, we would expect each mean to be different. That is, we would have a distribution of means. If each sample is large enough, it turns out that this distribution of the means is roughly normal. I have described the distributions above by saying that they have different shapes. In fact, these shapes can be conveniently described. We saw that the Bernoulli distribution was characterized by a value p.

This told us the probability that we would get a certain outcome. Different values of p correspond to 71 Probability In Chapter 2, I made the point that statistics was not simply a collection of isolated tools, but was a connected language.