ReviewEssays.com - Term Papers, Book Reports, Research Papers and College Essays
Search

Canada Diamonds

Essay by   •  June 27, 2011  •  Case Study  •  4,112 Words (17 Pages)  •  1,469 Views

Essay Preview: Canada Diamonds

Report this essay
Page 1 of 17

Introduction

The objective of this assignment is to find the best model for predicting the price of a diamond based on the four C’s вЂ" Cut, Carat, Clarity, and Color. Our goal is to see which of these variables has the greatest influence on the pricing of diamonds. In order to accomplish this, we will analyze a random selection of 44 round cut and 6 princess-cut diamonds from http://canadadiamonds.com. After analyzing our random sample, we will apply our knowledge of model building to recommend the equation that we believe best predicts the prices of diamonds.

Statistical Methods used for Analysis

After collecting the data (a random sample of 44 out of 183 round diamonds between 0.4 and 1.6 carat, and 6 out of 25 princess cut diamonds of the same carat range) the first action we had to do was transform the data from a test to numerical format in order to use it in determining a model for regression. The predictors for price that were given in text format on www.canadadiamonds.com were: Color, Cut, and Clarity. For Color and Clarity we were able to use a scale provided on the case sheet as these were scaled variables, as in there were better versus worse clarities of diamonds, and common to rare colors. The scale we followed was:

Code 1 2 3 4 5 6 7 8 9 10 11

Clarity I3 I2 I1 SI3 SI2 SI1 VS2 VS1 VVS2 VVS1 F

Color D E F G H I J K L M N+

As for cut since, we were only asked to take a sample of princess and round, we used a 0/1 scale. We chose round to be 0, as round was the more common form of diamond, the standard; princess on the other hand was a special cut of diamond, and would have a more significant impact on price.

Now having a numerical set of data, to get better acquainted with the data set, we looked at the basic statistics.

Descriptive Statistics: Carat, Price, Cut-Dum, Color-Dum, Clarity-Dum

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3

Carat 50 0 0.6846 0.0352 0.2486 0.4000 0.4850 0.6000 0.9025

Price 50 0 2557 245 1731 600 1087 2315 3236

Cut-Dum 50 0 0.1200 0.0464 0.3283 0.0000 0.0000 0.0000 0.0000

Color-Dum 50 0 4.700 0.265 1.876 1.000 3.000 4.500 6.000

Clarity-Dum 50 0 5.760 0.248 1.756 3.000 5.000 6.000 7.000

Using descriptive statistics we took note of the response mean, Price being 2557, this will prove useful later in determining how fitting a model our options were based on the S value. We also compared the means of each data column to its median, as we saw there seemed to be no significant skewing involved. Next, we performed a correlation matrix to determine which variables would have the largest impact on price, and came up with the following:

Correlations: Price, Carat, Cut-Dum, Color-Dum, Clarity-Dum

Carat Cut_Dum Color-Dum Clarity-Dum

Price 0.894 0.037 -0.198 0.164

0.000 0.798 0.168 0.256

The stand out variable in the data set was Carat having, by a good amount, the most significant absolute value at 0.894. Keeping this mind, a scatter plot of the data was then taken of each predictor (carat, cut, color, and clarity) against price. We performed the scatter plot to better determine what sort of regression (linear, quadratic, cubic) each variable could possibly have.

Looking closely at this plot, we determined it was possible that Carat could have a non-linear relationship with Price. We then created two more columns of data, Carat^2, and Carat^.5.

Having all of our data, including the transformations of the predictors, we came up with our best sub-sets as follows:

C

l

C a

o r C

C l i a C

u o t r a

C t r y a r

a - - - t a

r D D D ^ t

Mallows a u u u . ^

Vars R-Sq R-Sq(adj) Cp S t m m m 5 2

1 80.0 79.6 75.9 781.96 X

1 80.0 79.6 76.1 782.56 X

2 84.7 84.1 49.3 691.37 X X

2 84.0 83.3 53.4 706.47 X X

3 88.2 87.5 29.9 613.26 X X X

3 87.7 86.9 33.0 626.68 X X X

4 90.7 89.9 16.6 550.03 X X X X

4 89.9 89.0 21.9 575.45 X X X X

5 90.8 89.8 17.8 552.63 X X X X X

5 90.7 89.7 18.5 556.05 X X X X X

6 93.0 92.0 7.0 490.65 X X X X X X

After examining the subset table we decided to take a closer look at a model including all of our variables. We choose this model because it had the smallest S value, highest R-Sq and R-Sq(adj) values, and the difference between the two was less than 2%.

Regression Analysis: Price versus Carat, Cut-Dum, ...

The regression equation is

Price = 41985 + 129804 Carat - 916 Cut-Dum - 203 Color-Dum + 279 Clarity-Dum - 140882 Carat^.5 - 26552 Carat^2

...

...

Download as:   txt (14.3 Kb)   pdf (153.5 Kb)   docx (15.7 Kb)  
Continue for 16 more pages »
Only available on ReviewEssays.com