- 2022-08-13 发布 |
- 37.5 KB |
- 29页
申明敬告: 本站不保证该用户上传的文档完整性,不预览、不比对内容而直接下载产生的反悔问题本站不予受理。
文档介绍
高级统计学实验教程
高级统计学实验教程(初稿)编著葛虹哈尔滨工业大学管理学院\n2007年12月\n第一节回归分析的SPSS实现第二节Logistic回归步骤以及SPSS实现第三节聚类分析的SPSS实现第四节判别分析的SPSS实现11第五节主成分分析的SPSS实现16第六节因子分析的SPSS实现19附表132对夫妻身高数据24附表2金融机构监管数据25\n\n第一节回归分析的SPSS实现案例:考察32对夫妻的身高情况,建立丈夫与妻子身高之间的数量关系(见附表1)。第一步:建立数据文件第二步:观察散点图(Graphs―catter)选择"Simple”,点击“Define”将"Heightofhus”放进“YAxis",将“Heightofwife”放进"XAxis"点击"0K"\n第三步:建立回归方程(Analyze—►Regression—Linear)将“Heightofhus”放进“Dependent”,将“Heightofwife”放进uIndependent",再点击"Statistics”选择"Estimates”、"Modelfit”和^Confidenceintervals,再点击"Continue”去选择"Plots”将“ZPRED”放入“X”,将“ZRESID”放入“Y”,同时选择“Normalprobabilityplot",再点击uContinue"\nOKPasteRetelCancelHdpM^hodSelectionVariable|PJots•Save…|Qptions...|LinearRegression®HeighlofwtfeDependentI"||HeightofhusBlock1of1Nextndependent($tHeightofv^e图4点击“Options”图5对"Includeconstantinequation”进行选择,点击“Continue”,再点击"OK”完成执行过程第四步:结果解释ModelSummaryModelRRSquareAdjustedRSquareStd.ErroroftheEstimate1・63L.407.3877.62989a・Predictors:(Constant),Heightofwifeb・DependentVariable:Heightofhus由于“RSquare"仅为0.407,所以该线性模型不足以解释夫妻之间的身高关系。\nanov¥ModelSumofSquaresdfMeanSquareFSig.1RegressionResidualTotal1197.4211746.4542943.875130311197.42158.21520.569・000aa・Predictors:(Constant),Heightofwifeb・DependentVariable:Heightofhus方差分析结果表明:模型是显著的。Coefficient^ModelUnstancCoeffiardizedcientsStandardizedCoefficientstSig.95%ConfidenceIntervalforBBStd.ErrorBetaLowerBoundUpperBound1(Constant)Heightofwife42.760.80329.396.177.6381.4554.535.156.000-17.274.442102.7941.165a・DependentVariable:Heightofhus回归模型可以写成:Heightofhus=42.760+0.803*HeightofwifeT检验结果表明:斜率是显著的,而常数项不显著(从而在进行回归时,可以不包含常数项)ScatterplotNormalP-PPlotofRegressionStandardizedResidualDependentVariable:Heightofhus1.0-0.6・CM-fi.2-n.o-DependentVariable:Heigbtofhusqo】ds^upsoodxmO«>0.20.40.60.81.0ObservedCumProbRegressionStandardizedPredictedValueP-P图和残差图表明:误差的独立.等方差性和正态性基本成立。\n第二节Logistic回归步骤以及SPSS实现案例:(数据见附表2)Detectingailingfinancialandbusinessestablishmentsisanimportantfunctionofauditandcontrol.Table1givessomeoftheoperatingfinancialratiosof33firmsthatwentbankruptafter2yearsand33thatremainedsolventduringthesameperiod.Threefinancialratioswereavailableforeachfirm:retainedearningtotalassetsearningbeforeinterestandtaxettotalassetssalesX3=totalassets第一步:建立数据文件第二步:Logistic回归(Analyze—Regression——►BinaryLogistic...)将“倒闭情况”放入“Dependent”,将“保留收入比”“利税前收入比”“销售比”放入"Covariates",再点击"Options"。\n可选择"Atlaststep”和uIncludeconstantinmodelv,点击“Continue”,再点击“OK”完成执行过程。第三步:结果解释ModelSummaryStep-2LoglikelihoodCox&SnellRSquareNagelkerkeRSquare15.813a.727.969a・Estimationterminatedatiterationnumber12becauseparameterestimateschangedbylessthan.001・表中的“RSquare”较大,说明模型的解释能力较强ClassificationTableObservedPredicted倒闭情况PercentageCorrect.001.00Step!倒闭情况.0032197.01.0013297.0OverallPercentage97.0a.Thecutvalueis.500建立的模型分别对33个倒闭单元和33个没有倒闭单元中的32个进行了正确划分;总分辨率为97%。VariablesintheEquationBS.E.WalddfSig.Exp(B)S』ep保曲收入比.331.3011.2131.2711.3931利税前收入比.181.1072.8621.0911.198销售比5.0875.0821.0021.317161.979Constant-10.15310.840.8771.349.000a・Variable(s)enteredonstep1:保留收入比,利税前收入比,销售比.Logistic回归模型是:第i个金融机构有偿付能力的概率为-1+exp[—(一10.153+0.331xh.+0.181x2/+5.087x3/)]\n但Wald检验表明:常数项、“保留收入比”和“销售比”的系数都不够显著,有必要对模型中的自变量进一步进行选择。第三节聚类分析的SPSS实现案例:将2005年我国15个地区社会发展情况进行分类(见附表3屮的前15个数据)。第一步:建立数据文件第二步:系统聚类分析(Analyze—Classify—Hierarchicalcluster)将所有变量放入"Variables”,将"region”放入“Labelcasesby",再点击"Plots"。\n选择"Dendrogram”,再点击"Continue”。点击“Method”。HierarchicalClusterAnalysis:MethodClusterMethod:|BetweeregroupslinkageMeasureInterval:'Counts:■Within-groupslinkageNearestneighborFurthestneighborCentroidclusteringMedianclusteringSquaredEuclideandistance▼!TransformValuesStandardize:|NoneQByvariableCBycaseContinu专TransformMeasuresAbsolutevalues•ChangesignRescaleto0-1range在此选择系统聚类方法屮的一种。在"Continue"以及“”完成执行过程。第三步:结果解释\nVerticalIcicle第四节判别分析的SPSS实现案例:利用第二节金融监管数据,对其进行判别分析(见附表2)。第一步:建立数据文件第二步:判别分析过程(Analyze—Classify—►Discrimnant...)\n将组变量“倒闭情况”放入"GroupingVariable"o点击"DefineRange”。将0放入“Minimum”,将1放入"Maximum”。再点击"Continue”。将所有变量放入“Independents”,再点击“Statistics”。\nGroupingVariable:倒闭情呪(01TPriorProbabilitiesQ^llgroupsequal'ComputefromgroupsizesUseCovarianceMatrixQWithin-groupsfSeparate-groupsContinue点击“Means”^UnivariateANOVAs"“Box's”“Fisher's”Within-groupscorrelation^,再点击“Continue”和“Classify”。CancelHelpLimitcasestofirst:IDisplayCasewiseresultsSummarytableLeaye-one-outclassificationPlotsCombinedgroupsI"SeparategroupsI~TerritorialmapReplacemissingvalueswithmean点击"ComputefromgroupsizesvuSummarytable""Leaveoneoutclassification”,再点击"Continue"和"OK”完成执行过程。第三步:结果解释GroupStatistics倒闭情况MeanStd.DeviationValidN(listwise)UnweightedWeighted.00保笛收入比-62.512171.312533333.000利税前收入比・31.769751.347513333.000销售比1.50301.162293333.0001.00保留收入比35.251516.507753333.000利税前收入比15.318210.867773333.000销售比1.9394.930033333.000Total保留收入比-13.630371.161576666.000利税前收入比-8.225843.806316666.000销售比1.72121.067356666.000\nTestsofEqualityofGroupMeansWilks*LambdaFdf1df2Sig.保留收入比.52158.866164.000利税前收入比.70726.562164.000销售比.9582.836164.097注:检验对于每个变量各组是否有显著的差异,由检验结果可以看出各变量均有显著的可判别性。PooledWithin-GroupsMatrices保留收入比利税前收入比销售比Correlation保留收入比1.000.438-.136利税前收入比.4381.000-.561销售比-.136-.5611.000说明:由该表可以观察变显间的共线性,共线性的存在会导致联合判别的误差。Box'sTestofEqualityofCovarianceMatricesLogDeterminants倒闭情况RankLogDeterminant.00315.4651.00310.068Pooledwithin-groups314.610Theranksandnaturallogarithmsofdeterminantsprintedarethoseofthegroupcovariancematrices.注:协方差阵行列式的值反映协方差阵的相近程度,若利用Bayes判别,这两类协方差阵行列式的值越接近越好。TestResultsBox'sM117.971FApprox・18.659df16df229676.679Sig..000Testsnullhypothesisofequalpopulationcovariancematrices・注:这是协方差阵相等的BoxM检验,由检验结果可知:两组的协方差阵显著地不相等。因此,最好使用Fisher判别法或Logistic判别。SummaryofCanonicalDiscriminantFunctionsEigenvaluesFunctionEigenvalue%ofVarianeeCumulative%CanonicalCorrelation11.325a100.0100.0.755a・First1canonicaldiscriminantfunctionswereusedintheanalysis・这部分是利用Fisher判别法的分析结果。第一个判别函数的判别效率是100%。Wilks*LambdaTestofFunction(s)Wilks'LambdaChi-squaredfSig.1.43052.7333.000\n注:Wilks检验各组均值是否相等,由检验结果可知:各组均值有显著的差异,因此,利用这样的样本建立判别函数是有效的。StandardizedCanonicalDiscriminantFunctionCoefficientsFunction1保留收入比.643利税前收入比.627销售比.622Fisher判别函数:y=0・643兀:+0・627x;+0・622兀;(标准化)StructureMatrixFunction1保留收入比.833利税前收入比.560销售比.183Pooledwithin-groupscorrelationsbetweendiscriminatingvariablesandstandardizedcanonicaldiscriminantfunctionsVariablesorderedbyabsolutesizeofcorrelationwithinfunction.a:结构矩阵给出了每个变呈与第一Fished别函数的相关系数。由表可以看出“保留收入比”的判别能力最强,其次是“利税前收入比笃再次是“销售比”。CanonicalDiscriminantFunctionCoefficientsFunction1保苗收入比.012利税前收入比.017销售比.591(Constant)-.709UnstandardizedcoefficientsFisher判别函数:y=0.012^+0.017x2+0.59k3-0.709FunctionsatGroupCentroids倒闭情况Function1.00-1.1341.001.134Unstandardizedcanonicaldiscriminantfunctionsevaluatedatgroupmeans注:这是将各组样本投影后每组的重心。ClassificationStatistics(Bayes方法的分类结果)\nPriorProbabilitiesforGroups倒闭情况PriorCasesUsednAnalysisUnweightedWeighted.00.5003333.0001.00.5003333.000Total1.0006666.000注:每一组的先验概率ClassificationFunctionCoefficients倒闭情况.001.00保宙收入比-.024.004利税前收入比.016.054销售比1.5052.844(Constant)-2.329-3.935Fisher'slineardiscriminantfunctionsBayes判别函数:p(0|X)oc-0.24兀]+0.016x2+1,505x3一2.329p(l|X)oc0.004X,+0.054x2+2.844x3-3.935ClassificationResult®'倒闭情况PredictedGroupMembershipTotal.001.00OriginalCount.00285331.0013233%.0084.815.2100.01.003.097.0100.0Cross-validated3Count.00285331.0013233%.0084.815.2100.01.003.097.0100.0a・Crossvalidationisdoneonlyforthosecasesintheanalysis.Incrossvalidation,eachcaseisclassifiedbythefunctionsderivedfromallcasesotherthanthatcase・b・90.9%oforiginalgroupedcasescorrectlyclassified・C・90.9%ofcross-validatedgroupedcasescorrectlyclassified.注:利用Bnyes判别函数对原始样本的重新分类和交叉验证结果第五节主成分分析的SPSS实现案例:利用我国各地区2005年社会发展数据,进行主成分分析(见附表3)。第一步:建立数据文件第二步:执行主成分分析(Analyze—►DataReduction—>Factor)\nQnumA,regonyanabies:人均GDP新増固定疔1城宙可支配收入农村範收入■茴心■■二LJc■、鼻ResetCanedHelp(SelectionVariable:ErDescriptives...IExlrdction..Rddion...Scores...Options..将所有变量放入“Variables”,点击“Descriptives...”。FactorAnalysis:DescriptivesStafetic$厂垃也丫岂世竺兰.2U2!喧3VInitialsolution|Continue|CancelCorrelationMatrixCoefficientsSignificancelevelsDeterminant厂厂厂厂厂Inverse厂Reproduced厂Arti-imageKMOandBartlett'stestof$pherici(y选择"Univariatedescriplives冷再点击"ContinueMo点击"Scores…"。选择"Displayfactorscorecoefficientmatrix”,再点击“Continue”和“OK”完成执行过程。第三步:结果解释\nDescriptiveStatisticsMeanStd.DeviationAnalysisN人均GDP1218942820323.0293931新增固定资产1425.15481140.6471331城镇可支配收入10195.822931.2219531农村纯收入3511.54871601.3483131高校数57.806528.6838231卫生机构数9645.06455489.8733431CorrelationMatrix人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构数Correlation人均GDP1.000.447.913.959.272-.171新增固定资产.4471.000.488.525.795.570城镇可支配收入.913.4881.000.942.314-.088农村纯收入.959.525.9421.000.376-.056高校数.272.795.314.3761.000.730卫生机构数-.171.570-.088-.056.7301.000注:该表是相关系数阵CommunalitiesInitialExtraction人均GDP1.000.959新增固定资产1.000.850城镇可支配收入1.000.935农村纯收入1.000.971高校数1.000.897卫牛机构数1.000.891Extract!onMethod:PrincipalComponentAnalysis.说明:前两个主成分从“人均GDP”中抽取出95.9的信息;从“新增固定资产净值”中抽取出85%的信息,这也是抽取信息最少的指标。TotalVarianceExplainedComponentInitialEigenvaluesExtractionSumsofSquaredLoadingsTotal%ofVarianeeCumulative%Total%ofVarianeeCumulative%13.51558.59158.5913.51558.59158.59121.98833.13391.7241.98833.13391.7243.2213.68995.4134.1622.69998.1125.0851.41099.5226.029.478100.000ExtractionMethod:PrincipalComporientAnalysis.注:由特征值大于1原则,本次分析保留了两个主成分,且累计贡献率达91.72%o\nComponentMatrixComponent12人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构数.866.788.886.920・667.261-.456.479-.388-.353・673.907ExtractionMethod:PrincipalComponentAnalysis.a・2componentsextracted.注:本表给岀了标准化原始变量与两个主成分近似线性关系的系数。如:标准化的GDP-0.866^-0.456^ComponentScoreCoefficientMatrixComponent12人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构数.246.224.252.262.190.074-.229.241-.195-.177.338.456ExtractionMethod:PrincipalComponentAnalysis.注:该表给出的是标准化主成分的系数。如:标准化乙二=0.246标准化“GDP”+0・224标准化“新增固定资产净值”+0・252标准化“城镇收入-+0.262标准化“农村收入”+0・19标准化“高校数,+0・074标准化“卫生机构”1.第一主成分的表达式:K=0.24673.515览T218942+OR?菽3.515^^^+0.2527^^X?—10195.821820323.031140.652931.22+0.262聞"351L55+o」9聞兀-57.81十。。了仆血-9645.061601.3528.685489.872・第二主成分的表达式:K=-0.229jl.988X】—218942十Q24[打両X?-1425」5_°」9^988X3—10195.82-820323.031140.652931.22_1.77吋邑也互+0.338;而gi!+0.456;而・2竺色1601.352&685489.87DescriptiveStatisticsMeanStd・DeviationAnalysisN人均GDP1218942820323.0293931新增固定资产1425.15481140.6471331城镇可支配收入10195.822931.2219531农村纯收入3511.54871601.3483131高校数57.806528.6838231卫生机构数9645.06455489.87334313.齐与标准化“GDP”的相关系数:0.246x71515乙与标准化“高校数”的相关系0.19x73715\n第六节因子分析的SPSS实现案例:利用我国各地区2005年社会发展数据,进行因子分析(见附表3)。第一步:建立数据文件第二步:执行主成分分析(Analyze—DataReduction—Factor)将所有变量放入“Variables”,点击“Descriptives…”。FactorAnalysis:DescriptivesStatistics厂辿nivariatedescriptive^VInitialsolution■CorrelationMatrix厂Coefficient$厂InverseI"SignificaneelevelsI-Reproduced厂Determinant厂Anti-imageKMOandBartlett'stestofsphericity选择"Univariatedescriplives冷再点击"ContinueMo点击"Rotation”。\nFactorAnalysis:Rotation选择“Varimax",在点击“Continue”。点击"Scores…"。选择"Displayfactorscorecoefficientmatrix”,再点击“Continuev和“OK”完成执行过程。\n第三步:解释结果DescriptiveStatisticsMeanStd.DeviationAnalysisN人均GDP1218942820323.0293931新增固定资产1425.15481140.6471331城镇可支配收入10195.822931.2219531农村纯收入3511.54871601.3483131高校数57.806528.6838231卫生机构数9645.06455489.8733431CorrelationMatrix人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构数Correlation人均GDP1.000.447.913.959.272-.171新增固定资产.4471.000.488.525.795.570城镇可支配收入.913.4881.000.942・314-.088农村纯收入.959.525.9421.000.376-.056高校数.272.795.314.3761.000.730卫主机构数-.171.570-.088-.056.7301.000CommunalitiesInitialExtraction人均GDP1.000.959新增固定资产1.000.850城镇可支配收入1.000.935农村纯收入1.000.971高校数1.000.897卫生机构数1.000.891Extract!onMethod:PrincipalComponentAnalysis.注:共同度表。即:0.9782+0.0432=0.9590.4382+0.8112=0.850TotalVarianceExplainedComponentInitialEigenvaluesExtractionSumsofSquaredLoadinqsRotationSumsofSquaredLoadingsTotal%ofVarianceCumulative%Total%ofVarianceCumulative%Total%ofVarianeeCumulative%13.51558.59158.5913.51558.59158.5913.12752.11552.11521.98833.13391.7241.98833.13391.7242.37739.60991.7243.2213.68995.4134.1622.69998.1125.0851.41099.5226.029.478100.000Extract!onMethod:PrincipalComponentAnalysis.ComponentMatrixComponent12\n人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构数.866.788.886.920.667.261-.456.479-.388-.353.673.907ExtractionMethod:PrincipalComponentAnalysis.a・2componentsextracted・注:该表是由主成分分析法得到的因子载荷阵的初始阵,即:标准化GDP=0.866一0.456鬥+吕标准化“新增固定资产”=0.788片+0.479&+5标准化“城镇收入”=0.886F,-0.388E+£3标准化“农村收入”=0.920片一0.353厲+6标准化“高校数”=0.667片+0.673冷标准化“卫生机构”=0.261片+0.907竹+6RotatedComponentMatrixComporient12人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构数.978.438.961.973.236-.232.043.811.112.159.917.915ExtractionMethod:PrincipalComporientAnalysis.RotationMethod:VarimaxwithKaiserNormalization.a.Rotationconvergedin3iterations.注:(1)这是经过正交旋转后的因子载荷阵,即:标准化GDP=0.978耳+0.043的+吕标准化“新增固定资产”二0.438片+0.81迟+5标准化“城镇收入”=0.961^+0.112^+标准化“农村收入”二0.973耳+0.159场+6标准化“高校数”二0.236片+0.917&+®标准化“卫生机构”二一0.232耳+0.915鬥+&(2)由此载荷阵可以对因子进行命名:F,—一收入因子F2——发展保障因子\nComponentTransformationMatrixComponent121.863.5042-.504.863ExtractionMethod:PrincipalComporientAnalysis・RotationMethod:VarimaxwithKaiserNormalization.注:该表是正交旋转矩阵ComponentScoreCoefficientMatrixComponent12人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构数.328.072.316.316-.007-.166-.074.321-.041-.021.388.431ExtractionMethod:PrincipalComponentAnalysis・RotationMethod:VarimaxwithKaiserNormalization.ComporientScores・注:该表给出每个因子关于标准原始变量的系数,如:斤=0.3283^+0.072仝竺匕+0・3163殴1820323.031140.652931.22+03]6「3511.55_0007*厂57札0]66*6-9645.061601.3528.685489.87=-0.074X2-1425.151140.65-0.041X3—10195.822931.22--0.021X4—3511.551601.35+0.38严一5間+0.43円-9645.0628.685489.87\n附表132对夫妻身高数据numheightofhusheightofwifenumheightofhusheightofwife1186175171681672180168181831743160154191881734186166201661645163162211801636172152221761637192179231851718170163241691619174172251821671019117026162160111821702716916512178147281761671318116529180175141681623015715715162154311701721618816632186181\n附表2金融机构监管数据倒闭保留收入比利税前收入比销售比经营良好保留收入比利税前收入比销伟比-62.83.3-120.8-18.1-3.8-61.2-20.3-194.520.8-106.1-39.4-164.1-308.97.2-118.3-185.9-34.6-27.9-48.2-49.2-19.2-1&1-98-129■4-8.7-59.2-13.1-38-57.9-8.8-64.7-11.4-89.51.714316.41.3-3.51.1147161.9-103.22.51-3.342.7-28.81.113520.81.9-50.60.9146.712.60.9-56.21.7120.812.52.4-17.4113323.61.5-25.80.5126.110.42.1-4.31168.613.81.6-22.91.5137.333.43.5-35.71.215923.15.5-17.71.3149.623.81.9-65.80.8112.571.8-22.62137.334.11.5-34.21.5135.34.20.9-2806.7149.525.12.6-19.43.4118.113.546.31.3131.415.71.96.81.6121.5-14.41-17.20.318.55.81.5-36.70.8140.65.81.8-6.50.9134.626.41.8-20.81.7119.926.72.3-14.21.3117.412.61.3-15.82.1154.714.61.7-36.32.8153.520.61」-12.82」135.926.42-17.60.9139.430.51.91.61.2153.17.11.90.70.8139.813.81.2-9.10.9159.572-40」116.320.414.80.9121.7-7.81.6\n附表3我国各地区2005年社会发展数据region人均GDP新增固定资产城镇可支配收入农村纯收入高校数卫生机构beijing33825731703.617652.957346.26774818tianjing2678270719.212638.555579.87422472hebei11133091982.49107.093481.648618046shanxi941129.8888.78913.912890.66599430mongolia12332221622.69136.792988.87337629liaoniang143343923679107.553690.217614925jilin1006996985.78690.623263.99448755heilongjiang10899901098.18272.513221.27628326shanghai38895862195.118645.038247.77582526jiangshu18500774956.712318.575276.2911415324zhejiang20726552715.616293.776659.956812555anhui663517.2108138470.682640.96819197fujian1403851867.512321.314450.36537934jiangxi710874.2977.88619.663128.896710669shangdong15126413791.110744.793930.559916323henan852712.92362.68667.972870.588314554hubei862653.21561.78785.943099.2859459hunan777600.61197.39523.973117.749315008guangdong18378353668.814769.944690.49102I63I8guangxi660749.97929286.72494.67519416hainan816206.1135.98123.943004.03152464Chongqing829039.81146.610243.462809.32356380sichuan679397.31325.98385.962802.786823832guizhou400850.7445.68151.131876.96346571yunnan589585.7767.59265.92041.794410110tibet685130.6167.69431.182077.941378shaaxi746463.31026.98272.022052.637211701gansu563244.8412.38086.821979.883311849qinghai755629.1163.58057.852151.46111478ningxia768265.5229.48093.642508.89131463xinjiiang979700.2823.87990.152482.15308087查看更多