如何从statsmodels 线性回归WLS回归2D参数测试得到的prediction

点击联系发帖人 时间：2017-07-24 04:31

statsmodels 安装

R: A package for personality, psychometric, and psychological...
00.psych {psych}R Documentation
A package for personality, psychometric, and psychological research
Description
Overview of the psych package.
The psych package has been developed at Northwestern University to include functions most useful for personality and psychological research.
Some of the functions (e.g., , ,
) are useful for basic data entry and descriptive analyses. Use help(package=&psych&) for a list of all functions.
Two vignettes are included as part of the package.
The overview provides examples of using psych in many applications.
Psychometric applications include routines ( for principal axes (), minimum residual (minres: ), and weighted least squares (
factor analysis
as well as functions to do Schmid Leiman transformations () to transform a hierarchical factor structure into a bifactor solution. Factor or components transformations to a target matrix include the standard Promax transformation (), a transformation to a cluster target, or to any simple target matrix () as well as the ability to call many of the GPArotation functions. Functions for determining the number of factors in a data matrix include Very Simple Structure () and Minimum Average Partial correlation (). An alternative approach to factor analysis is Item Cluster Analysis ().
Reliability coefficients alpha (, ), beta () and McDonald's omega ( and
) as well as Guttman's six estimates of internal consistency reliability () and the six measures of Intraclass correlation coefficients () discussed by Shrout and Fleiss are also available.
functions may be used to form single or multiple scales from sets of dichotomous, multilevel, or multiple choice items by specifying scoring keys.
Additional functions make for more convenient descriptions of item characteristics.
Functions under development include 1 and 2 parameter Item Response measures.
functions are used to find 2 parameter descriptions of item functioning.
A number of procedures have been developed as part of the Synthetic Aperture Personality Assessment (SAPA) project.
These routines facilitate forming and analyzing composite scales equivalent to using the raw data but doing so by adding within and between cluster/scale item correlations. These functions include extracting clusters from factor loading matrices (), synthetically forming clusters from correlation matrices (), and finding multiple
(() and partial (() correlations from correlation matrices.
Functions to generate simulated data with particular structures include
(for circumplex structures),
(for general structures) and
(for a specific demonstration of congeneric measurement).
The functions
can be used to create data sets with particular structural properties. A more general form for all of these is
for generating general structural models.
These are discussed in more detail in the vignette (psych_for_sem).
Functions to apply various standard statistical tests include
and its variants for testing the probability of replication,
for the confidence intervals of a correlation, and
to test single, paired, or sets of correlations.
In order to study diurnal or circadian variations in mood, it is helpful to use circular statistics.
Functions to find the circular mean (), circular (phasic) correlations () and the correlation between linear variables and circular variables () supplement a function to find the best fitting phase angle () for measures taken with a fixed period (e.g., 24 hours).
The most recent development version of the package is always available for download as a source file from the repository at
Two vignettes (overview.pdf) and psych_for_sem.pdf) are useful introductions to the package. They may be found as vignettes in R or may be downloaded from
The psych package was originally a combination of multiple source files maintained at the
repository: &useful.r&, VSS.r., ICLUST.r, omega.r, etc.&useful.r& is a set of routines for easy data entry (),
simple descriptive statistics (), and splom plots combined with correlations (, adapted from the help files of pairs).
Those files have now been replaced with a single package.
routines allow for testing the number of factors (), showing plots () of goodness of fit, and basic routines for estimating the number of factors/components to extract by using the 's procedure, the examining the scree plot () or comparing with the scree of an equivalent matrix of random numbers ().
In addition, there are routines for hierarchical factor analysis using Schmid Leiman tranformations (, ) as well as Item Cluster analysis (, ).
The more important functions in the package are for the analysis of multivariate data, with an emphasis upon those functions useful in scale construction of item composites.
When given a set of items from a personality inventory, one goal is to combine these into higher level item composites. This leads to several questions:
1) What are the basic properties of the data?
reports basic summary statistics (mean, sd, median, mad, range,
minimum, maximum, skew, kurtosis, standard error) for vectors, columns of matrices, or data.frames.
provides descriptive statistics, organized by one or more grouping variables.
shows scatter plot matrices (SPLOMs) as well as histograms and the Pearson correlation for scales or items.
will plot variable means with associated confidence intervals.
will plot confidence intervals for both the x and y coordinates.
will find the significance values for a matrix of correlations.
2) What is the most appropriate number of item composites to form? After finding
either standard Pearson correlations, or finding tetrachoric or polychoric correlations using a wrapper () for John Fox's hetcor function, the dimensionality of the correlation matrix may be examined. The number of factors/components problem is a standard question of factor analysis, cluster analysis, or principal components analysis. Unfortunately, there is no agreed upon answer. The Very Simple Structure () set of procedures has been proposed as on answer to the question of the optimal number of factors.
Other procedures (,
also address this question.
3) What are the best composites to form?
Although this may be answered using principal components
(), principal axis () or minimum residual ()
factor analysis (all part of the
function) and to show the results graphically (), it is sometimes more useful to address this question using cluster analytic techniques. (Some would argue that better yet is to use maximum likelihood factor analysis using
from the stats package.) Previous versions of
(e.g., Revelle, 1979)
have been shown to be particularly successful at forming maximally consistent and independent item composites.
Graphical output from
uses the Graphviz dot language and allows one to write files suitable for Graphviz.
If Rgraphviz is available, these graphs can be done in R.
Graphical organizations of cluster and factor analysis output can be done using
which plots items by cluster/factor loadings and assigns items to that dimension with the highest loading.
4) How well does a particular item composite reflect a single construct?
This is a question of reliability and general factor saturation.
Multiple solutions for this problem result in (Cronbach's) alpha (, ), (Revelle's) Beta (), and (McDonald's)
(both omega hierarchical and omega total). Additional reliability estimates may be found in the
This can also be examined by applying
Item Response Theory techniques using factor analysis of the
correlation matrices and converting the results into the standard two parameter parameterization of item difficulty and item discrimination.
Information functions for the items suggest where they are most effective.
5) For some applications, data matrices are synthetically combined from sampling different items for different people.
So called Synthetic Aperture Personality Assessement (SAPA) techniques allow the formation of large correlation or covariance matrices even though no one person has taken all of the items. To analyze such data sets, it is easy to form item composites based upon the covariance matrix of the items, rather than original data set.
These matrices may then be analyzed using a number of functions (e.g., ,
, , , , and .
6) More typically, one has a raw data set to analyze.
will report several reliablity estimates as well as item-whole correlations for items forming a single scale,
will score data sets on multiple scales, reporting the scale scores, item-scale and scale-scale correlations, as well as coefficient alpha,
alpha-1 and G6+. Using a &keys& matrix (created by
or by hand), scales can have overlapping or independent items.
scores multiple choice items or converts multiple choice items to dichtomous (0/1) format for other functions.
An additional set of functions generate simulated data to meet certain structural properties.
produces data simulating a 3 way analysis of variance (ANOVA) or linear model with or with out repeated measures.
creates simple structure data,
will produce circumplex structured data,
produces circumplex or simple structured data for dichotomous items.
These item structures are useful for understanding the effects of skew, differential item endorsement on factor and cluster analytic soutions.
will produce correlation matrices and data matrices to match general structural models. (See the vignette).
When examining personality items, some people like to discuss them as representing items in a two dimensional space with a circumplex structure.
Tests of circumplex fit
have been developed.
When representing items in a circumplex, it is convenient to view them in
coordinates.
Additional functions for testing the difference between two independent or dependent correlation , to find the
coefficients from a two by table, or to find the confidence interval of a correlation coefficient.
Ten data sets are included:
represents 25 personality items thought to represent five factors of personality,
has 14 multiple choice iq items.
has data on self reported test scores by age and gender.
Galton's data set of the heights of parents and their children.
recreates the original Galton data set of the genetics of sweet peas.
provide even more Galton data,
provides the Guilford preference matrix of vegetables.
provides airline miles between 11 US cities (demo data for multidimensional scaling).
GPL version 2 or newer
A package for personality, psychometric, and psychological research.
Useful data entry and descriptive statistics
shortcut for reading from the clipboard
shortcut for reading comma delimited files from clipboard
shortcut for reading lower triangular matrices from the clipboard
shortcut for reading upper triangular matrices from the clipboard
Basic descriptive statistics useful for psychometrics
Find summary statistics by groups
combines the head and tail functions for showing data sets
SPLOM and correlations for a data matrix
Correlations, sample sizes, and p values
for a data matrix
graphically show the size of correlations in a correlation matrix
Histograms and densities of multiple variables arranged in matrix form
Calculate skew for a vector, each column of a matrix, or data.frame
Calculate kurtosis for a vector, each column of a matrix or dataframe
Find the geometric mean of a vector or columns of a data.frame
Find the harmonic mean of a vector or columns of a data.frame
Plot means and error bars
Plot means and error bars for separate groups
Two way error bars
Find the interpolated median, quartiles, or general quantiles.
Rescale data to specified mean and standard deviation
Convert a two dimensional table of counts to a matrix or data frame
Data reduction through cluster and factor analysis
Combined function for principal axis, minimum residual,
weighted least squares, and maximum likelihood factor analysis
Do a principal Axis factor analysis
(deprecated)
Do a minimum residual factor analysis (deprecated)
Do a weighted least squares factor analysis (deprecated)
Show the results of a factor analysis or principal components analysis graphically
Show the results of a factor analysis without using Rgraphviz
Sort a factor or principal components output
Apply the Dwyer extension for factor loadingss
Do an eigen value decomposition to find the principal components of a matrix
Scree test and Parallel analysis
Scree test and Parallel analysis for polychoric matrices
Estimate factor scores given a data matrix and factor loadings
8 different measures of reliability (6 from Guttman (1945)
Apply factor analysis to dichotomous items to get IRT parameters
Apply the ICLUST algorithm
Graph the output from ICLUST using the dot language
Graph the output from ICLUST using rgraphviz
Apply kaiser normalization before rotating
Find the polychoric correlations for items
and find item thresholds (uses J. Fox's polycor)
Find the polychoric correlations for items (uses J. Fox's hetcor)
Calculate the omega estimate of factor saturation (requires the GPArotation package)
Draw a hierarchical or Schmid Leiman orthogonalized solution (uses Rgraphviz)
Partial variables from a correlation matrix
Predict factor/component scores for new data
Apply the Schmid Leiman transformation to a correlation matrix
Combine items into multiple scales and find alpha
Combine items into multiple scales and find alpha and basic scale statistics
Find Cohen's set correlation between two sets of variables
Find the Squared Multiple Correlation (used for initial communality estimates)
Find tetrachoric correlations and item thresholds
Find polyserial and biserial correlations for item validity studies
Form a correlation matrix from continuous, polytomous, and dichotomous items
Apply the Very Simple Structure criterion to determine the appropriate number of factors.
Do a parallel analysis to determine the number of factors for a random matrix
Plot VSS output
Show the scree plot of the factor/principal components
Apply the Velicer Minimum Absolute Partial criterion for number of factors
Functions for reliability analysis (some are listed above as well).
Find coefficient alpha and Guttman Lambda 6 for a scale (see also )
8 different measures of reliability (6 from Guttman (1945)
Calculate the omega estimates of reliability (requires the GPArotation package)
Calculate the omega estimates of reliability
using a Confirmatory model (requires the sem package)
Intraclass correlation coefficients
Combine items into multiple scales and find alpha
The greates lower bound found by an algebraic solution (requires Rcsdp).
Written by
Andreas Moeltner
Procedures particularly useful for Synthetic Aperture Personality Assessment
Find coefficient alpha and Guttman Lambda 6 for a scale (see also )
Create the keys file for score.items or cluster.cor
Correct a correlation matrix for unreliability
Count the number of complete cases when doing pair wise correlations
find correlations of composite variables from larger matrix
find correlations of items with
composite variables from a larger matrix
Find the loadings when doing an eigen value decomposition
Do a minimal residual or principal axis factor analysis and estimate factor scores
Extend a factor analysis to a set of new variables
Do a Principal Axis factor analysis and estimate factor scores
extract cluster definitions from factor loadings
Factor congruence coefficient
How well does a factor model fit a correlation matrix
Reproduce a correlation matrix based upon the factor model
Fit = data - model
``hand rotate" factors
8 different measures of reliability
standardized multiple regression from raw or correlation matrix input
polyserial and biserial correlations with massive missing data
Find tetrachoric correlations and item thresholds
Functions for generating simulated data sets
The basic simulation functions
Generate 3 independent variables and 1 or more dependent variables for demonstrating ANOVA and lm designs
Generate a two dimensional circumplex item structure
Generate a two dimensional simple structrue with particular item characteristics
Generate a one factor congeneric reliability structure
Simulate nfact major and nvar/2 minor factors
Generate a multifactorial structural model
Generate data for a 1, 2, 3 or 4 parameter logistic model
Generate simulated data for the factor model
Create artificial data matrices for teaching purposes
Generate simulated correlation matrices with hierarchical or any structure
Graphical functions (require Rgraphviz) & deprecated
Draw a sem or regression graph
Draw the factor structure from a factor or principal components analysis
Draw the factor structure from an omega analysis (either with or without the Schmid Leiman transformation)
Draw the tree diagram from ICLUST
Graphical functions that do not require Rgraphviz
A general set of diagram functions.
Draw a sem or regression graph
Draw the factor structure from a factor or principal components analysis
Draw the factor structure from an omega analysis (either with or without the Schmid Leiman transformation)
Draw the tree diagram from ICLUST
A call to plot various types of output (e.g. from irt.fa, fa, omega, iclust
A heat map display of correlations
Spider and radar plots (circular displays of correlations)
Circular statistics (for circadian data analysis)
Find the correlation with e.g., mood and time of day
Correlate a circular value with a linear value
Find the circular mean of each column of a a data set
Find the best fitting phase angle for a circular data set
Miscellaneous functions
Convert base rate and comorbity to phi, Yule and tetrachoric
Convert a data.frame or matrix to a LaTeX table
Convert categorical data to dummy codes
Apply the Fisher r to z transform
Apply the Fisher z to r transform
Intraclass correlation coefficients
Test for equality of two matrices (see also cortest.normal, cortest.jennrich )
Test whether a matrix is an identity matrix
Test for the difference of two paired or two independent correlations
Confidence intervals for correlation coefficients
Test of significance of r, differences between rs.
The probability of replication given a p, r, t, or F
Find the phi coefficient of correlation from a 2 x 2 table
Demonstrate the problem of phi coefficients with varying cut points
Given a phi coefficient, what is the polychoric correlation
Given a phi coefficient, what is the polychoric correlation (works on matrices)
Convert 2 dimensional factor loadings to polar coordinates.
Use John Fox's hetcor to create a matrix of correlations from a data.frame or matrix of integer values
Use John Fox's polycor to create a matrix of polychoric
correlations from
a matrix of Yule correlations
Compares alternative scaling solutions and gives goodness of fits
Basic data cleaning
Finds tetrachoric correlations
Thurstone Case V scaling
Find the trace of a square matrix
weighted and unweighted versions of Cohen's kappa
Find the Yule Q coefficient of correlation
What is the two by two table that produces a Yule Q with set marginals?
What is the phi coefficient corresponding to a Yule Q with set marginals?
Convert a matrix of Yule coefficients to a matrix of phi coefficients.
Convert a matrix of Yule coefficients to a matrix of polychoric
coefficients.
Functions that are under development and not recommended for casual use
IRT estimate of item difficulty with assumption that theta = 0
Item Response Theory estimates of theta (ability) using a Rasch like model
Data sets included in the psych package
represents 25 personality items thought to represent five factors of personality
8 different data sets with a bifactor structure
The airline distances between 11 cities (used to demonstrate MDS)
13 personality scales
14 multiple choice iq items
75 mood items
Self reported ACT and SAT Verbal and Quantitative scores by age and gender
Correlation matrix from Tucker
Galton's data set of the heights of parents and their children
Galton's data set of the relationship between height and forearm (cubit) length
Galton's data table of height and forearm length
Galton`s data set of the diameters of 700 parent and offspring sweet peas
Guilford`s preference matrix of vegetables (used for thurstone)
A debugging function that may also be used as a demonstration of psych.
Run a test of the major functions on 5 different data sets.
Primarily for development purposes. Although the output can be used as a demo of the various functions.
Development versions (source code) of this package are maintained at the
repository
along with further documentation.
Specify that you are downloading a source package.
Some functions require other packages. Specifically, omega and schmid require the GPArotation package, and poly.mat, phi2poly and polychor.matrix requires John Fox's polychor package. ICLUST.rgraph and fa.graph require Rgraphviz but have alternatives using the diagram functions.
GPArotation
GPArotation
William Revelle
Department of Psychology
Northwestern University
Evanston, Illiniois
Maintainer: William Revelle &revelle@northwestern.edu&
References
A general guide to personality theory and research may be found at the personality-project . See also the short guide to R at .
In addition, see
Revelle, W. (in preparation) An Introduction to Psychometric Theory with applications in R. Springer. at
#See the separate man pages
test.psych()
[Package psych version 1.2.4 ]}

久游无息网