The purpose of this post is to show you how to express this model in OpenMx.
The data being modeled are in an object called ratings (see below for the code to create this object as described here.
Here’s Joe’s example structural equation model, as implemented in lavaan:
 require(lavaan)
 bifactor <- "
 general.factor =~ Easy_Reservation + Preferred_Seats
  + Flight_Options + Ticket_Prices
 + Seat_Comfort + Seat_Roominess
 + Overhead_Storage + Clean_Aircraft
 + Courtesy + Friendliness + Helpfulness + Service
 
 ticketing =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices
 aircraft  =~ Seat_Comfort + Seat_Roominess + Overhead_Storage + Clean_Aircraft
 service   =~ Courtesy + Friendliness + Helpfulness + Service"
 
 modelfit <- cfa(bifactor, data=ratings[,1:12], orthogonal=T)
 summary(modelfit, fit.measures = TRUE, standardize = T)

 bifactor <- "
 general =~ Easy_Reservation + Preferred_Seats 
  + Flight_Options + Ticket_Prices
 + Seat_Comfort + Seat_Roominess + Overhead_Storage 
 + Clean_Aircraft + Courtesy + Friendliness
 + Helpfulness + Service
 ticketing =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices
 aircraft  =~ Seat_Comfort + Seat_Roominess + Overhead_Storage + Clean_Aircraft
 service   =~ Courtesy + Friendliness + Helpfulness + Service
 Satisfaction ~ general + ticketing + aircraft + service
 "
 satfit <- sem(bifactor, data=ratings[,c(1:12,13)], orthogonal=T)
 summary(satfit, fit.measures = T, standardize = T)
 inspect(satfit, "rsquare")
And now the same model in OpenMx
 library(OpenMx)
 # make up some helpful lists of items we will need
 serviceVars     = c("Courtesy", "Friendliness", "Helpfulness", "Service")
 aircraftVars    = c("Seat_Comfort", "Seat_Roominess", "Overhead_Storage", "Clean_Aircraft")
 ticketingVars   = c("Easy_Reservation", "Preferred_Seats", "Flight_Options", "Ticket_Prices")
 allMeasuredVars = c(serviceVars, aircraftVars, ticketingVars)
 allLatentVars   = c("general_factor", "ticketing", "aircraft","service")

 bifactor <- mxModel("bifactor", type="RAM",
 manifestVars = allMeasuredVars,
 latentVars   = allLatentVars,
 mxPath(from = allLatentVars  , arrows = 2, free = F, values = 1), # latents fixed@1
 mxPath(from = allMeasuredVars, arrows = 2), # manifest residuals
 # Factor loadings
 mxPath(from = "general_factor", to = allMeasuredVars),
 mxPath(from = "ticketing"     , to = ticketingVars),
 mxPath(from = "aircraft"      , to = aircraftVars),
 mxPath(from = "service"       , to = serviceVars),
 mxData(cov(ratings[,allMeasuredVars]), type = "cov", numObs = nrow(ratings))
 )
 bifactor= mxRun(bifactor)
 tmp = summary(bifactor)
If you print the summary (just type tmp now as we stored it there), you will see a fullsome summary, with standardized and unstandardized paths, std errors, and a list of fit indices.
Here are the fit indices formatted nicely: χ2(42) = 48.57, p = 0.2253; CFI = 0.999; TLI = 0.999; RMSEA = 0.013
Because this is R, we can also format these nicely. Let’s pull out the standardized paths, round them off
pathLoadings = tmp$parameters[,c("row", "col", "Std.Estimate")]

pathLoadings$Std.Estimate = round(pathLoadings$Std.Estimate,2)

pathLoadings


                row              col Std.Estimate
1          Courtesy   general_factor         0.78
2      Friendliness   general_factor         0.76
3       Helpfulness   general_factor         0.84
4           Service   general_factor         0.81
5      Seat_Comfort   general_factor         0.79
6    Seat_Roominess   general_factor         0.72
7  Overhead_Storage   general_factor         0.70
8    Clean_Aircraft   general_factor         0.74
9  Easy_Reservation   general_factor         0.75
10  Preferred_Seats   general_factor         0.70
11   Flight_Options   general_factor         0.62
12    Ticket_Prices   general_factor         0.64
13 Easy_Reservation        ticketing         0.31
14  Preferred_Seats        ticketing         0.40
15   Flight_Options        ticketing         0.39
16    Ticket_Prices        ticketing         0.44
17     Seat_Comfort         aircraft         0.30
18   Seat_Roominess         aircraft         0.37
19 Overhead_Storage         aircraft         0.44
20   Clean_Aircraft         aircraft         0.34
21         Courtesy          service         0.27
22     Friendliness          service         0.44
23      Helpfulness          service         0.27
24          Service          service         0.34
25         Courtesy         Courtesy         0.33
26     Friendliness     Friendliness         0.23
27      Helpfulness      Helpfulness         0.23
28          Service          Service         0.23
29     Seat_Comfort     Seat_Comfort         0.28
30   Seat_Roominess   Seat_Roominess         0.34
31 Overhead_Storage Overhead_Storage         0.32
32   Clean_Aircraft   Clean_Aircraft         0.34
33 Easy_Reservation Easy_Reservation         0.34
34  Preferred_Seats  Preferred_Seats         0.36
35   Flight_Options   Flight_Options         0.47
36    Ticket_Prices    Ticket_Prices         0.39
> 
In my next post, I’ll show you how to standardize a model and then format the output more like you might expect for an EFA.
’til then, happy OpenMx-ing! tim
PS: If you want to learn more about fit indices, I recommend this Kenny link.

You probably want a figure as well, no? I describe doing that here.

R Code to Generate the Simulated Data and Run All Analyses

To generate the ratings data, execute this code below:
   
# First, we create a matrix of factor loadings.
# This pattern is called bifactor because it has a
# general factor with loadings from all the items
# and specific factors for separate components.
# The outcome variables are also formed as
# combinations of these general and specific factors.

loadings <- matrix(c(
 .33, .58, .00, .00,  # Ease of Making Reservation
 .35, .55, .00, .00,  # Availability of Preferred Seats
 .30, .52, .00, .00,  # Variety of Flight Options
 .40, .50, .00, .00,  # Ticket Prices
 .50, .00, .55, .00,  # Seat Comfort
 .41, .00, .51, .00,  # Roominess of Seat Area
 .45, .00, .57, .00,  # Availability of Overhead Storage
 .32, .00, .54, .00,  # Cleanliness of Aircraft
 .35, .00, .00, .50,  # Courtesy
 .38, .00, .00, .57,  # Friendliness
 .60, .00, .00, .50,  # Helpfulness
 .52, .00, .00, .58,  # Service
 .43, .10, .20, .30,  # Overall Satisfaction
 .35, .50, .40, .20,  # Purchase Intention
 .25, .50, .50, .00), # Willingness to Recommend
 nrow = 15, ncol = 4, byrow = TRUE
)

# Matrix multiplication produces the correlation matrix,
# except for the diagonal.
cor_matrix <- loadings %*% t(loadings)
# Diagonal set to ones.
diag(cor_matrix) <- 1
# cor_matrix = cov2cor(cor_matrix)
require(mvtnorm)
N = 1000
set.seed(7654321) #needed in order to reproduce the same data each time
std_ratings <- as.data.frame(rmvnorm(N, sigma = cor_matrix))

# Creates a mixture of two data sets:
# first 50 observations assigned uniformly lower scores.
ratings           <- data.frame(matrix(rep(0,15000),nrow=1000))
ratings[1:50,]    <- std_ratings[1:50,] * 2
ratings[51:1000,] <- std_ratings[51:1000,] * 2+7.0

# Ratings given different means
ratings[1] <- ratings[1]+2.2
ratings[2] <- ratings[2]+0.6
ratings[3] <- ratings[3]+0.3
ratings[4] <- ratings[4]+0.0
ratings[5] <- ratings[5]+1.5
ratings[6] <- ratings[6]+1.0
ratings[7] <- ratings[7]+0.5
ratings[8] <- ratings[8]+1.5
ratings[9] <- ratings[9]+2.4
ratings[10]<- ratings[10]+2.2
ratings[11]<- ratings[11]+2.1
ratings[12]<- ratings[12]+2.0
ratings[13]<- ratings[13]+1.5
ratings[14]<- ratings[14]+1.0
ratings[15]<- ratings[15]+0.5

# Truncates Scale to be between 1 and 9
ratings[ratings>9]<-9
ratings[ratings<1]<-1
# Rounds to single digit.
ratings<-round(ratings,0)

# Assigns names to the variables in the data frame called ratings
names(ratings) = c("Easy_Reservation","Preferred_Seats","Flight_Options","Ticket_Prices","Seat_Comfort","Seat_Roominess","Overhead_Storage","Clean_Aircraft","Courtesy","Friendliness","Helpfulness","Service","Satisfaction","Fly_Again","Recommend")
Now you have an object called ratings to work with.
0

Add a comment

  1. The power of science is in part the discipline that we reference claims: we make a promise to users as to the veracity of our statements, and give them a pointer to the evidence.

    Much of our communication is via Word (the app not the spoken) and Powerpoint. Solutions have emerged for Word which  allow embedding references, for instance, from Endnote. Sadly, this doesn't work cross platform with Mac Pages, and Word has invented its own referencing system which is not open. For Word, my big wish would be that MS builds in an open-cross platform plug-in architecture, for any app that can read and write .docx files. But at least we have support for telling people what facts support the argument we're making.

    Not so Powerpoint nor it's competitor Keynote. These two applications, despite encouraging people to make bullet point fact-claims, lack referencing capability entirely! And yet these are used to persuade: exactly where audiences need references, and exactly where we should be training students (and the world) to demand references. It's shocking to see a lecturer's slides that blithely claim some fact about the mind or society, and give not reference. Or the Cheshire-cat of references: A citation but no bibliography to turn that code into a look-up able reference.

    Just look at how Wikipedia has improved since it required verifiability. All our office tools should support citations, bibliography, and doi-style links.

    Given Apple's university strength, it's particularly depressing and inexplicable that Keynote is hopeless at referencing. What's missing is something like Endnote does in Word, but for powerpoint format files.

    So, our new program reads and writes Open XML .pptx files, just like now.

    However, as an editor, it is designed from the ground up to generate referenced statements: It should make it trivial to add references: automatically use the context to help you search for online references, wikipedia disambiguation, pubmed, doi.

    Working simple identifiers like doi, it would support formatting these inline. It should trivially formatted these into recognised styles. It should automatically maintain a bibliography in footnotes and/or a bibliography slide and handout pages, with clickable links for readers verifying claims online.

    It should allow you to edit slide text as markdown, so the information is exportable as readable text.

    POW would support animations, but unlike Keynote, it would export these outside the mac environment in formats like dot, svg, and perhaps D3. It should be tightly linked to R.

    It's cool, so it should use the cool file extension ".pow!" Perhaps the new app is called "Pow!" Reading and writing pptx. I think it would sell quite a few copies.




    0

    Add a comment


  2. Possibly great news from http://highlight.r-enthusiasts.com regarding a practical highlighting solution for R in blogger… Seems you can't switch to compose mode though...?
    # example code from ?iris
    dni3 <- dimnames(iris3)
    ii <- data.frame(
     matrix(aperm(iris3, c(1,3,2)), ncol=4, dimnames = list(NULL, sub(" L.",".Length", sub(" W.",".Width", dni3[[2]])))),
        Species = gl(3, 50, labels=sub("S", "s", sub("V", "v", dni3[[3]])))
    )
    all.equal(ii, iris) # TRUE
    
    0

    Add a comment


  3. In structural equation modelling, we are typically proposing theoretical causes of observed phenomena. These are termed "latent" (the unobserved causes) and manifest (the observed variables we measure, otherwise known as data).

    Importantly, the theoretical causes of behavior need not have a structure remotely resembling the correlations observed in the data. You might have hundreds of columns of correlated measures, and they might be modelled well by a single latent trait. You might have 30 measured traits, but make the testable prediction that they are best explained by just five uncorrelated latent causes.

    The case for this posting is a bit unusual: Can two observed variables predicted to share the same latent causes (i.e., that the causes of one observed variable also cause the other) and yet see zero correlation between them at the observed level...

    This recently came up in a review of some work we've been doing, and the answer is "yes: you can". I thought I'd write down how, using R and some simulations to make this more concrete.

    In our example, we theorised that religiosity has its roots in two more general, biological systems:  subserving community integration, and existential uncertainty. We found support for this model, but it lead to an apparently paradoxical conclusion: the observed manifestation of one of our purported (partial) causes of religiosity, namely existential uncertainty, showed almost no relationship to religiosity. How could these (or any two) measures be completely independent if they share a common cause?  The answer lies in countervailing effects: Each of the manifests is the sum of its influences, and, under not-so-uncommon-as-you-might-think instances, these can cancel out.


    Let's consider the example of three measures of vehicles (a very R-friendly example, given the ubiquity of mtcars :-)
    Measure the horsepower, fuel-efficiency, and the cost

    Let's also posit a theoretical model: First that the more cylinders a car has, the more horsepower it can generate, the worse its mpg will be, and the more it will cost to build. Second, that streamlining increases fuel efficiency, but also increases the cost to build, which is reflected in the cost to buy.

    Assume you've gone and collected a large representative set of data, called myCovData.

    In OpenMx, we can build this model as:

    library("OpenMx")


    # I use some helper functions: thanks to Hans for the code to readily import them from Github, by writing a source function that handles https... why doesn't R do this out of the box?

    url <- https:="" master="" p="" raw.github.com="" tbates="" umx.lib.r="" umx="">source_https <- function="" p="" u="" unlink.tmp.certs="F)"> # read script lines from website using a security certificate
    require(RCurl)
    if(!file.exists("cacert.pem")){
    download.file(url = "http://curl.haxx.se/ca/cacert.pem", destfile = "cacert.pem")
    }
    script <- cainfo="cacert.pem" followlocation="T," p="" rcurl::geturl="" u=""> if(unlink.tmp.certs) unlink("cacert.pem")
    # parse lines and evaluate in the global environement
    eval(parse(text = script), envir = .GlobalEnv)
    }
    source_https(url) # Using unlink.tmp.certs = T will delete the security certificates text file that source_https downloads

    manifests = c("HP", "MPG", "COST")


    latents   = c("Cylinders", "Streamlining")

    m1 <- color="#c2c77b" font="" mxmodel="">"m1", type="RAM",
        manifestVars = manifests,
        latentVars   = latents,
        # Factor loadings
        mxPath(from = "Cylinders"   , to = manifests),
        mxPath(from = "Streamlining", to = c("MPG","COST")),
        mxPath(from = manifests, arrows = 2), # manifest residuals 
        mxPath(from = latents, arrows = 2, free = F, values = 1), #fix latents@1
        mxData(myCovData, type = "cov", numObs = 1000)
    )

    m1 = mxRun(m1)

    summary(m1)

    You can visualise this model (here shown with some estimated parameters):


    umxGraph_RAM(m1std=Tprecision=2dotFilename="name") # helper function


    Three manifest (measured) traits of vehicles, modelled as resulting from two (unmeasured) latent traits: The number of cylinders in the engine, and the aerodynamic "slipperiness" of the body. Despite Mpg and Cost sharing the same causes, the manifest correlation between them is zero.

    And there's the conundrum: we can set the correlation between Mpg and Cost to zero without loss of fit. But one can readily see how this emerges in the vehicle example: Big engines are expensive, and lower fuel efficiency (opposite effects on the two variables that we are most interested in) while streamlining induces a positive correlation between our two measures. So, one factor is inducing a positive relationship between Mpg and Cost, and a second is inducing an (equal in this case) negative correlation.  The net effects can vary widely, and a zero observed correlation tells us nothing about the presence or absence of shared mechanisms.

    We can approach the problem from the other end: To generate the observed data from the theory instead of testing the theory using an SEM.

    Here we make 1000 runs of a simulation where  we generate the latent causes, then construct a manifest world consisting of 5000 observations of our three manifests from these latent causes, and examine, empirically, the correlation we get on each of those 1000 simulations:

    # Uncorrelated manifest variables which share the same latent causes
    sim = 100; r = rep(NA,sim)
    for (i in 1:sim) {
        n = 5000
        cyl  = rnorm(n = n)
        drag = rnorm(n = n)
        hp   = (.3 * cyl) + .7 * rnorm(n = n)
        mpg  = (-.2 * cyl) + (.2 * drag) + .6 * rnorm(n = n)
        cost = ( .2 * cyl) + (.2 * drag) + .6 * rnorm(n = n)
        r[i] = cor(mpg,cost)
    }
    myCovData = cov(data.frame(HP=hp, MPG= mpg, COST=cost))
    hist(r, breaks=40)
    text(.02, 50, paste("mean r =",prettyNum(mean(r),digits=2)),cex = .8)


    Graphically you can see the observed correlations cluster close to zero:


    Correlation (r) of Miles per Gallon (mpg) and Cost of a Car.



    Of course "cylinders" is a terrible "theory" of horsepower. What we really need is a mechanism: Cylinders increase horsepower only because horsepower is proportional to quantity of fuel burned  at a given efficiency, and one way to increase the quantity of fuel burned is to increase effective cubic capacity, either with a larger bore in each cylinder, more cylinders, or increased density of the fuel mixture (turbo and super charging). You can't just put "cylinders" in the boot of a car and go faster. Same for the other paths...


    5

    View comments

  4. A Cholesky or lower-triangular decomposition is often a great way to ensure (not guarantee) a semi-positive definite solution in many linear algebra problems. In OpenMx, we have a several types of mxMatrices: Full, Standardized, etc, and the lower matrix: just suited for this purpose In stock R, matrices are all Full.

    This lower triangle Cholesky lead a colleague to pull the results from a trivariate Cholesky, which was this
    0.71


    -0.28
    0.61

    0.15
    -0.38
    .1×10-6

    Then fill in the above-diagonal cells from the lower (makes sense right?), followed by applying the pre- and post-multiply trick to get a correlation matrix corresponding to the Cholesky. Yielding the rather improbable matrix:
    1
    -0.43
    178.02
    -0.43
    1
    -486.54
    178.02
    -486.54
    1




    Why doesn't this work? If a Cholesky Decomposition is just another way of solving the problem at hand, we must be able to get the correlations. So how?

    The answer is that in the CD, we have decomposed the relationships of the variables into a lower-triangle of factor loadings. To see the variance again, we need to re-compose that matrix into a variance-covariance matrix.

    So filling in the upper triangle is not our solution, instead we need to multiply the lower matrix by its transpose.

    Dropping into a gist to format the R nicely we have:
    Loading ....

    Bazinga: Our variables are genetically correlated... now, the Gödel hard part: A theory for why...
    0

    Add a comment


  5. Often in SEM scripts you will see matrices being pre- and post-multiplied by some other matrix. For instance, this figures in scripts computing the genetic correlation between variables. How does pre- and post-multiplying a variance/covariance matrix give us a correlation matrix? And what is it that we are multiplying this matrix by?
    In general, a covariance matrix can be converted to a correlation matrix by pre- and post-multiplying by a diagonal matrix with 1/SD for each variable on the diagonal.
    In R, matrix inversion (usually signified by A -1) is done using the solve() function.
    For the diagonal case, the inverse of a matrix is simply 1/x in each cell.

    Example with variance matrix A

     A = matrix(nrow = 3, byrow = T,c(
       1,0,0,
       0,2,0,
      0,0,3)
     ); 
    
     solve(A)
    
          [,1] [,2]  [,3]
     [1,]    1  0.0  0.00
     [2,]    0  0.5  0.00
     [3,]    0  0.0  0.33
    
    A number times its inverse = 1. For Matrices solve(A) %*% A = Identity Matrix
    solve(A) %*% A #  = I: The Standardized diagonal matrix
         [,1] [,2] [,3]
    [1,]    1    0    0
    [2,]    0    1    0
    [3,]    0    0    1
    

    An example with values (covariances) in the off-diagonals

    A = matrix(nrow = 3, byrow = T, c(
        1, .5, .9,
      .5,  2, .4,
     .9, .4,  4)
    );
    
    I = matrix(nrow = 3, byrow = T, c(
      1,  0, 0,
      0,  1, 0,
     0,  0, 1)
    ); 
    
    varianceA = I * A # zero the off-diagonal (regular, NOT matrix multiplication)
    sdMatrix  = sqrt(varianceA) # element sqrt to get SDs on diagonal: SD=sqrt(var)
    invSD     = solve(sdMatrix) # 1/SD = inverse of sdMatrix
    
    invSD
         [,1] [,2] [,3]
    [1,]    1 0.00  0.0
    [2,]    0 0.71  0.0
    [3,]    0 0.00  0.5
    
    Any number times its inverse = 1, so this sweeps covariances into correlations
    corr = invSD %*% A %*% invSD # pre- and post- multiply by 1/SD
    
         [,1] [,2] [,3]
    [1,] 1.00 0.35 0.45
    [2,] 0.35 1.00 0.14
    [3,] 0.45 0.14 1.00
    

    Easy way of doing this in R

    Using diag to grab the diagonal and make a new one, and capitalising on the fact that inv(X) = 1/x for a diagonal matrix
    diag(1/sqrt(diag(A))) %&% A # The %&% is a shortcut to pre- and post-mul
         [,1] [,2] [,3]
    [1,] 1.00 0.35 0.45
    [2,] 0.35 1.00 0.14
    [3,] 0.45 0.14 1.00
    

    Even-easier built-in way

    cov2cor(A)
    
         [,1] [,2] [,3]
    [1,] 1.00 0.35 0.45
    [2,] 0.35 1.00 0.14
    [3,] 0.45 0.14 1.00
    Note: See also this followup post on getting a correlation matrix when you are starting with a lower-triangular Cholesky composition.
    0

    Add a comment


  6. In this post, I made an SEM model and showed the results in a table.
    It’s a great feature of SEM that you can sketch your ideas about how the world works, and being able to get such a sketch back out of OpenMx is very helpful.
    Importantly, a figure can help readers understand what you’ve done, and it is a great help to verify for yourself that you’ve drawn the model you think you have drawn.
    So… how do you get presentation-quality figures out of OpenMx?
    I’ll assume you’ve got a model already run, and it’s called model
    In out-of-the-box OpenMx, you can say
        omxGraphviz(model, dotFilename="cool.dot")

    Boom!

    As they say on 2-minute R: “Boom! You just made a figure!”
    Sure: but where is it? Well, its a text file called cool.dot sitting in the directory getwd() points to.
    To turn it into a graphic, we have to open it in an application that understands how to read .dot or “graphviz” syntax.
    First, download an application that can read .dot. On windows, that includes Vizio. On mac, it includesOmnigraffle.
    An application I like, especially early on in developing a model, is GraphViz. Believe it or not, this is yet another amazing product of Bell Laboratories (as if S (and therefore R), CUnixCosmic Background radiation, and the transistor to name just a few were not enough) !
    That done, this code should pop up the file in Graphviz for you (you might need to make Graphviz the default application for .dot (esp. if you have word on your system))
        system(paste("open ", getwd(), "/cool.dot", sep=""));
    
    The builtin function doesn’t open the file automatically, and, importantly, doesn’t include path values or allow you to standardize them.
    I wrote this function which does these things.
    It also defaults to creating a file based on the model@name
     umxGraph_RAM(model) # default call opens standardized model in editor, with 2 decimal places of accuracy.
     # some extra parameters
     umxGraph_RAM(model, std = T, dotFilename = "name",  pathLabels = "both", precision = 3)
     # set the dotFilename to NA to get the syntax back
     umxGraph_RAM(model, std = T, dotFilename = NA,  pathLabels = "both", precision = 3)
    
    You can grab it here
    0

    Add a comment

  7. R-bloggers provides a great service, aggregating a universe of blogs which contribute aRticles on R and using R (marked using an "R"-tag.

    This is a nice community service creating a one-stop shop for readers to learn about R, but also a great idea for adding value to blogs (like this one) which likely publish only a few articles a month at most on R.
    0

    Add a comment


  8. The purpose of this post is to show you how to express this model in OpenMx.
    The data being modeled are in an object called ratings (see below for the code to create this object as described here.
    Here’s Joe’s example structural equation model, as implemented in lavaan:
     require(lavaan)
     bifactor <- "
     general.factor =~ Easy_Reservation + Preferred_Seats
      + Flight_Options + Ticket_Prices
     + Seat_Comfort + Seat_Roominess
     + Overhead_Storage + Clean_Aircraft
     + Courtesy + Friendliness + Helpfulness + Service
     
     ticketing =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices
     aircraft  =~ Seat_Comfort + Seat_Roominess + Overhead_Storage + Clean_Aircraft
     service   =~ Courtesy + Friendliness + Helpfulness + Service"
     
     modelfit <- cfa(bifactor, data=ratings[,1:12], orthogonal=T)
     summary(modelfit, fit.measures = TRUE, standardize = T)
    
     bifactor <- "
     general =~ Easy_Reservation + Preferred_Seats 
      + Flight_Options + Ticket_Prices
     + Seat_Comfort + Seat_Roominess + Overhead_Storage 
     + Clean_Aircraft + Courtesy + Friendliness
     + Helpfulness + Service
     ticketing =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices
     aircraft  =~ Seat_Comfort + Seat_Roominess + Overhead_Storage + Clean_Aircraft
     service   =~ Courtesy + Friendliness + Helpfulness + Service
     Satisfaction ~ general + ticketing + aircraft + service
     "
     satfit <- sem(bifactor, data=ratings[,c(1:12,13)], orthogonal=T)
     summary(satfit, fit.measures = T, standardize = T)
     inspect(satfit, "rsquare")
    
    And now the same model in OpenMx
     library(OpenMx)
     # make up some helpful lists of items we will need
     serviceVars     = c("Courtesy", "Friendliness", "Helpfulness", "Service")
     aircraftVars    = c("Seat_Comfort", "Seat_Roominess", "Overhead_Storage", "Clean_Aircraft")
     ticketingVars   = c("Easy_Reservation", "Preferred_Seats", "Flight_Options", "Ticket_Prices")
     allMeasuredVars = c(serviceVars, aircraftVars, ticketingVars)
     allLatentVars   = c("general_factor", "ticketing", "aircraft","service")
    
     bifactor <- mxModel("bifactor", type="RAM",
     manifestVars = allMeasuredVars,
     latentVars   = allLatentVars,
     mxPath(from = allLatentVars  , arrows = 2, free = F, values = 1), # latents fixed@1
     mxPath(from = allMeasuredVars, arrows = 2), # manifest residuals
     # Factor loadings
     mxPath(from = "general_factor", to = allMeasuredVars),
     mxPath(from = "ticketing"     , to = ticketingVars),
     mxPath(from = "aircraft"      , to = aircraftVars),
     mxPath(from = "service"       , to = serviceVars),
     mxData(cov(ratings[,allMeasuredVars]), type = "cov", numObs = nrow(ratings))
     )
     bifactor= mxRun(bifactor)
     tmp = summary(bifactor)
    
    If you print the summary (just type tmp now as we stored it there), you will see a fullsome summary, with standardized and unstandardized paths, std errors, and a list of fit indices.
    Here are the fit indices formatted nicely: χ2(42) = 48.57, p = 0.2253; CFI = 0.999; TLI = 0.999; RMSEA = 0.013
    Because this is R, we can also format these nicely. Let’s pull out the standardized paths, round them off
    pathLoadings = tmp$parameters[,c("row", "col", "Std.Estimate")]
    
    pathLoadings$Std.Estimate = round(pathLoadings$Std.Estimate,2)
    
    pathLoadings
    
    
                    row              col Std.Estimate
    1          Courtesy   general_factor         0.78
    2      Friendliness   general_factor         0.76
    3       Helpfulness   general_factor         0.84
    4           Service   general_factor         0.81
    5      Seat_Comfort   general_factor         0.79
    6    Seat_Roominess   general_factor         0.72
    7  Overhead_Storage   general_factor         0.70
    8    Clean_Aircraft   general_factor         0.74
    9  Easy_Reservation   general_factor         0.75
    10  Preferred_Seats   general_factor         0.70
    11   Flight_Options   general_factor         0.62
    12    Ticket_Prices   general_factor         0.64
    13 Easy_Reservation        ticketing         0.31
    14  Preferred_Seats        ticketing         0.40
    15   Flight_Options        ticketing         0.39
    16    Ticket_Prices        ticketing         0.44
    17     Seat_Comfort         aircraft         0.30
    18   Seat_Roominess         aircraft         0.37
    19 Overhead_Storage         aircraft         0.44
    20   Clean_Aircraft         aircraft         0.34
    21         Courtesy          service         0.27
    22     Friendliness          service         0.44
    23      Helpfulness          service         0.27
    24          Service          service         0.34
    25         Courtesy         Courtesy         0.33
    26     Friendliness     Friendliness         0.23
    27      Helpfulness      Helpfulness         0.23
    28          Service          Service         0.23
    29     Seat_Comfort     Seat_Comfort         0.28
    30   Seat_Roominess   Seat_Roominess         0.34
    31 Overhead_Storage Overhead_Storage         0.32
    32   Clean_Aircraft   Clean_Aircraft         0.34
    33 Easy_Reservation Easy_Reservation         0.34
    34  Preferred_Seats  Preferred_Seats         0.36
    35   Flight_Options   Flight_Options         0.47
    36    Ticket_Prices    Ticket_Prices         0.39
    > 
    In my next post, I’ll show you how to standardize a model and then format the output more like you might expect for an EFA.
    ’til then, happy OpenMx-ing! tim
    PS: If you want to learn more about fit indices, I recommend this Kenny link.

    You probably want a figure as well, no? I describe doing that here.

    R Code to Generate the Simulated Data and Run All Analyses

    To generate the ratings data, execute this code below:
       
    # First, we create a matrix of factor loadings.
    # This pattern is called bifactor because it has a
    # general factor with loadings from all the items
    # and specific factors for separate components.
    # The outcome variables are also formed as
    # combinations of these general and specific factors.
    
    loadings <- matrix(c(
     .33, .58, .00, .00,  # Ease of Making Reservation
     .35, .55, .00, .00,  # Availability of Preferred Seats
     .30, .52, .00, .00,  # Variety of Flight Options
     .40, .50, .00, .00,  # Ticket Prices
     .50, .00, .55, .00,  # Seat Comfort
     .41, .00, .51, .00,  # Roominess of Seat Area
     .45, .00, .57, .00,  # Availability of Overhead Storage
     .32, .00, .54, .00,  # Cleanliness of Aircraft
     .35, .00, .00, .50,  # Courtesy
     .38, .00, .00, .57,  # Friendliness
     .60, .00, .00, .50,  # Helpfulness
     .52, .00, .00, .58,  # Service
     .43, .10, .20, .30,  # Overall Satisfaction
     .35, .50, .40, .20,  # Purchase Intention
     .25, .50, .50, .00), # Willingness to Recommend
     nrow = 15, ncol = 4, byrow = TRUE
    )
    
    # Matrix multiplication produces the correlation matrix,
    # except for the diagonal.
    cor_matrix <- loadings %*% t(loadings)
    # Diagonal set to ones.
    diag(cor_matrix) <- 1
    # cor_matrix = cov2cor(cor_matrix)
    require(mvtnorm)
    N = 1000
    set.seed(7654321) #needed in order to reproduce the same data each time
    std_ratings <- as.data.frame(rmvnorm(N, sigma = cor_matrix))
    
    # Creates a mixture of two data sets:
    # first 50 observations assigned uniformly lower scores.
    ratings           <- data.frame(matrix(rep(0,15000),nrow=1000))
    ratings[1:50,]    <- std_ratings[1:50,] * 2
    ratings[51:1000,] <- std_ratings[51:1000,] * 2+7.0
    
    # Ratings given different means
    ratings[1] <- ratings[1]+2.2
    ratings[2] <- ratings[2]+0.6
    ratings[3] <- ratings[3]+0.3
    ratings[4] <- ratings[4]+0.0
    ratings[5] <- ratings[5]+1.5
    ratings[6] <- ratings[6]+1.0
    ratings[7] <- ratings[7]+0.5
    ratings[8] <- ratings[8]+1.5
    ratings[9] <- ratings[9]+2.4
    ratings[10]<- ratings[10]+2.2
    ratings[11]<- ratings[11]+2.1
    ratings[12]<- ratings[12]+2.0
    ratings[13]<- ratings[13]+1.5
    ratings[14]<- ratings[14]+1.0
    ratings[15]<- ratings[15]+0.5
    
    # Truncates Scale to be between 1 and 9
    ratings[ratings>9]<-9
    ratings[ratings<1]<-1
    # Rounds to single digit.
    ratings<-round(ratings,0)
    
    # Assigns names to the variables in the data frame called ratings
    names(ratings) = c("Easy_Reservation","Preferred_Seats","Flight_Options","Ticket_Prices","Seat_Comfort","Seat_Roominess","Overhead_Storage","Clean_Aircraft","Courtesy","Friendliness","Helpfulness","Service","Satisfaction","Fly_Again","Recommend")
    Now you have an object called ratings to work with.
    0

    Add a comment

  9. I have a row of 1,000 coins all heads down.

    I first turn over every second coin. Then every third coin.

    And then so on and so forth every fourth, every fifth, every sixth... all the way up until I simple turn over the 1000th coin.

    Question: Which coins will now be heads up?



    My R code to solve this here
    0

    Add a comment

  10. The process of getting more minds closer to the truth is fascinating to me, and it is one where technology can make a big difference.

    What we need to change is two fold: Letting people find out what they are wrong about currently, and getting references into the information that people give out, so we can easily find out what is right and what is wrong (or simply unknown).

    Some of these things can be aided by extensions of existing ideas. For instance extending freedom of association and freedom of speech to include uncensored access to the internet.

    Infrastructure, too is effective at making information available: Clearly the internet and its protocols are the equivalent of the printing press in terms of their ramifications.

    But innovations in content and creation are important also.

    Like inventing a structure within which people can collaboratively build an encyclopedia: Wikipedia will likely do more to make conversations sane and productive, and to drive self-education, than any other single innovation in human histroy (Fixing education could have as big effects, but that is a distinct, if desperately pressing matter, and one where human rules, rather than computers are the answer).

    Here, I’d just like to comment on a tiny innovation that I think would make the world a better place: Blogging system and referencing.

    The leading systems (blogger, tumblr, wordpress) all support better and better drag and drop interfaces for photos and video, tagging etc. What they currently fail to aid is citation, hyperlinking, repeatable computing, and fact checking.

    Without citations, without cross linking, without reputation-based criticism, and open data, these blogs cannot bring closer to the truth: Instead, they will drive people further from the truth and create a cloud of distortion that makes any effort to approach the truth harder.

    Some concrete suggestions, from using blogger (which is moving forward and can perhaps react to these suggestions best).
    1. One-click linking to wikipedia and wictionary: autoselect the current word, and make it trivial to link to wikipedia. If there are multiple pages, open dialog to let me choose the relevant Wikipedia page.
    2. Auto completion of urls: If I select some text and open the link button, have google search for viable links. If I start to type a link, autocomplete it from my history, from open tabs, and from the search engine look ahead.
    3. Smart identifiers: If I insert a pubmed URL, turn it into a citation, with a reference that links to google scholar, pubmed, and the pdf if possible.
    4. Reference sections… Support a referencing system. Make it open source, and help the world coalecse around a universal reference id system for documents and authors.
    5. An open reputation (perhaps based around systems like stackoverflow) and the ability to give credit (and thumbs-down) to posts, linking to this reputation.
    6. A comment system that acts to supplement and critique the references given in an article, and link articles to opposing points of view in ways that search engines can represent in visual and semantic terms.
    0

    Add a comment

Logo
Logo
Subscribe
Subscribe
Popular Posts
Popular Posts
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.