Boston University Libraries OpenBU
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    •   OpenBU
    • Theses & Dissertations
    • Boston University Theses & Dissertations
    • View Item
    •   OpenBU
    • Theses & Dissertations
    • Boston University Theses & Dissertations
    • View Item

    Language modeling for personality prediction

    Thumbnail
    Date Issued
    2021
    Author(s)
    Cutler, Andrew
    Share to FacebookShare to TwitterShare by Email
    Export Citation
    Download to BibTex
    Download to EndNote/RefMan (RIS)
    Metadata
    Show full item record
    Permanent Link
    https://hdl.handle.net/2144/41942
    Abstract
    This dissertation can be divided into two large questions. The first is a supervised learning problem: given text from an individual, how much can be said about their personality? The second is more fundamental: what personality structure is embedded in modern language models? To address the first question, three language models are used to predict many traits from Facebook Statuses. Traits include: gender, religion, politics, Big5 personality, sensational interests, impulsiveness, IQ, fair-mindedness, and self-disclosure. Linguistic Inquiry Word Count (Pennebaker et al., 2015), the dominant model used in psychology, explains close to zero variance on many labels. Bag of Words performs well and the model weights provide valuable insight about why predictions are made. Neural Nets perform the best by a wide margin on personality traits especially when few training samples are available. A pretrained personality model is made available online that can explain 10% of the variance of a trait with as little as 400 samples, within the range of normal psychology studies. This is a good replacement for Linguistic Inquiry Word Count in predictive settings. In psychology, personality structure is defined by dimensionality reduction of word vectors (Goldberg, 1993). To address the second question, factor analysis is performed on embeddings of personality words produced by the language model RoBERTa (Liu et al., 2019). This recovers two factors that look like Digman’s α and β (Digman, 1997) and not the more popular Big Five. The structure is shown to be robust to choice of context around an embedded word, language model, factorization method, word set and English vs Spanish. This is a flexible tool for exploring personality structure that can easily be applied to other languages.
    Collections
    • Boston University Theses & Dissertations [6905]


    Boston University
    Contact Us | Send Feedback | Help
     

     

    Browse

    All of OpenBUCommunities & CollectionsIssue DateAuthorsTitlesSubjectsThis CollectionIssue DateAuthorsTitlesSubjects

    Deposit Materials

    LoginNon-BU Registration

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Boston University
    Contact Us | Send Feedback | Help