A man of leisure living in the present, waiting for the future.

  • 0 Posts
  • 52 Comments
Joined 1 year ago
cake
Cake day: June 14th, 2023

help-circle
















  • Thanks for citing specifics but I’m still not seeing what you are claiming there, this paper seems to be about the limits of accurate classification of true and false statements in LLM models and shows that there is a linear pattern in the underlying classification via multidimensional analysis. This seems unsurprising since the way LLMs work is essentially taking a probabilistic walk through an array of every possible next word or token based on multidimensional analysis of patterns of each.

    Their conclusions, from the paper (btw, Arxive is not peer-reviewed):

    In this work we conduct a detailed investigation of the structure of LLM representations of truth.
    Drawing on simple visualizations, correlational evidence, and causal evidence, we find strong reason to believe that there is a “truth direction” in LLM representations. We also introduce mass-mean
    probing, a simple alternative to other linear probing techniques which better identifies truth directions from true/false datasets.

    Nothing about symbolic understanding, just showing that there is a linear pattern to statements defined as true vs false, when graphed a specific way.

    From the associated data explorer.:

    These representations live in a 5120-dimensional space, far too high-dimensional for us to picture, so we use PCA to select the two directions of greatest variation for the data. This allows us to produce 2-dimensional pictures of 5120-dimensional data.

    So they take the two dimensions that differ the greatest and chart those on X/Y, showing there are linear patterns to the differences in statements classified as, “true,” and, “false.” Because this is multidimensional and it’s AI finding patterns there are patterns being matched beyond the simplistic examples I’ve been offering as analogues, patterns that humans cannot see, patterns that extend beyond simple obvious correlations we humans might see in training data. It doesn’t literally need to be trained on statements like “Beijing is in China” and even if it is it’s not guaranteed that it will match that as a true statement. It might find patterns in unrelated words around these, or might associate these words or parts of these words with each other for other reasons.

    I’m rather simplifying how LLMs work for purposes of this discussion, but the point stands that pattern matching of words still seems to account for all of this. LLMs, which are probabilistic in nature, often get things wrong. Llama-13B is the best and it still gets things wrong a significant amount of the time.