Hacker News

Both new Sonnet and gpt-4o still fail at a simple:

"How many w's are in strawberry?"

gpt-4o: There are 2 "w's" in "strawberry."

Claude 3.5 Sonnet (new): Let me count the w's in "strawberry": 0 w's.

(same question with 'r' succeeds)

What is artificial about current gen of "artificial intelligence" is the way training (predict next token) and benchmarking (overfitting) is done. Perhaps a fresh approach is needed to achieve a true next step.

wild_egg a year ago [ - ]

It's bad at directly working on classical computer problems like math and data processing. But you can do it indirectly by having it write a program that produces the correct result. Interestingly, I didn't even have to have it run the program, although usually you would

    write a tool which counts the number of w's in "strawberry" and return the result

Which produced:

    Here's a simple Python function that counts the number of 'w's in the word "strawberry" and returns the result:

    ```python
    def count_w_in_strawberry():
        word = "strawberry"
        count = word.count('w')
        return count

    # Call the function and print the result
    result = count_w_in_strawberry()
    print(f"The number of 'w's in 'strawberry' is: {result}")
    ```

    This tool does the following:

    1. We define a function called `count_w_in_strawberry()`.
    2. Inside the function, we assign the word "strawberry" to a variable called `word`.
    3. We use the `count()` method on the `word` string to count the occurrences of 'w'.
    4. The function returns the count.
    5. Outside the function, we call `count_w_in_strawberry()` and store the result in the `result` variable.
    6. Finally, we print the result.

    When you run this code, it will output:

    ```
    The number of 'w's in 'strawberry' is: 1
    ```

    This tool correctly identifies that there is one 'w' in the word "strawberry".

jandrese a year ago [ - ]

I always thought the halting problem was an academic exercise, but here we see a potential practical use case. Actually this seems pretty dangerous letting the LLM write and automatically execute code. How good is the sandbox? Can I trick the LLM into writing a reverse shell and opening it up for me?

okasaki a year ago [ - ]

I'm not sure it's still the case, but I've had ChatGPT run shell commands. But I don't know what you could do since it's ephemeral, doesn't have internet access or root. Plus I'm sure they have security scanning.

redox99 a year ago [ - ]

There's always that one tokenization error comment

int_19h a year ago [ - ]

There are models around that will answer this question perfectly.

But why would anyone care? If you want to count w's in a word, that's a Python one-liner, you don't need an LLM for that (although the LLM can write it for you).

fassssst a year ago [ - ]

They are trained on tokens not characters.

ssijak a year ago [ - ]

Can we stop with these useless strawberry examples?