A recent good reason for using Java is that frontier LLMs are trained with very large amounts of high quality enterprise Java source code. Claude Code for example loves Java and its static type system.
I constrain my LLM-generated Java code to only static methods of 20 LOC or less, and limit data types to those that are JSON compatible. Both of these lead to more reliable code and data that Claude Code fully understands and generates.
I am preparing to auto-generate an agent-based application that might reach 1.5 million Java LOC. Hard to imagine accomplishing that with Javascript or Python or C++.
Could you please expand on how you limit the generated code? I haven't dived deep into Claude code, mostly just familiar with OpenAI's offering.
I first generate a specification JSON object from a text design narrative that lists fine-grained steps for each Java class that are constrained to be decomposed such that each fine grained step can be implemented as a static method in 20 lines of Java code or less. Likewise helper methods are similarly scoped to 20 LOC or less.
I also have a markdown-formatted document `core-programming-guidelines.md` that I include in the Claude Code code-generation prompt.
For example:
## Core Programming Principles
### Defensive Programming & Safety 1. *Use 'final' keyword aggressively* for method parameters, local variables, and class fields 2. *Null Safety*: Include null checks with Validate.notNull() and assertions for external calls 3. *Input Validation*: Validate all method parameters with clear preconditions using org.apache.commons.lang3.Validate
### Performance Optimization 1. *Collection Sizing*: Always provide calculated initial capacity for collections 2. *String Processing*: Use StringBuilder with pre-calculated capacity, avoid regex where possible and avoid `java.util.Scanner` where possible. 3. *Memory Management*: Clear large collections when done, reuse objects where appropriate
### Code Clarity & Documentation 1. *Naming Conventions*: Use descriptive names for variables, methods, and constants - All StringBuilder variables should be suffixed `Builder`. 2. *Documentation*: Comprehensive JavaDoc for all public, protected, and private methods 3. *Inline Comments*: Explain complex logic, algorithms, and business rules
### Modern Java 23 Features 1. *Text Blocks*: Use for multi-line string literals 2. *Pattern Matching*: Use where appropriate for cleaner code 3. *Records*: Use for immutable data carriers 4. *Enhanced Switch*: Use new switch expressions
> A recent good reason for using Java is that frontier LLMs are trained with very large amounts of high quality enterprise Java source code.
Where did it get it from?
GitHub public repositories mostly.
I don't think Github is necessarily full of high quality enterprise Java software, is it?
Grok says you are right (https://grok.com/share/c2hhcmQtMg%3D%3D_ddbace62-c299-4b7b-9...) however...
https://github.com/iluwatar/java-design-patterns
https://github.com/spring-projects/spring-framework
https://github.com/apache/kafka
https://github.com/neo4j
https://github.com/inforkgodara/store-poshttps://github.com/...