IBM's new product offering, Code Assistant for IBM Z, leverages a generative AI model to translate COBOL code to Java.
It's not the 1st time a language/tool will be lost to the annals of the job market, eg VB6 or FoxPro. Though previously all such cases used to happen gradually, giving most people enough time to adapt to the changes.
I wonder what's it going to be like this time now that the machine, w/ the help of humans of course, can accomplish an otherwise multi-month risky corporate project much faster? What happens to all those COBOL developer jobs?
Pray share your thoughts, esp if you're a COBOL professional and have more context around the implication of this announcement π
This sounds no different than the static analysis tools weβve had for COBOL for some time now.
The problem isnβt a conversion of what may or may not be complex code, itβs taking the time to prove out a new solution.
I can take any old service program on one of our IBM i machines and convert it out to Java no problem. The issue arises if some other subsystem that relies on that gets stalled out because the activation group is transient and spin up of the JVM is the stalling part.
Now suddenly, I need named activation and that means I need to take lifetimes into account. Static values are now suddenly living between requests when procedures donβt initial them. And all of that is a great way to start leaking data all over the place. And when you suddenly start putting other peopleβs phone numbers on 15 year contracts that have serious legal ramifications, legal doesnβt tend to like that.
It isnβt just enough to convert COBOL 1:1 to Java. You have to have an understanding of what the program is trying to get done. And just looking at the code isnβt going to make that obvious. Another example, this module locks a data area down because we need this other module to hit an error condition. The restart condition for the module reloads it into a different mode thatβs appropriate for the process which sends a message to the guest module to unlock the data area.
Yes, I shit you not. There is a program out there doing critical work where the expected execution path is to on purpose cause an error so that some part of code in the recovery gets ran. How many of you think an AI is going to pick up that context?
The tools back then were limited and so programmers did all kinds of hacky things to get particular things done. Weβve got tools now to fix that, just that so much has already been layered on top of the way things work right now. Pair with the whole, we cannot buy a second machine to build a new system and any new program must work 99.999% right out of the gate.
COBOL is just a language, itβs not the biggest problem. The biggest problem is the expectation. These systems run absolutely critical functions that just simply cannot fail. Trying to foray into Java or whatever language means we have to build a system that doesnβt have 45 years worth of testing that runs perfectly. Itβs just not a realistic expectation.
What pisses me off about many such endeavors is, that these companies always want big-bang solutions, which are excessively hard to plan out due to the complexity of these systems, so it's hard to put a financial number on the project and they typically end up with hundreds of people involved during "planning" just to be sacked before any meaningful progress could be made.
Instead they could simply take the engineers they need for maintenance anyway, and give them the freedom to rework the system in the time they are assigned to the project. Those systems are - in my opinion - basically microservice systems. Thousands of more or less small modules inter-connected by JCL scripts and batch processes. So instead of doing it big bang, you could tackle module by module. The module doesn't care in what language the other side is written in, as long as it still is able to work with the same datastructure(s).
Pick a module, understand it, write tests if they are missing, and then rewrite it.
After some years of doing that, all modules will be in a modern language (Java, Go, Rust, whatever) and you will have test coverage and hopefully even documentation. Then you can start refactoring the architecture.
But I guess that would be too easy and not enterprisy enough.
I think you vastly overestimate the separability of these systems.
Picture 10,000 lines of code in one method, with a history of multiple decades.
Now picture that that method has buried in it, complex interactions with another method of similar size, which is triggered via an obscure side-effect.
Picture whole teams of developers adding to this on a daily basis in realtime.
There is no "meaningful progress" to be made here. It may offend your aesthetic sense, but it's just the reality of doing business.
This sounds no different than the static analysis tools weβve had for COBOL for some time now.
One difference is people might kind of understand how the static analysis tools we've had for some time now actually work. LLMs are basically a black box. You also can't easily debug/fix a specific problem. The LLM produces wrong code in one particular case, what do you do? You can try performing fine tuning training with examples of the problem and what it should be but there's no guarantee that won't just change other stuff subtly and add a new issue for you to discovered at a future time.
Not a cobol professional but i know companies that have tried (and failed) to migrate from cobol to java because of the enormously high stakes involved (usually financial).
LLMs can speed up the process, but ultimately nobody is going to just say "yes, let's accept all suggested changes the LLM makes". The risk appetite of companies won't change because of LLMs.
Wonder what makes it so difficult. "Cobol to Java" doesn't sound like an impossible task since transpilers exist. Maybe they can't get similar performance characteristics in the auto-transpiled code?
COBOL programs are structured very differently from Java. For example; you canβt just declare a variable, you have to add it to the working storage section at the top of the program.
So the fintech companies who rely on that tested (though unliked) lump of iron from IBM running an OS, language, and architecture built to do fast, high-throughput transactional work should trust AI to turn it into Java code to run on hardware and infrastructure of their own choosing without having architected the whole migration from the ground up?
Don't get me wrong, I want to see the world move away from cobol and ancient big blue hardware, but there are safer ways to do this and the investment cost would likely be worth it.
Converting ancient code to a more modern language seems like a great use for AI, in all honesty. Not a lot of COBOL devs out there but once it's Java the amount of coders available to fix/improve whatever ChatGPT spits out jumps exponentially!
The fact that you say that tells me that you donβt know very much about software engineering. This whole thing is a terrible idea, and has the potential to introduce tons of incredibly subtle bugs and security flaws. ML + LLM is not ready to be used for stuff like this at the moment in anything outside of an experimental context. Engineers are generally - and with very good reason - deeply wary of βtoo much magicβ and this stuff falls squarely into that category.
All of that is mentioned in the article. Given how much it cost last time a company tried to convert from COBOL, don't be surprised when you see more businesses opt for this cheaper path. Even if it only converts half of the codebase, that's still a huge improvement.
I'm more alarmed at the conversation in this thread about migrating these cobol apps to java. Maybe I am the one who is out of touch, but what the actual fuck? Is it just because of the large java hiring pool? If you are effectively starting from scratch why in the ever loving fuck would you pick java?
It could mean anything, the same code used in production in new ways, slightly modified code, newly discovered cobol where the original language was a mystery, new requirements for old systems, seriously it could be too many things for that to be a useful metric with no context
That's the keyword right there. Everyone wants to phase mainframe shenanigans out until they get told about the investments necessary to do it, then they are happy to just survive with it.
I'm currently at a company that's actually trying it and it's being a pain
ChatGPT did an amazing job converting my Neovim config from VimScript to Lua including explaining each part and how it was different. That was a very well scoped piece of code though. I'd be interested to see how an LLM goes on large projects as I imagine that would be a whole different level of complexity. You need to understand a lot more about the components and interactions and be very careful not to change behaviour. Security is another important thing that was already mentioned in this thread and the article itself.
I put my self as doubtful but really interested to see the results nonetheless. I've already been surprised a few times over by these things so who knows.
Because Cobol is mainly used in an enterprise environment, where they most likely already run Java software which interfaces with the old Cobol software. Plus modern Java is a pretty good language, it's not 2005 anymore.
Sadly, I've haven't been programming for a while, but I did program Java. Why do you consider it legacy and do you see a specific language replacing it?
Im sorta excited for stuff like this to get going in terms of video games. There are some great games and it would be great if it was easier to pull it into a more modern engine or such.
Without a requirements doc stamped in metal you wonβt get 1:1 feature replication
This was kind of a joke but itβs actually very real tbh, the problems that companies have with human devs trying to bring ancient systems into the modern world will all be replicated here. The PM wonβt stop trying to add features just because the team doing it is using an LLM, and the team doing it wonβt be the team that built it, so they wonβt get all the nuances and intricacies right. So you get a strictly worse product, but itβs cheaper (maybe) so it has to balance out against the cost of the loss in quality
For large organizations, it tends to be a complex and costly proposition, given the small number of COBOL experts in the world.
When the Commonwealth Bank of Australia replaced its core COBOL platform in 2012, it took five years and cost over $700 million.
Running locally in an on-premises configuration or in the cloud as a managed service, Code Assistant is powered by a code-generating model, CodeNet, that can understand not only COBOL and Java but around 80 different programming languages.
A recent Stanford study finds that software engineers who use code-generating AI systems similar to it are more likely to cause vulnerabilities in the apps they develop.
βLike any AI system, there might be unique usage patterns of an enterpriseβs COBOL application that Code Assistant for IBM Z may not have mastered yet,β Puri said.
IBM sees a future in broader code-generating AI tools, as well β intent on competing with apps like GitHub Copilot and Amazon CodeWhisperer.
The original article contains 734 words, the summary contains 159 words. Saved 78%. I'm a bot and I'm open source!
So a 'compiler' then? From a fairly straightforward easy to use COBOL to whatever. makes sense. can the new code work in the mainframe environment? or is that what this piracy is about?
"Cross compiler" usually means a compiler that generates machine code for a machine other than what it runs on. For example, a compiler that runs on X86_64 but creates binaries for Atmel microcontrollers.
You might be thinking of transpilers, which produce source code in a different language. The f2c Fortran-to-C compiler is an example of that.
In my experience, transpiler output is practically unusable to a human reader. I'm guessing (I haven't read the article) that IBM is using AI to convert COBOL to readable, maintainable Java. If it can do so without errors, that's a big deal for mainframe users.
Something I found is that LLM struggle with weirder cases, when it comes to code.
I once tried getting ChatGPT (though admittedly only 3.5) to generate code inunderstandSaHuTOrEPoL, which is one of the more esoteric languages I created, and it really struggled with it.
Why would you expect ChatGPT to know how to write code in a language that you created yourself? According to that repository you created it last year, and ChatGPT was only trained on data up to 2021 so there's no way it could have been in its training set. Even though AIs have surprised us with their insights in a lot of cases they aren't magical.
Admittedly, I worded my comment poorly. What I meant is that ChatGPT struggled with understanding the semantics and structure of the language.
As an example, try from this this code block
$S__ do
S__-m__w("Hello world!") do
You can, hopefully guess that S__ is a variable which has a method m__w, accessed by using a hyphen, rather than a dot and statements end using a do keyword. ChatGPT missed on all marks.