opened 01:19AM - 14 Nov 20 UTC
closed 11:53PM - 13 Dec 21 UTC
area-System.Text.RegularExpressions
User Story
Priority:2
Cost:M
Team:Libraries
[AB#1255830](https://devdiv.visualstudio.com/10e66e43-9645-4201-b128-0fdc3769cc1…7/_workitems/edit/1255830)
Regex has an interpreted mode and a compiled mode. The compiled mode takes longer to start, but is generally faster. Some users want both startup time and performance; other users want to run on a platform where JITted code is not allowed, and also want performance. For those users, .NET Framework allowed the generated code to be saved out but .NET Core/.NET 5 does [not plan to support saving emitted IL](https://github.com/dotnet/runtime/issues/15704).
Instead, we could develop a [Roslyn Source Generator](https://devblogs.microsoft.com/dotnet/introducing-c-source-generators/) that would generate the necessary code (as C#) at compile time.
Essentially API should be need on Regex, as the generated code would hook into it in the regular way, by deriving from RegexRunner and implementing FindFirstChar() and Go(). This API is all public/protected today (indeed it happens that .NET Core can load regexes written out by .NET Framework.)
TBD the user experience - eg., how one must annotate regexes in order to trigger the generator - assuming it doesn't attempt to read all the code to try to infer them - and how the generated code is wired up.
For example, one could imagine requiring an annotation like this (we could also expose API on the source generator so it could consume a list of regexes, for example)
```
[RegexSourceGenerator(“(?:[0-9]{1,3}\.){3}[0-9]{1,3}”, RegexOptions.IgnoreCase)]
public partial Regex CreateIpAddressRegex();
```
This new attribute would need to be exposed somewhere, probably on the Regex assembly since the code being compiled would be referencing that anyway.
Also TBD the implementation details necessary to wire into the existing Regex implementation - this is probably most of the work.
cc @pgovind @eerhardt @stephentoub this is just a summary of our email thread. Anything to add?
In terms of customers, we have at least one major 1st party service that heavily uses regex and highly values both throughput and startup time. This is probably P1 for them.
@danroth27 in a Blazor app, can I assume that most users in the .NET 6 timeframe will be content with either using Javascript regexes, or interpreted mode .NET regexes? In my mind this is P2/P3.
@marek-safar similar question for Xamarin, on platforms that don't JIT, Xamarin apps presumably currently interpret the ref-emitted code, or use the regex interpreted mode - and this would not be a big win for them - we had talked about P2.