javascript - Parsing plain text to js array - add separator to the second element


Keywords:javascript 


Question: 

I'm searching solution how to parse plain text to the js array. I have already found some scheme in which i want to do this, but kind of stuck.

Part of plain text:

2017-11-08 09:43:49,153 [INFO ] root: {\"methodId\":6,\"requestBody\":{},\"token\":\"XXXX\"}2017-11-08 09:53:02,293 [INFO ] root: {\"methodId\":6,\"requestBody\":{},\"token\":\"XXXX\"}2017-11-08 09:53:02,355 [INFO ] root: {\"methodId\":6,\"requestBody\":{},\"token\":\"XXXX\"}

Expected result

const arr = [
    '2017-11-08 09:43:49,153 [INFO ] root: {\"methodId\":6,\"requestBody\":{},\"token\":\"XXXX\"}',
    '2017-11-08 09:53:02,293 [INFO ] root: {\"methodId\":6,\"requestBody\":{},\"token\":\"XXXX\"}',
    '2017-11-08 09:53:02,355 [INFO ] root: {\"methodId\":6,\"requestBody\":{},\"token\":\"XXXX\"}'
]

RegEx Pattern:

/}\d{4}-\d{2}/

Each chunk ends by closing object "}" and starting new date "YYYY-MM".

Problem

plainText.split(/}\d{4}-\d{2}/)

If i split it this way, it always "eats" my separator. Is there some way to split text and add founded separator to the second element from the splited pair? Then i could just add "}" to the first one and remove "}" from the second one. It's solution I'm thinking about, but maybe you can suggest something even better.


1 Answer: 

If the JSON data does not contain datetime-like substrings, you may use

s.split(/\b(?=\d{4}-\d{2}-\d{2}\s/).filter(Boolean)

Or a more verbose (to play it safer):

s.split(/\b(?=\d{4}-\d{2}-\d{2}\s+[\d:,]+\s+\[INFO ]\s+root:)/).filter(Boolean)

See the regex demo

The point is to match the datetime-like string but not consume it, thus, the whole pattern is wrapped within a positive lookahead (?=...) construct.

Longer pattern details

  • \b - a word boundary
  • (?= - start of the positive lookahead pattern
    • \d{4}-\d{2}-\d{2} - date-like string (4 digits-2 digits-2 digits)
    • \s+ - 1 or more whitespaces
    • [\d:,]+ - 1 or more digits, : or/and ,
    • \s+ - 1 or more whitespaces
    • \[INFO ] - an [INFO ] substring
    • \s+ - 1+ whitespaces
    • root: - root: substring
  • ) - end of the lookahead