Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datafy and recur-datafy throw StackOverflowError #13

Open
simongray opened this issue Jan 9, 2022 · 3 comments
Open

datafy and recur-datafy throw StackOverflowError #13

simongray opened this issue Jan 9, 2022 · 3 comments
Labels

Comments

@simongray
Copy link
Owner

Seems like it is an infinite loop in the datafy-tsm implementation. Removing the datafy call from (assoc m k (datafy v)) and leaving just v seems to solve it for the regular datafy. This is also how it should be, it shouldn't be recursive in the case of datafy.

In the case of recur-datafy I will need to look further into what's causing it. I guess some sort of memory of is needed to avoid this issue.

@simongray simongray added the bug label Jan 9, 2022
@ag91
Copy link

ag91 commented Sep 22, 2023

in case anybody else is trying the "sentiment" annotator, for instance:

(->> ((->pipeline {:annotators ["sentiment"]}) "Paula gave me 10 dollars. Of those $10 I used only one dollar. That felt bad. But also great.") 
     sentences
     (map (comp :sentiment recur-datafy))
     )

You can redefine recur-datafy like this (I left the debugging in case @simongray wants to try it out):

(in-ns 'dk.simongray.datalinguist)

(defmacro ignore-errors [& body]
  `(try ~@body (catch Exception e#)))

(def my (atom nil))

(defn recur-datafy
  "Return a recursively datafied representation of `x`.
  Call at the end of an annotation chain to get plain Clojure data structures."
  [x]
  (let [x* (datafy x)]
    ;; (prn "WOW---" x*)
    ;; (reset! my x*)
    (cond
      (seq? x*)
      (mapv recur-datafy x)

      (set? x*)
      (set (map recur-datafy x*))

      (map? x*)
      (ignore-errors (into {} (for [[k v] (dissoc x* :tree/binarized-tree :tree/tree) ;; (select-keys x*
                                    ;;                '(:tree/tree !
                                    ;;                  :token-end
                                    ;;                  :semantic-graph/collapsed-cc-processed-dependencies
                                    ;;                  :token-begin
                                    ;;                  :semantic-graph/basic-dependencies
                                    ;;                  :sentence-index
                                    ;;                  :sentiment
                                    ;;                  :semantic-graph/collapsed-dependencies
                                    ;;                  :character-offset-begin
                                    ;;                  :semantic-graph/enhanced-plus-plus-dependencies
                                    ;; ; :tree/binarized-tree !
                                    ;;                  :semantic-graph/enhanced-dependencies :tokens :character-offset-end :text
                                    ;;                  ))
                                    ]
                                [(recur-datafy k) (recur-datafy v)])))

      ;; Catches nearly all Java collections, including custom CoreNLP ones.
      (instance? Iterable x*)
      (mapv recur-datafy x*)

      :else x*)))

I discarded the :tree/binarized-tree :tree/tree keys, which seem to cause an infinite recursion.
With the prn I see

"WOW---" :tree/tree
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) 

Which means that recurring on the :tree/tree keyword continue to produce the same result.
@simongray you can reproduce the logging by removing the dissoc

(->> ((->pipeline {:annotators ["sentiment"]}) "Paula gave me 10 dollars. Of those $10 I used only one dollar. That felt bad. But also great.") 
     sentences
     first
     recur-datafy
     )

Maybe it could be enough to return the string of the contents of :tree/tree and :tree/binarized-tree? If so, adding another instance? case in recur-datafy could do the job.

@simongray
Copy link
Owner Author

Thank you, @ag91. I must admit that I haven't been actively developing this wrapper for a while now, so these longstanding issues continue to persist.

Are you using it for a project? Or just dabbling?

@ag91
Copy link

ag91 commented Sep 24, 2023

Oh, I was just dabbling with NLP really and I thought to try CoreNLP with Clojure.
I like your library, it is making my exploration super easy: thank you for sharing it!

It is fine to leave it if I am the only user: I just wanted to help other users and you, if you ever wanted to investigate this further ;)
(I can also open a PR if you have time and wish to save yourself some work. I am also fine with my personal fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants