Submission of Entries to the Deep Funding Mini Contest

Overview

1. Data Acquisition via GitHub API

  • Purpose:
    Retrieve repository metrics (stars, forks, watchers, open issues, last update) using the GitHub API.
  • Implementation:
    • fetch_github_metrics makes an API call to retrieve JSON data for a repository.
    • get_github_metrics caches results in a dictionary (GITHUB_CACHE) to minimize duplicate API calls.
  • Real‑World Use:
    Such API integrations are common when building dashboards or monitoring systems that track project popularity and activity.

2. Feature Extraction

  • Purpose:
    Convert each repository’s URL into a numeric feature vector that summarizes its popularity and activity.
  • Implementation:
    • extract_repo_features parses the GitHub URL to extract the organization and repository names, then obtains numerical features from the API response.
    • It also creates binary “technology flags” for keywords (e.g., “typescript”, “blockchain”, “testing”) present in the URL.
  • Real‑World Use:
    Feature extraction from online sources is a standard step in many recommendation systems and investment analysis tools.

3. Engineered Features and Data Preprocessing

  • Purpose:
    Create relative features between pairs of repositories and incorporate additional context.
  • Implementation:
    • For each pair, the features for project A and project B are computed separately.
    • Three engineered features are then calculated: the difference, the sum, and the ratio (with an epsilon to prevent division by zero).
    • Extra features (year, quarter, and one‑hot encoded funder) are appended.
    • Missing values are filled with the median, and all features are scaled using RobustScaler.
  • Real‑World Use:
    Such preprocessing ensures that models are not overly influenced by outliers and that comparisons between projects are meaningful—a practice used in finance and risk assessment.

4. Base Model: LightGBM

  • Purpose:
    Train a robust gradient boosting model to predict the relative funding share.
  • Implementation:
    • A LightGBM regressor is trained using 5‑fold cross‑validation.
    • The model’s hyperparameters (learning rate, number of leaves, max depth, etc.) are tuned to minimize MSE.
    • Out‑of‑fold predictions on training data and averaged predictions on test data are obtained.
  • Real‑World Use:
    Ensemble tree‑based methods like LightGBM are widely used for structured data in credit scoring, customer churn prediction, and other financial applications.

5. Meta‑Learning with Polynomial Expansion and Ridge Regression

  • Purpose:
    Capture nonlinear relationships in the base model’s output and improve the final prediction.
  • Implementation:
    • The base model’s predictions (which lie between 0 and 1) are transformed using a degree‑2 polynomial expansion (i.e. creating features xxx and x2x^2x2).
    • A Ridge regression meta‑learner is then trained on these features to learn the optimal combination.
    • Final predictions are clipped to [0, 1].
  • Real‑World Use:
    Stacking (or blending) models is a common approach to improve prediction accuracy in many real‑world ML competitions and risk modeling applications.

Github Repo :GitHub - AswinWebDev/Open-Source-Funding-Allocation-Predictor: A machine learning solution to predict the relative funding share between pairs of open-source projects using GitHub metrics and historical funding data. The solution utilizes advanced feature engineering and ensemble learning (LightGBM, XGBoost, CatBoost, and Ridge regression with meta-learning) to drive data‑driven funding allocation decisions.

My Solution for the Deep Funding Mini-Contest

Dataset:

1) From the raw data, obtained lists of all unique pairs (project_a + project_b) and unique individual repos;

2) All available metrics from GitHub API on unique repo (size, forks, stars, etc.) were requested.

  • calculated languages_score metric: received data on languages with a share of more than 5%; for each language the gpt model assigned weights to each language taking into account its value for Ethereum (backend / smart contracts are 2 times more valuable than frontend), weights were multiplied by shares, for several languages in the repo the final score was summarized.

3) metrics from OpenAI API (gpt gen models) were obtained, mostly for jury questions and my own ideas (nowadays everyone uses ChatGPT or analogs for work):

a) on individual repo’s (a few examples):

- grade_Eth - prompt: assess the value of this repository to the Ethereum ecosystem according to the following criteria: client distribution (consensus and execution), use data from clientdiversity.org if possible; impact on the Ethereum ecosystem given the presence or absence of this repository; problems the repository solves and opportunities it provides; criticality to users and overall value proposition; dependency on this component and consequences of its absence; time and effort savings for developers; cost of replacement or reproduction Use the ethereum/go-ethereum repo with a grade of 100 as a reference point.

- grade_code - prompt: grade the code of this repository, considering: code quality; code structure; code performance; code optimization; how reliable and stable it is; how important ongoing maintenance and updates are; the cost of possible failures or problems. Use the ethereum/go-ethereum repository with a relative grade of 100 as a reference point.

- grade_compl - prompt: rate the complexity of this repository, considering: how long it took to build; labor costs; financial costs; code size; complexity of development. Use the ethereum/go-ethereum repository with a relative grade of 100 as a reference point.

b) by repo pairs:

- weight_gpt - prompt: what fraction of 1 (in total) can we assign to each of them, given their importance to the Ethereum ecosystem?

4) Tried to get useful relationships between metrics; perform transformation and different approaches to make the data best fit the model.

Model:

On a simple dataset, had time to test the effectiveness of some of the available algorithms and models for linear regression: Keras, PyTorch Tabular, TabPFN, LightGBM, PyTorch.

a) Mini-Contest on Hugging Face (#6 with MSE 0.0123): the TabPFN pre-trained model performed well on a simple dataset. The more complex set required more computational resources and waiting time - did not get the result before the deadline.

b) Mini-Contest on Pond (#9 with Score 0.0492 of February 13): the LightGBM model showed the best result.

Some ideas: convert some of the metric pairs (e.g.: size_a, size_b) to weights to get closer to the output value format. Additionally, try to force scaling them and convert them to int type - this data type is the most suitable for the model.

In the end, the idea is to collect the largest and most versatile dataset possible and select the most suitable ones for the model being used. I have not had time to test all my ideas.

1 Like

Thanks to everyone who submitted an entry to the deep funding mini contest :pray:

We are impressed at receiving over 20 high quality submissions under the contest. We will now deliberate until the end of the month and announce results thereafter.

The committee members are @cerv1 ; @octopus ; Joel Miller; Vitalik Buterin; and @kronosapiens

Stay tuned!

4 Likes

Thanks to all participants for their patience while we digested all submissions and came up with final analysis.

I’m happy to note that we are ready to announce the winners and more importantly, give feedback on each and every submission in the contest!

Some general comments first;

we saw submissions that cover the different natural sources of knowledge for repos

  • the readme
  • the code
  • how other people interact with it on github
  • how other people interact with it on social media

long-term algorithms will try to use all four to be un-goodheartable

in the contest, @theofilus looked at external metrics while @niemerg scanned the readme. In the limit you can’t just rely on readme clues, because that’s like judging everyone by what suit they wear; you have to actually dig in to the code itself.

@AndreGasano tried to do LLM analysis of a bunch of important things that other projects are missing. @thefazi submission did that and looking at the one liners that their model produced, it feels like the summarizer focuses on “what the project does”, whereas gasano focused a bit on that but more on “quality of the project code”; both dimensions are important and ultimately it feels like we should be combining these approaches if we wanted to use these models for actual capital allocation.

@davidgasquez deserves special mention for influencing so many submissions building on his work

Enough with the preface, and on to the results!

drum roll

:1st_place_medal: @davidgasquez - $8000

Jury 1: Another strong submission, and the most performant in terms of MSE. Notably, David did not use LLMs for feature development, but rather took a more traditional approach to feature engineering using a variety of github data. He demonstrated a strong grasp of data leakage risks and approached his modeling in a way which minimized his exposure

Jury 2: In My Top 4

Very strong work. There is a clear explanation of the process used for model validation. The graphics provided offer insight into the underlying data. In addition, this user was able to point out relevant issues with data leakage that may affect the leaderboard results. (Appreciate gaming the system, then helping the system understand how it was gamed.)

Impressed with the second approach, which eliminates the data leakage issue. Also nice to see how the model is able to achieve significant performance without incorporating GitHub feature information.

Jury 3: Great work. The idea to mirror the dataset was clever. The model is well-justified and clear, and achieves a great MSE.

Jury 4: 3.5/4

Well-reasoned and a strong “inside the box” submission. I’d like to see winning participants go a bit farther outside the box, but this submission does a great job using the available data, performing feature testing, and avoiding overfitting.

Update: after reviewing all submissions, I’ve updated this from 3 to 3.5 given that it inspired many other models and is the top of the HuggingFace leaderboard.

:2nd_place_medal: @thefazi - $4000

jury 1: An interesting model which seemingly predicts funding based only on text summaries of the different projects. Reading through the code, I’m a bit confused about where the actual modeling was which maps summaries to funding levels, unless BERT is converting project summaries to numbers directly, but he is 12th place on HF so he’s doing something right. Very cool to get such good performance from descriptions alone

jury 2: Nice submission, applying Text analysis to information obtained from GitHub. Approach is relatively straightforward, and results are clearly explained in terms of model performance. Would be nice to have more understanding of why the different approaches perform as they did, or validation of their performance with intuitive expectations of how GitHub project logs might predict performance. Suspect there is a lurking variable somewhere.

Jury 3: A cool submission, although not deserving of a top spot. A MSE of 0.002 with just a few words of text is super interesting.

(I mean, it also makes me wonder if there are just some weird patterns in our training / testing data that we’re not understanding, but if so that’s not this user’s fault).

Jury 4: 3/4

Definitely one of the stronger and more novel submissions. I appreciate that they used an LLM to extend the feature set and then test three different models to see which one performed best. While not a production-ready solution, this is the right kind of experimentation mindset that we should be encouraging!

:3rd_place_medal: @niemerg , @omniacs.dao , @maxwshen , @AndreGasano - $2000 each

Allan Niemerg

Jury 1: One of my favorite submissions. I liked Allan’s approach of using ChatGPT for feature engineering and then using XGBoost for prediction – a really nice mix of LLM-augmented ML. I especially liked the way he worked through the problem in his notebook, exploring errors and iterating as he went. His overall project presentation was very clean and nicely organized. One limitation was his consideration of Github READMEs only.

Jury 2: In My Top 8

Cool project idea! The use of LLMs in this way feels innovative to me, essentially using the LLM to do work that a human auditor could do re: assessing qualitative aspects of a project. Nice explanation of the challenges encountered, and the transformations used to overcome them. The writeup also does an effective job of pointing out how this model could be combined with other models for an ensemble approach.

The inclusion of clear feature importances (e.g. community size and corporate backing as a predictor of funding) is very useful. Both the techniques and the data in this submission are impressive

Jury 3: Seems like a good project– I’m not a ML expert so I can’t speak to the specific techniques used, but I do like the idea of just using READMEs because it seems like it would generalize well

Jury 4: 2/4

Nice potential here, but I wish it went beyond just scanning the READMEs. This feels like it could be a useful module in a larger framework for evaluating code contributions.

@omniacs.dao

jury 1: Similar to @ladawinter submission, this submission took Github metrics and fed them as features into XGBoost. Where they got creative was through vector-embedding the project READMEs and using those as features in the model. On some level, this is similar to asking an LLM to score the README, as was done by Allan Niemierg and backdoor. This gives us some hints about how LLM-powered summarization compares to more conventional approaches like vector embeddings. Omniac’s inclusion of the Funding Map as way to visualize funding correlations was a welcome addition as well. A good submission!

jury 2:In My Top 4

Strong work that uses a variety of techniques to provide insight into the data, including the surprising observation that similar projects do not always receive similar outcomes.

I really like the cartography approach, which lends itself well to both visualization and explainability. In addition to specific insights, the interactive app allowing interested parties to further explore the data on their own feels like (a major win).

Jury 3: A pretty great submission, combining a well-performing model with additional insights and tools (via the map) that will help us with deepfunding in the long run.

The idea of using embeddings of readmes is cool, and something that can also be applied in the future

jury 4: 3/4

This is one of the strongest submissions in terms of exploratory data science. I don’t think it offers any new insight into how we might do better than human judgement, but feels like a good approximation of what humans likely do when scanning a large number of repos. The write-up is stronger than the underlying model

@maxwshen

jury 1: A cool submission. Max took the prompt to heart and extended David Gasquez’ model to produce intentionally de-correlated results, even at the expense of performance. His use of an error term which incorporated this measure of correlation was neat, and writing a whole blog post about his technique was a nice touch. Definitely deserving of a reward despite the weaker model performance.

jury 2:In My Top 8

Strong work. Interesting approach aligned with DeepFunding’s stated goal of developing a diverse set of complementary models. This leads to an innovative loss function that combines original loss, plus a term for weighting the correlation between the new model and a given model. The issues raised in the attached blog article give fundamental insights into design issues with the DeepLearning approach.

jury 3: Even though this model doesn’t perform well, I really appreciate Max taking the time to do something that will help us understand and improve on the deepfunding project in general.

The blog post is very detailed, I think it’s great that he took the time to write all that up.

I would put this in the upper half of projects.

jury 4: 3/4

Excellent work. I appreciate that he extended another submission, and brought more rigor to testing the model for feature strength and overfitting. This doesn’t quite hit the full mark for me because it doesn’t offer an improved score or incorporate any new agentic reasoning methods.

@AndreGasano

jury 1:This submission involved LLM-aided feature engineering, with some interesting perspectives, including measures of code quality and repo complexity. The inclusion of feature interactions and experimentation with multiple models led to competitive results on both leaderboards. Gets points for interesting LLM-aided feature engineering.

jury 2:Lots of interesting ideas. Would like to see a clearer focus on the most relevant outcomes.

Used programming language proportion (from GitHub repo) and AI-assisted analysis of codebase for feature engineering. Many ideas were explored, for both LLM-based scoring and different prediction models

jury 3:This user writes that the goal is to ”collect the largest and most versatile dataset possible”, which I see as being pretty useful to the overall project of deepfunding.

This submission doesn’t seem fully fleshed out, though

jury 4:3/4

One of the more novel submissions. Agree with other reviewers that this participant took things in an interesting direction feature-wise. I would have liked to see configurable weights and some sensitivity analysis to justify assumptions like “smart contracts are worth 2 times more than frontend features”. I appreciate the attempt to assess things like code complexity and quality.

1 Like

Hi! First, congratulations to all the winners! This was an exciting and valuable challenge, and I truly appreciate the effort that went into organizing it.

I wanted to better understand the selection criteria for the final prizes. The guidelines mentioned that rankings were based not only on leaderboard position but also on factors like transitive consistency (A < B < C), mathematical approach, and contributions to the community. However, looking at the final results, it seems that participants were chosen from a broad range of leaderboard positions in Pond, rather than strictly from the top ranks.

For instance, in the top 10, we had:

  1. davidgasquez – 0.0280

  2. Theofilus – 0.0375

  3. Diego Rivera Buendia – 0.0415

  4. Oleh RCL – 0.0425

  5. xyz453 – 0.0427

  6. Abdul040 – 0.0434

  7. Jake – 0.0441

  8. AndreGasano – 0.0468

  9. Leonardo Chan – 0.0483

  10. hojathpl – 0.0487

Some of these participants, despite being in the top 5 or top 10, were not recognized, while others further down the leaderboard were selected. Could you clarify how the different factors were weighted in the final selection?

Additionally, given the increasing use of LLM-based approaches, do you see these methods as scalable and aligned with the competition’s goals, or were alternative methodologies prioritized? Understanding this process better would be helpful—not just for myself but for other participants aiming to improve in future challenges.

Looking forward to your insights.

1 Like

In the second part of the judging process, i am sharing feedback on all the other participants that did not secure prizes.

Very roughly, we sought to reward original approaches more than particular submissions scoring well. So we tried to choose winners from among those who worked on scoring based on readme, code, etc of the repos.

Reason being, in the deep funding main contest we are implementing a unique scoring script by Vitalik that has a leaderboard based on how much each model is contributing to the composite model that then makes allocations. So original approaches contribute more to the composite model than simply having a low individual error.

A final word before getting into the individual feedback: if you are unhappy with the results, you have a chance to prove the jury wrong! In ~3-5 days, a contest is opening on pond asking you to predict the funding that projects will receive in GG23 BEFORE the round even begins. Submissions are open from close of applicatio0ns until the round begins (roughly 2-3 days).

You can deploy your model there to predict funding repos will get in the round. After the round closes, we will know which model best predicted the allocations. So prizes will be purely on the basis of leaderboard rankings!

without further ado, here we go

@cathie

jury 1: I liked how Cathie prepared two models to check the quality of the dataset and identify data leaks. With the first model, Cathie used a graphical algorithm to recursively develop scores based on the training weights, arguing that the decent error score on the test data showed that the datasets had high mutual information. The second model showed that funding amounts alone were sufficient to reconstruct relative weights. Neither model incorporated any additional information about the projects, nor did any parameter estimation, so likely not competitive. That said, I appreciate the contribution made to dataset forensics.

jury 2: Nice work. Nice to see two models, especially ones with high explainability. Appreciate the clear step-by-step descriptions and the publicly available link to work. Would like to see this combined with other approaches, using either traditional ML or AI.

jury 3: A cool project, but probably deserves to be lower in the pack.

I think that understanding the dataset better is an admirable goal, but it’s my understanding that future versions of this contest will use other types of data, so I’m worried that this contribution won’t generalize.

jury 4: Appreciate their “kicking the tires” on the data, but not a proper model. I would not expect this to perform well beyond the training data.

@backdoor

jury 1: Much closer to what I had expected submissions to look like for this context. Extensive use of AI agents via LangChain to scrape repository data and perform analyses, directly producing relative weights. I liked the incorporation of a “validator” agent to check the results of the sub-agents, as a type of chain-of-thought reasoning. A good example of the kind of model this contest was designed to cultivate.

jury 2: Intriguing design for AI agent architecture. Already produces differences in funding levels, comparing human allocation to agent-based allocation. Work feels incomplete, but would like to see further exploration of this interesting idea.

jury 3: The architecture seems reasonable, but the lack of a MSE/ adequate testing is worrying.

I’m not sure that I see the high divergence from human scores as a good thing, especially considering the limited testing. It’s my understanding that these models are supposed to fill in the gaps where humans can’t judge, not totally contradict human judgement.

jury 4: 2/4

Potentially strong submission, and still being worked on. Appreciate how they used the initial data as a starting point and used an agent to enrich it with new data from the repos directly.

Unfortunately, I don’t know how to analyze or replicate their work since they didn’t share MSE results and the repo is quite sprawling.

@ladawinter

jury 1: A decent submission. Not as much feature engineering as other submissions, and the data write-up was not as detailed. No use of LLMs for feature development either. Not bad, but doesn’t particularly stand out either. That said, performed well on the leaderboard.

jury 2: Good work. Thorough and informative writeup focused on the impact of feature selection (removing features with significant correlation). Appreciate that the GitHub code is included. Work does a good job with the focused question, but doesn’t feel innovative in terms of connecting to the overall DeepFunding goals. Feels as if there was only one experiment (the “too cold” to the “too hot” try-everything approach of some other users).

jury 3: I appreciated the attention paid to spearman correlation coefficients. Not a super crazy submission, but IMO it’s better than the median.

jury 4: 3/4

One of the strongest submissions doing straightforward feature modeling. This feels like a bread-and-butter hugging face entry. Well documented and easy to replicate.

@FelixPhilip

jury 1: This model was interesting – a Bradley-Terry model which focuses on learning an underlying project strength to predict the results of pairwise match-ups. This model fits very closely with the format of the training data, but is less reliant on additional data (i.e. Github data). The source code includes Github data, but the writeup does not mention using it at all. I also couldn’t quite figure out the purpose of the additional regularization being done with the graph structure, and found the writeup as a whole difficult to follow. Also, doing the whole thing in Java was definitely a move. All that said, this model scored very well – whether it will generalize to new data remains to be seen.

jury 2: Good work. Bradley-Terry framework is appropriate and appreciated. 8 like the framing as an optimization problem (offers a degree of explainability). The write-up does a great job explaining process, but doesn’t offer much insight into the model’s results or the underlying data (i.e. what do we now know that we didn’t before?).

jury 3: This writeup was pretty difficult to follow. I’m not sure if anything stands out to me about this project.

jury 4: 1/4

This feels like a Cursor-generated model. WTF is it in Java? Write-up feels like LLM-generated slop.

@Allen_Chu

jury 1: Another very reasonable model, incorporating LLM-based feature development and vanilla Github data, using XGBoost. Clear and concise writeup as well. Not amazing performance, though.

jury 2: In My Top 8

Very strong effort in my view. Nice mix of standard Machine Learning techniques, with both GitHub-based and LLM-based feature engineering.

Uses relevant GitHub information to engineer features. Nice use of LLM for assessing qualitative textual features such as text from README file. Multiple models attempted. Thorough and readable GitHub repository included.

jury 3: To me this feels like a middle-of-the-road submission. I’m not sure if anything in particular stands out about it.

jury 4: 2/4

Agree with other reviewers that this feels very middle-of-the pack. It does the standard feature engineering on repo stats and readme analysis. I am giving the same scoring that I’ve given to other undifferentiated models like this

@the_technocrat

jury 1: This submission made no attempt to score well, but rather sought to implement a deterministic algorithm to generate scores, based simply on the number of edges leading to a dependency (reflecting level of use) and frequency of releases. Interesting, and points for doing something different, but overall belongs towards the rear of the pack.

jury 2: In My Top 8

Elegant work with promising ideas. Would like to see it explored further.

Appreciate the principle-based approach. LOVE the inclusion of “ways to game the algorithm”; 8think we should be encouraging more of this cognitive security/self-red-teaming/science-rigor. The overall work is essentially prediction on one new engineered feature. It would be nice to see this feature incorporated into other machine learning models.

jury 3: I do appreciate the graph theoretic approach. Like the v-index discussed later, it does feel intuitively true that something along these lines belongs in the world of deepfunding.

But the ideas are also relatively underdeveloped and it feels like this user put in markedly less effort than others. So it’s hard for me to really support it.

I also don’t love that the user listed “resisting gameability” as a core principle and then presented an algorithm that can be gamed easily.

jury 4: 1/4

Too simplistic: no attempt was made to weigh the value of any given edge. This doesn’t really do anything beyond present information we already have in a different format.

@Jake

jury 1: A decent submission, building off of David Gasquez’ approach, and adding the “v-index” as a new feature. Jake also used LLMs to generate “embeddings” of various project components. His models did well in test, so it seems like a fine addition, not entirely original but making useful contributions.

jury 2: Good work overall.Nice application of the V-Index, which is appropriate. Would like to see this approach developed further, specifically as a feature in conjunction with other traditional Machine Learning or LLM approaches.

The explanation of the process is clear. One limitation is that the full data set is not used. The writeup could be a bit clearer, as the exact methodology for using time information doesn’t feel completely flushed out.

jury 3: Even though the writeup wasn’t clear, I have to hand it to Jake for using the idea of a v-index. It feels like such an obvious idea, but no one else used it AFAIK, so maybe it’s only obvious in hindsight.

I would like to see the idea more fleshed out, though.

jury 4: 2/4

Explores a few interesting elements, but the sum feels lesser than the individual parts. This feels like an attempt to test out of a few methods and see what’s best, rather than a cohesive effort to build a strong model.

@rohitmalekar

jury 1: Another Bradley-Terry model, incorporating funding levels and doing some clustering to help reduce transitivity. I personally don’t fully understand why transitivity was a problem – if you’re learning a per-project strength, then your predictions should always be transitive. I appreciated the additional analysis of the power-law distributions of public-goods-funding outcomes. A decent submission.

jury 2: In My Top 4

Very nice work. The overall approaches from traditional Data Science (Bradley-Terry, searching for power law distributions, clustering) are combined in an innovative way.

Appreciate the variety of approaches and representations (tables, interactive exploration utility, etc.) What is especially nice: the different approaches are coherently integrated to augment each other’s limitations, rather than being a scattershot approach.

jury 3: A strong submission. Even though Rohit’s MSE isn’t that great, he clearly has a very strong grasp of the appropriate places to look in the PGF world to figure the problem out (e.g. the references to funding distributions as tracked by OSO).

@theofilus

jury 1: This submission built on David Gasquez’ work, incorporating some additional Github metrics. He also used BART to summarize READMEs, using tf-idf to convert the summary into a feature. Some additional data processing, but nothing which jumped out at me as particularly interesting. Some suggestions for further use of Github metrics.

jury 2: In My Top 4

Strong work. There are a variety of techniques, both standard and novel, combined in innovative ways. 8 really like that this does research on recent architectures in the AI literature that are problem-relevant.

This submission uses some unique approaches to data preprocessing and feature acquisition. A straightforward graph like “size and stars have positive skew”, and pointoug how to addressed the issue with logarithmic transform, goes a long way in offering insight. Appreciate the use of different language transformers (like the BART model), open-source tools like MetricHunter, and research on recommendation systems, implemented in useful ways.

jury 3: As far as I can tell, this is a fairly strong submission with a well-thought-through methodology and a good MSE. More importantly, I give them huge props for referencing and consulting published academic research.

jury 4: 2/4

Given that this builds on David Gasquez’s original work, I am looking for incremental value add here. It scrapes data that had already been provided in the deepfunding data repo (issues, PRs, etc). I’m not quite sure what the rationale is behind why these additional features would improve the model; feels like over-fitting.

New post since only 10 users can be tagged to continue the feedback

@wunder

jury 1: While Wunder appears on the HF leaderboard, the writeup only describes an exploration of a novel github feature, quantifying the relationship between commits between two repos. The idea seems to be that if one repo has many commits soon after a dependency releases an update, that indicates a major dependency. There is some correlation, so it seems like a useful and relevant signal. Definitely worth something!

jury 2: Good work. The focus here is on developing a few features that take information about commit timings in different repos, and turns this information into a clear signal usable for DeepFunding’s purpose.

Appreciate the clear definitions, explanation of process, and graphs. There is definitely more worth exploring here. I do have a few questions as to suitability: whether this metric actually measures dependency, and also how gameable it is. Definitely more work to do, which should become more feasible as more data becomes available.

jury 3: Wunder didn’t exactly follow the instructions of the competition, but I don’t mind since they explored a novel idea that might end up being helpful to the overall project of deepfunding. Octopus is right to point out that the patterns Wunder looks at are very gameable. I guess this is a middle-of-the-road project for me.

jury 4: 2/4

Interesting concept, and I appreciate the clarity and reproducibility of the model. That said, this feels like a spurious correlation. For this to score higher, I’d like to see more rigour

@Diego

jury 1: Did not include a link to source code. The description seemed to focus mostly on feature engineering from Github data, which we like. Best model was a boosted tree, as per usual. Did well on the Pond contest, so that’s something. Solid, if somewhat uninventive, entry.

jury 2: Nice experimentation with different methods. Would like to see more insight into either the underlying data set, or the processes used to train (e.g. any idea why GradientBoosting is outperforming RidgeRegression?)

The writeup suggests that some interesting data acquisition and feature engineering was done (e.g. using information from GitIngest). Unfortunately, there’s no way to verify this (no public code provided), and the work did not result in strong measurable gains in MSE. The methods used are generally appropriate, but fairly basic and not specifically targeted to the Deep Funding problem space. The writeup is pretty bare-bones and doesn’t offer much verifiability (no public code link provided).

jury 3: I appreciate that this user tried a lot of different techniques, but besides that, I’m having trouble finding anything that really stands out about this project.

jury 4: 1/4

Not sure how to validate or learn more about their work. Feels like the standard “throw a lot of features in a model and see what performs best” approach with not much thinking behind which features and which model to use.

@Oleg8978

jury 1: Seemingly strong and thorough write-up, incorporating feature engineering, model optimization, and additional processing. However, no link to source code nor does the user seem to appear on either leaderboard. A bit strange, hard to know what to recommend.

jury 2: Good work overall. The model combines graph-based techniques with outlier detection for prediction.

The writeup is a bit light on details; unfortunately it uses “tell” rather than “show”. There isn’t much emphasis on how available features were used. Difficult to assess how interesting the work ultimately is, without any kind of verifiable public link.

jury 3: without source code, leaderboard info, or significant details, I’m not sure what we can learn from this project. Or, put another way, I’m not sure that this user’s post will do anything to strengthen the overall project of deepfunding

jury 4: 1/4

Can’t view source code. Submission write-up feels like LLM-generated slop. “This approach not only enhances predictive accuracy but also ensures the model remains mathematically coherent and adaptable to real-world funding dynamics.”

@abdul040

jury 1: Another classic approach, with engineered features being fed into a boosted model. Some interesting additions, such as the use of a stacked-ensemble method. Performed will on the Pond contest, although no links to code were included. Decent submission.

jury 2: Decent work with clear knowledge of several different data science techniques, combined for score optimization through feature engineering and optimized search. Would like to see a clearer problem statement beyond score optimization, as well as some insight into either the underlying data or why specific techniques performed better.

This approach almost treats score optimization as a completely automated search process, of just seeing which feature transformations and which predictors will give the best score. 8.feel: it lacks a connection to the underlying problem space and data.

jury 3: A good submission, this user definitely knows their stuff. I appreciate that the writeup was clear. I think I tend to appreciate the projects that stick to classic data science / ML techniques, rather than ones that strongly rely on querying LLMs. This was definitely an example of the former.

jury 4: 2/4

Agree with other reviewers. Feels like a straightforward ML submission without much thinking behind the “why these features” and “why this model”. Without seeing source code, it’s hard to know how much trial and error went into their submission

@D4ps

jury 1: This submission used LightGBM seemingly only on Github star counts, scoring around 20th place on the Pond contest. Not the most interesting submission or writeup.

jury 2: Solid idea based on data science fundamentals: LightGBM relying strongly on GitHub stars.

There are many issues for further exploration or potential improvement: Would like to better understand the approach used to optimize the model (to ensure it isn’t overfitting/score-chasing). Would be nice to have some insight about data (either underlying data or data on how the model was trained), beyond just which techniques led to better competition score. Additionally, the feature of GitHub stars is highly gameable, so it isn’t robust for this purpose in isolation.

jury 3: Only using github stars is a pretty serious limitation of this project.

AFAICT this architecture could be expanded to use other features, which is good, but as it stands I can’t say this project stands out all that much.

jury 4: 2/4

Recognizing that I have now viewed many prior entries that do little beyond feeding GitHub repo stats into a model, there’s nothing special about this particular one.

@DrunkunMonster

jury 1: Another example of LightGBM being applied on a set of features engineered from Github data. Incorporated many interactions, including polynomial expansion,to produce additional features. Did not see this user appear on either leaderboard.

jury 2: Decent effort with good data science fundamentals in terms of building an initial model. Doesn’t feel like there’s any deep insight or novel idea, though.

I also have concerns about the overall applicability of some of the techniques, such as using standard regression followed by clipping to get to [0,1]. There are other basic techniques for producing [0,1] output that would likely be better to try initially.

jury 3: Seems like a pretty good project, although I didn’t understand the “meta-learning” part.

jury 4: 2/4

Arguably one of the better ones in the group of “GitHub repo metrics fed into a model”. I was excited because I thought they were going to explore developer or other types of connections between projects, but in the end this feels like a slightly more robust version of other models using the same basic set of features.

1 Like

Thanks, jurors, for the submission feedback. will take the feedback seriously and rebuild it with focus

1 Like

Hello everyone,

An update on the competition prize distribution:

We’ve successfully sent prizes to the following participants:
@davidgasquez
@thefazi
@niemerg
@omniacs.dao

However, we’re still pending distribution for:
@maxwshen – awaiting your response to our DM
@AndreGasano – unable to send you a direct message

If you’re tagged above and haven’t received your prize yet, kindly reply to this thread with your email address so we can proceed with the distribution.

Thank you!

Hi, Pond! Hi, Theo! Thanks to the jury for highly appreciating my solution in the Deep Funding Mini Contest. Regards, Andrei (AndreGasano)

2 Likes

Hey @Pond, I responded with my email address (maxwshen at gmail) in DMs on April 24th (I was out of the country up until then) but I haven’t received a response yet. Is there anything else I can do at this time? Thanks!