philschmid commited on
Commit
faee5d2
·
unverified ·
1 Parent(s): 1adcf06
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
README.md CHANGED
@@ -1,14 +1,77 @@
1
- ---
2
- title: Pdf To Structured Data
3
- emoji: 🌍
4
- colorFrom: purple
5
- colorTo: gray
6
- sdk: docker
7
- header: mini
8
- app_port: 3000
9
- pinned: false
10
- license: apache-2.0
11
- short_description: PDF to Structured Data powered by Google DeepMind Gemini 2.0
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Image Creation & Editing with Next.js and Gemini 2.0 Flash
2
+
3
+ This project demonstrates how to create and edit images using Google's Gemini 2.0 Flash AI model in a Next.js web application. It allows users to generate images from text prompts or edit existing images through natural language instructions, maintaining conversation context for iterative refinements.
4
+
5
+ **How It Works:**
6
+
7
+ 1. **Create Images**: Generate images from text prompts using Gemini 2.0 Flash
8
+ 2. **Edit Images**: Upload an image and provide instructions to modify it
9
+ 3. **Conversation History**: Maintain context through a conversation with the AI for iterative refinements
10
+ 4. **Download Results**: Save your generated or edited images
11
+
12
+ ## Features
13
+
14
+ - 🎨 Text-to-image generation with Gemini 2.0 Flash
15
+ - 🖌️ Image editing through natural language instructions
16
+ - 💬 Conversation history for context-aware image refinements
17
+ - 📱 Responsive UI built with Next.js and shadcn/ui
18
+ - 🔄 Seamless workflow between creation and editing modes
19
+ - ⚡ Uses Gemini 2.0 Flash Javascript SDK
20
+
21
+ ## Getting Started
22
+
23
+ ### Local Development
24
+
25
+ First, set up your environment variables:
26
+
27
+ ```bash
28
+ cp .env.example .env
29
+ ```
30
+
31
+ Add your Google AI Studio API key to the `.env` file:
32
+
33
+ ```
34
+ GEMINI_API_KEY=your_google_api_key
35
+ ```
36
+
37
+ Then, install dependencies and run the development server:
38
+
39
+ ```bash
40
+ npm install
41
+ npm run dev
42
+ ```
43
+
44
+ Open [http://localhost:3000](http://localhost:3000) with your browser to see the application.
45
+
46
+ ### Docker Deployment
47
+
48
+ 1. Build the Docker image:
49
+
50
+ ```bash
51
+ docker build -t nextjs-gemini-image-editing .
52
+ ```
53
+
54
+ 2. Run the container with your Google API key:
55
+
56
+ ```bash
57
+ docker run -p 3000:3000 -e GEMINI_API_KEY=your_google_api_key nextjs-gemini-image-editing
58
+ ```
59
+
60
+ Or using an environment file:
61
+
62
+ ```bash
63
+ # Run container with env file
64
+ docker run -p 3000:3000 --env-file .env nextjs-gemini-image-editing
65
+ ```
66
+
67
+ Open [http://localhost:3000](http://localhost:3000) with your browser to see the application.
68
+
69
+ ## Technologies Used
70
+
71
+ - [Next.js](https://nextjs.org/) - React framework for the web application
72
+ - [Google Gemini 2.0 Flash](https://deepmind.google/technologies/gemini/) - AI model for image generation and editing
73
+ - [shadcn/ui](https://ui.shadcn.com/) - Re-usable components built using Radix UI and Tailwind CSS
74
+
75
+ ## License
76
+
77
+ This project is licensed under the Apache License 2.0 - see the [LICENSE](./LICENSE) file for details.
app/api/extract/route.ts DELETED
@@ -1,51 +0,0 @@
1
- import { NextResponse } from "next/server";
2
- import { GoogleGenerativeAI } from "@google/generative-ai";
3
-
4
- const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
5
- const MODEL_ID = "gemini-2.0-flash";
6
-
7
- export async function POST(request: Request) {
8
- try {
9
- const formData = await request.formData();
10
- const file = formData.get("file") as File;
11
- const schema = JSON.parse(formData.get("schema") as string);
12
-
13
- // Convert PDF to base64
14
- const buffer = await file.arrayBuffer();
15
- const base64 = Buffer.from(buffer).toString("base64");
16
-
17
- const model = genAI.getGenerativeModel({
18
- model: MODEL_ID,
19
- generationConfig: {
20
- responseMimeType: "application/json",
21
- responseSchema: schema,
22
- },
23
- });
24
-
25
- const prompt = "Extract the structured data from the following PDF file";
26
-
27
- const result = await model.generateContent([
28
- prompt,
29
- {
30
- inlineData: {
31
- mimeType: "application/pdf",
32
- data: base64,
33
- },
34
- },
35
- ]);
36
-
37
- const response = await result.response;
38
- const extractedData = JSON.parse(response.text());
39
-
40
- return NextResponse.json(extractedData);
41
- } catch (error) {
42
- console.error("Error extracting data:", error);
43
- return NextResponse.json(
44
- {
45
- error:
46
- "Failed to extract data, open a thread in discussions, could be be a rate limit issue.s",
47
- },
48
- { status: 500 }
49
- );
50
- }
51
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/api/image/route.ts ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { NextRequest, NextResponse } from "next/server";
2
+ import { GoogleGenerativeAI } from "@google/generative-ai";
3
+ import { HistoryItem, HistoryPart } from "@/lib/types";
4
+
5
+ // Initialize the Google Gen AI client with your API key
6
+ const GEMINI_API_KEY = process.env.GEMINI_API_KEY || "";
7
+ const genAI = new GoogleGenerativeAI(GEMINI_API_KEY);
8
+
9
+ // Define the model ID for Gemini 2.0 Flash experimental
10
+ const MODEL_ID = "gemini-2.0-flash-exp";
11
+
12
+ // Define interface for the formatted history item
13
+ interface FormattedHistoryItem {
14
+ role: "user" | "model";
15
+ parts: Array<{
16
+ text?: string;
17
+ inlineData?: { data: string; mimeType: string };
18
+ }>;
19
+ }
20
+
21
+ export async function POST(req: NextRequest) {
22
+ try {
23
+ // Parse JSON request instead of FormData
24
+ const requestData = await req.json();
25
+ const { prompt, image: inputImage, history } = requestData;
26
+
27
+ if (!prompt) {
28
+ return NextResponse.json(
29
+ { error: "Prompt is required" },
30
+ { status: 400 }
31
+ );
32
+ }
33
+
34
+ // Get the model with the correct configuration
35
+ const model = genAI.getGenerativeModel({
36
+ model: MODEL_ID,
37
+ generationConfig: {
38
+ temperature: 1,
39
+ topP: 0.95,
40
+ topK: 40,
41
+ // @ts-expect-error - Gemini API JS is missing this type
42
+ responseModalities: ["Text", "Image"],
43
+ },
44
+ });
45
+
46
+ let result;
47
+
48
+ try {
49
+ // Convert history to the format expected by Gemini API
50
+ const formattedHistory =
51
+ history && history.length > 0
52
+ ? history
53
+ .map((item: HistoryItem) => {
54
+ return {
55
+ role: item.role,
56
+ parts: item.parts
57
+ .map((part: HistoryPart) => {
58
+ if (part.text) {
59
+ return { text: part.text };
60
+ }
61
+ if (part.image && item.role === "user") {
62
+ const imgParts = part.image.split(",");
63
+ if (imgParts.length > 1) {
64
+ return {
65
+ inlineData: {
66
+ data: imgParts[1],
67
+ mimeType: part.image.includes("image/png")
68
+ ? "image/png"
69
+ : "image/jpeg",
70
+ },
71
+ };
72
+ }
73
+ }
74
+ return { text: "" };
75
+ })
76
+ .filter((part) => Object.keys(part).length > 0), // Remove empty parts
77
+ };
78
+ })
79
+ .filter((item: FormattedHistoryItem) => item.parts.length > 0) // Remove items with no parts
80
+ : [];
81
+
82
+ // Create a chat session with the formatted history
83
+ const chat = model.startChat({
84
+ history: formattedHistory,
85
+ });
86
+
87
+ // Prepare the current message parts
88
+ const messageParts = [];
89
+
90
+ // Add the text prompt
91
+ messageParts.push({ text: prompt });
92
+
93
+ // Add the image if provided
94
+ if (inputImage) {
95
+ // For image editing
96
+ console.log("Processing image edit request");
97
+
98
+ // Check if the image is a valid data URL
99
+ if (!inputImage.startsWith("data:")) {
100
+ throw new Error("Invalid image data URL format");
101
+ }
102
+
103
+ const imageParts = inputImage.split(",");
104
+ if (imageParts.length < 2) {
105
+ throw new Error("Invalid image data URL format");
106
+ }
107
+
108
+ const base64Image = imageParts[1];
109
+ const mimeType = inputImage.includes("image/png")
110
+ ? "image/png"
111
+ : "image/jpeg";
112
+ console.log(
113
+ "Base64 image length:",
114
+ base64Image.length,
115
+ "MIME type:",
116
+ mimeType
117
+ );
118
+
119
+ // Add the image to message parts
120
+ messageParts.push({
121
+ inlineData: {
122
+ data: base64Image,
123
+ mimeType: mimeType,
124
+ },
125
+ });
126
+ }
127
+
128
+ // Send the message to the chat
129
+ console.log("Sending message with", messageParts.length, "parts");
130
+ result = await chat.sendMessage(messageParts);
131
+ } catch (error) {
132
+ console.error("Error in chat.sendMessage:", error);
133
+ throw error;
134
+ }
135
+
136
+ const response = result.response;
137
+
138
+ let textResponse = null;
139
+ let imageData = null;
140
+ let mimeType = "image/png";
141
+
142
+ // Process the response
143
+ if (response.candidates && response.candidates.length > 0) {
144
+ const parts = response.candidates[0].content.parts;
145
+ console.log("Number of parts in response:", parts.length);
146
+
147
+ for (const part of parts) {
148
+ if ("inlineData" in part && part.inlineData) {
149
+ // Get the image data
150
+ imageData = part.inlineData.data;
151
+ mimeType = part.inlineData.mimeType || "image/png";
152
+ console.log(
153
+ "Image data received, length:",
154
+ imageData.length,
155
+ "MIME type:",
156
+ mimeType
157
+ );
158
+ } else if ("text" in part && part.text) {
159
+ // Store the text
160
+ textResponse = part.text;
161
+ console.log(
162
+ "Text response received:",
163
+ textResponse.substring(0, 50) + "..."
164
+ );
165
+ }
166
+ }
167
+ }
168
+
169
+ // Return just the base64 image and description as JSON
170
+ return NextResponse.json({
171
+ image: imageData ? `data:${mimeType};base64,${imageData}` : null,
172
+ description: textResponse,
173
+ });
174
+ } catch (error) {
175
+ console.error("Error generating image:", error);
176
+ return NextResponse.json(
177
+ {
178
+ error: "Failed to generate image",
179
+ details: error instanceof Error ? error.message : String(error),
180
+ },
181
+ { status: 500 }
182
+ );
183
+ }
184
+ }
app/api/schema/route.ts DELETED
@@ -1,147 +0,0 @@
1
- import { NextResponse } from "next/server";
2
- import { GoogleGenerativeAI } from "@google/generative-ai";
3
-
4
- const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
5
- const MODEL_ID = "gemini-2.0-flash";
6
-
7
- const META_PROMPT = `
8
- You are a JSON Schema expert. Your task is to create JSON schema baed on the user input. The schema will be used for extra data.
9
-
10
- You must also make sure:
11
- - All fields in an object are set as required
12
- - All objects must have properties defined
13
- - Order matters! If the values are dependent or would require additional information, make sure to include the additional information in the description. Same counts for "reasoning" or "thinking" should come before the conclusion.
14
- - $defs must be defined under the schema param
15
- - Return only the schema JSON not more, use \`\`\`json to start and \`\`\` to end the JSON schema
16
-
17
- Restrictions:
18
- - You cannot use examples, if you think examples are helpful include them in the description.
19
- - You cannot use default values, If you think default are helpful include them in the description.
20
- - Top level cannot have a "title" property only "description"
21
- - You cannot use $defs, directly in the schema, don't use any $defs and $ref in the schema. Directly define the schema in the properties.
22
- - Never include a $schema
23
- - The "type" needs to be a single value, no arrays
24
-
25
- Guidelines:
26
- - If the user prompt is short define a single object schema and fields based on your knowledge.
27
- - If the user prompt is in detail about the data only use the data in the schema. Don't add more fields than the user asked for.
28
-
29
- Examples:
30
-
31
- Input: Cookie Recipes
32
- Output: \`\`\`json
33
- {
34
- "description": "Schema for a cookie recipe, including ingredients and quantities. The 'ingredients' array lists each ingredient along with its corresponding quantity and unit of measurement. The 'instructions' array provides a step-by-step guide to preparing the cookies. The order of instructions is important.",
35
- "type": "object",
36
- "properties": {
37
- "name": {
38
- "type": "string",
39
- "description": "The name of the cookie recipe."
40
- },
41
- "description": {
42
- "type": "string",
43
- "description": "A short description of the cookie, including taste and textures."
44
- },
45
- "ingredients": {
46
- "type": "array",
47
- "description": "A list of ingredients required for the recipe.",
48
- "items": {
49
- "type": "object",
50
- "description": "An ingredient with its quantity and unit.",
51
- "properties": {
52
- "name": {
53
- "type": "string",
54
- "description": "The name of the ingredient (e.g., flour, sugar, butter)."
55
- },
56
- "quantity": {
57
- "type": "number",
58
- "description": "The amount of the ingredient needed."
59
- },
60
- "unit": {
61
- "type": "string",
62
- "description": "The unit of measurement for the ingredient (e.g., cups, grams, teaspoons). Use abbreviations like 'tsp' for teaspoon and 'tbsp' for tablespoon."
63
- }
64
- },
65
- "required": [
66
- "name",
67
- "quantity",
68
- "unit"
69
- ]
70
- }
71
- },
72
- "instructions": {
73
- "type": "array",
74
- "description": "A sequence of steps to prepare the cookie recipe. The order of instructions matters.",
75
- "items": {
76
- "type": "string",
77
- "description": "A single instruction step."
78
- }
79
- }
80
- },
81
- "required": [
82
- "name",
83
- "description",
84
- "ingredients",
85
- "instructions"
86
- ]
87
- }
88
- \`\`\`
89
-
90
- Input: Book with title, author, and publication year.
91
- Output: \`\`\`json
92
- {
93
- "type": "object",
94
- "properties": {
95
- "title": {
96
- "type": "string",
97
- "description": "The title of the book."
98
- },
99
- "author": {
100
- "type": "string",
101
- "description": "The author of the book."
102
- },
103
- "publicationYear": {
104
- "type": "integer",
105
- "description": "The year the book was published."
106
- }
107
- },
108
- "required": [
109
- "title",
110
- "author",
111
- "publicationYear"
112
- ],
113
- }
114
- \`\`\`
115
-
116
- Input: {USER_PROMPT}`.trim();
117
-
118
- export async function POST(request: Request) {
119
- try {
120
- // Get the prompt from the request body
121
- const { prompt } = await request.json();
122
- // Get the model
123
- const model = genAI.getGenerativeModel({ model: MODEL_ID });
124
- // Generate the content
125
- const result = await model.generateContent(
126
- META_PROMPT.replace("{USER_PROMPT}", prompt)
127
- );
128
- // Get the response
129
- const response = await result.response;
130
- // Remove markdown code block markers if present
131
- const jsonString = response
132
- .text()
133
- .replace(/^```json\n?/, "")
134
- .replace(/\n?```$/, "");
135
- // Return the schema
136
- return NextResponse.json({ schema: JSON.parse(jsonString) });
137
- } catch (error) {
138
- console.error("Error generating schema:", error);
139
- return NextResponse.json(
140
- {
141
- error:
142
- "Failed to generate schema, open a thread in discussions, could be be a rate limit issue.",
143
- },
144
- { status: 500 }
145
- );
146
- }
147
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/globals.css CHANGED
@@ -80,4 +80,5 @@ body {
80
 
81
  h1,h2,h3,h4,h5,h6 {
82
  @apply text-foreground dark:text-foreground;
83
- }
 
 
80
 
81
  h1,h2,h3,h4,h5,h6 {
82
  @apply text-foreground dark:text-foreground;
83
+ }
84
+
app/layout.tsx CHANGED
@@ -11,8 +11,8 @@ const openSans = Open_Sans({
11
  });
12
 
13
  export const metadata: Metadata = {
14
- title: "PDF Extractor",
15
- description: "Extract data from PDFs using Google DeepMind Gemini 2.0",
16
  };
17
 
18
  export const viewport: Viewport = {
 
11
  });
12
 
13
  export const metadata: Metadata = {
14
+ title: "Image Editor",
15
+ description: "Edit images using Google DeepMind Gemini 2.0",
16
  };
17
 
18
  export const viewport: Viewport = {
app/page.tsx CHANGED
@@ -1,49 +1,84 @@
1
  "use client";
2
  import { useState } from "react";
3
- import { FileUpload } from "@/components/FileUpload";
4
- import { PromptInput } from "@/components/PromptInput";
5
- import { ResultDisplay } from "@/components/ResultDisplay";
6
- import { FileIcon, FileText } from "lucide-react";
7
  import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
 
8
 
9
  export default function Home() {
10
- const [schema, setSchema] = useState<string | null>(null);
11
- const [file, setFile] = useState<File | null>(null);
12
- const [result, setResult] = useState<string | null>(null);
13
  const [loading, setLoading] = useState(false);
 
 
14
 
15
- const handleFileSelect = (selectedFile: File) => {
16
- setFile(selectedFile);
17
  };
18
 
19
  const handlePromptSubmit = async (prompt: string) => {
20
  try {
21
  setLoading(true);
22
- // First, get the JSON schema
23
- const schemaResponse = await fetch("/api/schema", {
 
 
 
 
 
 
 
 
 
 
 
24
  method: "POST",
25
  headers: {
26
  "Content-Type": "application/json",
27
  },
28
- body: JSON.stringify({ prompt }),
29
  });
30
 
31
- const { schema } = await schemaResponse.json();
 
 
 
32
 
33
- setSchema(schema);
34
- // Then, process the PDF with the schema
35
- const formData = new FormData();
36
- formData.append("file", file!);
37
- formData.append("schema", JSON.stringify(schema));
38
 
39
- const extractResponse = await fetch("/api/extract", {
40
- method: "POST",
41
- body: formData,
42
- });
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
- const data = await extractResponse.json();
45
- setResult(data);
 
 
 
46
  } catch (error) {
 
47
  console.error("Error processing request:", error);
48
  } finally {
49
  setLoading(false);
@@ -51,46 +86,76 @@ export default function Home() {
51
  };
52
 
53
  const handleReset = () => {
54
- setFile(null);
55
- setResult(null);
56
- setSchema(null);
57
  setLoading(false);
 
 
58
  };
59
 
 
 
 
 
 
 
 
60
  return (
61
  <main className="min-h-screen flex items-center justify-center bg-background p-8">
62
- <Card className="w-full max-w-2xl border-0 bg-card shadow-none">
63
  <CardHeader className="flex flex-col items-center justify-center space-y-2">
64
  <CardTitle className="flex items-center gap-2 text-foreground">
65
- <FileText className="w-8 h-8 text-primary" />
66
- PDF to Structured Data
67
  </CardTitle>
68
  <span className="text-sm font-mono text-muted-foreground">
69
  powered by Google DeepMind Gemini 2.0 Flash
70
  </span>
71
  </CardHeader>
72
  <CardContent className="space-y-6 pt-6 w-full">
73
- {!result && !loading ? (
 
 
 
 
 
 
74
  <>
75
- <FileUpload onFileSelect={handleFileSelect} />
76
- <PromptInput onSubmit={handlePromptSubmit} file={file} />
 
 
 
 
 
 
 
77
  </>
78
  ) : loading ? (
79
  <div
80
  role="status"
81
  className="flex items-center mx-auto justify-center h-56 max-w-sm bg-gray-300 rounded-lg animate-pulse dark:bg-secondary"
82
  >
83
- <FileIcon className="w-10 h-10 text-gray-200 dark:text-muted-foreground" />
84
  <span className="pl-4 font-mono font-xs text-muted-foreground">
85
  Processing...
86
  </span>
87
  </div>
88
  ) : (
89
- <ResultDisplay
90
- result={result || ""}
91
- schema={schema || ""}
92
- onReset={handleReset}
93
- />
 
 
 
 
 
 
 
 
94
  )}
95
  </CardContent>
96
  </Card>
 
1
  "use client";
2
  import { useState } from "react";
3
+ import { ImageUpload } from "@/components/ImageUpload";
4
+ import { ImagePromptInput } from "@/components/ImagePromptInput";
5
+ import { ImageResultDisplay } from "@/components/ImageResultDisplay";
6
+ import { ImageIcon, Wand2 } from "lucide-react";
7
  import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
8
+ import { HistoryItem } from "@/lib/types";
9
 
10
  export default function Home() {
11
+ const [image, setImage] = useState<string | null>(null);
12
+ const [generatedImage, setGeneratedImage] = useState<string | null>(null);
13
+ const [description, setDescription] = useState<string | null>(null);
14
  const [loading, setLoading] = useState(false);
15
+ const [error, setError] = useState<string | null>(null);
16
+ const [history, setHistory] = useState<HistoryItem[]>([]);
17
 
18
+ const handleImageSelect = (imageData: string) => {
19
+ setImage(imageData || null);
20
  };
21
 
22
  const handlePromptSubmit = async (prompt: string) => {
23
  try {
24
  setLoading(true);
25
+ setError(null);
26
+
27
+ // If we have a generated image, use that for editing, otherwise use the uploaded image
28
+ const imageToEdit = generatedImage || image;
29
+
30
+ // Prepare the request data as JSON
31
+ const requestData = {
32
+ prompt,
33
+ image: imageToEdit,
34
+ history: history.length > 0 ? history : undefined,
35
+ };
36
+
37
+ const response = await fetch("/api/image", {
38
  method: "POST",
39
  headers: {
40
  "Content-Type": "application/json",
41
  },
42
+ body: JSON.stringify(requestData),
43
  });
44
 
45
+ if (!response.ok) {
46
+ const errorData = await response.json();
47
+ throw new Error(errorData.error || "Failed to generate image");
48
+ }
49
 
50
+ const data = await response.json();
 
 
 
 
51
 
52
+ if (data.image) {
53
+ // Update the generated image and description
54
+ setGeneratedImage(data.image);
55
+ setDescription(data.description || null);
56
+
57
+ // Update history locally - add user message
58
+ const userMessage: HistoryItem = {
59
+ role: "user",
60
+ parts: [
61
+ { text: prompt },
62
+ ...(imageToEdit ? [{ image: imageToEdit }] : []),
63
+ ],
64
+ };
65
+
66
+ // Add AI response
67
+ const aiResponse: HistoryItem = {
68
+ role: "model",
69
+ parts: [
70
+ ...(data.description ? [{ text: data.description }] : []),
71
+ ...(data.image ? [{ image: data.image }] : []),
72
+ ],
73
+ };
74
 
75
+ // Update history with both messages
76
+ setHistory((prevHistory) => [...prevHistory, userMessage, aiResponse]);
77
+ } else {
78
+ setError("No image returned from API");
79
+ }
80
  } catch (error) {
81
+ setError(error instanceof Error ? error.message : "An error occurred");
82
  console.error("Error processing request:", error);
83
  } finally {
84
  setLoading(false);
 
86
  };
87
 
88
  const handleReset = () => {
89
+ setImage(null);
90
+ setGeneratedImage(null);
91
+ setDescription(null);
92
  setLoading(false);
93
+ setError(null);
94
+ setHistory([]);
95
  };
96
 
97
+ // If we have a generated image, we want to edit it next time
98
+ const currentImage = generatedImage || image;
99
+ const isEditing = !!currentImage;
100
+
101
+ // Get the latest image to display (always the generated image)
102
+ const displayImage = generatedImage;
103
+
104
  return (
105
  <main className="min-h-screen flex items-center justify-center bg-background p-8">
106
+ <Card className="w-full max-w-4xl border-0 bg-card shadow-none">
107
  <CardHeader className="flex flex-col items-center justify-center space-y-2">
108
  <CardTitle className="flex items-center gap-2 text-foreground">
109
+ <Wand2 className="w-8 h-8 text-primary" />
110
+ Image Creation & Editing
111
  </CardTitle>
112
  <span className="text-sm font-mono text-muted-foreground">
113
  powered by Google DeepMind Gemini 2.0 Flash
114
  </span>
115
  </CardHeader>
116
  <CardContent className="space-y-6 pt-6 w-full">
117
+ {error && (
118
+ <div className="p-4 mb-4 text-sm text-red-700 bg-red-100 rounded-lg">
119
+ {error}
120
+ </div>
121
+ )}
122
+
123
+ {!displayImage && !loading ? (
124
  <>
125
+ <ImageUpload
126
+ onImageSelect={handleImageSelect}
127
+ currentImage={currentImage}
128
+ />
129
+ <ImagePromptInput
130
+ onSubmit={handlePromptSubmit}
131
+ isEditing={isEditing}
132
+ isLoading={loading}
133
+ />
134
  </>
135
  ) : loading ? (
136
  <div
137
  role="status"
138
  className="flex items-center mx-auto justify-center h-56 max-w-sm bg-gray-300 rounded-lg animate-pulse dark:bg-secondary"
139
  >
140
+ <ImageIcon className="w-10 h-10 text-gray-200 dark:text-muted-foreground" />
141
  <span className="pl-4 font-mono font-xs text-muted-foreground">
142
  Processing...
143
  </span>
144
  </div>
145
  ) : (
146
+ <>
147
+ <ImageResultDisplay
148
+ imageUrl={displayImage || ""}
149
+ description={description}
150
+ onReset={handleReset}
151
+ conversationHistory={history}
152
+ />
153
+ <ImagePromptInput
154
+ onSubmit={handlePromptSubmit}
155
+ isEditing={true}
156
+ isLoading={loading}
157
+ />
158
+ </>
159
  )}
160
  </CardContent>
161
  </Card>
components/{PromptInput.tsx → ImagePromptInput.tsx} RENAMED
@@ -3,19 +3,25 @@
3
  import { useState } from "react";
4
  import { Button } from "@/components/ui/button";
5
  import { Wand2 } from "lucide-react";
6
- import { Textarea } from "@/components/ui/textarea";
7
- interface PromptInputProps {
 
8
  onSubmit: (prompt: string) => void;
9
- file: File | null;
 
10
  }
11
 
12
- export function PromptInput({ onSubmit, file }: PromptInputProps) {
 
 
 
 
13
  const [prompt, setPrompt] = useState("");
14
 
15
- const handleSubmit = (e: React.FormEvent) => {
16
- e.preventDefault();
17
  if (prompt.trim()) {
18
  onSubmit(prompt.trim());
 
19
  }
20
  };
21
 
@@ -23,26 +29,31 @@ export function PromptInput({ onSubmit, file }: PromptInputProps) {
23
  <form onSubmit={handleSubmit} className="space-y-4 rounded-lg">
24
  <div className="space-y-2">
25
  <p className="text-sm font-medium text-foreground">
26
- Describe the structure and type of data you want to extract from the
27
- PDF.
 
28
  </p>
29
  </div>
30
 
31
- <Textarea
32
  id="prompt"
33
- className="min-h-[100px] border-secondary resize-none "
34
- placeholder="Example: Extract all invoice details including invoice number, date, items, prices, and total amount..."
 
 
 
 
35
  value={prompt}
36
  onChange={(e) => setPrompt(e.target.value)}
37
  />
38
 
39
  <Button
40
  type="submit"
41
- disabled={!prompt.trim() || file === null}
42
  className="w-full bg-primary hover:bg-primary/90"
43
  >
44
  <Wand2 className="w-4 h-4 mr-2" />
45
- Extract Data
46
  </Button>
47
  </form>
48
  );
 
3
  import { useState } from "react";
4
  import { Button } from "@/components/ui/button";
5
  import { Wand2 } from "lucide-react";
6
+ import { Input } from "./ui/input";
7
+
8
+ interface ImagePromptInputProps {
9
  onSubmit: (prompt: string) => void;
10
+ isEditing: boolean;
11
+ isLoading: boolean;
12
  }
13
 
14
+ export function ImagePromptInput({
15
+ onSubmit,
16
+ isEditing,
17
+ isLoading,
18
+ }: ImagePromptInputProps) {
19
  const [prompt, setPrompt] = useState("");
20
 
21
+ const handleSubmit = () => {
 
22
  if (prompt.trim()) {
23
  onSubmit(prompt.trim());
24
+ setPrompt("");
25
  }
26
  };
27
 
 
29
  <form onSubmit={handleSubmit} className="space-y-4 rounded-lg">
30
  <div className="space-y-2">
31
  <p className="text-sm font-medium text-foreground">
32
+ {isEditing
33
+ ? "Describe how you want to edit the image"
34
+ : "Describe the image you want to generate"}
35
  </p>
36
  </div>
37
 
38
+ <Input
39
  id="prompt"
40
+ className="border-secondary resize-none"
41
+ placeholder={
42
+ isEditing
43
+ ? "Example: Make the background blue and add a rainbow..."
44
+ : "Example: A 3D rendered image of a pig with wings and a top hat flying over a futuristic city..."
45
+ }
46
  value={prompt}
47
  onChange={(e) => setPrompt(e.target.value)}
48
  />
49
 
50
  <Button
51
  type="submit"
52
+ disabled={!prompt.trim() || isLoading}
53
  className="w-full bg-primary hover:bg-primary/90"
54
  >
55
  <Wand2 className="w-4 h-4 mr-2" />
56
+ {isEditing ? "Edit Image" : "Generate Image"}
57
  </Button>
58
  </form>
59
  );
components/ImageResultDisplay.tsx ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ "use client";
2
+
3
+ import { Button } from "@/components/ui/button";
4
+ import { Download, RotateCcw, MessageCircle } from "lucide-react";
5
+ import { useState } from "react";
6
+ import { HistoryItem, HistoryPart } from "@/lib/types";
7
+ import Image from "next/image";
8
+
9
+ interface ImageResultDisplayProps {
10
+ imageUrl: string;
11
+ description: string | null;
12
+ onReset: () => void;
13
+ conversationHistory?: HistoryItem[];
14
+ }
15
+
16
+ export function ImageResultDisplay({
17
+ imageUrl,
18
+ description,
19
+ onReset,
20
+ conversationHistory = [],
21
+ }: ImageResultDisplayProps) {
22
+ const [showHistory, setShowHistory] = useState(false);
23
+
24
+ const handleDownload = () => {
25
+ // Create a temporary link element
26
+ const link = document.createElement("a");
27
+ link.href = imageUrl;
28
+ link.download = `gemini-image-${Date.now()}.png`;
29
+ document.body.appendChild(link);
30
+ link.click();
31
+ document.body.removeChild(link);
32
+ };
33
+
34
+ const toggleHistory = () => {
35
+ setShowHistory(!showHistory);
36
+ };
37
+
38
+ return (
39
+ <div className="space-y-4">
40
+ <div className="flex items-center justify-between">
41
+ <h2 className="text-xl font-semibold">Generated Image</h2>
42
+ <div className="space-x-2">
43
+ <Button variant="outline" size="sm" onClick={handleDownload}>
44
+ <Download className="w-4 h-4 mr-2" />
45
+ Download
46
+ </Button>
47
+ {conversationHistory.length > 0 && (
48
+ <Button variant="outline" size="sm" onClick={toggleHistory}>
49
+ <MessageCircle className="w-4 h-4 mr-2" />
50
+ {showHistory ? "Hide History" : "Show History"}
51
+ </Button>
52
+ )}
53
+ <Button variant="outline" size="sm" onClick={onReset}>
54
+ <RotateCcw className="w-4 h-4 mr-2" />
55
+ Create New Image
56
+ </Button>
57
+ </div>
58
+ </div>
59
+
60
+ <div className="rounded-lg overflow-hidden bg-muted p-2">
61
+ <Image
62
+ src={imageUrl}
63
+ alt="Generated"
64
+ className="max-w-[640px] h-auto mx-auto"
65
+ />
66
+ </div>
67
+
68
+ {description && (
69
+ <div className="p-4 rounded-lg bg-muted">
70
+ <h3 className="text-sm font-medium mb-2">Description</h3>
71
+ <p className="text-sm text-muted-foreground">{description}</p>
72
+ </div>
73
+ )}
74
+
75
+ {showHistory && conversationHistory.length > 0 && (
76
+ <div className="p-4 rounded-lg">
77
+ <h3 className="text-sm font-medium mb-4">Conversation History</h3>
78
+ <div className="space-y-4">
79
+ {conversationHistory.map((item, index) => (
80
+ <div key={index} className={`p-3 rounded-lg bg-secondary`}>
81
+ <p
82
+ className={`text-sm font-medium mb-2 ${
83
+ item.role === "user" ? "text-foreground" : "text-primary"
84
+ }`}
85
+ >
86
+ {item.role === "user" ? "You" : "Gemini"}
87
+ </p>
88
+ <div className="space-y-2">
89
+ {item.parts.map((part: HistoryPart, partIndex) => (
90
+ <div key={partIndex}>
91
+ {part.text && <p className="text-sm">{part.text}</p>}
92
+ {part.image && (
93
+ <div className="mt-2 overflow-hidden rounded-md">
94
+ <Image
95
+ src={part.image}
96
+ alt={`${item.role} image`}
97
+ className="max-w-64 h-auto object-contain"
98
+ />
99
+ </div>
100
+ )}
101
+ </div>
102
+ ))}
103
+ </div>
104
+ </div>
105
+ ))}
106
+ </div>
107
+ </div>
108
+ )}
109
+ </div>
110
+ );
111
+ }
components/{FileUpload.tsx → ImageUpload.tsx} RENAMED
@@ -1,13 +1,13 @@
1
  "use client";
2
 
3
- import { useCallback, useState } from "react";
4
  import { useDropzone } from "react-dropzone";
5
  import { Button } from "./ui/button";
6
- import { Upload as UploadIcon, File as FileIcon, X } from "lucide-react";
7
- import PdfViewer from "./PdfViewer";
8
-
9
- interface FileUploadProps {
10
- onFileSelect: (file: File) => void;
11
  }
12
 
13
  export function formatFileSize(bytes: number): string {
@@ -20,32 +20,58 @@ export function formatFileSize(bytes: number): string {
20
  );
21
  }
22
 
23
- export function FileUpload({ onFileSelect }: FileUploadProps) {
24
  const [selectedFile, setSelectedFile] = useState<File | null>(null);
25
- const [file, setFile] = useState<File | null>(null);
 
 
 
 
 
 
26
 
27
  const onDrop = useCallback(
28
  (acceptedFiles: File[]) => {
29
  const file = acceptedFiles[0];
 
 
30
  setSelectedFile(file);
31
- onFileSelect(file);
32
- setFile(file);
 
 
 
 
 
 
 
 
 
 
 
 
33
  },
34
- [onFileSelect]
35
  );
36
 
37
  const { getRootProps, getInputProps, isDragActive } = useDropzone({
38
  onDrop,
39
  accept: {
40
- "application/pdf": [".pdf"],
 
41
  },
42
- maxSize: 100 * 1024 * 1024, // 100MB
43
  multiple: false,
44
  });
45
 
 
 
 
 
 
46
  return (
47
- <div className={`"w-full min-h-[150px] `}>
48
- {!selectedFile ? (
49
  <div
50
  {...getRootProps()}
51
  className={`min-h-[150px] p-4 rounded-lg
@@ -60,36 +86,45 @@ export function FileUpload({ onFileSelect }: FileUploadProps) {
60
  <UploadIcon className="w-8 h-8 text-primary mr-3 flex-shrink-0" />
61
  <div className="">
62
  <p className="text-sm font-medium text-foreground">
63
- Drop your PDF here or click to browse
64
  </p>
65
  <p className="text-xs text-muted-foreground">
66
- Maximum file size: 100MB
67
  </p>
68
  </div>
69
  </div>
70
  </div>
71
  ) : (
72
- <div className="flex my-auto flex-row items-center p-4 rounded-lg bg-secondary">
73
- <FileIcon className="w-8 h-8 text-primary mr-3 flex-shrink-0" />
74
- <div className="flex-grow min-w-0">
75
- <p className="text-sm font-medium truncate text-foreground">
76
- {selectedFile?.name}
77
- </p>
78
- <p className="text-xs text-muted-foreground">
79
- {formatFileSize(selectedFile?.size ?? 0)}
80
- </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  </div>
82
- {file && <PdfViewer file={file} />}
83
-
84
- <Button
85
- variant="ghost"
86
- size="icon"
87
- onClick={() => setSelectedFile(null)}
88
- className="flex-shrink-0 ml-2"
89
- >
90
- <X className="w-4 h-4" />
91
- <span className="sr-only">Remove file</span>
92
- </Button>
93
  </div>
94
  )}
95
  </div>
 
1
  "use client";
2
 
3
+ import { useCallback, useState, useEffect } from "react";
4
  import { useDropzone } from "react-dropzone";
5
  import { Button } from "./ui/button";
6
+ import { Upload as UploadIcon, Image as ImageIcon, X } from "lucide-react";
7
+ import Image from "next/image";
8
+ interface ImageUploadProps {
9
+ onImageSelect: (imageData: string) => void;
10
+ currentImage: string | null;
11
  }
12
 
13
  export function formatFileSize(bytes: number): string {
 
20
  );
21
  }
22
 
23
+ export function ImageUpload({ onImageSelect, currentImage }: ImageUploadProps) {
24
  const [selectedFile, setSelectedFile] = useState<File | null>(null);
25
+
26
+ // Update the selected file when the current image changes
27
+ useEffect(() => {
28
+ if (!currentImage) {
29
+ setSelectedFile(null);
30
+ }
31
+ }, [currentImage]);
32
 
33
  const onDrop = useCallback(
34
  (acceptedFiles: File[]) => {
35
  const file = acceptedFiles[0];
36
+ if (!file) return;
37
+
38
  setSelectedFile(file);
39
+
40
+ // Convert the file to base64
41
+ const reader = new FileReader();
42
+ reader.onload = (event) => {
43
+ if (event.target && event.target.result) {
44
+ const result = event.target.result as string;
45
+ console.log("Image loaded, length:", result.length);
46
+ onImageSelect(result);
47
+ }
48
+ };
49
+ reader.onerror = (error) => {
50
+ console.error("Error reading file:", error);
51
+ };
52
+ reader.readAsDataURL(file);
53
  },
54
+ [onImageSelect]
55
  );
56
 
57
  const { getRootProps, getInputProps, isDragActive } = useDropzone({
58
  onDrop,
59
  accept: {
60
+ "image/png": [".png"],
61
+ "image/jpeg": [".jpg", ".jpeg"],
62
  },
63
+ maxSize: 10 * 1024 * 1024, // 10MB
64
  multiple: false,
65
  });
66
 
67
+ const handleRemove = () => {
68
+ setSelectedFile(null);
69
+ onImageSelect("");
70
+ };
71
+
72
  return (
73
+ <div className="w-full">
74
+ {!currentImage ? (
75
  <div
76
  {...getRootProps()}
77
  className={`min-h-[150px] p-4 rounded-lg
 
86
  <UploadIcon className="w-8 h-8 text-primary mr-3 flex-shrink-0" />
87
  <div className="">
88
  <p className="text-sm font-medium text-foreground">
89
+ Drop your image here or click to browse
90
  </p>
91
  <p className="text-xs text-muted-foreground">
92
+ Maximum file size: 10MB
93
  </p>
94
  </div>
95
  </div>
96
  </div>
97
  ) : (
98
+ <div className="flex flex-col items-center p-4 rounded-lg bg-secondary">
99
+ <div className="flex w-full items-center mb-4">
100
+ <ImageIcon className="w-8 h-8 text-primary mr-3 flex-shrink-0" />
101
+ <div className="flex-grow min-w-0">
102
+ <p className="text-sm font-medium truncate text-foreground">
103
+ {selectedFile?.name || "Current Image"}
104
+ </p>
105
+ {selectedFile && (
106
+ <p className="text-xs text-muted-foreground">
107
+ {formatFileSize(selectedFile?.size ?? 0)}
108
+ </p>
109
+ )}
110
+ </div>
111
+ <Button
112
+ variant="ghost"
113
+ size="icon"
114
+ onClick={handleRemove}
115
+ className="flex-shrink-0 ml-2"
116
+ >
117
+ <X className="w-4 h-4" />
118
+ <span className="sr-only">Remove image</span>
119
+ </Button>
120
+ </div>
121
+ <div className="w-full overflow-hidden rounded-md">
122
+ <Image
123
+ src={currentImage}
124
+ alt="Selected"
125
+ className="w-full h-auto object-contain"
126
+ />
127
  </div>
 
 
 
 
 
 
 
 
 
 
 
128
  </div>
129
  )}
130
  </div>
components/PdfViewer.tsx DELETED
@@ -1,77 +0,0 @@
1
- "use client";
2
-
3
- import { useCallback, useState } from "react";
4
- import { pdfjs, Document, Page } from "react-pdf";
5
- import "react-pdf/dist/esm/Page/AnnotationLayer.css";
6
- import "react-pdf/dist/esm/Page/TextLayer.css";
7
- import { useResizeObserver } from "@wojtekmaj/react-hooks";
8
-
9
- import type { PDFDocumentProxy } from "pdfjs-dist";
10
- import {
11
- Sheet,
12
- SheetContent,
13
- SheetHeader,
14
- SheetTitle,
15
- SheetTrigger,
16
- } from "./ui/sheet";
17
-
18
- pdfjs.GlobalWorkerOptions.workerSrc = new URL(
19
- "pdfjs-dist/build/pdf.worker.min.mjs",
20
- import.meta.url
21
- ).toString();
22
-
23
- const options = {
24
- cMapUrl: "/cmaps/",
25
- standardFontDataUrl: "/standard_fonts/",
26
- };
27
-
28
- export default function PdfViewer({ file }: { file: File }) {
29
- const [numPages, setNumPages] = useState<number>();
30
- const [containerRef, setContainerRef] = useState<HTMLElement | null>(null);
31
- const [containerWidth, setContainerWidth] = useState<number>();
32
-
33
- // Add resize observer
34
- const onResize = useCallback<ResizeObserverCallback>((entries) => {
35
- const [entry] = entries;
36
- if (entry) {
37
- setContainerWidth(entry.contentRect.width);
38
- }
39
- }, []);
40
-
41
- useResizeObserver(containerRef, {}, onResize);
42
-
43
- async function onDocumentLoadSuccess(page: PDFDocumentProxy): Promise<void> {
44
- setNumPages(page._pdfInfo.numPages);
45
- }
46
-
47
- return (
48
- <Sheet>
49
- <SheetTrigger className="h-10 rounded-lg px-4 py-2 border-input bg-background border-2 hover:bg-accent hover:text-accent-foreground">
50
- Preview
51
- </SheetTrigger>
52
- <SheetContent side="bottom">
53
- <SheetHeader>
54
- <SheetTitle>{file.name}</SheetTitle>
55
- </SheetHeader>
56
- <div
57
- ref={setContainerRef}
58
- className="max-w-2xl mx-auto mt-2 max-h-[calc(100vh-10rem)] overflow-y-auto"
59
- >
60
- <Document
61
- file={file}
62
- onLoadSuccess={onDocumentLoadSuccess}
63
- options={options}
64
- >
65
- {Array.from(new Array(numPages), (_el, index) => (
66
- <Page
67
- key={`page_${index + 1}`}
68
- pageNumber={index + 1}
69
- width={containerWidth}
70
- />
71
- ))}
72
- </Document>
73
- </div>
74
- </SheetContent>
75
- </Sheet>
76
- );
77
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
components/ResultDisplay.tsx DELETED
@@ -1,78 +0,0 @@
1
- "use client";
2
-
3
- import { Button } from "@/components/ui/button";
4
- import { Braces, Copy, RotateCcw } from "lucide-react";
5
- import { useState } from "react";
6
- import {
7
- Popover,
8
- PopoverContent,
9
- PopoverTrigger,
10
- } from "@/components/ui/popover";
11
- interface ResultDisplayProps {
12
- result: string;
13
- schema: string;
14
- onReset: () => void;
15
- }
16
-
17
- export function ResultDisplay({ result, schema, onReset }: ResultDisplayProps) {
18
- const [copied, setCopied] = useState(false);
19
- const [schemaCopied, setSchemaCopied] = useState(false);
20
-
21
- const handleCopy = () => {
22
- navigator.clipboard.writeText(JSON.stringify(result, null, 2));
23
- setCopied(true);
24
- setTimeout(() => setCopied(false), 2000);
25
- };
26
- const handleSchemaCopy = () => {
27
- navigator.clipboard.writeText(JSON.stringify(schema, null, 2));
28
- setSchemaCopied(true);
29
- setTimeout(() => setSchemaCopied(false), 2000);
30
- };
31
-
32
- return (
33
- <div className="space-y-4">
34
- <div className="flex items-center justify-between">
35
- <h2 className="text-xl font-semibold">Extracted Data</h2>
36
- <div className="space-x-2">
37
- <Popover>
38
- <PopoverTrigger>
39
- <Button variant="outline" size="sm">
40
- <Braces className="w-4 h-4 mr-2" />
41
- Schema
42
- </Button>
43
- </PopoverTrigger>
44
- <PopoverContent className="max-h-[500px] max-w-[700px] w-full overflow-y-auto">
45
- <div className="relative p-4 rounded-lg bg-muted">
46
- <Button
47
- variant="secondary"
48
- size="sm"
49
- onClick={handleSchemaCopy}
50
- className="absolute top-2 right-2"
51
- >
52
- <Copy className="w-4 h-4 mr-2" />
53
- {schemaCopied ? "Copied!" : "Copy"}
54
- </Button>
55
- <pre className="overflow-auto">
56
- <code className="text-xs">
57
- {JSON.stringify(schema, null, 2)}
58
- </code>
59
- </pre>
60
- </div>
61
- </PopoverContent>
62
- </Popover>
63
- <Button variant="outline" size="sm" onClick={handleCopy}>
64
- <Copy className="w-4 h-4 mr-2" />
65
- {copied ? "Copied!" : "Copy"}
66
- </Button>
67
- <Button variant="outline" size="sm" onClick={onReset}>
68
- <RotateCcw className="w-4 h-4 mr-2" />
69
- Process Another PDF
70
- </Button>
71
- </div>
72
- </div>
73
- <pre className="p-4 rounded-lg bg-muted overflow-auto">
74
- <code className="text-sm">{JSON.stringify(result, null, 2)}</code>
75
- </pre>
76
- </div>
77
- );
78
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
lib/types.ts ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Define the interface for conversation history items
2
+ export interface HistoryItem {
3
+ // Role can be either "user" or "model"
4
+ role: "user" | "model";
5
+ // Parts can contain text and/or images
6
+ parts: HistoryPart[];
7
+ }
8
+
9
+ // Define the interface for history parts
10
+ export interface HistoryPart {
11
+ // Text content (optional)
12
+ text?: string;
13
+ // Image content as data URL (optional)
14
+ // Format: data:image/png;base64,... or data:image/jpeg;base64,...
15
+ image?: string;
16
+ }
17
+
18
+ // Note: When sending to the Gemini API:
19
+ // 1. User messages can contain both text and images (as inlineData)
20
+ // 2. Model messages should only contain text parts
21
+ // 3. Images in history are stored as data URLs in our app, but converted to base64 for the API
next.config.ts CHANGED
@@ -11,4 +11,4 @@ const nextConfig: NextConfig = {
11
  },
12
  };
13
 
14
- export default nextConfig;
 
11
  },
12
  };
13
 
14
+ export default nextConfig;
package-lock.json CHANGED
@@ -20,7 +20,7 @@
20
  "next-themes": "^0.4.4",
21
  "react": "^19.0.0",
22
  "react-dom": "^19.0.0",
23
- "react-dropzone": "^14.3.5",
24
  "react-pdf": "^9.2.1",
25
  "tailwind-merge": "^3.0.1",
26
  "tailwindcss-animate": "^1.0.7"
@@ -6255,9 +6255,9 @@
6255
  }
6256
  },
6257
  "node_modules/react-dropzone": {
6258
- "version": "14.3.5",
6259
- "resolved": "https://registry.npmjs.org/react-dropzone/-/react-dropzone-14.3.5.tgz",
6260
- "integrity": "sha512-9nDUaEEpqZLOz5v5SUcFA0CjM4vq8YbqO0WRls+EYT7+DvxUdzDPKNCPLqGfj3YL9MsniCLCD4RFA6M95V6KMQ==",
6261
  "license": "MIT",
6262
  "dependencies": {
6263
  "attr-accept": "^2.2.4",
 
20
  "next-themes": "^0.4.4",
21
  "react": "^19.0.0",
22
  "react-dom": "^19.0.0",
23
+ "react-dropzone": "^14.3.8",
24
  "react-pdf": "^9.2.1",
25
  "tailwind-merge": "^3.0.1",
26
  "tailwindcss-animate": "^1.0.7"
 
6255
  }
6256
  },
6257
  "node_modules/react-dropzone": {
6258
+ "version": "14.3.8",
6259
+ "resolved": "https://registry.npmjs.org/react-dropzone/-/react-dropzone-14.3.8.tgz",
6260
+ "integrity": "sha512-sBgODnq+lcA4P296DY4wacOZz3JFpD99fp+hb//iBO2HHnyeZU3FwWyXJ6salNpqQdsZrgMrotuko/BdJMV8Ug==",
6261
  "license": "MIT",
6262
  "dependencies": {
6263
  "attr-accept": "^2.2.4",
package.json CHANGED
@@ -21,8 +21,7 @@
21
  "next-themes": "^0.4.4",
22
  "react": "^19.0.0",
23
  "react-dom": "^19.0.0",
24
- "react-dropzone": "^14.3.5",
25
- "react-pdf": "^9.2.1",
26
  "tailwind-merge": "^3.0.1",
27
  "tailwindcss-animate": "^1.0.7"
28
  },
 
21
  "next-themes": "^0.4.4",
22
  "react": "^19.0.0",
23
  "react-dom": "^19.0.0",
24
+ "react-dropzone": "^14.3.8",
 
25
  "tailwind-merge": "^3.0.1",
26
  "tailwindcss-animate": "^1.0.7"
27
  },
tsconfig.json CHANGED
@@ -4,7 +4,7 @@
4
  "lib": ["dom", "dom.iterable", "esnext"],
5
  "allowJs": true,
6
  "skipLibCheck": true,
7
- "strict": true,
8
  "noEmit": true,
9
  "esModuleInterop": true,
10
  "module": "esnext",
 
4
  "lib": ["dom", "dom.iterable", "esnext"],
5
  "allowJs": true,
6
  "skipLibCheck": true,
7
+ "strict": false,
8
  "noEmit": true,
9
  "esModuleInterop": true,
10
  "module": "esnext",